A.I. President

Can machine learning tweet like Trump?

A tale of tweets and two Markov chains

Trump's tweets put through "algorithmic paces"

Stoatistic algorithms and natural language processing applied to tweet generation.

Curious as to whether or not machine learning would be able to mimic natural language, it seeemed like Twitter and one of it's iconic members could be a great source of source data. In pursuit of the answer, approximately 50 pages of tweets were extracted and processed in an attempt to analyze and mimic President Trumps writing style. You are cordially invited to judge the sometimes hilarious results for yourself as to whether or not machine learning was able to acheive this feat.

nationwide
Tweets, the oldfashioned way.

The first step was to get the data

Why copy and paste when the computer can do it for us? The video below shows how a programming script was used to automate a browser to scroll through Trump's Twitter history and simultaneously extract and store the text. The entire process took about 20 minutes and all the while was making choices as to what information to keep and which posts to ignore. For example, the decision was made to skip over re-tweets. For those who are unfamiliar with Twitter, "re-tweets" are usually recycled posts from other users and processing them would dilute what could be attributed to the President.

Scraping the data through browser automation

The real tweets

Roughly 200 of President Trumps tweets from October, 2020 were extracted from approximately 50 pages of tweet history and a few of them are displayed in the screenshot below. The full results of the genuine Trump tweets are linked here and also in the navigation bar above.

nationwide
A screenshot of October 2020 tweets.

Putting the data through an analysis library

A Markov chain breaks sentences into smaller chunks, analyzes word frequencies and patterns to assess and regurgitate likely word sequences. The first step was to run the data from twitter through an "off the shelf" natural language processing package called "Markovify" which provides a convenient way to assess the likelihood of being successful. Check out the screenshot below and/or view the table of full results.

nationwide
Markovify package results.

Custom algorithm

Taking the project to the next level meant creating a custom script to employ a Markov chain script and compare the results with the Markovify results. Both procesees generated tweets with a Trump-like flavor and perhaps oddly,a slap-stick style humor. The final results of the custom script are here: the table of full results.

nationwide
Custom script results.

Project on Github

Interested in the code? All scripts are available for download in the code repository.

GitHub repository: A.I. President.

The next step

If the project was to be developed further, more data is the first step I would take as the results were almost "too good". This is especially for the Markovify package results which were almost hard to differentiate from the genuine tweets sometimes. This suggests the possibility that there was not enough information to make a mistake as it were. Therefore, one way to test to that the scripts are truly working would be to process a much larger dataset. Additonally, the custom script would benefit from some conditional clauses to normalize the results. For example, the custom script results had quite a few stray characters that were not produced using the Markovify package from the Python natural language processing library.

Thanks for reading and checking out the project!