In early September I started collecting #MeToo tweets and stumbled into a big-data look at #MeToo and the Kavanaugh confirmation. As I posted previously, the Twitter network is radically split into red and blue factions. (FiveThirtyEight subsequently wrote about this divide, based not on Twitter but on polls, and arrived at the same conclusion.) Now I want to post an update on this project, which is more than ever a work in progress. I’m especially wanting input from #MeToo movement organizers, who hopefully have real questions that can guide where this research goes next.
I am still collecting tweets. Here is an updated map, showing the same left-right split. The network appears to have a couple significant bridges and a “super-left” tail:
#MeToo twitter network of Sept 25-27
What tweets exactly are being counted? Over Sept 15-27 I curated a list of hashtags to track, aiming to capture #MeToo spirit without taking unnecessary Kavanaugh crossfire. During that time, #BelieveSurvivors grew from zero to the number one trending hashtag on Twitter of Sept 24. My final list of 20 “#MeToo hashtags” also includes #WhyIDidntReport, #BelieveWomen, #MenToo, #MeTooMvmt, #SurvivorCulture, and #HimToo. Below is a word cloud showing all the top hashtags from the 5 million tweets charted above.
Word cloud for all #MeToo tweets Sept 25 – Oct 2
All these maps and charts are a nice start, but how can we better understand what’s happening on the two sides of this “conversation”? One way is to make separate word clouds, one for each side:
Separate word clouds for left and right network clusters
We can see some important differences based on these word clouds, like #HimToo on the right and #WhyIDidntReport on the left. But the differences are obscured by the overwhelming similarities. For example, barely a day after #BelieveSurvivors exploded on the left, it became just as huge on the right, and so both word clouds feature this hashtag prominently, which does not help us understand the differences between left and right.
Let’s look at this problem another way. We’ve got 5 million Tweets from 1.5 million users. Based on network clusters, we can categorize many (maybe most) of those users as “left” or “right.” What happens if we make one bucket of tweets from known “left” users, another bucket of tweets from known “right” users, and then teach a computer program to recognize the difference between a “left” tweet and a “right” tweet? If we succeed, then we can use that computer program to score any #MeToo tweet on left-vs-right partisanship, including tweets from unknown users and without even drawing a network map.
We have formulated a classic problem of machine learning. Skipping some technical detail, we train a classifier to recognize our two categories of #MeToo tweets with roughly 87% accuracy. Not bad. If we crack open the resulting classifier, we find model coefficients that tell us exactly which words are most strongly associated with each side of the #MeToo divide. The bigger the bar, the more influence it has on our “prediction”:
The words listed above do not have any extraneous hashtags that are popular on both sides. We are looking at the most significant single-word indicators that a tweet is either “left” or “right.” The top two and bottom two make perfect sense. The left champions #SurvivorCulture and #StopKavanaugh. The right champions #HimToo (a cry to protect men from false accusations) and #ConfirmKavanaughNow. Some words included in the list are not obviously partisan (#world) and we’d want to do more model-training if we were really serious about classifying lots of future tweets very accurately.
Let’s run with our first-draft model for now. With it, we can actually compute, for any #MeToo tweet, the probability that it’s left or right. If a tweet scores 0.0001, then it’s almost certainly left, and it it scores 0.9999 then it’s almost certainly right. If we can score tweets this way, then we can aggregate tweet scores user by user and estimate how far each individual leans left or right (on a zero-to-one scale), based on what they’re literally saying and without having to bother with a map. Below we see a curve of tweet scores based on Sept 25-27.
The rainbow in the chart above shows how we assign a color to each score value from zero to one. This will be handy when we start assigning scores to nodes and edges in network maps.
Based on the distribution above, let’s consider a more nuanced classification than the binary “left” vs “right.” I’ve proposed four categories, and selected 2-3 of the most-retweeted examples within each category. It looks good at the far ends, with a miss or two in the mid-left and mid-right.
Far Left: ~500K tweets scoring 0.0-0.15:
“i was raped at Yale. i was groped at parties in dke’s house—#kavanaugh’s fraternity at yale—and was told as a freshman to avoid their “rape basement.” multiple dear friends were raped by yale dke brothers & by boys from elite prep schools. i believe ramirez. #believesurvivors”
“by scheduling a vote on judge kavanaugh before dr. ford has even testified, senate republican leaders are saying loud and clear: they don’t care what she says. #believesurvivors”
“mr. president, enough. a supreme court nomination is not worth more than the lives of survivors. there must be a full investigation of these allegations of criminal behavior, and judge kavanaugh’s nomination must be withdrawn.”
Mid Left: ~300K tweets scoring 0.15-0.35
“tune in as democrats show our support for dr. christine blasey ford. #believesurvivors”
“so, the same party that wants to force teenage boys and girls to shower together in the name of transgender rights is also leading #metoo against sexual predators?”
Mid-Right: ~200K tweets scoring 0.35-0.75
“modern feminism has never been about equality with men.
it has always been about special treatment and exemption from all responsibility. many condemned me for being one of the first to speaking out against #metoo. now it’s toxicity is on full display. #defendourboys”
“you can like or not like @michaelavenatti but what he just put out is a sworn affidavit alleging that kavanaugh and mark judge regularly gang raped women including once his client julie swetnick. i believe survivors.”
“it’s all about #metoo & #webelievesurvivors unless the survivors support @realdonaldtrump or the sexual predator is a democrat. ain’t that right @keithellison @maziehirono @senfeinstein & @billclinton @dnc the party of hypocrisy
Far Right: 100K tweets scoring 0.75-1.0
“i’m loving the hashtag #himtoo. it appears to be a movement built of men who have had their lives and families destroyed by false allegations and a lack of due process. radical feminism has become problematic and needs to be addressed. dr. luke, brett kavanaugh… #himtoo”
“serious question. are keith ellison, sen. sherrod brown, sen. booker and sen. tom carper signing on? i know they’re democrats but thought it’s only fair to ask given their history’s on this subject.”
With a good scoring and coloring system, along the lines described above, we can apply those colors to every node and edge of a twitter map, and see “exactly” where left- and right-leaning discussions are happening, along with some shades in between the extremes. Something like this:
Prototype #MeToo Twitter network map,
With color spectrum to indicate extent of left vs right expression.
Let me know what you think. I am especially interested in movement organizer folks who have suggestions for improving the relevance and usefulness of this method to provide them with actionable information.
Originally published 10/9/18 at Connective Associates.