Linguists Out Men Impersonating Women On Twitter
Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."
I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.
ELOI, ELOI, LAMA SABACHTHANI!?
Huh...the word "hubby" is used more by women. Who knew!
The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting
or a Mac user.
A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.
> Or it can be used as a training tool for would-be impersonators.
Or to test gender-altering scripts. OMG! :)
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...
It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.
I guess I don't quite see what their weight has to do with anything...
#DeleteChrome
From the paper, in their data set 47.7% of tweets were from females, 32.8% were from males, and the rest was unspecified. Tossing out the unspecified ones, guessing "female" all the time would then give ~59% accuracy. On the surface that makes the 65.9% figure in the summary very lackluster, though better figures are reported with more information elsewhere in the article.
Not entirely true I am afraid.
Several experiments were conducted in the 60s and 70s on children raised in gender neutral parenting conditions, that focused on toy choices.
The experiment was intended to show the impact of societal imperitives on children and gender identities and gender specific behaviors, using toy preferences as metric.
The result of the test STILL had little girls favoring dollies with bright colors, and boys favoring machines and soldier type toys, even when very carefully imposed gender neutrality parenting was in effect, even from very young ages.
This is somewhat reinforced by more modern research into the physiological differences between male and female nervous systems.
The idea that men and women might intrinsically focus more on different concepts (and thus, relate to their environments differently from each other, and as such, describe them differently in literature) is not really all that far-fetched.
It is simply politically incorrect to state that women might actually have a biological proclevity toward being the "Domestic" partner in relationships given the current political climate of our western post-sufferage societies.
Somehow, "Staying home, taking care of babies, and doing the chores all day." is seen as a degrading thing, while "Standing in an assembly line inserting part A into assembly B ad nauseum all day" is somehow seen in an idealized fashion as a kind of "Freedom"-- however sick that might be in reality not withstanding.
Now, if you want to complain about women being statistically paid less than men, I will strongly support your argument that it (the practice) is based on pure bull--- But the statement that men and women are innately gender neutral and get conditioned exclusively by stereotypes? that is not supported by behaviorists.
Gender stereotypes simply reinforce already existent behaviors, for better or for worse.
Sneakers and Steel-toed boots. We apparently have different jobs.
Supporter of the +1 Over Dramatic mod option. In memory of apk.
It's pretty easy to tell if she often tweets about her penis.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.
The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) between individuals produces a more accurate determination than an essay.
Yes, humans widely use language differently based on their own subcultures. Women particularly in some cultures speak an entirely different language from the gender-neutral language spoken by everyone. In some languages such as Japanese gendered language is extremely readily apparent, and when I was chatting on Japanese chatrooms, it was nice to be able to identify the gender of the speaker in one or two lines of text from them.
In much the same way, while we often are of the belief that men and women use language the same way in English, because it's not readily apparent, we do actually use language differently. Here is another interesting one: women use fewer contractions than men. Weird but oddly true.
All of this has less to do with "gaydar" than that every subculture speaks a slightly different dialect. Gay men have a selection of words that set them off, (I actually commented to a gay-rights group, where I was an "ally" of gay-rights, that they were using "fabulous" like... A LOT. And I was all, "um... do you REALLY want to be projecting the notion that this stereotype is valid and accurate? Because that is what you are doing.") and this does not mean that gay men talk like women. They actually talk differently and distinctly from women, but in this world of false dichotomies that we live in, we presume that if gay men don't talk the same way as straight men, then they must talk like women. But, in reality, this isn't actually correct.
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
Now THAT is insightful. That would explain things. It would explain why some people told me over the years that I "talked like a girl" because I spoke properly, and precisely in that nerdy way. By my standards, most men are sloppy speakers. Even my sister pointed this out to me at a drive thru some years back, she said most men would say "I wanna burger, fries, and coke." and then stop and drive on, while I said, "I would like a hamburger, medium order of fries and medium coke, please, and that will be all (to prevent the annoying upsell for dessert or anything else)"
Then again, I am transgendered, and that might affect things alongside the nerdy precision.
It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.
But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.
Basically, they have 100,654 female users, 83,075 male users,and 53817 unspecified. Taking the known ones, there are 183,729 users of known gender. (With the caveat in the previous paragraph.) Out of that, the probability to be a female is about 55%.
BUT if they guess at individual tweets, then it's pretty much the number of tweets from each that counts. There were 2,429,621 tweets from (self-labeled) females, vs 1,672,813 tweets from (self-labeled) males, and unspecified. Total 4,102,434 tweets with "known" gender. Out of those the tweets from "known" females were a bit over 59%.
So basically an algorithm which takes one tweet and just does a hard-coded 'return "female";' would be right over 59% of the time. Bumping that to 65% is such a ridiculously marginal effect that, really, it's funny.
And actually what worries me is not as much the research grants, as the hordes of morons who don't understand the ecological fallacy (extrapolations from whole population "ecological" studies to individuals are stupid) and who'll take this as some infallible identi-kit or worse, as a scientific justification for sexism. Even the summary makes strong claims of outing males pretending to be females, or that flat-out "women use language differently than men". No they don't really. The difference is marginal, and there is massive overlap between any word's usage by males and females.
E.g., one of the "strongly male indicators" they churned is using the word http (presumably tweeting a link?) where actually any given instance of it, the probability of the user to be female is 50.6%, according to their table. So it's really a 50-50 split on the use of this word. One of the few actual strongly male words was Google, but even there it's only a 2/3 and 1/3 split between male and female. Conversely strongly female stuff like mentioning "love" was basically still a 2/3 and 1/3 split in the other direction.
But not that it will stop morons from taking it as some scientifically proven rule that women talk about love and cute stuff, and guys talk about http and Google. And that, for example, therefore we need to hire less women in IT.
A polar bear is a cartesian bear after a coordinate transform.
Don't most people pretending to be female on twitter fill their tweets with stereotypical female language? This would only catch pretenders who are really lazy and incompatent.