Linguists Out Men Impersonating Women On Twitter
Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."
I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.
ELOI, ELOI, LAMA SABACHTHANI!?
Huh...the word "hubby" is used more by women. Who knew!
Or it can be used as a training tool for would-be impersonators.
I refuse to use
The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting
or a Mac user.
The internet: where men are men, women are men, and children are federal agents (also men.)
Really. Who cares?
How do you fold a fitted sheet and why do you need more then 2 pairs of shoes.
Apparently I have very feminine text messages/tweets, as I use excessive emoticons, exclamation points, and affectionate pet names (though those are directed towards females). And here I thought I had solidified my masculinity when I burnt all my pink shirts.
Then again, the nickname probably isn't helping either...
Motorcycles, Robots, Space Gossip and More!
A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.
where men are women, women are women and kids are cops.
The mind conceives, the body achieves, the spirit manifests.
I'm a guy. And I used to keep up a personal blog. Once upon a time there was a website that would analyze your blog for you and guess if you were male or female much like this twitter nonsense.
It guessed I was female. Dude! I'm not even gay! STOP SAYING THAT!!!
I mean...seriously! Jesus Christ!
Ponies!!!
Nobel laureate V. S. Naipaul recently caused an uproar when he claimed, among other things, that he could identify the gender of an author from their work:
In what must have been an attempt to be as offensive as possible, he continued, saying that men’s and women’s writing is “quite different I read a piece of writing and within a paragraph or two I know whether it is by a woman or not. I think [it is] unequal to me.”
I guess this means he was right? Although, for the record, he still seems like an arrogant, sexist SOB--just not for this particular reason.
The test set represented 18000 users. The probability of flipping 18000 coins and getting 65.9% heads or more is 8.1e-405.
WhEn YoU SeE pOsTs LiKe ThIs It iS aLwAyS a GiRl!!!!
Quite seriously though, I'm a straight guy, and I make heavy use of exclamation marks, emoticons, "omg", "haha" and "love" in IM conversations, although not so much when blogging. I don't use Twitter, so one can't say whether I'd show these girly traits there.
Do we have a benchmark for how well a human can detect genders? I understand being automagic has some special applications, but it seems like a useful point of comparison for its accuracy.
This is my signature. There are many like it, but this one is mine.
I'm a lesbian trapped in a man's body.
Heh. In our house, my wife occasionally comments on how several well-known online companies (including netflix and google) seem to have decided that she's a gay male. If so, she's very good at impersonating a straight female when I'm around. ;-)
So far, we haven't actually found any downside to this, but it's not hard to imagine situations where it could cause serious problems. For example, the guys who killed Matt Shepard might not believe that her "disguise" isn't a disguise. Such things happen in our world.
So I'm not really all that comfortable with the idea that a piece of software somewhere will be inferring things about me related to sex, and giving its conclusions to people who I don't know, to do with as they like. Our society has a long, sorry history telling us what this can lead to.
One thought, I suppose, might be "How can a lot of us work to sabotage things like this and poison their inferences?" Another might be "Is there a way we can learn about who is getting such inferred info about us, and what they're planning to do with it?" Or "It there a way we can find out who has bought this information, and sue the perpetrators if the information is incorrect?"
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet,
Yay omg, you are cute! I want to be a happy girl! I have no hubby and want u to rub chocolate in my hair. Hahaha!
This is exactly what I was going to say... Where's my mod points?
Peter predicted that you would "deliberately forget" creation 2000 years ago...
I've spent a long time online. And I can pretty easily and reliably deduce gender from what someone "says". Women write differently. I wouldn't say they write worse, just differently.
paintball
I wonder what would happen if you fed my stuff to this algorithm. I'm transsexual and hang out in very different environments depending on which of my friends I psend time with. It can range from LANs to baking parties. On the overall I'd say I'm a poor fit for both male and female stereotypes. It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.
What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...
They've done studies, you know. They say 65.9% of the time, it works every time.
Linguists know that "out" is not a verb.
The paper is done by MITRE Corporation, I gave up reading it, scanned most of it and the results and method seem convoluted and unclear. By the way, the company's motto is: "Applying systems engineering and advanced technology to critical national problems"
Have gnu, will travel.
One thought, I suppose, might be "How can a lot of us work to sabotage things like this and poison their inferences?" Another might be "Is there a way we can learn about who is getting such inferred info about us, and what they're planning to do with it?" Or "It there a way we can find out who has bought this information, and sue the perpetrators if the information is incorrect?"
There really isn't a way to be able to sue them, unless you consider being called the wrong sex defamation, but even if you do, I doubt that courts would really recognize it as an actionable claim.
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
This will only lead to better impersonations skills
From the paper, in their data set 47.7% of tweets were from females, 32.8% were from males, and the rest was unspecified. Tossing out the unspecified ones, guessing "female" all the time would then give ~59% accuracy. On the surface that makes the 65.9% figure in the summary very lackluster, though better figures are reported with more information elsewhere in the article.
Then all the women would know that the algorithm is a man ;-)
This study in no way outs men impersonating women. In fact it specifically identifies gender for analysis by comparing it to the linked blog/website profile information and assuming that "the effort involved in maintaining this deception in two different places suggests that the blog labels on Twitter data are largely reliable". Basically it assumes that anyone attempting to impersonate the opposing gender is a tech ignorant moron that has made no effort to create a persona - something that is contrary to pretty much every piece of information we have on people who do this.
Overall, a poorly constructed study that oversells what it discovers and is then exagerated and stretched by the media who claim things that even the study isn't pretending that it does - in other words a typical day in research and scientific reporting.
So now if I tweet that I "love" something, or god forbid, use a smiley face or an exclamation point, I'm going to get tampon and other feminine product ads? Super.
Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research
Better written as
Depending on how successful the program is proven to be, it could be used for ad-targeting (EVIL), or for socio-linguistic research (GOOD)
Make sure everyone's vote counts: Verified Voting
I read this article and didn't find anything about impersonation of gender. If anything, part of detection algorithm is based on looking for terms like "my wife/my gf" vs. "my hubby/my boyfriend". I think these terms are pretty self-explanatory (well, at least the former, in most states, the latter may belong to a gay user). In any case, this is a fairly trivial method of determining gender and one that seems to be quite basic and naive, in a way that any impersonator, even not particularly determined, would be able to defeat.
Like, omg, whatever.
This study might also identify gay men from straight men. From reading this, I certainly would be classified as a woman. like, lolll hehe:)
Not entirely true I am afraid.
Several experiments were conducted in the 60s and 70s on children raised in gender neutral parenting conditions, that focused on toy choices.
The experiment was intended to show the impact of societal imperitives on children and gender identities and gender specific behaviors, using toy preferences as metric.
The result of the test STILL had little girls favoring dollies with bright colors, and boys favoring machines and soldier type toys, even when very carefully imposed gender neutrality parenting was in effect, even from very young ages.
This is somewhat reinforced by more modern research into the physiological differences between male and female nervous systems.
The idea that men and women might intrinsically focus more on different concepts (and thus, relate to their environments differently from each other, and as such, describe them differently in literature) is not really all that far-fetched.
It is simply politically incorrect to state that women might actually have a biological proclevity toward being the "Domestic" partner in relationships given the current political climate of our western post-sufferage societies.
Somehow, "Staying home, taking care of babies, and doing the chores all day." is seen as a degrading thing, while "Standing in an assembly line inserting part A into assembly B ad nauseum all day" is somehow seen in an idealized fashion as a kind of "Freedom"-- however sick that might be in reality not withstanding.
Now, if you want to complain about women being statistically paid less than men, I will strongly support your argument that it (the practice) is based on pure bull--- But the statement that men and women are innately gender neutral and get conditioned exclusively by stereotypes? that is not supported by behaviorists.
Gender stereotypes simply reinforce already existent behaviors, for better or for worse.
One of the gender differences in English uses that has interested me most is the male tendency to use absolutes more often. A lot of it seems to stem from the sort of "fish story" and humor-based phase of social bonding that begins for most boys in grade school. Men are more likely to say "always" when they mean "usually", "never" when they mean "rarely", etc... which tends to mean that those pedants among us who try to use more precise language sometimes end up appearing more effeminate, or weak (i.e. "You talk like a fag, and your shit's all retarded.") I often wonder how the social interactions between geeks and non-geeks, including bullying, are affected/effected by linguistic cues like these.
OMG! Awsoooooomeee! Ha-ha, TV. How cute.
Stop showing me ads for tampons, damnit. I'M A MAAAANNNNN!!!!!!
how is babby formed?
The closest thing in Unicode to an i with heart is U+01D0 "LATIN SMALL LETTER I WITH CARON", which appears not to be on Slashdot's code point whitelist.
It wouldn't accept the HTML code for it either. Stripped it right out.
Slashdot instituted a code point whitelist after the erocS incident.
Aren't Mac users a subset of women?
LOL. This doesn't surprise me at all. I totally fit the bill. I use emotes and abbreviations ALL the time. Maybe I should stop just so I don't perpetuate gender stereotypes
cat
It's pretty easy to tell if she often tweets about her penis.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
between a gay man and a woman. I happen to use lot of exclamation points and gooey terms in my texts!!!
XXXOOO,
--kev
What does the researcher's research say about effeminate males? Or males that are even slightly more in touch with their emotions (which does not equate to being homosexual as I'm sure many will immediately ponder)? Stereotyping... geeze. It's the 21st century! :)
program could correctly identify gender 65.9% of the time.
Vs. 50% for random?
Hey, thanks a lot, you insensitive clods! I was in a great, loving, completely hot relationship with a lesbian, and now she's dumped me.
Leading to the oblig. 'Bloom County' http://www.gocomics.com/bloomcounty/1988/07/07/
"The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
"used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate."
Interesting that males would drop behaviours that are now "female" despite being the originators of the behaviours. :)
Many of those phrases started when computing was almost exclusively male.
I remember seeing a stat on TV in the BBS (bulletin board) days about how computer users at that time were 96% male. Yet emoticons (smiley's) were used all the time, as was lol, omg, hahaha and exclamation points. Although "hahaha" often took the form of "Muhahaha".
"girl" and "omg" could be used together as well - "OMG! there's a girl on the board..."
And the phrase "press ctrl-alt-delete to continue" was actually a prank?
Ok, enough reminiscing.
Linguistics, and socio-linguistics in particular, is one of those fields where "researchers" almost NEVER do true science. I studied it for 4 years and ended up so disgusted that I switched to computing. You can do almost anything with statistics and when one of the basic premises of the discipline is that "exceptions are a normal and expected situation", it's party time. So you can invent a "scientific generalisation", which you will codify with formulae and everything, and then when presented with obvious and repeated examples disproving the "scientific theory" being proposed, you simply say "the exception that confirms the rule". I LITERALLY had a full university professor bust that one out in my presence... It is anti-science. "Linguists" are charlatans. The only truly great linguist is Roy Harris - former chair of Linguistics at Oxford. He was the first chair, and know what he wanted to do at the end of his tenure? Abolish it!
72.8% of tweets are sent by men. Simply identifying all tweets from a sample as 'male' would yield a higher success rate.
love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate
how about "your back is all white" and "don't teach me how to live?"
Any guest worker system is indistinguishable from indentured servitude.
I've tested these things on my own writing, and they seem to be unsure of my gender. The same analyzer will report different genders on different articles, even though they were all written by me, within a couple of months of each other. Even when they're right, they claim to only be about 50-60 percent sure of my gender. I'm male, if that makes a difference. These analyzers seem to think it doesn't.
Now, if you want to complain about women being statistically paid less than men, I will strongly support your argument that it (the practice) is based on pure bull--- But the statement that men and women are innately gender neutral and get conditioned exclusively by stereotypes? that is not supported by behaviorists.
Gender stereotypes simply reinforce already existent behaviors, for better or for worse.
I'm going to argue with you a bit here, but not in the ways that you probably imagine. I do not think that men and women are innnately gender neutral. We certainly are born with some innate sense of gender, and whom we want to mimic and conform ourselves to. Much of those behaviors though are not innate, in fact, I would label it as "most" behaviors and stereotypes are acquired. And this "genderless rearing", from where? The kids were unable to watch TV, they were completely cut off from books, culture, and others? We impose these stereotypes quite subtly, and it is nearly as difficult to cool something to absolute zero as it is to keep a child in a purely genderless environment.
That said, our children pick up and acquire much of these stereotyping self-conforming behaviors so easily and readily that they have to be innately directed. Not innate themselves, but acquired innately. Like language. We don't actually teach children language, they are exposed to it, and simply acquire it... but the language that one speaks is still culturally determined. The word for snow is "snow" or it's "xue", the child learns both readily and easily, because it's innately acquired, but the exact nature of the word itself is yet arbitrary and culturally imposed.
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
"I only watched "Will And Grace" one time one day
Wish I hadn't 'cause TiVo now thinks I'm gay"
- Couch Potato
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
On the old Galaxynet IRC network a middle-aged man impersonated a young girl for at least half a decade and managed to get promoted to IRCop on several servers. "Kimba" presented "herself" as a teen-age girl who had a history of being sexually molested. Her persona was bolstered by the job the real person had which was as a night security guard in a bank in Perth, Australia. By virtue of his job he could manage to be on-line a great deal of the time. "Kimba" had a succession of male "boyfriends" over the years but all of them carefully chosen for their geographic locations: e.g.: not close to Perth, Australia.
Some guessed that Kimba was a guy but no one knew for sure until "she" confessed to me and explained how the impersonation was done, described "her" real life (married with children) and then quit IRC for (as far as I know) good.
I must admit that some things "she" said made me wonder. One time "she" claimed that she had to take a quick bathroom break because it was "poking its head out, mate". I thought to myself that those Australian girls were certainly earthy. LOL. I bet that semantics program would have caught "her" out in a few minutes.
No one ever had to evacuate a city because the solar panels broke!
Aren't Mac users a subset of women?
Girly men are not women no matter how effeminate.
Calling someone a "hater" only means you can not rationally rebut their argument.
I am not a MAC user! ;-)
Don't fight for your country, if your country does not fight for you.
Some background information that explains this.
http://www.guardian.co.uk/education/2008/dec/16/play
Ona program I saw (probably BBC) they showed once what happened when they let a person play with a kid. Different people where told that the kid was male or female. What happend was that the adult was picking the toys to let the kid play with.
So conditioning of the gender specific toys is done by the adults. I and this is done from pre-birth on. Blue for boys. Pink for girls. And not only the parents will have influence on this. Grandma buying the toys for the kid will be an influence as well.
I personally do not think this is a bad thing. It worked for thousands (or 6.000 if you wish) of years, so why change? Just to be PC?
Don't fight for your country, if your country does not fight for you.
> Several experiments were conducted in the 60s and 70s on children raised in gender neutral parenting conditions, that focused on toy choices.
Was the experiment done in a "double blind" fashion?
I.E you tell the adults and the children that some of the boys are girls and vice-versa?
Otherwise I doubt that the experiment is very interesting: even if we try to be neutral, we treat differently boys and girls..
If that chart is to be believed, all of the men on Twitter are actually just URL fields masquerading as people.
Could very well be the truth. Maybe they are detecting spam bots, not men.
Tell your friends about xenu.net
Great:( I'm a guy, not trying to impersonate a woman, but I profile like one.
Learn the Lady Language, or your life ain't gonna go good
You can't handle the truth.
I wonder how it works if you just write in normal English, with no stupid slang?
To have a right to do a thing is not at all the same as to be right in doing it
yeah well, we dont want them either.
To elude that evil detector of gender, we need a tool which obfuscates male or female posts in a way that it looks like the opposite sex. ha-ha :)
It has been in documented use in the English language since the 14th century and most probably was in use prior to that because "out" is the equivalent of in Old English utian which was often used as a transitive verb.
I always think of this, i never assume that this person on the other end of a virtual conversation is of either sex. This keeps me out of trouble and avoiding doing things on the internet i would certainly regret
It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.
But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.
Basically, they have 100,654 female users, 83,075 male users,and 53817 unspecified. Taking the known ones, there are 183,729 users of known gender. (With the caveat in the previous paragraph.) Out of that, the probability to be a female is about 55%.
BUT if they guess at individual tweets, then it's pretty much the number of tweets from each that counts. There were 2,429,621 tweets from (self-labeled) females, vs 1,672,813 tweets from (self-labeled) males, and unspecified. Total 4,102,434 tweets with "known" gender. Out of those the tweets from "known" females were a bit over 59%.
So basically an algorithm which takes one tweet and just does a hard-coded 'return "female";' would be right over 59% of the time. Bumping that to 65% is such a ridiculously marginal effect that, really, it's funny.
And actually what worries me is not as much the research grants, as the hordes of morons who don't understand the ecological fallacy (extrapolations from whole population "ecological" studies to individuals are stupid) and who'll take this as some infallible identi-kit or worse, as a scientific justification for sexism. Even the summary makes strong claims of outing males pretending to be females, or that flat-out "women use language differently than men". No they don't really. The difference is marginal, and there is massive overlap between any word's usage by males and females.
E.g., one of the "strongly male indicators" they churned is using the word http (presumably tweeting a link?) where actually any given instance of it, the probability of the user to be female is 50.6%, according to their table. So it's really a 50-50 split on the use of this word. One of the few actual strongly male words was Google, but even there it's only a 2/3 and 1/3 split between male and female. Conversely strongly female stuff like mentioning "love" was basically still a 2/3 and 1/3 split in the other direction.
But not that it will stop morons from taking it as some scientifically proven rule that women talk about love and cute stuff, and guys talk about http and Google. And that, for example, therefore we need to hire less women in IT.
A polar bear is a cartesian bear after a coordinate transform.
I can't see how a detector like this would 'out' somebody impersonating a female. If it sees "my wife", it takes it as a very strong hint that the poster is not a female. But wouldn't someone impersonating a female rather say "my boyfriend" or "my period is due" or some other stereotypically feminine things rather than "my nigga" or "my balls itch"?
Every end has half a stick.
If they included all the adolescent twitters in the statistics, it's not so impressive the software can tell who's female...
"OMFG! LoL! ha ha! ur so qt!"
Without even analyzing the text, I could just say each time it's a male and probably have a better success rate than 65.9
This was a real problem for me on WoW since I liked to play female toons. Somehow, I don't know why, everybody assumed I was female in real life. I realize there are always the 14 year old boys who want to talk to any 'girl', but even the guild leader (A woman herself) and several adults in the guild were shocked when I finally got on Ventrillo. I don't think I type using feminine words or expressions. I guess there might had been a lack of male lunk headedness on my part. One Guildie even tried getting me into cyber sex chat (a rather lame attempt too) before I had to put on the brakes and told him I was really a guy.
Because I keep getting 'identified' as female. It's really annoying when I browse a site and obviously female oriented ads keep popping up. Google identifies me as female and that made no sense to me. 95% of my browsing is to IT/Tech/Computer sites, why would that identify as female? But now I realize they must be identifying me via my gmail account and other written text. The confusion likely stems from the fact I'm an exceptionally verbose male, predisposed towards descriptive language and precise grammar. That and I've always used emoticons as part of my attempt to overcome the lack of expression available in written text.
I would bet any amount my twitter also identifies as female. My wife is likely going to find this very amusing...
Now that's a creepy phone app just waiting to be made.
Don't most people pretending to be female on twitter fill their tweets with stereotypical female language? This would only catch pretenders who are really lazy and incompatent.
Is it? Guessing is only 15% less accurate, but needs infinite% less input data.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
omg I'm a happy, girl :) LOL I love my cute hubby's chocolate hair. yay ! hahaha
Hmmm ... So maybe I should learn how to convince the analysis code that I'm a transvestite female pretending to be a gay male (to satisfy my "wife" who they believe is a gay male). I wonder who could give me lessons in this sort of fakery. I mean, I have no experience actually being female (or gay or transvestite), so I probably wouldn't do it right without some lessons.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
The word "out" has been documented as being used as a transitive verb since the 14th century. Moreover, it is a derivative of a Middle English word (owte) which is derived from an Old English word (utian) which is also well documented as being used as a transitive verb.
It's oldest meaning as a verb? To reveal something previously hidden.
So "outing" someone is only a very slight adaption of a very old usage.
Good grief, what do they teach kids at the university these days?!
Ultracrepidasts "know" that "out" is not a verb.
Don't believe me? Go look the OED entry for "out" as a transitive verb. Look at the historical section. Notice the entries spanning from the 14th century up through to the present day.