Linguists Out Men Impersonating Women On Twitter
Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."
I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.
ELOI, ELOI, LAMA SABACHTHANI!?
I'm a lesbian trapped in a man's body.
Huh...the word "hubby" is used more by women. Who knew!
but only just?
Those cunning linguists!
Or it can be used as a training tool for would-be impersonators.
I refuse to use
The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting
or a Mac user.
The internet: where men are men, women are men, and children are federal agents (also men.)
Given that a drunken monkey with a dart gun could manage roughly 50% accuracy given only two choices, 65.9% doesn't exactly seem like a major advance...
65%?? that's very close to myself flipping a coin...
Really. Who cares?
How do you fold a fitted sheet and why do you need more then 2 pairs of shoes.
But will it help me score?!?
Now the fakers will all be using exclamation marks and smileys.
Hahahahahaha
Uh, that's not very impressive.
OK, so you're suggesting they should replace "man" with "likes girls" and "woman" with "likes boys"? I'd buy that...
Apparently I have very feminine text messages/tweets, as I use excessive emoticons, exclamation points, and affectionate pet names (though those are directed towards females). And here I thought I had solidified my masculinity when I burnt all my pink shirts.
Then again, the nickname probably isn't helping either...
Motorcycles, Robots, Space Gossip and More!
"http, google"
That's it. Those are the words guys use more? They link to stuff and Google? Really?
A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.
where men are women, women are women and kids are cops.
The mind conceives, the body achieves, the spirit manifests.
I'm a guy. And I used to keep up a personal blog. Once upon a time there was a website that would analyze your blog for you and guess if you were male or female much like this twitter nonsense.
It guessed I was female. Dude! I'm not even gay! STOP SAYING THAT!!!
I mean...seriously! Jesus Christ!
Ponies!!!
So dont get this guys! Gonna pretend you do it right..right ;)
Disclaimer: It took about over 15 minutes for me to write one line which didn't seem unequivocally to have been written by a guy, and when I finally thought I managed such a feat I found it so jarring that I was moved to append a disaclaimer explaining it. Must be girl thing.
Nobel laureate V. S. Naipaul recently caused an uproar when he claimed, among other things, that he could identify the gender of an author from their work:
In what must have been an attempt to be as offensive as possible, he continued, saying that men’s and women’s writing is “quite different I read a piece of writing and within a paragraph or two I know whether it is by a woman or not. I think [it is] unequal to me.”
I guess this means he was right? Although, for the record, he still seems like an arrogant, sexist SOB--just not for this particular reason.
I'd be more impressed if it was wrong 70% of the time.
OMG, does this mean I'm really gay?!!! NLOL, WMS!!!
[NLOL, WMS = not laughing out loud, wetting my self]
It's a 140 character sex-change operation!
WhEn YoU SeE pOsTs LiKe ThIs It iS aLwAyS a GiRl!!!!
Do we have a benchmark for how well a human can detect genders? I understand being automagic has some special applications, but it seems like a useful point of comparison for its accuracy.
This is my signature. There are many like it, but this one is mine.
"The research is based on the idea that women use language differently than men." I have another idea - that old people use language differently *from* you.
linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet,
Yay omg, you are cute! I want to be a happy girl! I have no hubby and want u to rub chocolate in my hair. Hahaha!
This is exactly what I was going to say... Where's my mod points?
Peter predicted that you would "deliberately forget" creation 2000 years ago...
I've spent a long time online. And I can pretty easily and reliably deduce gender from what someone "says". Women write differently. I wouldn't say they write worse, just differently.
paintball
I wonder what would happen if you fed my stuff to this algorithm. I'm transsexual and hang out in very different environments depending on which of my friends I psend time with. It can range from LANs to baking parties. On the overall I'd say I'm a poor fit for both male and female stereotypes. It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.
What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...
They've done studies, you know. They say 65.9% of the time, it works every time.
Linguists know that "out" is not a verb.
The paper is done by MITRE Corporation, I gave up reading it, scanned most of it and the results and method seem convoluted and unclear. By the way, the company's motto is: "Applying systems engineering and advanced technology to critical national problems"
Have gnu, will travel.
Bitches love smilies.
This will only lead to better impersonations skills
From the paper, in their data set 47.7% of tweets were from females, 32.8% were from males, and the rest was unspecified. Tossing out the unspecified ones, guessing "female" all the time would then give ~59% accuracy. On the surface that makes the 65.9% figure in the summary very lackluster, though better figures are reported with more information elsewhere in the article.
The only reason for the stipulated deduction that men and women speak differently is due to the enforcement of gender stereotypes. Given no outside influence, they wouldn't speak like this. A small percentage yes. Given the gender normatives, one could easily impersonate the other gender as the scripters have erroneously created it. The study is utter bollocks.
I used UK vernacular...I must be from the UK! ...I'm from Texas...I don't say Y'all and I don't have an accent. I also use the phrase No Worries. Why? Because it is the nicest way to say what you intend. The list can go on ex infinitum.
Cultural enforcement studies are abhorrent.
This study in no way outs men impersonating women. In fact it specifically identifies gender for analysis by comparing it to the linked blog/website profile information and assuming that "the effort involved in maintaining this deception in two different places suggests that the blog labels on Twitter data are largely reliable". Basically it assumes that anyone attempting to impersonate the opposing gender is a tech ignorant moron that has made no effort to create a persona - something that is contrary to pretty much every piece of information we have on people who do this.
Overall, a poorly constructed study that oversells what it discovers and is then exagerated and stretched by the media who claim things that even the study isn't pretending that it does - in other words a typical day in research and scientific reporting.
omg lol! hair? Does that make me gay :'(
Call me when they are able to identify people who actually try to impersonate the other gender. Otherwise, this isn't very useful, as asking people is probably going to yield more accurate results.
So now if I tweet that I "love" something, or god forbid, use a smiley face or an exclamation point, I'm going to get tampon and other feminine product ads? Super.
Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research
Better written as
Depending on how successful the program is proven to be, it could be used for ad-targeting (EVIL), or for socio-linguistic research (GOOD)
Make sure everyone's vote counts: Verified Voting
I read this article and didn't find anything about impersonation of gender. If anything, part of detection algorithm is based on looking for terms like "my wife/my gf" vs. "my hubby/my boyfriend". I think these terms are pretty self-explanatory (well, at least the former, in most states, the latter may belong to a gay user). In any case, this is a fairly trivial method of determining gender and one that seems to be quite basic and naive, in a way that any impersonator, even not particularly determined, would be able to defeat.
Like, omg, whatever.
This study might also identify gay men from straight men. From reading this, I certainly would be classified as a woman. like, lolll hehe:)
They could try this on SecondLife, but what would be the point?
One of the gender differences in English uses that has interested me most is the male tendency to use absolutes more often. A lot of it seems to stem from the sort of "fish story" and humor-based phase of social bonding that begins for most boys in grade school. Men are more likely to say "always" when they mean "usually", "never" when they mean "rarely", etc... which tends to mean that those pedants among us who try to use more precise language sometimes end up appearing more effeminate, or weak (i.e. "You talk like a fag, and your shit's all retarded.") I often wonder how the social interactions between geeks and non-geeks, including bullying, are affected/effected by linguistic cues like these.
OMG! Awsoooooomeee! Ha-ha, TV. How cute.
Stop showing me ads for tampons, damnit. I'M A MAAAANNNNN!!!!!!
how is babby formed?
The closest thing in Unicode to an i with heart is U+01D0 "LATIN SMALL LETTER I WITH CARON", which appears not to be on Slashdot's code point whitelist.
It wouldn't accept the HTML code for it either. Stripped it right out.
Slashdot instituted a code point whitelist after the erocS incident.
Aren't Mac users a subset of women?
LOL. This doesn't surprise me at all. I totally fit the bill. I use emotes and abbreviations ALL the time. Maybe I should stop just so I don't perpetuate gender stereotypes
cat
I have taken a number of these "are you male or female" tests over the years, most of which consisted of analyzing paragraphs (at least) of writing. When I tried to write like a stereotypical man, they identified me as male. When I tried to write as a stereotypical female, they identified me as female. When I wrote in my normal style, they would usually identify me as male, but with a pretty low degree of certainty.
It isn't difficult to "write like a woman" if you have a rudimentary knowledge of how men and women in general use English differently.
It's pretty easy to tell if she often tweets about her penis.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
between a gay man and a woman. I happen to use lot of exclamation points and gooey terms in my texts!!!
XXXOOO,
--kev
What does the researcher's research say about effeminate males? Or males that are even slightly more in touch with their emotions (which does not equate to being homosexual as I'm sure many will immediately ponder)? Stereotyping... geeze. It's the 21st century! :)
program could correctly identify gender 65.9% of the time.
Vs. 50% for random?
Hey, thanks a lot, you insensitive clods! I was in a great, loving, completely hot relationship with a lesbian, and now she's dumped me.
If you RTFA, you'll see this, and instantly doubt the article, given that they had more females than males as the sample.
http://images.fastcompany.com/upload/a_onadate.png
Did they actually study gay women versus straight men? Lesbians are 3 times more likely to be gender non-conforming than straight women.
Leading to the oblig. 'Bloom County' http://www.gocomics.com/bloomcounty/1988/07/07/
"The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
"used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate."
Interesting that males would drop behaviours that are now "female" despite being the originators of the behaviours. :)
Many of those phrases started when computing was almost exclusively male.
I remember seeing a stat on TV in the BBS (bulletin board) days about how computer users at that time were 96% male. Yet emoticons (smiley's) were used all the time, as was lol, omg, hahaha and exclamation points. Although "hahaha" often took the form of "Muhahaha".
"girl" and "omg" could be used together as well - "OMG! there's a girl on the board..."
And the phrase "press ctrl-alt-delete to continue" was actually a prank?
Ok, enough reminiscing.
first their not linguist they are computer scientists using data mining to obtain a result. the input data just happens to be english haha.
second, the impressive part is the fact that they found a way to train their classifier on all 3,280,532 traing tweets on a single computer in around seven minutes!
while the actual male/female classifier may not be hugely useful this process could be used to build other classifiers quickly and then update them as new information appears. :)
Linguistics, and socio-linguistics in particular, is one of those fields where "researchers" almost NEVER do true science. I studied it for 4 years and ended up so disgusted that I switched to computing. You can do almost anything with statistics and when one of the basic premises of the discipline is that "exceptions are a normal and expected situation", it's party time. So you can invent a "scientific generalisation", which you will codify with formulae and everything, and then when presented with obvious and repeated examples disproving the "scientific theory" being proposed, you simply say "the exception that confirms the rule". I LITERALLY had a full university professor bust that one out in my presence... It is anti-science. "Linguists" are charlatans. The only truly great linguist is Roy Harris - former chair of Linguistics at Oxford. He was the first chair, and know what he wanted to do at the end of his tenure? Abolish it!
72.8% of tweets are sent by men. Simply identifying all tweets from a sample as 'male' would yield a higher success rate.
love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate
how about "your back is all white" and "don't teach me how to live?"
Any guest worker system is indistinguishable from indentured servitude.
Am I the only one who noticed that in the Atlantic Wire article linked in the post, there are the tables listing various things men and women are more likely to say? Women includes a pile of things like "cute, omg, hair, shopping, chocolate, or sigh" and the only thing listed under men is "http" and "google".
If that chart is to be believed, all of the men on Twitter are actually just URL fields masquerading as people.
What?
I've tested these things on my own writing, and they seem to be unsure of my gender. The same analyzer will report different genders on different articles, even though they were all written by me, within a couple of months of each other. Even when they're right, they claim to only be about 50-60 percent sure of my gender. I'm male, if that makes a difference. These analyzers seem to think it doesn't.
The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting
or a Mac user.
Come on, we all know that they have female brain structures
"I only watched "Will And Grace" one time one day
Wish I hadn't 'cause TiVo now thinks I'm gay"
- Couch Potato
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
On the old Galaxynet IRC network a middle-aged man impersonated a young girl for at least half a decade and managed to get promoted to IRCop on several servers. "Kimba" presented "herself" as a teen-age girl who had a history of being sexually molested. Her persona was bolstered by the job the real person had which was as a night security guard in a bank in Perth, Australia. By virtue of his job he could manage to be on-line a great deal of the time. "Kimba" had a succession of male "boyfriends" over the years but all of them carefully chosen for their geographic locations: e.g.: not close to Perth, Australia.
Some guessed that Kimba was a guy but no one knew for sure until "she" confessed to me and explained how the impersonation was done, described "her" real life (married with children) and then quit IRC for (as far as I know) good.
I must admit that some things "she" said made me wonder. One time "she" claimed that she had to take a quick bathroom break because it was "poking its head out, mate". I thought to myself that those Australian girls were certainly earthy. LOL. I bet that semantics program would have caught "her" out in a few minutes.
No one ever had to evacuate a city because the solar panels broke!
Aren't Mac users a subset of women?
Girly men are not women no matter how effeminate.
Calling someone a "hater" only means you can not rationally rebut their argument.
The article and the research it is based on, as well as the comments here, are obnoxiously sexist and heteronormative. Not cool.
And the "research" is worthless, probably being quite offensive to a number of people if applied to any real-world use, like advertising.
I am not a MAC user! ;-)
Don't fight for your country, if your country does not fight for you.
Where the men are men and the women are too.
Excellent! The time for an exact measure of gayness of a post is near.
Great:( I'm a guy, not trying to impersonate a woman, but I profile like one.
Learn the Lady Language, or your life ain't gonna go good
You can't handle the truth.
I wonder how it works if you just write in normal English, with no stupid slang?
To have a right to do a thing is not at all the same as to be right in doing it
yeah well, we dont want them either.
that the poster was a male geek, who's closest exposure to "women" is via twitter. Actually, I expand that prediction to 99% of the people reading this comment! :-)
3 Ya all!
To elude that evil detector of gender, we need a tool which obfuscates male or female posts in a way that it looks like the opposite sex. ha-ha :)
And what about someone now using this data to adjust their writing style, if they're purposely out to fake their gender? I imagine the effectiveness of such a strategy would fall away over longer and longer pieces of text. It would be interesting to know how hard it is for men to forge a feminine writing style and vice versa. Also interesting would be the effects of sexuality as opposed to just gender on writing style.
It has been in documented use in the English language since the 14th century and most probably was in use prior to that because "out" is the equivalent of in Old English utian which was often used as a transitive verb.
I always think of this, i never assume that this person on the other end of a virtual conversation is of either sex. This keeps me out of trouble and avoiding doing things on the internet i would certainly regret
It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.
But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.
Basically, they have 100,654 female users, 83,075 male users,and 53817 unspecified. Taking the known ones, there are 183,729 users of known gender. (With the caveat in the previous paragraph.) Out of that, the probability to be a female is about 55%.
BUT if they guess at individual tweets, then it's pretty much the number of tweets from each that counts. There were 2,429,621 tweets from (self-labeled) females, vs 1,672,813 tweets from (self-labeled) males, and unspecified. Total 4,102,434 tweets with "known" gender. Out of those the tweets from "known" females were a bit over 59%.
So basically an algorithm which takes one tweet and just does a hard-coded 'return "female";' would be right over 59% of the time. Bumping that to 65% is such a ridiculously marginal effect that, really, it's funny.
And actually what worries me is not as much the research grants, as the hordes of morons who don't understand the ecological fallacy (extrapolations from whole population "ecological" studies to individuals are stupid) and who'll take this as some infallible identi-kit or worse, as a scientific justification for sexism. Even the summary makes strong claims of outing males pretending to be females, or that flat-out "women use language differently than men". No they don't really. The difference is marginal, and there is massive overlap between any word's usage by males and females.
E.g., one of the "strongly male indicators" they churned is using the word http (presumably tweeting a link?) where actually any given instance of it, the probability of the user to be female is 50.6%, according to their table. So it's really a 50-50 split on the use of this word. One of the few actual strongly male words was Google, but even there it's only a 2/3 and 1/3 split between male and female. Conversely strongly female stuff like mentioning "love" was basically still a 2/3 and 1/3 split in the other direction.
But not that it will stop morons from taking it as some scientifically proven rule that women talk about love and cute stuff, and guys talk about http and Google. And that, for example, therefore we need to hire less women in IT.
A polar bear is a cartesian bear after a coordinate transform.
I can't see how a detector like this would 'out' somebody impersonating a female. If it sees "my wife", it takes it as a very strong hint that the poster is not a female. But wouldn't someone impersonating a female rather say "my boyfriend" or "my period is due" or some other stereotypically feminine things rather than "my nigga" or "my balls itch"?
Every end has half a stick.
Bollocks!
If they included all the adolescent twitters in the statistics, it's not so impressive the software can tell who's female...
"OMFG! LoL! ha ha! ur so qt!"
Without even analyzing the text, I could just say each time it's a male and probably have a better success rate than 65.9
"love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate"
Yep, that about sums it up. See? Women aren't so complicated.
Because I keep getting 'identified' as female. It's really annoying when I browse a site and obviously female oriented ads keep popping up. Google identifies me as female and that made no sense to me. 95% of my browsing is to IT/Tech/Computer sites, why would that identify as female? But now I realize they must be identifying me via my gmail account and other written text. The confusion likely stems from the fact I'm an exceptionally verbose male, predisposed towards descriptive language and precise grammar. That and I've always used emoticons as part of my attempt to overcome the lack of expression available in written text.
I would bet any amount my twitter also identifies as female. My wife is likely going to find this very amusing...
Now that's a creepy phone app just waiting to be made.
This is old news. Internet lore tells us that bitches love smiley faces.
Don't most people pretending to be female on twitter fill their tweets with stereotypical female language? This would only catch pretenders who are really lazy and incompatent.
Is it? Guessing is only 15% less accurate, but needs infinite% less input data.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
omg, my dick just shrunk :)
I've been doing this on IRC for over a decade. Can usually get within 10 years on age, too. It's not that hard.
talk about cunning linguists. Amazing!
omg I'm a happy, girl :) LOL I love my cute hubby's chocolate hair. yay ! hahaha
The word "out" has been documented as being used as a transitive verb since the 14th century. Moreover, it is a derivative of a Middle English word (owte) which is derived from an Old English word (utian) which is also well documented as being used as a transitive verb.
It's oldest meaning as a verb? To reveal something previously hidden.
So "outing" someone is only a very slight adaption of a very old usage.
Good grief, what do they teach kids at the university these days?!
Ultracrepidasts "know" that "out" is not a verb.
Don't believe me? Go look the OED entry for "out" as a transitive verb. Look at the historical section. Notice the entries spanning from the 14th century up through to the present day.
;-) I use lots of those!
I must have been wrong my whole life, I must be a female!
Except, I have been equipped with a penis and testicles since birth, not a vaga and womb, and I'm very safe in my identity as a male. Okey, I like poetry, I like love novels and movies, I like making pretty paintings, I love to play with and take care of kids, I love snuggling more then banging (except when the the my musth sets in when a female ovulate and start to smell that way), but since I have been a very muscular and large male specimen all my life, nobody have ever mistaken me to be "girly".
Perhaps I use smilies and exclamation marks because:
- I'm not a native English user, I use the smilies and exclamation marks because they may compensate for any misinterpretations caused by my lack of English skills.
- I started using internet in 1995, when smilies was very popular (but almost all people connected to internet was male)
- I'm not an US-American, I don't have the typical US male insecurities, I don't need to play the role of a "grown up" macho man all the time. I can be childish, I'm allowed to use smilies. I'm allowed to show my feelings, I can use exclamation marks.