Slashdot Mirror


Linguists Out Men Impersonating Women On Twitter

Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."

34 of 350 comments (clear)

  1. Let's hope that 15%... by Lead+Butthead · · Score: 4, Insightful

    I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
    1. Re:Let's hope that 15%... by MightyMartian · · Score: 2

      We'll just pay the researchers in bitcoins.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    2. Re:Let's hope that 15%... by Anonymous Coward · · Score: 2, Insightful

      It seems to me that there are more men than women posting on twitter, so guessing man on every tweet might yield a higher accuracy than this algorithm.

    3. Re:Let's hope that 15%... by bwayne314 · · Score: 4, Insightful

      HAHA! omg, thats soooo cute! ....

      oh, yea, :)

    4. Re:Let's hope that 15%... by he-sk · · Score: 2

      That's a 65% prediction rate based on a single tweet. The authors report a 92% success rate for the best classifier on the entire set. If they restrict the data set just to tweet texts (but more than one), they achieved a 76% success rate. That still might not satisfy you, but the authors also report that only 5 in 130 people correctly classified 100 tweets with a higher accuracy.

      --
      Free Manning, jail Obama.
    5. Re:Let's hope that 15%... by Macgrrl · · Score: 2

      I suppose I could always google it. Oops, I think I broke the algorithim.

      --
      Sara
      Designer, Gamer, Macgrrl in an XP World
    6. Re:Let's hope that 15%... by Macgrrl · · Score: 2

      *pout*

      *sigh*

      sooooo tired of the 'no women teh interwebs' meme

      --
      Sara
      Designer, Gamer, Macgrrl in an XP World
    7. Re:Let's hope that 15%... by Hognoxious · · Score: 2

      since there is a good chance an east Asian guy would be considered female

      I thought Lay-D-Boy was a counterfeit armchair till I went to Bangkok.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  2. Who Knew! by DoomHamster · · Score: 2

    Huh...the word "hubby" is used more by women. Who knew!

    1. Re:Who Knew! by MightyMartian · · Score: 5, Funny

      The mere fact that you show emotion outs you. Real men only use periods and commas, AND TYPE IN ALL CAPS BECAUSE REAL MEN ARE ALWAYS SHOUTING.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    2. Re:Who Knew! by el3mentary · · Score: 2

      I'm totally going to hell for this but in order to re-enforce my manhood, I must say:

      My zipper was down and my wife found my gf. My nigga wanted my beer and my shorts! I took my jeep and my woman to my vegas timeshare.

      (Here!)

      You used an exclamation mark you are clearly a woman.

      --
      I reject your reality and substitute my own.
    3. Re:Who Knew! by snowgirl · · Score: 2

      I don't have mod points, so I have to post a comment to tell you, "funniest thing I read all day". You boys are weird :P

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    4. Re:Who Knew! by RivenAleem · · Score: 3, Funny

      We use a FULL STOP. Cus when I tell that sentence to end it motherfucking does. Bitches.

  3. Linguists Need to Visit a Starbucks Occasionally by RobotRunAmok · · Score: 5, Funny

    The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting

    or a Mac user.

  4. Well depends on how it increases by Sycraft-fu · · Score: 3, Insightful

    A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.

    1. Re:Well depends on how it increases by wealthychef · · Score: 2

      How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?

      --
      Currently hooked on AMP
    2. Re:Well depends on how it increases by raehl · · Score: 5, Funny

      I go to my congressional office, take my shirt off, arrange my family photos in the background, and take a picture to send to them.

    3. Re:Well depends on how it increases by IceNinjaNine · · Score: 3, Funny

      If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting.

      Interesting stuff. I wrote the first revision of my best friend's profile for match.com (I'm a man, she's a woman) simply because she was just awful at putting her best foot forward. She tweaked it, but I wonder how that would have come out under such analysis.

      Noooo! She's not a lithe fifty year old target shooting yoga instructor, she's a MAN! ;)

    4. Re:Well depends on how it increases by LordLucless · · Score: 3, Insightful

      I wonder what the proportions are on tweets that are deliberately intending to be misleading. Getting a 65% hit rate on people who are attempting to deceive is much more impressive than 65% who aren't making any attempt to obfuscate their gender.

      --
      Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
    5. Re:Well depends on how it increases by Strange+Ranger · · Score: 2

      "How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?"
       
      You look at more of their tweets until you're 98% sure. Then target your advertising.
       
      GONG! Thanks for playing.

      --

      Operator, give me the number for 911!
    6. Re:Well depends on how it increases by kenj0418 · · Score: 2

      How is this "huge?" What the hell are you going to do with it?

      That's what she said. Or maybe it was a he -- I'm confused now.

  5. Re:Or... by John+Hasler · · Score: 2

    > Or it can be used as a training tool for would-be impersonators.

    Or to test gender-altering scripts. OMG! :)

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  6. Recognizing gender 65.9% based on one tweet by wurp · · Score: 3, Insightful

    What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...

    1. Re:Recognizing gender 65.9% based on one tweet by ahziem · · Score: 3, Informative

      55% female according to the linked paper

  7. Re:Oh this ought to be good by 93+Escort+Wagon · · Score: 3, Funny

    It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.

    I guess I don't quite see what their weight has to do with anything...

    --
    #DeleteChrome
  8. Gender Inequality by FrootLoops · · Score: 3, Insightful

    From the paper, in their data set 47.7% of tweets were from females, 32.8% were from males, and the rest was unspecified. Tossing out the unspecified ones, guessing "female" all the time would then give ~59% accuracy. On the surface that makes the 65.9% figure in the summary very lackluster, though better figures are reported with more information elsewhere in the article.

    1. Re:Gender Inequality by Demogoblin · · Score: 3, Informative

      From TFA (http://images.fastcompany.com/upload/a_variousfields.png):

      Feature: Accuracy
      Baseline (Female): 54.9%
      One tweet: 65.9%
      Description: 71.2%
      All tweets: 75.8%
      Screen Name: 77.1%
      Full Name: 89.1%
      Tweets + screen name: 81.4%
        Screen name + description + all tweets: 84.3%
      All four fields: 92%

      Honestly, 77% based on screen name alone was the most interesting result to me.

  9. Re:The only reason for the deduction is... by wierd_w · · Score: 4, Insightful

    Not entirely true I am afraid.

    Several experiments were conducted in the 60s and 70s on children raised in gender neutral parenting conditions, that focused on toy choices.
    The experiment was intended to show the impact of societal imperitives on children and gender identities and gender specific behaviors, using toy preferences as metric.

    The result of the test STILL had little girls favoring dollies with bright colors, and boys favoring machines and soldier type toys, even when very carefully imposed gender neutrality parenting was in effect, even from very young ages.
    This is somewhat reinforced by more modern research into the physiological differences between male and female nervous systems.

    The idea that men and women might intrinsically focus more on different concepts (and thus, relate to their environments differently from each other, and as such, describe them differently in literature) is not really all that far-fetched.
    It is simply politically incorrect to state that women might actually have a biological proclevity toward being the "Domestic" partner in relationships given the current political climate of our western post-sufferage societies.

    Somehow, "Staying home, taking care of babies, and doing the chores all day." is seen as a degrading thing, while "Standing in an assembly line inserting part A into assembly B ad nauseum all day" is somehow seen in an idealized fashion as a kind of "Freedom"-- however sick that might be in reality not withstanding.

    Now, if you want to complain about women being statistically paid less than men, I will strongly support your argument that it (the practice) is based on pure bull--- But the statement that men and women are innately gender neutral and get conditioned exclusively by stereotypes? that is not supported by behaviorists.

    Gender stereotypes simply reinforce already existent behaviors, for better or for worse.

  10. Re:The Phrase to Type by Skidborg · · Score: 2

    Sneakers and Steel-toed boots. We apparently have different jobs.

    --
    Supporter of the +1 Over Dramatic mod option. In memory of apk.
  11. Number 1 Clue by Greyfox · · Score: 2, Funny

    It's pretty easy to tell if she often tweets about her penis.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  12. Re:man vs. machine by snowgirl · · Score: 3, Informative

    True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.

    The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) between individuals produces a more accurate determination than an essay.

    Yes, humans widely use language differently based on their own subcultures. Women particularly in some cultures speak an entirely different language from the gender-neutral language spoken by everyone. In some languages such as Japanese gendered language is extremely readily apparent, and when I was chatting on Japanese chatrooms, it was nice to be able to identify the gender of the speaker in one or two lines of text from them.

    In much the same way, while we often are of the belief that men and women use language the same way in English, because it's not readily apparent, we do actually use language differently. Here is another interesting one: women use fewer contractions than men. Weird but oddly true.

    All of this has less to do with "gaydar" than that every subculture speaks a slightly different dialect. Gay men have a selection of words that set them off, (I actually commented to a gay-rights group, where I was an "ally" of gay-rights, that they were using "fabulous" like... A LOT. And I was all, "um... do you REALLY want to be projecting the notion that this stereotype is valid and accurate? Because that is what you are doing.") and this does not mean that gay men talk like women. They actually talk differently and distinctly from women, but in this world of false dichotomies that we live in, we presume that if gay men don't talk the same way as straight men, then they must talk like women. But, in reality, this isn't actually correct.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  13. Re:Hyperbole and Male Language Use by CronoCloud · · Score: 2

    Now THAT is insightful. That would explain things. It would explain why some people told me over the years that I "talked like a girl" because I spoke properly, and precisely in that nerdy way. By my standards, most men are sloppy speakers. Even my sister pointed this out to me at a drive thru some years back, she said most men would say "I wanna burger, fries, and coke." and then stop and drive on, while I said, "I would like a hamburger, medium order of fries and medium coke, please, and that will be all (to prevent the annoying upsell for dessert or anything else)"

    Then again, I am transgendered, and that might affect things alongside the nerdy precision.

  14. It's even worse by Moraelin · · Score: 2

    It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.

    But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.

    Basically, they have 100,654 female users, 83,075 male users,and 53817 unspecified. Taking the known ones, there are 183,729 users of known gender. (With the caveat in the previous paragraph.) Out of that, the probability to be a female is about 55%.

    BUT if they guess at individual tweets, then it's pretty much the number of tweets from each that counts. There were 2,429,621 tweets from (self-labeled) females, vs 1,672,813 tweets from (self-labeled) males, and unspecified. Total 4,102,434 tweets with "known" gender. Out of those the tweets from "known" females were a bit over 59%.

    So basically an algorithm which takes one tweet and just does a hard-coded 'return "female";' would be right over 59% of the time. Bumping that to 65% is such a ridiculously marginal effect that, really, it's funny.

    And actually what worries me is not as much the research grants, as the hordes of morons who don't understand the ecological fallacy (extrapolations from whole population "ecological" studies to individuals are stupid) and who'll take this as some infallible identi-kit or worse, as a scientific justification for sexism. Even the summary makes strong claims of outing males pretending to be females, or that flat-out "women use language differently than men". No they don't really. The difference is marginal, and there is massive overlap between any word's usage by males and females.

    E.g., one of the "strongly male indicators" they churned is using the word http (presumably tweeting a link?) where actually any given instance of it, the probability of the user to be female is 50.6%, according to their table. So it's really a 50-50 split on the use of this word. One of the few actual strongly male words was Google, but even there it's only a 2/3 and 1/3 split between male and female. Conversely strongly female stuff like mentioning "love" was basically still a 2/3 and 1/3 split in the other direction.

    But not that it will stop morons from taking it as some scientifically proven rule that women talk about love and cute stuff, and guys talk about http and Google. And that, for example, therefore we need to hire less women in IT.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  15. Accurate by ZombieBraintrust · · Score: 2

    Don't most people pretending to be female on twitter fill their tweets with stereotypical female language? This would only catch pretenders who are really lazy and incompatent.