Using Facebook Data, Algorithm Predicts Personality Better Than Friends

← Back to Stories (view on slashdot.org)

Using Facebook Data, Algorithm Predicts Personality Better Than Friends

Posted by Soulskill on Monday January 12, 2015 @11:51AM from the worst-scifi-plot-ever dept.

sciencehabit writes: A new study of Facebook data shows that machines are now better at sussing out our true personalities than our friends. One of the standard methods for assessing personality is to analyze people's answers to a 100-item questionnaire with a statistical technique called factor analysis. There are five main factors that divide people by personality—openness, conscientiousness, extraversion, agreeableness, and neuroticism—which is why personality researchers call this test the Big Five. People can accurately predict how their friends will answer the Big Five questions. ... Compared with humans predicting their friends' personalities by filling out the Big Five questionnaire, the computer's prediction based on Facebook likes was almost 15% more accurate on average, the team reports online today in PNAS (abstract). Only people's spouses were better than the computer at judging personality.

3 of 80 comments (clear)

Min score:

Reason:

Sort:

Re:2015: Still using Facebook by Anonymous Coward · 2015-01-12 13:28 · Score: 2, Interesting

So is death. If you accept the idea of the existence of God, then you probably accept the existence of some kind of afterlife and/or reincarnation. What's a lifetime of suffering, compared to an eternity of bliss? What is death, when it's the beginning of a happier time than life? God is the misunderstood parent getting their child inoculated. A moment of discomfort to avoid a worse fate later.

The idea of the Christian God as evil isn't a new one, though. The Gnostics had the same idea nearly 2000 years ago, and I'm sure that it wasn't a new idea then, either.
Re:2015: Still using Facebook by Anonymous Coward · 2015-01-12 13:36 · Score: 2, Interesting

...or... the notion of disbelieving in God is just a self-rationalization to enable one to live their life without feeling like they are actually have any real responsibility for their choices.
If, when you die, that's it... you are done and over with, and none of the choices you would have made will actually have any bearing on you, then you can do whatever you want, live your life as irresponsibly as you want, in full assurance that death will enable you to escape whatever consequence might otherwise befall you.
Or maybe... just maybe.... your choices in this life have an actual eternal implication. That's a heckuva lot of responsibility, and I don't blame you for preferring to disbelieve in it, because it's dramatically easier to cope with.
Doesn't make it true, however. I'm not saying that you're wrong, only that disbelieving in God can be seen as just as cowardly an approach to life as belief in God is sometimes accused of being as a world view.
Re: Why are these factors? by Anonymous Coward · 2015-01-12 20:53 · Score: 2, Interesting

Except that the Big Five aren't orthogonal, which means they are fairly useless as a personality theory.
Nothing is going to be explicitly orthogonal, and forcing them to be doesn't make the conceptual issue you seem to have any better or worse (n.b., orthogonal connotes a lack of meaningful correlation between the factors. What the parent is complaining about is that each of the latent factors is meaningfully correlated with the other four to different extents). First, we are of course talking about an exploratory (EFA) approach (haven't read the article but the 10-fold CV referenced above makes sense), and partially the distinction between principal components and factor analysis. The Big 5 model itself has been tested using SEM and confirmatory factor analysis, and the five interrelated but not redundant number of latent factors validates repeatedly. Second, remember that EFA solved using maximum likelihood can be used to assess the null hypothesis that no more factors are necessary to produce acceptable fit within the sample. Thus (although this, from a statistical fishing perspective, would be bad) we can actually sequentially find the minimum number of factors necessary to reproduce a non-significantly different correlation matrix, when compared to the original sample. Therefore, with multiple independent studies (and k-fold CV like this study did) we can say that five is pretty well empirically demonstrated.
Now, the distinction between PCA and EFA. PCA is a technique explicitly designed to remove redundant covariation between items, and as such, the more dimensions you allow to represent the data, the better your overall fit. If you have nine items, nine principal components will capture 100% of the total, 9 item variance. However, it may be that 1 PC captures 65% of the variance, 2 represent 90% and the remaining 7 PCs make up the remaining 10. EFA works with correlations, and as such the most variance that can be reproduced is not 100%, but instead something analogous to the signal to noise ratio in engineering. It's a technique designed to identify and structure signals within noisy data, and therefore by default it doesn't assume everything being input is actually pure signal. Again, we're not measuring one thing chopped into 5 bits (or two, three etc) but 5 different things that have been repeatedly found to best fit data, when tested simultaneously, therefore controlling for each other. That means that the structure found represents statistically independent latents.
However, that is not to say that the five latent factors do not share commonality that is meaningful (although when you run these procedures, a correlation of .3-.5 is generally pretty high, meaning at most a .1-.25% information redundancy between factors). If interested, and you have a sample and the required number of parameters, you can build hierarchical factor models, in which common latents underly multiple lower level latents, which then underly the observed item responses. Alternatively, you could even say that there is just one personality latent, let's call it `everything', and that only one latent underlies (it helps if you think of latents as causes of the observed variables/items) all 100 or whatever personality items, like in this study. There is a specific rotation procedure, the bifactor/Schmid–Leiman factor rotation.
What this will do is examine global model fit: the question of whether the regression slopes from the observed items to the common covariances meaningfully reproduce the sample's covariances; does the data here empirically validate the correlational pattern we would expect if only one informational construct was represented (measured) in the data. Next (actually simultaneously), it will estimate whether, controlling for that one believed general latent factor, is there still meaningful latents estimable from the data. So, it's asking: is there still statistically significant relationships between items, once we've rem