When Metadata Analytics Goes Awry
jfruh writes "When blogger Dan Tynan started seeing lots of Latvians in his LinkedIn People You May Know list, it was pretty funny, considering he'd never been to Latvia or ever met anyone from there. But now that shadowy spy agencies are using algorithms similar to LinkedIn's to see if we're terrorists, mistakes like this are a lot scarier. From the article: 'More than ever -- and online in particular -- who you know can be more important than who you are. In fact, who somebody thinks you know may be more important than who you are, especially if that somebody is a faceless government bureaucracy with limitless power to izjaukt savu dzvi (mess up your life).'"
I created a new gmail id to get price quotes from auto dealers. And now Google keeps telling me I might now someone named Steve Lexus and wants me to add him to my circles. Well, at least they seem to have filtered out Jane Honda and Palvayantheeswaran Toyota and Poponopoulous Mitsubishi.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
If the NSA just reverses a similar algorithm, what happens when it says that Mahmoud Ahmadinejad may know me? Especially if I have access to centrifuges.
Then I have to prove a negative, that I do not know this person. All their evidence points to the opposite. "He was in New York at the same time!" (BUT I LIVE THERE) "Doesn't matter". "Your fathe'rs, cousin's, uncle's former roomate went to Iran as an exchange student", etc, etc.
Silence is a state of mime.
Time to worry about the real problems affecting people's lives.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Back in the day when gmail was still signing people up on an invite only basis, I got my account from a rather prolific inviter handing them out on a forum i frequent. Gmail would automatically add the new gmail address of new invitees to the inviter's address book, which is needless to say where the dozens of random people came from when the inviter decided to let linkedin access his address book...
When FB or Amazon recommends something/someone, I can usually see some sense behind it. LinkedIn is just plain random. I don't know 95% of the people it seems to want to connect me with. It is a joke.
Sveiki Slashdot Es mlu js visus ldzu, stiet man karstu putraimi visas jsu bzes ir pie mums. Js esat fori.
Stop using social media. Some of the crud I've seen on LinkedIn is as bad as Facebook and I do not want to be associated with it.
Side Note: If you do use LinkedIn; it is not a dating site. Some of my female colleagues have started complaining about unwanted attention. Just because she met you at that training class last month, and accepted your connection, does not mean she is interested in 'knowing' you. Sheesh.
Noting some key facts about the US terrorist watch list:
Terror watch list grows to 875,000
As of December 2012, a factsheet from the center states, TIDE contained over 875,000 entries. Each one represents a known or suspected terrorist and includes all their known aliases and spelling variations on their name, the official said.
Less than one percent, or fewer than 9,000, were Americans, including both citizens and legal permanent residents, he said, adding the center does not release exact numbers.
So if there are only 9,000 known or suspected terrorists in the US out of 310,000,000+ Americans, how much impact is that likely to have? I wouldn't necessarily expect terrorists to be highly connected to people outside of their purpose.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Seriously, if you have provided incomplete and inaccurate information to LinkedIn, how is it their fault that they suggest you only bullshit? Oh, and the NSA will blame that on you too if they shoot you "just in case". Oh oops, we were 99.99% sure that he was a terrorist. What an idiot, why didn't he fill out the facebook profile completely. We did nothing wrong. :-(
Originally when the concept of "degree of separation was invented" the idea was that everybody was connected to everybody through 6 degree of separation.
At the same time people think that are "the good guy" who does not keep "bad company".
With social media the length of the separation chain has considerably shrunk.
Add to this that most people who "do something interesting" (like making really nice flower arrangement for instance) will tend to travel and meet a "much smaller" crowd of people who "move around".
In this "smaller world" you can make "very short chains" to quite shady people. Actually it is trivial to create a chain from any US politician to "big list of officially evil guy" that at most 4 level deep. (For instance Ex HP Head Carly Fiorina went to KSA and met large HP clients including the heads of SBL managed by the brother of that really bad guy who did get some support from the ex President(s) Bush when he was against the Soviets...
And now comes the "suspicion creep" if you know Fiorina and one or tow of the Bushes, then you know 2 suspicious characters that are 3 or less level away from Really Suspicious guy.
So "one" could be ok, but 2 humm very bad...
So unless you take great pain to avoid anybody that might "be out of the ordinary", you imediatelly are 100% sure to become somehow "in contact" with somebody "suspicious".
Or seen another way, being not completely boring gets you something like 200 contacts, among which you can expect at least 3 "super connectors" who do not really overlap, particularly if you are travelling, so taking in account diminishing returns it is hard to avoid having less the 3M "level 3 contacts"
or 1/1000 of all adults in the world
the probability that less than 2 are "bad guys" is quite low.
so be boring or be afraid, very afraid...
As I used to work for an American company who have an office in Dubai (full of people with Arabic names, and lots of Muslims), a working team in India (very close to Pakistan, never mind the fact that the two countries hate each other almost as much as Chicago Bears and Green Bay Packers fans), and a development/support team in the Phillippines (close to China, with a similar relationship to India and Pakistan, and with their own domestic terrorism issues), clients in sub-Saharan Africa, Russia and Texas, my LinkedIn and Facebook profiles are full of people in those areas.
Given that the NSA does not stop at analyzing your own contacts, I am apparently a person of interest if one of my contacts has any dubious friends, or if one of my contact's contacts has any dubious friends.
Kevin Bacon is indeed going to be screwed, we might as well just lock him up and start waterboarding him now, and save the NSA the trouble.
If you truly concern about this problem, the real question to ask is why on earth do you sign up with linkedin (or g+ or facebook).
My father always gets stopped at airports because he shares a name with a terror suspect. His first name is within the 10 most common first names in the US, his last name is within the 10 most common last names. I wonder how many other people this affects? I wonder how it may get worse if they include 3 hops of separation?
This is what I, and a host of others, having been screaming about for years. People are blindly using analytics and "big data" to make important decisions decisions about health care, insurance, credit ratings, terrorist affiliations, etc. I have encountered so much bad data in my career the thought that it is take as "gospel" makes me sick. Bad data are out there and cleaning up a polluted data stream, when possible, is expensive and takes a long time.
Then you add in the use of NoSQL databases engines such as MongoDB which are not ACID compliant. You are virtually guaranteeing data will be corrupted. But then again, maybe I "just don't get it". But personally I think contributing to bad data is unethical.
putting the 'B' in LGBTQ+
Purely speculative and all conjecture. I know nothing of the algorithms involved and make the following assumptions about the meta-data and the algorithms.
Meta-data
1: Geo-location of the event/person
2: Time of the event/person
Algorithm
3: Compare location +1 correlation.
4: Compare time +1 correlation.
5: Location compares street.
6: Time within 1 day.
Given these very simplistic assumptions. We have two people. We'll call them Good Steve and Evil Steve. They have never met, never seen each other. One lives at Street A #15 and is homebound (GS), the other, ES works at Street A #12 and is plotting embezzelment. Abreviated to GS and ES for the purpose of the demonstration.
Day 1, City A, Street A #12: ES makes 4 calls, which get logged.
Day 1, City A, Street A #15: GS makes 2 calls, which gets logged.
Correlation between ES and GS: 6.
Already, the correlation between ES and GS is 6 after one day. Because they're on the same street, just a few street #s away from each other.
Suppose this goes on in the same way for a few months. Say 3. The correlation is 540 after three months. Now, say that the person that ES was calling has half that, assuming calling the same person. In the ensuing metadata analysis after the embezzlement is discovered, there is a link formed between ES and GS that is GREATER in this admittedly VERY simple model than that of ES and the person ES was conspiring against. Another example, say this sort of thing happened but ES called a bank, and GS called the same bank after or before ES. There becomes a tenuous link between the bank, ES and GS based on both location and time and even number called, a stat not directly recorded by this algorithm.
The actual reality should be far more complex, but I would imagine a meta-data analysis would rely on more rules with finer resolutions among other things... At least I hope so, so that the probability correlations of a connection between two people or a person and a group of people is more solid and worth investigation than the example I demonstrated as a worst case scenario.
In some cases metadata can be useful, but I do not think it is for any reasonable, serious leg for investigations to stand on. Certainly it is useful in an investigatory sense to draw lines between connected people and groups, but an investigation is necessarily an activity that takes place AFTER something has gone down that requires investigation. It is NOT for government to do an ongoing investigation into its citizens without due cause, oversight and a full accounting after the fact.
To do otherwise would be to invite the temptation to use the knowledge and insight such an ongoing investigation would make available to tamp down on things the government in power would really not prefer to allow. It's not hard to imagine a far religious right government doing so, but we must also be wary of the far left as well. To allow the left also to tamp down on private and personal freedoms would be as bad as the far right doing the same.
To make it more amenable to the lovers of LOTR out there...
It is analogous to Frodo offering the ring to Gandalf. Here's the quote:
I dare not take it. Not even to keep it safe. Understand, Frodo. I would use this ring (knowledge) from a desire to do good... But through me, it would wield a power too great and terrible to imagine.
They are searching for patriots who may decide that they prefer some form of actual democracy to oligarchy.
My work requires me to keep up to date with the computer industry. This means I must be connected with the hacker sites and ipso facto it also demands I am 1 degree separated from Mr. Snowden and many others who the US Government takes a dim view of. Get real people, mere contact isn't criminality, it is in the case of the investigator necessity. This is why the whole concept of Probable Cause is such a necessity!
"You" might "want" to "back off" on the "quotes".
This would be easy to abuse:
1. Become someone's friend. Politicians certainly wants to have lots of friends. NSA agents are probably a lonely bunch...
2. Make contact with known criminals, mafiosi and terrorists. (Not that hard, terrorists like those who sympathize with their cause...)
Your marks are now connected to all sorts of low life through only one link - you! A fake profile might be handy if you don't want yourself associated with terrorists...
So if I were a terrorist network, I would take a few new recruits, preferably of the "innocent-seeming" sort and dedicate them to fouling the system. I would have them keep their innocence, send them to no training camps, rather send them out onto social media and have them be as social and friendly as possible. I'd have them make as many connections as possible and get well entrenched in the system.
Next I'd make sure they make online associations with known terrorists - enough online associations so it can't be missed by any sort of metadata search. At the same time I'd keep the rest of their outward appearance innocent, and keep them in close touch with all of their friends.
Assuming I had a dozen or so of these people working for me, I'd also have them cross-link some of their contacts, so that most of their truly innocent friends had multiple friends inside my organization - cross-linking.
Consider the NSA 3-degrees-of-separation exploitable. Snarl the system. Cause innocent people to be annoyed the NSA / watch list / no-fly list.
The living have better things to do than to continue hating the dead.
I have heard it said, perhaps apocryphally - If you look at the birth and death records for the State of Florida, you will conclude that a majority of people in that state are born Latino and die Jewish. Having reams of data is a start; but you must also have an accurate model.
It's not as if it's binary. It's not a know vs don't-know that makes for a link in this maze. There are weights attached to all links in graphs and what determines these weights is what makes the algorithm -- not the mere presence or absence of a link. As an extreme case, what if you considered yourself linked to everyone, just with links of weight 0? This whole "we are all connected" crap is not very meaningful without the subtle answer to the question "how much?"
Any guest worker system is indistinguishable from indentured servitude.
The "mess up your life" translates as "sabojat tavu dzivi". Right now it says "mess up ones own life". And no, slashdot is still not friendly to non-latin alphabets.
TLAs that do their own analysis have some experience at separating random links from meaningful ones. Many terrorists call for pizza delivery, so that phone number is a dead end. Since meaningful links are actionable (maybe even with a SWAT team), it pays to distill this list down to a minimum.
Have gnu, will travel.
I find the situation where a government treats the whole world as suspect quite objectionable.
Privacy is terrorism.
Except you would not need to do anything, nobody lives in a vacuum, so just the "natural" connectors as sufficient to "poison" the network.
....it's worse than it has been portrayed:
http://www.privacysos.org/node/1122
After all, Latvia is the home of Dr. Victor Von Doom.
My wife decided to update her Linked-In profile (to increase job contacts), she also happened to be signed into Gmail webmail (NOT using no-script I might add). Once she was done with her updates she had a popup asking her if she wanted to update her contacts. Linked-In then promptly sent off 70, yes 70 emails either updating current contacts (which was maybe 10) and asking the rest if they knew her to add her to their Linked-In contacts.
She was rather shocked, as there was no warning by Linked-In that she was about to send off 70 updates. Plus, most of the invites came from her Gmail email account to which she had never sent off an invite. Some of those people she never wanted to send a request to. Most of the people did have a closer relationship, but not all. so Linked-ins algorithms still need some work.
She tried asking Linked-In about the promiscuous behaviour, but there was never any response.
So watch out with Linked-In, the company is doing some very strange stuff, probably via javascript & cookies. I regularly block linked-in, twitter, facebook, and yes as much as possible google scripts on most websites for exactly this reason.