81% of Tor Users Can Be De-anonymized By Analysing Router Information
An anonymous reader writes A former researcher at Columbia University's Network Security Lab has conducted research since 2008 indicating that traffic flow software included in network routers, notably Cisco's 'Netflow' package, can be exploited to deanonymize 81.4% of Tor clients. Professor Sambuddho Chakravarty, currently researching Network Anonymity and Privacy at the Indraprastha Institute of Information Technology, uses a technique which injects a repeating traffic pattern into the TCP connection associated with an exit node, and then compares subsequent aberrations in network timing with the traffic flow records generated by Netflow (or equivalent packages from other router manufacturers) to individuate the 'victim' client. In laboratory conditions the success rate of this traffic analysis attack is 100%, with network noise and variations reducing efficiency to 81% in a live Tor environment. Chakravarty says: 'it is not even essential to be a global adversary to launch such traffic analysis attacks. A powerful, yet non- global adversary could use traffic analysis methods [] to determine the various relays participating in a Tor circuit and directly monitor the traffic entering the entry node of the victim connection.'
is to maximize bandwidth utilization with junk traffic between all connected nodes, substituting junk data for legitimate data as needed.
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
By "can be" De-anonymized, we mean "have been".
Sincerely,
The NSA
I've been repeatedly told I was paranoid regarding TOR traffic analysis by the the /. hive mind. So this can't be true.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
While I haven't read the paper, the article seems to have a reasonably big "correlation for non-victim" bar. If this means false positives, it makes this technique at least a lot less useful than the "81%" deanonymization rate that they claim. It might make it useless for anything really.
Honestly, this all seems like more headline and less news. But I do still have to read the paper.
So, apparently this "individuate" word has been around since the 17th century and somehow, after a lifetime of academic and recreational reading and watching media I've never see or heard it used anywhere, ever.
I attribute that to taste. Either I have the good taste not to read/watch things where it might appear, or people generally have good enough taste to avoid it, or both.
I thought Tor immediately cuts traffic to any exit found to be passing anything other than exactly what was requested.
The whole point of tor for those who are morally and ethically sane, is that it makes monitoring the populus orders of magnitude more expensive!
Forcing NSA and their ilk to actually target people individually, instead of just passivly collecting plain text data on everyone is exactly what needs to happen!
Use Tor as much as possible, it is the only thing stopping complete internet surveillance.
Every thing can be hacked and/or de-anonymized sooner or later. What is the point in using anti-virus and firewalls, tor and the likes. Seems every thing is flawed by design.
Basically what they are saying is that you should not use Tor at home or at work, but in other places, where you don't do your normal browsing. Make normal and Tor browsing mutually network exlusive!
You can add a fingerprint without changing the data. One way is by timing. A 10 Mbps cable modem, for example, can send at maybe 50 Mbps for 100 milliseconds, then it stops for a 400ms to average 10 Mbps, the speed you paid for. If I want to mark a traffic flow I'm relaying, I can send the packets out in burts of 120KB, 60KB, 120KB, 60KB. Assuming a sufficiently uncongested network, that pattern will be visible several routers further down the line.
I've relayed precisely the data I was sent, I just modulated the rate at which I sent it.
It's clear that there are significant limitations to the tested identification methods. Firstly, it requires that the server endpoint be under the control of the entity attempting identification. Secondly, the TOR *entry* node being used must be identified (if you have the resources, I guess you could monitor traffic flows from *all* entry nodes) in order for the Netflow data to be compared between the Server-->Exit Node and the Entry Node-->potential target client. Thirdly, in order to generate enough traffic to have enough collected data for correlation, large (the authors' term, they do not identify the size of the file/data required, only that downloads must last ~seven minutes to collect enough data) amounts of data must be downloaded from the server.
It's an interesting piece of work, but pulling off an identification like this requires the anonymized client to both connect to a server specifically configured to generate traffic flows that can be identified, and once connected, the client must be induced to download a "large" file/dataset. What is more, those attempting the identification must also be able to gather Netflow records from the interface(s) associated with the specific (and likely unknown) TOR entry node as well, or monitor flows from *all* TOR entry nodes.
It seems to me, that while the above scenario is certainly feasible, if you can get a potential target to visit a server that's under your control and download a large file, you can probably infect the client with malware from that server, and have said malware phone home without TOR, producing a specific identification without false positives or negatives. Which would be much less resource intensive and more useful, IMHO.
No, no, you're not thinking; you're just being logical. --Niels Bohr
I WENT THROUGH 7 PROXIES GOOD LUCK.
(and I'm not yelling, I'm quoting, lameness filter)
So if you can spy on the traffic from the user to the tor entry node, and can spy on the traffic leaving the tor exit node at the same time... then you can tell that the traffic you saw going to the entry node is linked to the traffic leaving the exit node?
NO FREAKING DUH!?
Good luck being able to sniff traffic on *both* ends.
We do that for years with just a requirement for all ISPs to keep netflow data for 3 years.
Best regards,
The FSB.
But who would allow a anonymous server to modulate data back to the client computer? I mean, it's not like people are connecting to Facebook through Tor.
In other words, you're only "anonymous" if you don't matter.
I do not fail; I succeed at finding out what does not work.
Security researcher proves that knowing your plaintext password greatly increases the speed of cracking it's hashed value.
Where do people get the idea that privacy is some sort of inalienable right? I'll agree that it's a civic courtesy, and certainly it's impolite to disregard another person's privacy, but to that end, I see it as more of a social contract than any sort of actual right. I would suggest that any appearance of privacy we might seem to have is actually just an illusion offered by the fact that other people are either making a deliberate choice to be polite in that regard, or else they are simply not interested enough in what we think is private for others to be bothered with it. Either way, it's not something that you can actually control... its largely determined by what other people do or want.
File under 'M' for 'Manic ranting'
Name one 'service' on TOR that has been up for long enough to get attention and not been busted?
Just enumerate the TOR services that are clearly, without question, 100% legal and socially acceptable and there's your list.
But for the fact that it's brand new, Facebook would be on that list.
I'm pretty sure reddit probably through google analytics may have started doing this around eighteen months ago. I tested trolling them with sock puppets and they could identify my house through tor but could not differentiate between individual computers in the house. So pretty much anybody that uses google analytics probably has this capability.
I am here to clarify some misconceptions. '81%' of Tor traffic DOES NOT represent all the Tor traffic but only those that we used in our experiment, at a certain point of time. The paper primarily explores the practical challenges involved in actually carrying out a traffic analysis attack and the number shows that it can be used, but certainly NOT that 81.4 % of ALL Tor traffic is can be attacked. Please do not be paranoid. I have all the respect for the good work done by Tor folks.
Sambuddho
I'm not surprised. I wrote a paper back in 2003, Techniques for Cyber Attack Attribution, that listed a LONG list of ways to do attribution. This sounds a like a variant combining "modify transmitted messages" and "matching streams" via timing (see the paper).
Real anonymity is HARD. If someone wants to attribute you, it's hard to prevent.
- David A. Wheeler (see my Secure Programming HOWTO)