The Evil in E-Mail
Frenchy in Ontario writes "An Ontario university researcher is devising ways to help law enforcement agencies better pinpoint likely criminal behavior in e-mails. His theory is that people who are "up to something" are more likely to write differently than people who aren't - either by avoiding using certain words at all that could be flagged for possible criminal context (like "bombed) or to examine patterns that might indicate criminal activity - like several people e-mailing one person but not each other, which is how some criminal networks operate. There's also an interesting paragraph on why Enron's emails aren't as valuable as you might think for this sort of work."
From TFA:
Super. I'm predicting a whole lot of false positives...especially during the initial phase of this operation...
Also from TFA:
Great...so words like 'bombed' get the email flagged...as well as an absense of the word 'bombed'? So far, Skillicorn's test appears 100% sensitive...too bad it's 0% specific.
Some more from TFA:
OMG! This is the pattern of emails in my company! My whole company is a giant terrorist organization! I had no idea!
But here's the kicker...again with the quoting:
So let me get this straight...if criminals are okay with their criminal activity (like...say...terrorists), they'll 'slip under the radar'??? Great test, Skillicorn...sounds a lot like a standard polygraph test, which experienced criminals can fool at will, while innocent people fail them 50% of the time. That's what the War on Terror really needs...another inaccurate 'test' that does nothing but throw false positives.
I'm just glad that this method is so obviously stupid that it will never be implemented by our government...
Oh, wait...one more from TFA:
Crap.
____
~ |rip/\/\aster /\/\onkey
This may work well for English,etc.. but may not work with other languages..
The emails you send would be encrypted instead plaintext.
Real criminals aren't dumb, only the bad ones who get caught are.
There are no atheists when recovering from tape backup.
This line in the lead jumped out at me: We have an addresses "techsupport@internaldomain" which matches this pattern to a T.
--MarkusQ
P.S. Back when we were on MS-Windows, it would have been OK, because the people asking for TechSupport were often sending each other worms at the same time.
Pattern recognition has been around a long time - from analyzing the causes of infection to finding likely cheats on expense reports (and the latter uses the frequencies of certain digits, rather than looking for the text entries).
I do disagree with his statement about not being useful to fight spam - recognizing patterns ins spam is already in use, applying the idea that the same or significantly numerous occurrences of the same words from either the same person to multiple users at the same sight and different sites, or the same basic message sent to multiple users from different mailers / return addresses might be a good indicator of spam. The challenge is how do you monitor all the traffic?
I'm a consultant - I convert gibberish into cash-flow.
Ah, my alma mater Queen's makes it onto Slashdot!
I don't know if using the Enron e-mails as his test material is such a good idea. Corporate malfeasance is probably not conducted the same way that every other criminal (or terrorist) network runs. At least their communication might be different due not to a "lack of guilt" but due to the fact that it's probably so easy to make a naughty memo sound like an innocent one without being obvious. After all these memos would be mixed in with a lot of legitimate company business the conspirators are also conducting.
How does automated analysis separate a memo saying "I think we should go ahead and promote Price out of the mailroom" - which means "Have Price-Waterhouse cook those spreadsheets I sent you", from one which just leads to some dude getting promoted out of the mailroom? Of course if they are not bothering to use code words then the system might work very well.
A related trick, he says, is to examine patterns in who e-mails whom. As an example, in criminal networks it is common to find several people communicating regularly with the same person, but never with each other. This is meant to ensure that if one lawbreaker is caught, he or she is unlikely to lead authorities to too many others. But it can also be a clue to suspicious activity.
Traffic analysis is probably more promising, since you can reconstruct relationships between players with it. The traffic pattern could look like a terrorist cell, or it could look like a bunch of guys who know each other - as he says, there's a difference. But this is old news, though automating it would make snoops' lives easier.
At any rate I find this line of inquiry disturbing for civil rights reasons, but I don't believe we should attack the researcher for working on it. Academic freedom is a very useful concept and ultimately does us more good than harm, IMO.
Freedom: "I won't!"
That should keep me safe for a few days.
--
Registered .sig quotient : 1337
Personally, I can't see how this would ever work. It is typical of the attitude that "all terrorists are bad, they are all the same and we just have to deal with them all in the same way".
Isn't it obvious that different terrorists will have different styles, different levels of literacy, different levels of security awareness, different languages, different aims, different approaches - the list goes on and on. Normal emails all have these traits too. I can't imagine there is any way of applying Bayesian filtering to help with this task.
He's just using statistics to detect emails that are "different". So, anyone who isn't conforming is flagged up. Organising an anti-war protest? There you are, flagged. Say goodbye to freedom, if you hadn't already. Or encrypt all your emails, and try and persuade everyone you know to. Maybe we can make encryption widespread enough these things are useless.
I am trolling
...or to examine patterns that might indicate criminal activity - like several people e-mailing one person but not each other, which is how some criminal networks operate.
Not to mention most social networks. Or is everyone you know equally popular?
Dr Skillicorn has obviously never done any work with or for a law enforcement or intelligence agency. After spending three years in this area working on data mining of electronic communication, I can say this fella has not done his research properly. He has failed to note that the frequency of grammatical and spelling mistakes, let alone "missing" words, have become so frequent now in the SMS TXT generation that this will cause a major problem when scanning messages on this scale. I really can't be bothered to pick any more holes in this because it is time for a bacon and ketchup sandwich.
Everyone knows that you just have to check the evil bit. (Some terrorists may be sophisticated enough to tamper with the evil bit but if they use Windows, the lack of the bit will stick out like a sore thumb.)
One line blog. I hear that they're called Twitters now.
So It is now no longer good enough to just have the ability to subpoena your records if your arrested?Now the government wants to activly sort/monitor the emails of an entire nation. HMM I smell more violations of the rights of the people. How much more of this are we willing to accept. How much longer until dissidents start a revolution. That's right I said it a revolution. This sounds like a combo of search/packet sniffing software.Last I heard PGP and RSA encryption was still unbreakable. This will NOT be effective for the worst thieves or tererorists.
-Tacitus
Government is already too invasive. I'm already forced to seek a building permit before I can erect a structure on my own property. The fines for ignoring this, (and say, having the gall to build a solar powered house which is not connected to the AC power grid, or (horrors!) a straw-bale house), are huge and the government's reasons for these laws are utterly ridiculous.
Any professor who suggests that we should be looking to monitor email content is not thinking clearly. The Government already has their nose in everything, and telling us that, "It's For Our Own Good," is NOT a valid excuse.
It's MUCH more important that people be able to make mistakes -and even die through their own faults- than live ensnared in the safe-keeping of a bunch of ignorant civil servants who are trying to build a Starfleet future where everybody dresses the same, and nobody is allowed to think or act outside a bunch of pre-set 'safe' boundaries designed for middle-class suburbanites who exist in eternal ignorance of the real world, who actually believe in the Discovery Channel, who drink milk, and live in absolute terror of anything you can't experience beyond the confines of a nice, respectable department store.
-FL
Letter from College:
Hi Mom,
I blew it and bombed the final exam. The physics
prof put the gun on my head and told me to work harder.
I could kill him. I feel like having a knife
at my throat. The anger feels like poison in my
blood but I know it is my fault and the all is
blamed to that virus, I had been laboring with
for quite a while. I'm working on it mom! I promise
to make you proud. I can not wait to be on the subway
home to work on my final project on weapons of
mass destruction in my political science class. Its
mental terror.
Love
Your son
P.S. The powder you sent me works well for my
skin infection. Strong agent.
How many criminals are going to send plain text emails discussing criminal activities?
This is clearly just designed to appeal to the government of Police State America, probably to get more funding.
This whole obsession with 'terrorists' is just becoming tiring. There are very few 'terrorists' in the world that the Americans didn't create through their own acts of terror. If America would stop its interference in the affairs of other countries, there would probably be almost none at all outside of the White House.
- Many languages are conjunctive/agglutinating in nature (e.g. Turkish, Finnish, Swahili). This means that words of sentences aren't isolated (like most European languages) but are in fact formed from 'parts' that change depending on the surrounding words. Moreover, modifying pre-/suffixes are used as inflections for e.g. verb paradigms. This results in language that effectively have literally billions or even an infinite number of possible "words". It is impossible to do keyword-based analysis on such languages without a full morphological parser for each language to break a word into its 'parts' - such a parser is a massive task.
- Chinese is the opposite, it is a totally "isolating", meaning each word is distinct with no inflections, and because different characters are used for different words there are NO SPACES between words. So you cannot begin to analyse Chinese data at all unless you have a full "Chinese segmenter" to locate word boundaries.
The need to do further disambiguation further complicates all of this analysis.
There is pretty much no way for this type of analysis to be really accurate under the current level of written language analysis technologies.
Yes the people who are "up to something" actually write differently. Most of the time they use phrases like "validate your bank account",
"please verify your credit card information", etc.
--- Eat my sig.