Software Sorts Electronic Evidence
securitas writes: "The New York Times has a very interesting article about the legal industry using new search software to sort through electronic evidence such as e-mail, documents and recovered files, and the process that they go through to make the evidence usable. It has spawned an industry."
Lawyers discover grep?
The legal industry probably generates 50% of the Fed Ex traffic ;P Shipping all those reams of paper back and forth.
Software to sort intelligently has been around for quite some time- googles 'i feel lucky' link is probably a great example of this. The number of times i've hit what i'm looking for with a few judicious '+'s using google is unbelievable.
Sounds almost like they are reinventing the wheel...
This is why it is critical to have a retention schedule so old emails don't come back to bite you. I haven't seen employees take this issue seriously even when there is a clear policy.
Hey, where am I supposed to type my nytimes registration blah blah?? I guess you got the link wrong; please be more careful next time!
I thought the most interesting part was: "Responding to a request for documents during a merger review by the Federal Trade Commission took one company nearly a year because the electronic documents were kept in offices all over the world and in all manner of different formats". But the twist there is: would you like to have everything easy to find (and if so, how do you do that, what with all the fragmentation between office 97, office 2000, not to mention different file servers and such)? Or are you better off making it inconvenient for other litigants to get at the data?
*golf clap*
-- Brett
Wonder if they would be kind enough to run that against my email box and sort out all the spammers for me? Then I could take it to court to request compensation for the bandwidth consumption as well as "emotional damages" because of all the pron spam :)
Electronic Frontier Foundation for online civil rights information
For the convenience of everyone who likes using the registration link, here ya go.
;)
-- Brett
The continuing investigations into the Clinton presidency ran into a major problem. Every email sent to, and within, the administration was saved. But not indexed. Apparently there are terabytes of the stuff. On tape. Sorting through it looking for evidence is likely to take years. And then it has to go to the national archives. And that's just the email. Add in all the other digital content they generated, all of which had to be saved, and there's a major problem.
Best Slashdot Co
To do all the stuff we think is too boring, too difficult or too time consuming. Perhaps this will speed up things in the process of investigation, in which case it is only good. Ofcourse.
I can see a potential for even more widespread abuse. Couldn't searching for keywords give some bloodthirsty prosecutor the ability to present a biased, subjective, out-of context version of what was communicated? We already know of several instances where a lack of understanding of the technology coupled with a lack of understanding of the context under which a message was communicated has led to abuse by those in positions of authority.
You're using her as bait, Master!
Here's the correct link for those who'd like to register before they read the article.
Yours Sincerely, Michael.
when lawyers attourneys and judges have been replaced by computer programs.
Lawyer Eliza: (Walks up to the witness) So how are you today?
Witness: I am fine thank you.
L Eliza: How long have you been fine thank i?
Witness: I don't understand the question...
L Eliza: Don't you really understand the question?
Witness: That's right.
L Eliza: Is it really that right?
Prosecutor Eliza: Objection Your Honor! She is harassing the witness!
Judge Eliza: Why are you concerned about my honour she is harassing the witness?
and so on...
Will work for bandwidth
I'm not looking for a karma whore, but could anyone see for sure what the product is? or what the company is?
The only company i see is OnTrack Data International. Anyone?
I'm interested here cuz I do tech/litigation support in a law firm.
The article does miss one important little detail. The first level of sorting is done by clerks or paralegals. Associates do the law-related grunt work, but that's AFTER someone making $10-$15/hour has gone through and sorted out the pr0n(trust me, lawyers get A LOT!), and other pointless crap.
Great, now the NYTimes will see archive.nytimes.com getting /.ed and will realize that people use it to avoid registering.
I worked on prototype very similar to this for senior project in college.
I worked for one of the state level governments in N.A. and had access, and "da-buck-stops-here" responsibility for the IT side of "Archives". Archives is leglislatively required to hold in permenant storage, "All materials relating to the ongoing business of the government". This caused some real problems:
1) we had a case of an outgoing elected official low level formatting their HDD on the way out the door. Had to be sent out to a special data recovery lab. (they can do some amazing things with scanning electron microscopes on half tracks and such)
2) there are stacks and stacks of 8" floppy disks, in formats like IBM DisplayWriter, and other chunks of physical hardware that haven't been seen by mortal man in 20 yrs.
Finding a chunk of info is damn tricky, but after you find it, you have to find something that can read the punchcard/papertape/magtape/floppydisk/harddisk in question. And due to a querk in how the original act was written (keeping in mind that these things were written back when data was carved on rock slates and format isn't a big consideration) we were required to keep it in its original form.
I feel for someone with my job in 50 yrs. I ran away from govt work after that. It was scary!
One plus side. EMP has a hard time taking out papertape!
On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
This is definately an intresting bit. I am not sure if its a good thing or a bad thing. Or course yes it will be easier to nab the bad guys. But for the good guys who commit trivial crimes (occationaly downloading pirated software), it can be a way to track down individuals who downloaded something questionable. Yes I may seem a bit paranoid. but every cloud has its silver lining, and ever silver lining its cloud
The leader in the industry is a Company called Megaputer, and their clients included the US government, Boeing, the CDC, and many large companies.
---
"Of course, that's just my opinion. I could be wrong." --Dennis Miller
I know the tendency for unix enthusiasts to believe that you invented every useful technology in the 1970's, but it's simply not the case. Grep simply isn't suitable for interactive searches of gigabytes of data broken up into millions of files.
To efficiently work with several years worth email, more advanced techniques are required. Specifically, you need a text indexing program tied to a relational database. While this doesn't give you any more power than recursive searches using the grep and find combo, it's much much faster as your keyword and message attributes can use b-tree index lookups and a cost based optimizer to reduce disk reads.
That being said, it's still not that impressive of software. I'm certain that I could build the search component in a couple of weeks using Microsoft SQL Server (with the neat full text indexing feature) and a moderately adept gui developer could hammer out a decent interface in the same amount of time.
Still, there's a difference between "trivial to implement with any decent rdmbs" and "I can do it with a 2 line bash script". You would do well to remember it.
Your friend,
--Shoeboy
I am all for the government writing any kind of search program, block program, security program or what have you for whatever purpose it wants.
If we really want the Internet to permeate into our lives, then it should go into our lives as they really are. Perhaps some people will be less wary about leaving evidentiary data lying about on the Net.
When we decry against censorware, or searchware or whatever, we are decrying a social use of technology and not the technology itself. Rather than stifling the developemnt of search technologies or other supposedly "authoritarian" tech, we should be adding to the debate about what kind of a society we live in.
I will be writing a variant of this for a controversial website soon, in support of rigidly restricted appliance computers and limited-access proprietary content AOL style networks developing alongside the open Internet. In this society we have prisons, in which the prisoners can't use the Internet much because the software and hardware that would allow them to use it within prison rules (reliably, monitored by non-technical prison officials) does not exist.
I would rather the educational and self-betterment resources available on the Net be extended to prisoners with the blessing of prison officials, so prisons which have lost their education budgets can restore these services cheaply.
Goat sex free since 2001
In reality, the biggest difference between grep and so-called "forensics" software is the emphasis on examining the data without modifying it and maintaining the chain of custody and audit trail. In fact, many experienced computer investigators do their jobs with little more than DD, grep, and various other Unix utilities. Most of the digital forensics software out there simply attempts to make this funcionality more accessable to your less tech saavy investigator. (The problems caused by inexperienced/unqualified investigators performing this type of analysis are beyond the scope of this response.)
I am currently the designer and project lead for a cross-platform open source (GPL) digital evidence processing suite. It is intended to bring together the various functionalities required to perform this type of work, and (ideally) operate on whatever platform the investigator desires. Our primary development platform is RedHat 7.1.
There are currently software packages out there that attempt to do this, including EnCase and The Forensic Toolkit in the commercial arena and The Coroner's Toolkit in the open source arena, however they lack the broad filesystem support and/or true ease of use to make them usable by everyone. The other barrier is price as EnCase, for example, costs thousands of dollars per copy.
We're well funded, and have already done a significant amount of work. We have some of our core components functional and plan on starting beta testing and releasing our first code drop later this year. If this field interests you and you'd like more information, or you work in the investigative field and have thoughts on what you'd like to see in such a tool, I'd love to hear from you.
---
All my evil schemes are copyright protected and encrypted.
The lawyers cannot learn of my deeds witout creating a circumvention device. If they find out what I have/done they will go to jail for much longer.
It seems to me that this could also bring about another niche for programmers... software that goes through a companies systems and completely wipes out data. I'm sure this would cause havoc with a company's ISO procedures, but with more and more companies getting caught because of e-mail or other data that is archived on their tapes, I could see the need to make sure that they have control over the "evidence" that exists on their own systems.
I'm not saying it's the right thing to do, or even that it's legal, but since most companies are not even aware of what data they do have backed up, and what is retrievable and what isn't, I could see this happening, if it hasn't already.
w o w...taht is about all that i have to say about this. why are they even worried about this?
Admin - www.newspad.org
NewsPAD - the daily news source for geeks!
If this software is so good at finding "hot" (i.e. incriminating or embarassing) documents, how long before the virus writers will "discover" the same techniques. Rather than just SIRCAM'ing out a random file out of the My Documents folder, spider the whole hard disk, and all reacheable network drives, and selectively mail out those items that score high on a "hotness" scale. This would make opening those SIRCAM attachments (using a Linux office suite, for safety...) much more rewarding...
About a half dozen lawyers spent weeks at a time in the period leading up to the trial in 1998 wading through through many thousands of pages of printed electronic documents. "It was a lot of paper," said one former government lawyer who worked on the case.
Which explains a lot about why litigation is so expensive! :) What I find humorous, being in the jury pool for our county's Superior Court, is that we (as a society) can afford to pay a half dozen lawyers to sit around poring over printed e-mails, but can't afford to validate parking for jurors. Assuming you can prove the validity of your evidence, I can see how a method to automate this would be very attractive.
One criticism of the NYT article is that it makes it sound like the legal profession just yesterday caught on to digital discovery and forensics. Although there are always some Luddites out there, lawyers who do major commercial or product liability litigation have been using digital techniques for years.
As far as user-friendly interfaces for forensic-ware, and other suggestions by comment-posters for improving the technologies, don't forget that in order to be useful to a lawyer, digital forensic evidence must be admissible in a court of law. Nobody is going to settle a lawsuit based on some damning piece of deleted email recovered from their hard drive, unless you convince them that the jury trying their case is going to see a big blow-up poster of all the bad things they said in it. In order to get that recovered data into evidence (at least in the USA), the lawyer must "lay a foundation" that the evidence has some reliability. An eyewitness to an event, for example, can testify about things she was able to see or hear from her particular location, but her testimony about what might have been happening out of her eye-earshot is not admissible in court. Another way to lay a foundation is through a qualified expert opinion, for example, an accident reconstruction expert who measures the skid marks and applies a scientific method to determine whether the car was speeding before the accident. The point being, even if I as a lawyer could read up on the relationship between skid marks and vehicle speed, make those measurements on my own, and perform the calculations just as accurately, that would not do me a bit of good. I would still have to go out and retain someone with considerable expertise in such matters in order to get the court to admit the results of the calculations into evidence, or I never get to put them on my blow-up poster for the jury. And this is not just a gimme. Especially in federal court, there are specific criteria for the qualifications that an expert must have, and the demonstrated reliability of the expert's method, before the results can be admitted in court.
So for those of you who are devising tomorrow's user-friendly forensics - a warning. No matter how point-and-clicky you make them, my lawyer colleagues and I will likely never touch them. Even though I am technically literate enough to grep anything you can grep, I'll keep on hiring one of you technical experts when I need some digital forensics done, because I need your experience, credentials and signature to convince a court that the results are reliable and not just wishful computer hokey-pokey by a lawyer who wants her client to win. (Also, lawyers don't testify in their own cases, as a rule, for various reasons.) This is especially true with things that *sound* somewhat unreliable, like recovering from low-level formats and such. The more extrapolation and guesswork is involved in the "recovery," the less likely it is to get into evidence.
And if you're developing a search method, or some other new technique for data recovery, keep in mind that in order to qualify yourself and the technique as proper expert testimony, you're likely going to have to disclose quite a bit about how you did it in order to lay the foundation for admissibility. You can just throw those valuable little trade secrets and patentable methods out the window. That's another reason why legal tech forensic shops tend to rely on things like grep and dd rather than innovating - where's the big payoff? Now if you don't care about admissibility, and are just mining the hard drives of your ex-employees (or ex-spouses, or whatever) for business reasons, maybe that's a different story. But most people don't think they're about to get into a lawsuit until it happens, so I wouldn't be so sure.
No, no, no. This is not a sig.
Now that is a hit load of links.
As you can see from the Google cache, the comment displayed just fine in a standard browser. It was not deleted because it contained malformed HTML. It was deleted for no reason, save that they didn't like it. That is in direct conflict with the statements in the FAQ.
I wouldn't be so sure about this. The Google cache version wouldn't display properly in Netscape, but worked fine in IE.
From that original start, they then (allegedly) gained the interest of the intelligence services,and then the media companies and dot coms, to become the players they are today.
Oh, this just screams for people to pull a NSA Line Eater-like filler into their mail, if just to PO the lawyer at the other side of the litigation and render their indexing useless.
"Trust me - I know what I'm doing."
- Sledge Hammer
Sounds to me like electronically sifting through evidence might require breaking software protections or interpreting custom file formats. Isn't that against the law now in the USA?
I don't think the DMCA is a good idea. In fact, I think it sucks. The best way to defeat that trash legislation is to hold EVERYONE, especially the legal community, to the letter of the language.
--- -- - -
Give me LIBERTY, or give me a check.
Could you give some more detail on this? There's a few pages on the web about the theoretical aspects of recovery but very little on actual cases.