Slashdot Mirror


Software Sorts Electronic Evidence

securitas writes: "The New York Times has a very interesting article about the legal industry using new search software to sort through electronic evidence such as e-mail, documents and recovered files, and the process that they go through to make the evidence usable. It has spawned an industry."

85 comments

  1. grep by astafas · · Score: 5, Funny

    Lawyers discover grep?

    1. Re:grep by CrazyBrett · · Score: 1
      % grep "incriminating evidence" /var/log/carnivore/*
      %

      Damn.

    2. Re:grep by xcomputer_man · · Score: 1

      Oh no. The First Church of Digital Grepping is becoming a government agency now?

    3. Re:grep by swinginSwingler · · Score: 1

      I'm gonna start my own consulting firm too. Start charging $250/hr and patent the whole process...

      patent #afdvg5q9j4qt
      Use of existing programming techniques to produce filtered text in a way anyone else could have tought up only I patented it first.

    4. Re:grep by keesh · · Score: 2

      Or maybe agrep. Isn't approximate matching more useful for this kind of task?

  2. Information Overload by purduephotog · · Score: 1

    The legal industry probably generates 50% of the Fed Ex traffic ;P Shipping all those reams of paper back and forth.

    Software to sort intelligently has been around for quite some time- googles 'i feel lucky' link is probably a great example of this. The number of times i've hit what i'm looking for with a few judicious '+'s using google is unbelievable.

    Sounds almost like they are reinventing the wheel...

  3. policy to the rescue by Anonymous Coward · · Score: 2, Interesting

    This is why it is critical to have a retention schedule so old emails don't come back to bite you. I haven't seen employees take this issue seriously even when there is a clear policy.

  4. password!!! by Anonymous Coward · · Score: 0

    Hey, where am I supposed to type my nytimes registration blah blah?? I guess you got the link wrong; please be more careful next time!

  5. Yes, but what kind of electronic format? by kingdon · · Score: 2, Interesting

    I thought the most interesting part was: "Responding to a request for documents during a merger review by the Federal Trade Commission took one company nearly a year because the electronic documents were kept in offices all over the world and in all manner of different formats". But the twist there is: would you like to have everything easy to find (and if so, how do you do that, what with all the fragmentation between office 97, office 2000, not to mention different file servers and such)? Or are you better off making it inconvenient for other litigants to get at the data?

    1. Re:Yes, but what kind of electronic format? by shibut · · Score: 1

      I believe one of the services of these companies, in particular Ontrack, is to put all the data in one format and provide you with a "viewer" (with an outlook feel) to view the data they re-generated. This way you can have different formats, but if you pay enough you can get it uniform...

  6. Progress! by CrazyBrett · · Score: 1
    Wow, someone actually included the "archive" link within the article. Yay!

    *golf clap*

    -- Brett

  7. Useful against spam? by M_Talon · · Score: 3, Insightful

    Wonder if they would be kind enough to run that against my email box and sort out all the spammers for me? Then I could take it to court to request compensation for the bandwidth consumption as well as "emotional damages" because of all the pron spam :)

    --
    Electronic Frontier Foundation for online civil rights information
    1. Re:Useful against spam? by Anonymous Coward · · Score: 1, Funny

      Wasn't sniffing through e-mail done by something called Carnivore, which everyone here bitched about as an invasion of privacy? Bleh, be consistent about this stuff, if you don't want something looking through your e-mail, stick with that opinion.

    2. Re:Useful against spam? by astafas · · Score: 2, Interesting

      I knew a guy that used to write code for the defense (electronic) industry. He told me that it could cost up to about $75 million to trace someone's email or telnet sessions if they did it smart enough. The problem comes out trying to get so many companies and foriegn countries to cooperate with you enough to look at their logs. He said it was because there isn't much goodwill towards the US in many parts of the world.

    3. Re:Useful against spam? by M_Talon · · Score: 3, Informative

      Just to clarify, there's a difference between this stuff and Carnivore. Carnivore is/was basically a wiretap, a way to monitor ongoing communications both incoming and outgoing. What's being discussed here is a way to organize and sift through information that is archived and already subpoenaed. Apples and oranges, my friends.

      --
      Electronic Frontier Foundation for online civil rights information
    4. Re:Useful against spam? by Anonymous Coward · · Score: 0
      He told me that it could cost up to about $75 million to trace someone's email or telnet sessions

      People can tell you a lot of things, but that doesn't make them true.

  8. alternate link by CrazyBrett · · Score: 1

    For the convenience of everyone who likes using the registration link, here ya go.

    ;)

    -- Brett

    1. Re:alternate link by sydb · · Score: 1

      Damn, you beat me to it.

      --
      Yours Sincerely, Michael.
  9. The amount of evidence is a problem by wiredog · · Score: 4, Interesting

    The continuing investigations into the Clinton presidency ran into a major problem. Every email sent to, and within, the administration was saved. But not indexed. Apparently there are terabytes of the stuff. On tape. Sorting through it looking for evidence is likely to take years. And then it has to go to the national archives. And that's just the email. Add in all the other digital content they generated, all of which had to be saved, and there's a major problem.

    1. Re:The amount of evidence is a problem by PD · · Score: 2

      On the bright side, archiving this ever increasing amount of material is very expensive, and the size of the archive is increasing faster than the size and sophistication of our army. Mankind will indeed live in a machine imposed era of peace, except instead of holding us hostage with our own weapons, we will be forced to use the resources formerly dedicated to making war for the preservation of the machine records.

  10. This is what computers are for by Anonymous Coward · · Score: 0

    To do all the stuff we think is too boring, too difficult or too time consuming. Perhaps this will speed up things in the process of investigation, in which case it is only good. Ofcourse.

  11. Biased Searches? by Robber+Baron · · Score: 3, Insightful

    I can see a potential for even more widespread abuse. Couldn't searching for keywords give some bloodthirsty prosecutor the ability to present a biased, subjective, out-of context version of what was communicated? We already know of several instances where a lack of understanding of the technology coupled with a lack of understanding of the context under which a message was communicated has led to abuse by those in positions of authority.

    --

    You're using her as bait, Master!

    1. Re:Biased Searches? by Kenyaman · · Score: 1

      Sure. That's why we have defense laywers as well.
      Waiting for my 20 seconds to expire....

    2. Re:Biased Searches? by Logic+Bomb · · Score: 2

      What you're worried about is nothing new; lawyers have done this using relatively (or extremely) out-of-context quotes found on paper or remembered from conversation since courts were invented. :-) Besides, both sides can use similar software, and communications that might be used to present a different perspective could be obtained just as easily.

  12. Link by sydb · · Score: 3, Funny

    Here's the correct link for those who'd like to register before they read the article.

    --
    Yours Sincerely, Michael.
  13. In the future by shd99004 · · Score: 2, Funny

    when lawyers attourneys and judges have been replaced by computer programs.

    Lawyer Eliza: (Walks up to the witness) So how are you today?
    Witness: I am fine thank you.
    L Eliza: How long have you been fine thank i?
    Witness: I don't understand the question...
    L Eliza: Don't you really understand the question?
    Witness: That's right.
    L Eliza: Is it really that right?
    Prosecutor Eliza: Objection Your Honor! She is harassing the witness!
    Judge Eliza: Why are you concerned about my honour she is harassing the witness?

    and so on...

    --
    Will work for bandwidth
    1. Re:In the future by Anonymous Coward · · Score: 0

      That sounds like part of a "Futurama" episode.
      Good thing you posted that logged in -- now you can sue him for residuals when he steals the idea.

    2. Re:In the future by Anonymous Coward · · Score: 0

      My MissDr eliza model in prolog could talk better than robots in spielberg movie`s.

      It`s a language thing.

      el lenguaje es solamente otro estado de la mente

  14. Name of the software? by nm42 · · Score: 2, Informative

    I'm not looking for a karma whore, but could anyone see for sure what the product is? or what the company is?
    The only company i see is OnTrack Data International. Anyone?
    I'm interested here cuz I do tech/litigation support in a law firm.

    The article does miss one important little detail. The first level of sorting is done by clerks or paralegals. Associates do the law-related grunt work, but that's AFTER someone making $10-$15/hour has gone through and sorted out the pr0n(trust me, lawyers get A LOT!), and other pointless crap.

    1. Re:Name of the software? by Anonymous Coward · · Score: 0

      Yes, Ontrack Data International

      http://www.ontrack.com

      They seem like a pretty happening company....

  15. Guess I am going to have to register now. by Anonymous Coward · · Score: 0

    Great, now the NYTimes will see archive.nytimes.com getting /.ed and will realize that people use it to avoid registering.

  16. I have done this by Anonymous Coward · · Score: 0

    I worked on prototype very similar to this for senior project in college.

    1. Re:I have done this by Anonymous Coward · · Score: 0

      This is also not unlike what was done to identify computer systems "at risk" for Year 2000 problems. The main differences lie in:

      1. The Search Keywords used.

      2. Extending the search to "deleted" areas of media, and e-mail.

  17. The pointy end of the search problem. by Minupla · · Score: 5, Interesting

    I worked for one of the state level governments in N.A. and had access, and "da-buck-stops-here" responsibility for the IT side of "Archives". Archives is leglislatively required to hold in permenant storage, "All materials relating to the ongoing business of the government". This caused some real problems:

    1) we had a case of an outgoing elected official low level formatting their HDD on the way out the door. Had to be sent out to a special data recovery lab. (they can do some amazing things with scanning electron microscopes on half tracks and such)

    2) there are stacks and stacks of 8" floppy disks, in formats like IBM DisplayWriter, and other chunks of physical hardware that haven't been seen by mortal man in 20 yrs.

    Finding a chunk of info is damn tricky, but after you find it, you have to find something that can read the punchcard/papertape/magtape/floppydisk/harddisk in question. And due to a querk in how the original act was written (keeping in mind that these things were written back when data was carved on rock slates and format isn't a big consideration) we were required to keep it in its original form.

    I feel for someone with my job in 50 yrs. I ran away from govt work after that. It was scary!

    One plus side. EMP has a hard time taking out papertape!

    --
    On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
    1. Re:The pointy end of the search problem. by hansk · · Score: 1

      there are stacks and stacks of 8" floppy disks, in formats like IBM DisplayWriter, and other chunks of physical hardware that haven't been seen by mortal man in 20 yrs.

      Hell, just try to find a 5 1/4" drive these days.

    2. Re:The pointy end of the search problem. by RacerX69 · · Score: 1

      I used to have a cassette tape drive for a Vic 20. Whenever I had to load a program, I had to rewind the tape to a point before I knew the file was saved. The computer would responde with "Press play to continue...". I also had to be sure to keep them separate from my regular audio cassettes. It sounded like a cat fight from Hell if I mistook one for a music cassette.

      Those were the days.

  18. Intresting.... by Anonymous Coward · · Score: 0

    This is definately an intresting bit. I am not sure if its a good thing or a bad thing. Or course yes it will be easier to nab the bad guys. But for the good guys who commit trivial crimes (occationaly downloading pirated software), it can be a way to track down individuals who downloaded something questionable. Yes I may seem a bit paranoid. but every cloud has its silver lining, and ever silver lining its cloud

  19. This isn't new by AntiFreeze · · Score: 2
    This process has been around for years, and is still being refined. It is referred to as "text-mining" and there is some spectacular software out there to accomplish these tasks.

    The leader in the industry is a Company called Megaputer, and their clients included the US government, Boeing, the CDC, and many large companies.

    --

    ---
    "Of course, that's just my opinion. I could be wrong." --Dennis Miller

  20. It's not merely grep you halfwits by Shoeboy · · Score: 2, Funny

    I know the tendency for unix enthusiasts to believe that you invented every useful technology in the 1970's, but it's simply not the case. Grep simply isn't suitable for interactive searches of gigabytes of data broken up into millions of files.

    To efficiently work with several years worth email, more advanced techniques are required. Specifically, you need a text indexing program tied to a relational database. While this doesn't give you any more power than recursive searches using the grep and find combo, it's much much faster as your keyword and message attributes can use b-tree index lookups and a cost based optimizer to reduce disk reads.

    That being said, it's still not that impressive of software. I'm certain that I could build the search component in a couple of weeks using Microsoft SQL Server (with the neat full text indexing feature) and a moderately adept gui developer could hammer out a decent interface in the same amount of time.

    Still, there's a difference between "trivial to implement with any decent rdmbs" and "I can do it with a 2 line bash script". You would do well to remember it.

    Your friend,
    --Shoeboy

    1. Re:It's not merely grep you halfwits by seanadams.com · · Score: 1

      Grep simply isn't suitable for interactive searches of gigabytes of data broken up into millions of files.

      For ineractively searching gigabytes of files, we have:

      find . -exec grep "cum-stained blue dress" {} \; -print | more

  21. search software is good by perdida · · Score: 1, Troll

    I am all for the government writing any kind of search program, block program, security program or what have you for whatever purpose it wants.

    If we really want the Internet to permeate into our lives, then it should go into our lives as they really are. Perhaps some people will be less wary about leaving evidentiary data lying about on the Net.

    When we decry against censorware, or searchware or whatever, we are decrying a social use of technology and not the technology itself. Rather than stifling the developemnt of search technologies or other supposedly "authoritarian" tech, we should be adding to the debate about what kind of a society we live in.

    I will be writing a variant of this for a controversial website soon, in support of rigidly restricted appliance computers and limited-access proprietary content AOL style networks developing alongside the open Internet. In this society we have prisons, in which the prisoners can't use the Internet much because the software and hardware that would allow them to use it within prison rules (reliably, monitored by non-technical prison officials) does not exist.

    I would rather the educational and self-betterment resources available on the Net be extended to prisoners with the blessing of prison officials, so prisons which have lost their education budgets can restore these services cheaply.

  22. Digital Evidence Software by D3TH · · Score: 5, Interesting

    In reality, the biggest difference between grep and so-called "forensics" software is the emphasis on examining the data without modifying it and maintaining the chain of custody and audit trail. In fact, many experienced computer investigators do their jobs with little more than DD, grep, and various other Unix utilities. Most of the digital forensics software out there simply attempts to make this funcionality more accessable to your less tech saavy investigator. (The problems caused by inexperienced/unqualified investigators performing this type of analysis are beyond the scope of this response.)

    I am currently the designer and project lead for a cross-platform open source (GPL) digital evidence processing suite. It is intended to bring together the various functionalities required to perform this type of work, and (ideally) operate on whatever platform the investigator desires. Our primary development platform is RedHat 7.1.

    There are currently software packages out there that attempt to do this, including EnCase and The Forensic Toolkit in the commercial arena and The Coroner's Toolkit in the open source arena, however they lack the broad filesystem support and/or true ease of use to make them usable by everyone. The other barrier is price as EnCase, for example, costs thousands of dollars per copy.

    We're well funded, and have already done a significant amount of work. We have some of our core components functional and plan on starting beta testing and releasing our first code drop later this year. If this field interests you and you'd like more information, or you work in the investigative field and have thoughts on what you'd like to see in such a tool, I'd love to hear from you.

    --
    ---
    1. Re:Digital Evidence Software by Anonymous Coward · · Score: 0
      Wasn't NASA using beowulf clusters to analyze hard disks? I think they setup a lab that could scan a hard disk in a fraction of the time using a cluster and high-speed networking....


      BTW, your stuff sounds VERY intersting, but could you please post a link to your web site?

    2. Re:Digital Evidence Software by Anonymous Coward · · Score: 0

      I'm peripherally aware of Nasa's work, (the Nasa Office of the Inspector General, to be exact) and know they have used Linux in their analysis. They have even been involved in developing Linux software and have released their enhancements under the GPL, one example being the enhanced loopback driver they developed.

      They're some of the best in the business when it comes to computer forensics, and their feedback has been invaluable in the design of my current project.

      We have a site for the project, but I'm not really interested in getting slashdotted this early in our development process. I'll probably post something about it when we're ready for our first code drop, and of course if you want to know when we get there just drop me a line. I have a list of interested parties (primarily Law Enforcement personnel) that I'm going to notify when we take the project public and the site goes live.

    3. Re:Digital Evidence Software by D3TH · · Score: 1

      Gah. Serves me right for not logging in to post that last one. The correct link for the enhanced loopback driver is here

      Enjoy.

      --
      ---
  23. I am safe by Anonymous Coward · · Score: 0

    All my evil schemes are copyright protected and encrypted.

    The lawyers cannot learn of my deeds witout creating a circumvention device. If they find out what I have/done they will go to jail for much longer.

  24. Spawing another industry by AllieA · · Score: 2, Interesting

    It seems to me that this could also bring about another niche for programmers... software that goes through a companies systems and completely wipes out data. I'm sure this would cause havoc with a company's ISO procedures, but with more and more companies getting caught because of e-mail or other data that is archived on their tapes, I could see the need to make sure that they have control over the "evidence" that exists on their own systems.

    I'm not saying it's the right thing to do, or even that it's legal, but since most companies are not even aware of what data they do have backed up, and what is retrievable and what isn't, I could see this happening, if it hasn't already.

    1. Re:Spawing another industry by ethereal · · Score: 1

      Obligatory Netscape story.

      Obligatory text to avoid the postercomment compression filter. I'm beginning to think that the trolls are right about Taco not being able to code; considering how much ASCII art I've seen in the last couple weeks, it's amazing that my little bit of HTML won't fly...

      --

      Your right to not believe: Americans United for Separation of Church and

  25. wow, that's all i can say by jkott · · Score: 1

    w o w...taht is about all that i have to say about this. why are they even worried about this?

    Admin - www.newspad.org
    NewsPAD - the daily news source for geeks!

  26. Other application of this tech: viruses! by cyberdonny · · Score: 3, Insightful

    If this software is so good at finding "hot" (i.e. incriminating or embarassing) documents, how long before the virus writers will "discover" the same techniques. Rather than just SIRCAM'ing out a random file out of the My Documents folder, spider the whole hard disk, and all reacheable network drives, and selectively mail out those items that score high on a "hotness" scale. This would make opening those SIRCAM attachments (using a Linux office suite, for safety...) much more rewarding...

  27. No wonder litigation is so expensive by Kenyaman · · Score: 1

    About a half dozen lawyers spent weeks at a time in the period leading up to the trial in 1998 wading through through many thousands of pages of printed electronic documents. "It was a lot of paper," said one former government lawyer who worked on the case.

    Which explains a lot about why litigation is so expensive! :) What I find humorous, being in the jury pool for our county's Superior Court, is that we (as a society) can afford to pay a half dozen lawyers to sit around poring over printed e-mails, but can't afford to validate parking for jurors. Assuming you can prove the validity of your evidence, I can see how a method to automate this would be very attractive.

  28. Forensics must be "admissible" to be useful by raresilk · · Score: 4, Informative

    One criticism of the NYT article is that it makes it sound like the legal profession just yesterday caught on to digital discovery and forensics. Although there are always some Luddites out there, lawyers who do major commercial or product liability litigation have been using digital techniques for years.

    As far as user-friendly interfaces for forensic-ware, and other suggestions by comment-posters for improving the technologies, don't forget that in order to be useful to a lawyer, digital forensic evidence must be admissible in a court of law. Nobody is going to settle a lawsuit based on some damning piece of deleted email recovered from their hard drive, unless you convince them that the jury trying their case is going to see a big blow-up poster of all the bad things they said in it. In order to get that recovered data into evidence (at least in the USA), the lawyer must "lay a foundation" that the evidence has some reliability. An eyewitness to an event, for example, can testify about things she was able to see or hear from her particular location, but her testimony about what might have been happening out of her eye-earshot is not admissible in court. Another way to lay a foundation is through a qualified expert opinion, for example, an accident reconstruction expert who measures the skid marks and applies a scientific method to determine whether the car was speeding before the accident. The point being, even if I as a lawyer could read up on the relationship between skid marks and vehicle speed, make those measurements on my own, and perform the calculations just as accurately, that would not do me a bit of good. I would still have to go out and retain someone with considerable expertise in such matters in order to get the court to admit the results of the calculations into evidence, or I never get to put them on my blow-up poster for the jury. And this is not just a gimme. Especially in federal court, there are specific criteria for the qualifications that an expert must have, and the demonstrated reliability of the expert's method, before the results can be admitted in court.

    So for those of you who are devising tomorrow's user-friendly forensics - a warning. No matter how point-and-clicky you make them, my lawyer colleagues and I will likely never touch them. Even though I am technically literate enough to grep anything you can grep, I'll keep on hiring one of you technical experts when I need some digital forensics done, because I need your experience, credentials and signature to convince a court that the results are reliable and not just wishful computer hokey-pokey by a lawyer who wants her client to win. (Also, lawyers don't testify in their own cases, as a rule, for various reasons.) This is especially true with things that *sound* somewhat unreliable, like recovering from low-level formats and such. The more extrapolation and guesswork is involved in the "recovery," the less likely it is to get into evidence.

    And if you're developing a search method, or some other new technique for data recovery, keep in mind that in order to qualify yourself and the technique as proper expert testimony, you're likely going to have to disclose quite a bit about how you did it in order to lay the foundation for admissibility. You can just throw those valuable little trade secrets and patentable methods out the window. That's another reason why legal tech forensic shops tend to rely on things like grep and dd rather than innovating - where's the big payoff? Now if you don't care about admissibility, and are just mining the hard drives of your ex-employees (or ex-spouses, or whatever) for business reasons, maybe that's a different story. But most people don't think they're about to get into a lawsuit until it happens, so I wouldn't be so sure.

    --
    No, no, no. This is not a sig.
  29. hit load of links. by kego · · Score: 1

    Now that is a hit load of links.

  30. Re:Michael Sims is a Censor! by Anonymous Coward · · Score: 0

    As you can see from the Google cache, the comment displayed just fine in a standard browser. It was not deleted because it contained malformed HTML. It was deleted for no reason, save that they didn't like it. That is in direct conflict with the statements in the FAQ.

    I wouldn't be so sure about this. The Google cache version wouldn't display properly in Netscape, but worked fine in IE.

  31. This is where Autonomy started by JPMH · · Score: 2
    What is now Autonomy, the knowledge management company, started about ten years ago when Mike Lynch's PhD research was sponsored by the police in the UK, to find ways to scan the mass of witness statements that are gathered in a major incident enquiry (often inconsistent, with varying content and terminolgy), and to automatically identify important features and cross-reference them.

    From that original start, they then (allegedly) gained the interest of the intelligence services,and then the media companies and dot coms, to become the players they are today.

  32. NSA Line Eater by mfarah · · Score: 1

    Oh, this just screams for people to pull a NSA Line Eater-like filler into their mail, if just to PO the lawyer at the other side of the litigation and render their indexing useless.

    --
    "Trust me - I know what I'm doing."
    - Sledge Hammer
  33. Don't lawyers head DMCA? by KarmaBlackballed · · Score: 2

    Sounds to me like electronically sifting through evidence might require breaking software protections or interpreting custom file formats. Isn't that against the law now in the USA?

    I don't think the DMCA is a good idea. In fact, I think it sucks. The best way to defeat that trash legislation is to hold EVERYONE, especially the legal community, to the letter of the language.

    --

    --- -- - -
    Give me LIBERTY, or give me a check.
  34. Low level format -- Recovery %? by Anonymous Coward · · Score: 0
    we had a case of an outgoing elected official low level formatting their HDD on the way out the door. Had to be sent out to a special data recovery lab. (they can do some amazing things with scanning electron microscopes on half tracks and such

    Could you give some more detail on this? There's a few pages on the web about the theoretical aspects of recovery but very little on actual cases.