Slashdot Mirror


Wayback Archives as a Law Tool

Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."

53 of 198 comments (clear)

  1. Text (Yes) Images (not always) by bigwavejas · · Score: 3, Informative

    WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.

    --
    "Simplify, simplify, simplify!" Thoreau
    1. Re:Text (Yes) Images (not always) by el_gordo101 · · Score: 3, Informative

      That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site. If the images files no longer exist on the original site, they will not display on the WBM archived page.

      --
      TODO: Insert witty sig
    2. Re:Text (Yes) Images (not always) by Adult+film+producer · · Score: 2, Informative

      That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site.

      I don't think that's the case, I don't think it even tries to grab the images from the server. At least with my webpage, I just clicked on the Nov 17th, 2001 archive... it had all the old images that I've long since deleted... and the server logs show no hits/404's for those images...

      Maybe that's not always how it operates though, *shrug*

    3. Re:Text (Yes) Images (not always) by cybersaga · · Score: 2, Funny

      They seemed to have captured Goatse just fine.

    4. Re:Text (Yes) Images (not always) by el_gordo101 · · Score: 3, Informative

      Interesting. Our sites date back to 1999. The images files from the older versions of sites do not show up, as these files were deleted long ago. The newer versions that use the images files that are still resident in our /images directory work fine. I am not sure how they handle images.

      --
      TODO: Insert witty sig
    5. Re:Text (Yes) Images (not always) by el_gordo101 · · Score: 2, Interesting
      From the FAQ:

      Why am I getting broken or gray images on a site? Broken images (when there is a small red "x" where the image should be) occur when the images are not available on our servers. Usually this means that we did not archive them. Gray images are the result of robots.txt exclusions. The site in question may have blocked robot access to their images directory.
      --
      TODO: Insert witty sig
    6. Re:Text (Yes) Images (not always) by the_true_cirrus · · Score: 2, Insightful

      As far as I could tell from experimenting with an old site of mine WBM tends to have multiple copies of sites from different times. If an image, page or any other object is missing from one of those archived copies it will redirect to the next most recent copy of that file in the WBM and so on. If the latest archived copy still doesn't have the file THEN it redirects to the actual website.

      I've often noticed that WBM pages load very slowly and I suspect that this is partly due to all the chained redirects your browser goes through before it finally reaches a copy of an image or a definitive 404 error if it is gone.

  2. And this is a big deal why? by Shadow+of+Eternity · · Score: 5, Insightful

    Peaple go to the library and dig through hundreds of old newspapers and records, whats the big deal with using wayback for websites?

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
    1. Re:And this is a big deal why? by ucahg · · Score: 2, Insightful

      For one its not quite as verifiable. Who is to say, for example, that someone with access to the Wayback servers couldn't put their own content and dates on there, and then use that as "evidence" for some suit?

      I don't know how (if?) its regulated, any insights into this?

    2. Re:And this is a big deal why? by TTK+Ciar · · Score: 5, Interesting

      For one its not quite as verifiable. Who is to say, for example, that someone with access to the Wayback servers couldn't put their own content and dates on there, and then use that as "evidence" for some suit?

      I don't know how (if?) its regulated, any insights into this?

      I work at The Archive. There are only two people, three at most, with the expertise and access to pull something like this off, and if someone tried Brad would almost definitely notice. There are checks in place to detect bitrot in the web archive, and altering older ARCs to include new information would be detected as bitrot and flagged for closer attention. They would then be compared against the copies in our sister organization's data cluster in Europe, and possibly also compared against the copies in the datacenter in Egypt.

      To make it work, you'd pretty much have to get Brad to play along, and he is fanatical about the integrity of the web data. I don't think you could pay him enough to do it, and he doesn't have any sons or daughters you could kidnap for blackmail.

      How one would go about demonstrating all of this in court, though, I do not know. IANAL.

      -- TTK

    3. Re:And this is a big deal why? by Kenja · · Score: 4, Funny

      Your right! We must put a stop to this movable type menace before the serfs use it to spread anti authoritarian pamplets!

      --

      "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
  3. Not everything is archived .... by awacs · · Score: 4, Insightful

    ... WBM respects the site's decision to not allow archiving. Unfortunately, those sites who might be the most interesting know that, and know that they can block archiving.

    1. Re:Not everything is archived .... by pilgrim23 · · Score: 5, Interesting

      Another point of note: Net Nanny and Surf Watch or other such tools blcok the main sites. they do NOT block the WBM archive of goatse.cx or the like. AND THAT IS A GOOD THING!!!

      Example:

      www.copstalk.com used to be the home page for a maker of Macintosh to PC via Appletalk cross platform communications tools. They were later bought out. If you wish to look at documentation on their older products, go to the WBM. www.copstalk.com these days IS A PORN SITE.

      --
      - Minutus cantorum, minutus balorum, minutus carborata descendum pantorum.
  4. In Contrast to by goneutt · · Score: 3, Interesting

    A few weeks ago /. had a bit on an Internet Archive that got sued for having material that was ordered withdrawn by a court. How can they have things both ways? I think I'm begining to understand why law school is so hard, you have to learn to think like a lawyer, which is like learning to type with your nose.

    --
    Bacardi + slashdot = negative karma.
    1. Re:In Contrast to by fritter · · Score: 4, Funny

      OMG D00D you're totally right! Some lawyers argue for one thing, while others argue for another! I just heard about this case the other week where this lawyer made a case that some guy was guilty of murder, but another lawyer was arguing the EXACT SAME GUY was innocent! I cannot believe they tried to have it both ways! Unlike every other profession in the world, where everyone thinks exactly alike!

      Seriously, how did this get modded "Interesting"?

    2. Re:In Contrast to by Macadamizer · · Score: 2, Insightful

      Lawyers dont have this long term morality thing that we do

      Has nothing to do with morality -- being a lawyer is a job. A lawyer can argue one day about poor quality tools and the next day that those tools are high quality (as long as it is in a different case, that is!) because that's his or her job -- to zealously represent the interests of his or her client. Lawyers don't deal with "truth" in that way -- the truth is for the jury to determine -- so its not a moral issue of lying one day and telling the truth the next. Besides, maybe, based on the particular facts of the case, in the context of the facts the Sear's tools ARE poor quality in the context of one case and ARE high quality in the context of the other case.

      Besides, other than the paycheck and their oath to zealously represent their clients, lawyers typically don't have "a dog in the fight" -- civil cases are typically not personal, so again, it's all just another job, and you do your best for your boss, whoever that might be on a particular day...

      --

      "That's not even wrong..." -- Wolfgang Pauli
  5. Employers are using Google too by October_30th · · Score: 4, Insightful

    Maybe this is a bit off-topic, but employers are also known to use Google and web archives to check up on the past of a potential employee. So be careful what kind of statements you make on the net using your real name.

    --
    The owls are not what they seem
    1. Re:Employers are using Google too by op12 · · Score: 4, Informative

      It's like that CNN article (I think it was posted here) from a few days back:

      Bloggers learn the price of telling too much

  6. I wonder... by Bryansix · · Score: 2, Insightful

    I wonder if people will try to sue website owners for content that they already pulled off of thier website. I mean I would hope not but I could see how this could happen. A person realizes that certain content is copyrighted and then pulls it and later on some lawyer for the owner of this content sues and uses google cache or WBM as a tool to prove it posted copyrighted material.

    1. Re:I wonder... by Peyna · · Score: 3, Insightful

      Just because you stopped doing something, doesn't mean it wasn't illegal while you were doing it.

      See, if I am beating the crap out of you, but stop before the police get there and witness it, that doesn't mean I wasn't beating the crap out of you and therefore guilty of battery.

      It's a weird example, but it works.

      If you've ever read some of the RIAA threat letters you'll notice they specifically state that just because you listen and pull down the offending material doesn't mean they're giving up their right to sue you for posting it in the first place.

      --
      What?
    2. Re:I wonder... by Peyna · · Score: 2, Informative

      Right, I know that the website commited a crime in the past but they didn't know it was a crime at the time so it's not the same as beating the crap out of someone.

      Ignorance of the law is no excuse.

      I can't believe I actually have to point that out to someone.

      --
      What?
    3. Re:I wonder... by Peyna · · Score: 2, Interesting

      Lets say person A posted copyrighted info on your web site. You didn't know it was copyrighted at the time. Person B tells you that it is and you pull the content. Copyright owner (C) later sues your ass since they found it on the WBM.


      It's copyright infringement, whether or not you know the material is copyrighted, or have reason to know.

      Copyright law does provide for reduced penalties in the case where the "infringer was not aware and had no reason to believe that his or her acts constituted an infringement of copyright."

      So, the law has in essence created a duty to users of documents to determine if they are copyrighted prior to distributing.

      --
      What?
  7. It also got wayback sued. by Peyna · · Score: 4, Interesting
    --
    What?
  8. Infinite Loop by kensai · · Score: 4, Funny

    If Google cached pages from the WBM and the WBM archived Google cached pages, wouldn't that cause an infinite loop. j/k

  9. One easy workaround... by DJ+Rubbie · · Score: 5, Informative

    $ cat robots.txt
    User-agent: ia_archiver
    Disallow: /

    My site is not archived there, problem solved.

    (Of course, if another of these service pops up...)

    --
    Please direct all bug reports to /dev/null
    1. Re:One easy workaround... by baadger · · Score: 2, Interesting

      What about the HTTP/1.1 header "Cache-Control: no-store"? Although the rfc description of what it does seems rather confusing to me.

  10. So perhaps censoring the archive is wrong? Signed? by expro · · Score: 4, Informative

    Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?

    What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.

  11. Hey Mr. Peabody! by DigitalReverend · · Score: 4, Interesting

    Stand back Sherman as we set the dials on the wayback machine to 1845....

    Seriously, theres been many times I would want to kiss the person running wayback. I lost my home a few years back and had several websites that I lost because I hosted out of my house. I have been able to rebuild, or come fairly close to duplicating those original sites.

    As for lawyers, if there wasn't somebody already archiving all these sights, they'd get someone to do it for them and then it would not be accessible to the public. I guess we need to take the good with the bad on this.

    --
    I read Slashdot for the headlines, because the headlines, unlike the articles, are usually original and never duplicated
  12. But see, they signed a peice of paper by Mr+Guy · · Score: 4, Insightful

    In the article, it mentions one of the archive's technicians signing an affidavit saying they think it's a true archive. No one would ever lie about that for a big corporate payout.

    1. Re:But see, they signed a peice of paper by sholden · · Score: 2, Informative

      They might. But if the other side can provide some evidence that it isn't a true archive than said technician is in deep shit.

    2. Re:But see, they signed a peice of paper by sholden · · Score: 2, Interesting

      And?

      That the priginal point isn't a problem was my point - that it was sarcasm is irrelevant to what was expressed.

      Determining who to believe is what courts do. The technician swears under oath that the archive supplied hasn't been tampered with. The other side can argue the testimony is untrustworthy or that the technician can't know. It's no different than a doctor saying "yes those X-rays are of the person in question and were taken on this date".

      Or a police officer saying "He ran a red light".

      Yes it's not foolproof. People can lie. People can be paid off. But it's the system as it stands - and the opposing side gets a chance to discredit the person making the claims and also to provide evidence that contradicts the claims.

  13. Childish Grudges by SilentShriek · · Score: 2, Insightful

    The childish nature of these corporations is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said we suck!" Who cares.

    If Dell did not suck, they would not have to be so defensive.

    1. Re:Childish Grudges by shawb · · Score: 2, Informative

      Well, the DellComputersSuck.com website actually DESERVED to be sued. Is this a free speach issue? Not really. They were selling another brand of computers, so trademark infringement would actually come into play here.

      --
      I'll never make that mistake again, reading the experts' opinions. - Feynman
  14. Proof and Evidence : The 'secret' to wining a case by markpapadakis · · Score: 3, Interesting

    Sometimes being able to see in the past is more valuable than being able to see in the future. It makes sense for lawyers to try to find ways to look back, for evidence and proof can only be found in the past.

    If you have evidence, you can prove your claims. If you can prove your claims, you win a dispute. If you win the dispute in favor of your client, that makes you one good lawyer.

    --
    Technology ramblings : Simple is Beautiful
  15. False info being planted by up2ng · · Score: 3, Interesting

    In the future what will stop someone or some entity from falsifying information knowing that the legal system will use this.
    How can the info in wayback.org or google be trusted ? You can make redirect pages based on googlebot or wayback that have nothing to do with what is really on the site.

    In the article it is mentioned that vodaphone.com was taken by a squatter and they used wayback to show that her intentions were "intended to misleadingly attract consumers" ,what if she had WBM's bot goto a "nice" page and not the real site that case could have gone differently.
    I think that if someone wants to they can plan ahead and use this in a nefarious fashion.

    --
    Success is not the result of spontaneous combustion, you must set yourself on fire.
    1. Re:False info being planted by shawb · · Score: 2, Informative

      That's why the archive's administrators signed an affidavit stating that the information was, to the best of their knowledge, not tampered with. And it would be up to the other lawyers to prove that you were falsifying the information, which could lead you into further trouble or at the very least remove any doubt of your guilt from the jury and show that you were indeed acting in bad faith.

      --
      I'll never make that mistake again, reading the experts' opinions. - Feynman
  16. robots.txt by Cyburbia · · Score: 3, Informative
    Even if the Wayback Machine archived your site, adding an appropriate robots.txt file to your Web site's root directory will make _all_ previous archives inaccessible to the public. I discovered this by accident, after I blocked the Wayback Machine robot by accident in an attempt to control malicious spiders. After I modified robots.txt, all the old archives reappeared after a few weeks.

    I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.

  17. Re:Proof and Evidence : The 'secret' to wining a c by (A)*(B)!0_- · · Score: 3, Funny
    Thank you Captain Obvious.

  18. I like hardware sites... by Momoru · · Score: 2, Insightful

    From Apple.com circa 1996:

                New PowerBook Family
    Addressing the needs of customers in small offices, home offices, business and education, Apple announced the Macintosh PowerBook 1400 series, combining 117MHz PowerPC speed with a removable CD-ROM drive and expansion options.

    Ah 1996 was how long ago? I remember lusting after those 117Mhz.

  19. Old Drivers by TheSeventh · · Score: 4, Informative

    I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.

    /bq

    --
    Just because you're paranoid, it doesn't mean that they're not out to get you.
  20. Playboy by floppy+ears · · Score: 4, Funny

    Playboy checks Wayback to look for infringers of its trademark bunny or other images.

    So they're basically just sitting around surfing porn too, eh?

    --

    "If I could live to be several hundred
    I could take a walk and really wander, really wonder."
  21. Re:School by LnxAddct · · Score: 4, Informative

    Its been a bit since I was in highschool and had to get around filters, but the most sure fire way is to run an ssh server at home on a port that your school's firewall lets through (most let 22, but to be less suspicious choose like 25 or 443 or something) and then carry putty around on a pen drive. Whenever you need unrestricted access, pop open putty, connect to your server and create a dynamic tunnel, on that pen drive you can also have firefox and have a socks proxy set up to use port like 1080 or whatever port you choose for a dynamic proxy. There you go, unlimited, encrypted surfing all bypassing your school's filters and beening tunneled through your house. This all assumes that your school's firewall doesn't block based on protocol, but rather ports.
    Regards,
    Steve

  22. Someone had to do it by jolande · · Score: 3, Funny

    yuhuhpoiu sauick

  23. If only there was a Firefox extension by Rurik · · Score: 3, Informative

    Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.

    Oh, wait, there is one.

    /shameful plug

  24. disclaimer by TTK+Ciar · · Score: 3, Informative

    I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.

    -- TTK

  25. has anyone used this for finding terrorists by asscroft · · Score: 3, Interesting

    I imagine if you searched for sites talking about 9-11 pre-9-11-2001 you'd find some interesting things. Post 9-11 there were a zillion references, but pre-9-11 there couldn't have been that many.

    Same with the london bomings, no?

    --
    because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
    1. Re:has anyone used this for finding terrorists by gatkinso · · Score: 3, Funny

      I would imagine that pre 9/11/2001 very few web sites mentioned the September 11th attacks...

      --
      I am very small, utmostly microscopic.
  26. Re:What about copyrights? by Ngwenya · · Score: 5, Informative

    OK, you may look at WBM as a library, but IS it?

    Yes. It really is. It's a registered member of the American Library Association. Details on http://www.archive.org/about/about.php

    It's an honest to God library, which also means that Section 108 of the USC on Copyright applies. Public libraries in the US (and here in the UK) have some pertinent exemptions to the copyright restrictions that bind us mere mortals.

    --Ng

  27. logos and other images by notnAP · · Score: 3, Funny

    Playboy checks Wayback to look for infringers of its trademark bunny or other images.

    In a related story, managers at Playboy have taken note of productivity differences between John Salem, who was tasked with finding instances of people illegally using the playboy logo, and Henry Waxman, who has been looking for instances of "other images," but has been observed taking frequent bathroom breaks.

  28. Canada murder case? by djKing · · Score: 2, Funny

    They killed Canada! Those bastards!

    - Peace

    --
    Free as in "the Truth shall set you..."
  29. There ought to be a Law by Nom+du+Keyboard · · Score: 2, Informative
    Requests from third parties to remove information are generally denied. The Wayback Machine makes exceptions in certain circumstances, for example if the Web pages contain personal information provided in confidence, such as medical data.

    I bet the (so-called) Church of Scientology gets everything they want pulled.

    In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.

    This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.

    I see two ways to fix this problem of misuse of a valuable archive:

    1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?

    2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.

    Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:There ought to be a Law by mousse-man · · Score: 2, Insightful

      What I'd do is to force WBM to either become legally compliant (ie archive their data on WORM media) or not being admissible in court.

      Problem solved since WBM can impossibly keep that much data on revision-proof media.

  30. Re:So the archives are still kept even w / robots. by Cyburbia · · Score: 2, Interesting

    Here's what happens, I think: 1) At first, no reference to Wayback machine in robots.txt. Site is spidered and the archive placed on line. 2) Add Wayback Machine to robots.txt. The site is no longer spidered, and old archives are hidden from public view. 3) Remove Wayback Machine from robots.txt again. Spidering resumes, and all the old archives of the site reappear. However, there is no archive of your site from the time robots.txt was up; remember, it wasn't spidered then.