Slashdot Mirror


Checksumming Webpages Patented

Just when you thought nothing else stupid could be patented, Wahfuz noted a story running about a company called Pumatech who has apparently patented storing a checksum of a webpage to determine if it has updated or not. I guess from now on everyone who wants to detect changes in web pages will need to store full copies of the pages in question, because I'm sure nobody thought of anything so complex as piping it through md5 and saving the output.

61 of 234 comments (clear)

  1. Oh. I see. by Wakko+Warner · · Score: 2
    So, if I just stored it and didn't ever do anything with it, I'd be okay.

    Ever heard the story of the CD-WOM? It was a device consisting of two blocks of ordinary wood and a cable connecting it to the user's PC. CD media was placed between the two blocks and data was written to the CD. The process was foolproof (I challenge you to prove to me that no data was written to write-only media!)

    That's about how useful storing a checksum of a webpage would be without *doing* anything with the data. Sure, the checksum exists, but if you don't bother to do anything with it, the data is as worthless as a CD-WOM. Obviously, someone creating MD5 hashes of all their webpages would also build some sort of system around it to make use of those hashes!

    - A.P.

    --
    Forget Napster. Why not really break the law?

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
  2. Uhh... ok.. by jCaT · · Score: 3

    I'm sure nobody thought of anything so complex as piping it through md5 and saving the output.

    Yeah- this is one of those "Why didn't I think of that?" things- but I have yet to hear of a web cache or proxy that uses md5sums instead of last-modified headers- are there any out there? And if so, wouldn't that count as the all-important prior art?

    Just because something seems simple once somebody else thought of it doesn't mean it wasn't a good idea in the first place.

    1. Re:Uhh... ok.. by artdodge · · Score: 2
      Hmm... you could use the MD5 of a document as the entity tag (Etag) and use the If-None-Match conditional header. By spec, Etags are totally opaque, but there's no reason they couldn't be checksums.

      IOW, the mechanism is there, but I'm not aware of that particular policy (tag==MD5sum) ever being used.

    2. Re:Uhh... ok.. by artdodge · · Score: 2
      Ehrm?

      You have an old copy of the page and a checksum of that copy. You send a request to the server saying "If the checksum is no longer X, please send me the new copy, otherwise send me a 304 Not Modified message". The server has a checksum Y of whatever version of the page is current. If X==Y, send "304 Not Modified" (a few hundred bytes). If X!=Y, send the new version. This is standardized behavior (see ETag and If-Match/If-None-Match in RFC2616).

      You and Taco must be smoking the same crack today.

    3. Re:Uhh... ok.. by kijiki · · Score: 5

      You couldn't be more wrong.

      January 1997 -- rfc2068 HTTP/1.1

      See section 14.20, 14.25, 14.26, and 14.43.

      It describes the "ETag: " header, which is usually a md5 hash of the resource.

      The client can then validate the resources in its cache by sending a request with a "If-None-Match: " header with the ETag associated with the copy in its cache.

      The server will either respond "Not modified" in which case the client simply uses the version in its cache, or the server will resend the resource if the ETags don't match.

      Since this patent was filed for in 1999, this is pretty clear prior art, in the most commonly used protocol on the largest network in the world. If the patent office can't locate prior art in incredibly obvious (obvious to anyone skilled in the art, that is) cases like this one, what hope do we have for them intelligently handling more subtle cases?

    4. Re:Uhh... ok.. by gorilla · · Score: 2

      I wrote a program in 1998 which monitored several pages and emailed me if any of them had been changed. I used it when I was having problems with a content generation system loosing connection to the database. I couldn't use the last modified header, because this was dynamically generated content without one.

    5. Re:Uhh... ok.. by signe · · Score: 2

      Actually, I seem to recall that Inktomi Traffic Server has this functionality. However, I'm not sure if they implemented it prior to 1999 or not.

      Their design was more geared towards hashing something like redball.gif and allowing a single instance in the cache to be referenced for multiple sites, thereby saving space in the cache.

      -Todd

      ---

      --
      "The details of my life are quite inconsequential..."
    6. Re:Uhh... ok.. by bmajik · · Score: 2

      I have.

      Infact, getting the whole page and doing an md5 sum, and then comparing it to a stored value in a mysql database is exactly how mine works. This patent can go fuck itself, thank you very much :)

      I dont remember if I actually coded the sum/compare part, becuase by the time i got to that part, i was sick of the idea anyhow. But the bookmarks db and all its entries are live on my machine at home, and i use it for storing and retriving bookmarks from anywhere i happen to be using a computer.

      A patent for this is ridiculous. I am fucking tired of people patenting totally naive and obvious approaches to trivial problems.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    7. Re:Uhh... ok.. by nehril · · Score: 2

      the patent in question does checksums of *parts* of an html document, so the system is more complex than just wget | md5gen or whatever. It's supposed to be able to fetch only the diffs. Perhaps still not patentable, but still different from what has already been done.

    8. Re:Uhh... ok.. by lpontiac · · Score: 2
      Yeah- this is one of those "Why didn't I think of that?" things- but I have yet to hear of a web cache or proxy that uses md5sums instead of last-modified headers- are there any out there? And if so, wouldn't that count as the all-important prior art?

      I know of a website which kept an index of lists to people's weblogs (it's a semi-private thing on a cable modem, so sorry but no link). It polls the websites every 15 minutes to see whether they're changed, and orders the list of links as such ... so you can visit the page an see at a glance who's updated their weblog.

      This was all accomplished using a homegrown Perl script. Originally, it stored a checksum of the pages it retrieved for later comparison, to determine when a page was last updated. This was later replaced with a simple byte count of the page's size - using a checksum or the whole page generates "false alarms" when people are using hit counters on their page, wheras the size of the page tends to be more stable, yet it unlikely to remain the same between updates.

    9. Re:Uhh... ok.. by SCHecklerX · · Score: 2
      hashing web pages (or anything else for that matter) is a standard security procedure. You monitor the hashes, and notify if something isn't right.

      I use hashes for database work...for example, when I want to make a link to a data element I just added, I create a hash of that record to refer to (because you don't know the primary key ID that was assigned by the database, and that's how you would call the record.)

      Anyway...yes it is obvious, and yes there is quite a bit of prior art. Even if neither were the case, patenting processes is fscking stupid anyway. If you have a prototype of a device, by all means, patent it. Otherwise go away.

  3. I know I'll get flamed for this but... by tgd · · Score: 2

    If you read the press release, the patent isn't on storing checksums of HTML pages, but is for storing checksums of sections of a page between pre-identified HTML nodes.

    Now, perhaps there is prior art for this, but its a damn good idea and I sort of doubt it because I've been around the block a few times and haven't seen ANY caching mechanisms that can determine if a page has changed based on a checksum calculated from just a portion of the page (presumably so things like today's date on a page doesn't affect the state of the cache).

    That seems pretty damn innovative to me. I'm no big fan of software patents, but as software patents go, this is a lot more justifiable than most.

    So flame away, but there is a lot of posturing going on here about prior art, and none of them seem to come close.

  4. No its not, its a very targeted patent. by tgd · · Score: 2

    And, unfortunately, probably perfectly valid in the US where something as stupid as software patents can be "valid".

    I quote:

    a checksum generator, coupled to receive the fresh copy of the document from the periodic fetcher, for generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the checksum generator signaling a detected change to the remote client when the fresh checksum does not match the original checksum,

    Note the bold part. Contrary to the inflamatory headlines, this patent does NOT cover blindly checksumming webpages, but rather strategically checksumming the critical part of a page, so the fluff doesn't affect the cache status.

  5. You think that's bad.... by Francis · · Score: 5

    I used to work at Pumatech. (Actually, I worked in the wireless web-browsing end of things, as an engineeer)

    Anyways, we were checking our emails one day (this was about 6 months ago) and there's some big "congratulations" email - we got another pattent!

    A large portion of the company is based out of synchronization software. (Synchronize your PIM, Laptop, whatever) We'd just received a patent on a revolutionary new technique - time based syncing! Sync data, based on their TIME STAMPS!

    We had a good laugh.


    --
    --

    --
    #include <malloc.h>
    free(your.mind);
  6. Re:Individual patents, no. Collective patents, yes by jimhill · · Score: 2

    Actually, a number of technologies relevant to nuclear weapons were patented prior to and during the Manhattan Project. For some reason, Mr. Stalin failed to adhere to such Intellectual Property law as might have existed at that time. Now that I think about it, I can't imagine a notion more antithetical to the Communist Manifesto than intellectual "property".

    --
    Learn to spell: nickel, missile, lose, solely, amendment, speech, kernel, probably, ridiculous, deity, hierarchy, versus
  7. defeating it by Barbarian · · Score: 2

    If they're using a simple checksum, then someone should figure out how to fool it--add like a comment field to a webpage with the correct characters to make the checksum the same.

    If they're using md5sums, well, I guess this won't work.

  8. Anything obvious is unpatentable. by Christopher+Thomas · · Score: 3

    Just because something seems simple once somebody else thought of it doesn't mean it wasn't a good idea in the first place.

    And just because they (allegedly) were the first to think of it, doesn't mean it's patentable.

    Patents are supposed to be given only for things that aren't "obvious to anyone skilled in the art". In practice, this isn't assessed well by the patent office, but that's another can of worms.

    1. Re:Anything obvious is unpatentable. by HiThere · · Score: 2

      They've got a lot of work to do before I'll believe that they're even trying. And hearing about cases like this ... well, it sure doesn't make me think more hightly of them!


      Caution: Now approaching the (technological) singularity.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  9. Re:Previous Works by MikeFM · · Score: 2

    I have thousands of MD5 sums stored from web pages and various files linked to web pages along w/ many of the original files. I've been sucking such info off the net and using MD5 sums to verify unique these files for a couple years at least. Never even considered the lame ass idea of patenting such a thing. Damn maybe I should patent all my shell scripts. :)

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  10. Re:When? by Samrobb · · Score: 2

    Likewise. Company I used to work for did something very similar, using a CRC calculated using the text of a web page to determine web page "identity". I would be surprised if the Lycos (or Altavista, or Webcrawler, or Hotbot...) spiders didn't do something very similar.

    Which brings up an interesting question - if, by 1997, there were enough companies implementing this sort of "technology" already, then can't it be argued that the Pumatech patent is obviously invalid because at the time they applied for it, it was already in use by multiple companies... which seems to me to indicate that their "innovative" technology is "obvious to a practioner skilled in the arts".

    --
    "Great men are not always wise: neither do the aged understand judgement." Job 32:9
  11. Re:Patents become irrelevant by HiThere · · Score: 2

    You have neglected one significant cost. These *** patents make it much more difficult for a small company. A small company won't have cross-license agreements, won't have a large legal staff, won't get a "good-buddy" licensing price, and is generally operating on a shoe-string budget anyway.

    So this is one of the factors that causes many new companies to fold. Think of it as a social control mechanism ... and it is, whether intentional or not. Because of this, I tend to think of these "spurious" patents as a large evil. Not the biggest one, but not a small one either.


    Caution: Now approaching the (technological) singularity.

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  12. Re:Indeed by HiThere · · Score: 2

    Yes indeed. Text is so highly differentiated that if you know about doing something to the whole thing, doing something to a part of it is patentworthy. ????

    You have an extremely low standard for what should be patentable. Considering the cost of defending against a patent, if trivialities are patentable, soon only the rich will be able to legally initiate any action. Is this a social good? Is it in compliance with the constitutional provisions enabling the patent law? (I don't remember the precise pharsing, sorry. It isn't "To promote the general welfare ...", but that was the idea behind it.)

    E.g.: There may be no prior are in the archives of the patent law covering eating using a metalic or otherwise ridgid, or somewhat stiff, divided instrument to convey the nutritive material from a holding container to the grinding apparatus. Should this be patentable?


    Caution: Now approaching the (technological) singularity.

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  13. Re:Akamai does this already by Skapare · · Score: 2

    Can they testify that they have been doing this since prior to Feb 18, 1999?

    --
    now we need to go OSS in diesel cars
  14. check_www did this since 1-Dec-2000 by vik · · Score: 2

    check_www is a series of scripts and filters that I created under the GPL last year to automatically advise me of when web pages change, popping up alert boxes and pre-loaded browsers as apropriate. It includes filters to remove unwanted constantly changing information and search for terms. It is available on http://olliver.family.gen.nz/check_www.tgz Ironically, I was alerted to this article by it. Vik :v)

  15. Lawyering... by Merk · · Score: 2

    Now I know you can go after the police for malicious prosecution, and I know people have sued to recover court costs before. Could something like that be used to go after companies that file obvious patents that have been in use for a long time?

    Say you're an independant coder, and you create a way to check if a file is current using checksums, and you use it on your personal web site, never thinking about it. Years later a company patents exactly what you're doing.

    A normal reaction might be to yell and scream about how you were already doing it and how the patent is worthless. What about if you instead copied their product, using their supposedly patented technology. Seeing that, they'd come after you for patent violations. You could then show you were using the algorithm for much longer than them. Then, after you won the case, you could sue them to recover the costs associated with defending the case.

    I dunno, maybe some variation on this might work. It sure would be nice to be able to turn the screws on the screwers.

    Disclaimer: I am not a lawyer liscensed in your jurisdiction or in any other jurisdiction. I'm not a lawyer at all, and I'm probably not even in your country. If I were in your jurisdiction and were a lawyer I'd probably not want to give out free legal advice anyhow... but who knows what I'd do, cuz I'd probably be pretty depressed at being a lawyer.

  16. rsync and rproxy. by himi · · Score: 2

    rsync does a block by block checksum of a file, then searches another file for matching blocks, thus making it a generalisation of this idea to /any/ file. It's been around for a /long/ time - the mailing list archives go back to 1991.

    rproxy applies the rsync protocol to http caching. I first heard about it at CALU in July 1999, and checked out some cvs code that worked at that time.

    The general idea has been floating around for ages, though - look on the rproxy site for links to other people's ideas about this kind of thing.

    This /is/ yet another case of a really dumb patent.

    himi

    --

    --

    My very own DeCSS mirror.
  17. Hey Pumatech, HEADs up! by RomulusNR · · Score: 2

    % telnet slashdot.org 80
    Trying 64.28.67.150...
    Connected to slashdot.org.
    Escape character is '^]'.
    HEAD / HTTP/1.0

    HTTP/1.1 200 OK
    Date: Tue, 24 Apr 2001 05:22:53 GMT
    Server: Apache/1.3.12 (Unix) mod_perl/1.24
    Connection: close
    Content-Type: text/html

    Connection closed by foreign host.

    --

    --
    Terrorists can attack freedom, but only Congress can destroy it.
  18. Cisco CSS by ryanr · · Score: 2

    Nee Arrowpoint, the web balancers Slashdot itself uses.

    It stores an MD5 checksum of a webpage to determine if the page it retrieved is complete. This is part of its timing mechanism to determine load. Pretty sure they did this prior to Feb. 99.

  19. It's all been done before. by schon · · Score: 2

    Yeah- this is one of those "Why didn't I think of that?" things

    No, it isn't.

    but I have yet to hear of a web cache or proxy that uses md5sums instead of last-modified headers- are there any out there?

    No, because that's a completely different question.

    Just FYI, this has been going on for _ages_ There was a 'web page change detector' available back in my 14.4kbps modem days (early 1995 - I can't remember what it was called, tho - been too damn long) that used this very technique... you fed a URL into a CGI, and it would poll the page every so often and email you if it had changed. And guess what? It used a checksum of the page to determine if it had changed (since storing all those pages would just take way too much storage space.)

    This is _NOT_ new, and it's _NOT_ non-obvious.

  20. Too many prior art by segmond · · Score: 2

    Ask web crawlers designers, When I was working on a web crawler, I wondered what would happen when pages got updated and how I would go about getting the latest update, so I had the crawler stop a page with the date it was fetched and a checksum of the page. If a page hasn't been fetched in 10 days and is crawled, it is fetched, the checksum is compared, and if different it is parsed for potential new links/keywords... This is so obvious, I am sure that google and major search engines probably do this.

    --
    ------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
  21. Tripwire prior art and USPTO by poopie · · Score: 2

    http://www.geek-girl.com/ids/1995/0306.html

    lots of postings here from 1995 about tripwire and it's predecessors. . .

    maybe the USPTO should post their patent requests to slashdot and let us find the prior art before they issue patents.

    How about a site like http://find-prior-art.com that pays out money to the first people to find prior art for patent requests?

  22. Re:ahem.. by coyote-san · · Score: 2

    I can think of at least two excellent reasons off the top of my head.

    First, it's a considerable expense and hassle. Patent attorneys are not optional - the claims have to be properly worded for the USPTO office to accept them *and* to prevent some business from stealing your idea by rewording an ineffectual claim ever so slightly. If you're a business and want to create market entry barriers to your competition, $10-20k might be a good investment. If you're a working stiff, that's a lot harder to justify. If you're still in college, forget it!

    Second, by seeking patents for "obvious" things we're implicitly accepting the validity of all other obvious patents. A sadly too common analogy is elections in corrupt regimes - you can organize a voter boycott because the election is corrupt, you can run your own candidate, but you can't do both.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  23. Re:Right on. Enough of this irresponsible hype. by Malcontent · · Score: 2

    He's an idiot don't expect him to actually think about things like that.

    He thinks that if you disagree about a patent you are a communist. What kind of a moron thinks like that?

    --

    War is necrophilia.

  24. Content-MD5 already in HTTP/1.1 by gotan · · Score: 2

    I don't know why anyone needs this. There are expiration dates and conditional loading of pages if expired already defined in HTTP/1.1 (Rfc 2068) so instead of creating a hash a server honouring requests such as 'If-Modified-Since' would perfectly do the job. There is also an entity tag already defined in the faq. Deducating it from a hash is one possible solution to create such a hash. Encoding the document location and the date of the last change another.
    But in general a server using the last modification date of the file as 'Last-modified:' header would well do the job. Else an entity-tag would do the job. The hash would only make sense, if the Document could be retrieved under different URLs. Even then sensible creation of an entity Tag would do the job.

    Then there is the Content-MD5 field for an integrity check (from rfc 2068):
    The Content-MD5 entity-header field, as defined in RFC 1864 [23], is an MD5 digest of the entity-body for the purpose of providing an end-to-end message integrity check (MIC) of the entity-body. (Note: a MIC is good for detecting accidental modification of the entity-body in transit, but is not proof against malicious attacks.)

    This is in the rfc dated January 1997. There are also guidelines, how Proxies or clients should use these Tags to check for expired Documents. It's all there.

    --
    "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
  25. Ok patent zipping parts of files by gotan · · Score: 2

    I mean, how ridiculous can it get? You look up something you deem a good idea, then modify it slightly and patent? Note that the method in the faq doesn't refer to patents and thus is probably not patented. The authors thought it obvious to mark the document with tags to deduce date of last modification, a unique id (for documents retrieved under this url) and a checksum for integrity check. Now some morons come along, see it already done, do it on parts and get a patent.

    I would like to patent transporting morons. In parts.

    --
    "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
  26. Re:smells like troll but ... by sconeu · · Score: 2

    Besides, the US let out the REAL secret at Alamagordo, Hiroshima, and Nagasaki. Namely, that it was possible to build a working atomic bomb. Once the Russkis had that, the rest was engineering. They already knew the theory.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  27. stupid idea by maraist · · Score: 2

    Why would you want to checksum a file to see if it's changed? As a web server, the time stamp is adaquate to determine if it's changed, and as a web browser or web proxy, HEAD is adaquate to check the time stamp.

    While we're at it. I'm going to rush to the patent office and see if I can "patent" 64bit date time stamps, so I have a lead in on the next big crisis!

    -Michael

    --
    -Michael
  28. Re:already found prior art by bwt · · Score: 2

    Did the patent office even try a Google search before stamping its approval on this patent?

    Obviously not: http://www.google.com/search?q=web+checksum

    Hit #2 is prior art: "BIBLINK.Checksum - an MD5 message digest for Web pages" . Note that: "This article last updated/links checked on 23-Sept-1998"

  29. Prior art by FauxPasIII · · Score: 2

    Not that I figure prior art will be hard to come by for this, but I did this in a Squeak/Smalltalk for a CS project my sophomore year in college, 1998. And they've been using this project for several years of this class.

    --
    25% Funny, 25% Insightful, 25% Informative, 25% Troll
  30. It's more than it seems by Pulzar · · Score: 2

    Taking a look at the patent content, it's not as simple as running the page through a checksum generator. This wouldn't work with some dynamicaly-generated pages, for example, because their dates of creation will change every time.

    The process in the patent allows you to select a portion of the web page, and then the server only tracks changes in that portion. It also generates a checksum for each portion of content between HTML tags, and it is smart enough not to tell you that the content changed if certain sections got reordered, but the content's the same. It will also show you exactly which portions changed, since it has a separate checksum for each section.

    It's not fusion power, but it's an ok idea, and I don't think anyone has used it before. So, let them have the patent.


    ----------

    --
    Never underestimate the bandwidth of a 747 filled with CD-ROMs.
  31. Pumatech has fallen down before... by drin · · Score: 2

    This is the same company that developed and sold the synchronization software that supposedly worked with the Palm HotSynch app to allow synchronization to other schedulers. Their conduit software worked once you took the days required to figure out how to install it correctly.

    It figures that they'd come up with yet another harebrained scheme....

    -drin

  32. False assumption by matthewn · · Score: 2

    The posting begins, "Just when you thought nothing else stupid could be patented" . . . um, hello? Why the heck would ANY of us think that? Did I miss the story about the patent office coming to its senses?

  33. Integrity Checkers by Lish · · Score: 2
    There is probably major prior art from Tripwire and other file-integrity checkers. Basically the exact same idea, with the purpose of detecting when important files have been altered through a breakin.


    ---

    --
    "This message is composed of 100% recycled electrons."
  34. Prior art in the HTTP Protocol? by Carnage4Life · · Score: 3

    Isn't this just doing stuff similar to what strong validators a là Entity Tags in HTTP requests and responses use for determining whether a page has been changed (i.e. is in the cache) or not?

    The only difference I can see is that they generate an Etag like entity for tect highlighted by the user as well as the entire webpage. Doesn't seem worthy of a patent though.

    --

  35. Pretty Broad Claim by Artagel · · Score: 2

    Claim 1 of the patent reads:

    1. A change-detection web server comprising:

    a network connection for transmitting and receiving packets from a remote client and a remote document server;

    a responder, coupled to the network connection, for communicating with the remote client, the responder registering a document for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than the entire document;

    archival storage means, coupled to the responder, for receiving the URL and the original checksum from the responder when the document is registered by the remote client, the archival storage means for storing a plurality of records each containing a URL and a checksum for a registered document;

    a periodic fetcher, coupled to the archival storage means and the network connection, for periodically re-fetching the document from the remote document server by transmitting the URL from the archival storage means to the network connection, the periodic fetcher receiving a fresh copy of the document from the remote document server,

    a checksum generator, coupled to receive the fresh copy of the document from the periodic fetcher, for generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the checksum generator signaling a detected change to the remote client when the fresh checksum does not match the original checksum,

    whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein changes in portions of the document outside the checked portion are not signaled to the remote client.

    So, the usual flame-before-reading crowd isn't entirely unjustified. (That's not to endorse flaming before reading, much less thinking, but hey, even a blind pig finds the occasional acorn.)

    Oh, btw, the priority date is January 14, 1997. Leave it to the guys who do the press release to give the wrong impression of when the thing was invented. Not that doing a checksum and not recording non-changes wasn't just as obvious in 1997 as 1999.

  36. When? by Trinition · · Score: 2
    The last company I worked for had been doing checksums of web pages since about 1997. Depending on when Pumatech started, this may be prior art. In my breif skim, I didn't see any initial date.

    Anyways, its a silly patent. Checksums are a pretty fundamental thing to do! I don't even think my last company tried to patent it because it was so blatantly obvious!

  37. Er ... no ... by Chillas · · Score: 3

    Ahem ... no, they have patented a system for creating, storing, and using the checksum. An entire system, not just the storage of a checksum. Once again, alarmist headlines from /. I think we'd all appreciate it if these stories had accurate headlines.

    --
    --- Math illiteracy affects 8 out of every 5 people.
    1. Re:Er ... no ... by agentZ · · Score: 2

      Well now hang on a second here. Is it really the company that's trying to get rich, or the lawyers. Who has more to gain from this? The company might make some money suing another company some day, but the lawyers definitely make money (and keep their job security) by encouraging patenting technology.

  38. Re:Umm you ARE responsible for what your pet does, by agentZ · · Score: 2

    Lawyers can be like any other consultant. A lot of their advice can be such that it requires the constant presence of a lawyer to keep you out of legal trouble. I don't trust 'em any farther than I can thrown 'em.

  39. Re:already found prior art by unformed · · Score: 2

    nope, that would involve...ummm...technical competence

  40. Re:first claim by GrassSnake · · Score: 2
    I wrote a similar system as a college freshman. From the UMBC Agent Web:

    A new and improved diffAgent server has been released which includes additional mediators. "A diffAgent watches information sources available via the web and e-mails you when it detects changes. In particular, it can:

    • Watch your FedEx package for you and e-mail you when it sees the words "Package has been Delivered!" (make a package watcher agent)
    • Monitor a list of query results at a search service like Altavista to see when new pages on your topic appear (make a web topic watcher agent)
    • Keep track of news articles on a topic and mail you when it finds new ones (make a news topic watcher agent)
    • Mail you when your name appears in a list of papers at an electronic archive (make a web page watcher agent)
    • Tell you when the word "snow" appears on the Pittsburgh weather page (make a web page watcher agent)
    8/15/96

    diffAgent had two modes. In the first mode, it stored a CRC checksum of the page, periodically compared checksums, and notified you of changes.

    In the second mode, it stored the whole page, ran diff --context=3 over it to detect changed lines, and then grep'd for user-specified words of interest.

    I believe The NetMind web page was already up at that time, but they may not have had all of the features important to the patent. IMO, the NetMind technology is not worth a patent, but it is a bit beyond the diffAgent, and not entirely trivial to implement even if it is trivial to think of.

  41. New business model in the New Economy by Xibby · · Score: 4

    If you patenet it, they will come.

    --
    I'm going to go back in my box and will think within the limits of my box: MS Sucks Linux Good I read too much Slashdot.
  42. Dang! What next? by Dr.+Awktagon · · Score: 2

    I've ALWAYS used checksums to do that kind of stuff. Unfortunately in scripts that aren't distributed publicly, but cripes, any damn fool could come up with that idea!

    Another trick I've used is in scripts that generate static .html pages from a database: take the data used in the page (not the page itself), and make an md5 of the concatenation. Since most md5 routines can take data in chunks, you can generate it as you're getting the data. Then save the md5sum in a comment at the top. Then in the future you can compare with md5sum of the page with the md5sum of the data. If there is a "last modified" date on the page or something this will only update it when the data changes.

    I also use this trick for an automatic DNS updating script that creates zone files from a master data file. Can't just update the zone files every time because then the serial numbers would be updated constantly.

    So if anybody patents this silly idea (maybe they already have?), I've been using it for like eight years!! I'm publicly announcing it here on /.!!

    Blah.

    Besides I don't use NetMind anymore, I use SpyOnIt.

  43. already found prior art by Anthony+Boyd · · Score: 2

    I note that Linux Focus already uses md5 to allow mirrors to check for updates to the pages. See that here.

    Did the patent office even try a Google search before stamping its approval on this patent?

  44. Re:Previous Works by FKell · · Score: 2
    actually mysql probably keeps it running VERY FAST. We had a mysql server take upwards of 15000 requests to write a webpage (PHP API). The page would be written within seconds.

    Now to get on topic, does the patent office do any background checking on anything dealing with a computer program? Or do they just assume that since this was the first they read about this function, that it is obviously the first time it was implemented?

  45. Prior art. by Zeinfeld · · Score: 2
    I have prior art for the specific first patent claim sent to the www-talk mailing list in 1994.

    The HTTP protocol itself has had Jeff Moghul's cahce optimization protocol in it since at least 1996.

    It is yet another bogus patent. Time to use the proposal I made of issuing a civil action for perjury against people making fraudulent patent claims. I suspect that approach would cut down on the number of bogus applications.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
    1. Re:Prior art. by Zeinfeld · · Score: 2
      Publically avaliable prior art: the [Harvest] distributed Internet search system, programmed in 1994, and still freely available for download, compilation and use today, includes exactly what is claimed here. (Related to Zeinfeld's work?)

      I had forgot how Harvest worked, I suspect that the number of like cases is very large.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
  46. Patents become irrelevant by MrNovember · · Score: 3
    When laws such as copyright and patent become misused in idiotic ways, the masses will simply ignore them in what amounts to large scale civil disobedience.

    The danger of patents like these is not, IMHO, that someone is going to ask you to pay a license fee for your two line Perl program that uses checksumming but that when you really invent something original and worthwhile, patent protection will have been rendered meaningless by people simply ignoring it.

  47. Re:Your own evidence refutes you. by poemofatic · · Score: 2

    I believe people who work hard and ethically have a right to their billion dollars.


    Hello? Heelllooo?!

    No one makes a billion dollars by working 100,000 times harder than someone making 10K.
    They make a billion dollars by having a horde of people who are earning 10K work for them. Check out Nike.
    Phil Knight doesn't work any harder than the Vietnamese girls who make the shoes. Those girls are not *lazy*.

    He makes his money by siphoning off the value from their labor, since they work in a corrupt government where unions and occupational safety codes are written by dictators who have no interest in protecting these "lazy" poor people.

    There is no relationship, for example, between executive compensation and productivity.

    What really lets people make huge amounts of money is not hard work (the mexicans who wash the dishes in the restaurant where you dine are working very hard) and it's not intelligence (the college prof's who taught you are probably pulling in 60K on average. The grad students are making 15-20K) but it's being able to position yourself into a role where you either manage people, or money, or both. Or maybe get a fat government monopoly on something (i.e. patents) that others use and skim off of their income. That, or just let your money "work" for you.

    In either case the key to making big bucks is to park your behind right in the middle of some productivity intersection, and start taking tolls..


    And if any one objects, there will always be Ayn Rand worshipping idealogues such as yourself to keep up the PR war, believing that this is somehow the ethical way to do business.

    --

    When in doubt, have a man come through a door with a gun in his hand.

  48. Individual patents, no. Collective patents, yes by petri+dish · · Score: 2

    Does an individual deserve to own a patent on checksumming? Surely not. But is there an argument to be made for collective ownership of the patent? I believe there is.

    You see, when a patent is granted to an individual, the benefits aren't accrued solely by the individual. The entire society benefits, because that country now possesses a citizen who owns the patent and can wield it against other countries' citizens. The GNP is in whole raised because of efforts like these.

    You can imagine how much richer the US economy would have been if we'd managed to patent the transistor before Japan got its own electronics markets running. You can imagine how much safer the world would be from nuclear warfare if the US had successfully patented atomic weapons before the Russians got their own projects going. Though the lifespan of a patent is only about 18 years, that would have been enough time to get some diplomatic solutions in place and prevent the escalated arms races of the Cold War.

    What does this have to do with checksumming? Not much, I'm afraid. That's a stupid patent and we all know it. But let's not cut off our nose to spite our face when so much good can be done by a proper patent system.

  49. No worries by redcup · · Score: 2

    Ha! I just patented 1-Click check sums... The rest of you will have to use the inferior "2-click" check sum...

    --

    RC