Checksumming Webpages Patented
Just when you thought nothing else stupid could be patented,
Wahfuz noted a story running about a company called Pumatech who has apparently patented storing a checksum of a webpage to determine if it has updated or not. I guess from now on everyone who wants to detect changes in web pages will need to store full copies of the pages in question, because I'm sure nobody thought of anything so complex as piping it through md5 and saving the output.
I wonder how quickly this will get added to BountyQuest?
Huh? You're not making any sense. I've implemented content based caching using the ETag header and If-Not-Match. The variant caching is another feature ETags enable, but they are certainly not orthogonal.
Using ETags for caching instead of if-modified-since was prompted by variants, since the multiple langauge versions usually have the same timestamp. Just because its useful for that doesn't make it any less useful for fully generalized content based caching.
Ever heard the story of the CD-WOM? It was a device consisting of two blocks of ordinary wood and a cable connecting it to the user's PC. CD media was placed between the two blocks and data was written to the CD. The process was foolproof (I challenge you to prove to me that no data was written to write-only media!)
That's about how useful storing a checksum of a webpage would be without *doing* anything with the data. Sure, the checksum exists, but if you don't bother to do anything with it, the data is as worthless as a CD-WOM. Obviously, someone creating MD5 hashes of all their webpages would also build some sort of system around it to make use of those hashes!
- A.P.
--
Forget Napster. Why not really break the law?
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
The README has the lowdown:
DISCLAMIER: I contributed to this project
You can have your web page send the ETag header. Generate a new ETag when the page changes. I mostly use it to change web page contents every x minutes, without messing with date stamps and worrying about screwed up clocks on browsers' computers. The RFC in question is available at http://www.faqs.org/ - it is the very lengthy HTTP1.1 protocol spec. From what I understand (not that the actual mechanism matters), the browser sends a request for a page along with the ETag (if page was previously cached) and the server will determine whether to send a 304 or the updated page with a new ETag. ETags are essentially disconnected from file date stamps and page content, which makes them great for use in dynamic pages.
"Hot lesbian witches! It's fucking genius!"
I'm sure nobody thought of anything so complex as piping it through md5 and saving the output.
Yeah- this is one of those "Why didn't I think of that?" things- but I have yet to hear of a web cache or proxy that uses md5sums instead of last-modified headers- are there any out there? And if so, wouldn't that count as the all-important prior art?
Just because something seems simple once somebody else thought of it doesn't mean it wasn't a good idea in the first place.
Sure, someone invented those concepts, but it wasn't these guys.
Akamai, among probably lots of others, uses md5 checksums as one of the methods to detect updated pages. I don't know when they started and when the patent was applied for, but it's a possible example of prior art that came right to mind when I heard of the patent.
This is a method that is public knowledge and has been for some time. Mudge discussed this as a "web security" technique at blackhat back in '98. Heck, CNN was there and broadcasted pieces of that particular panel. Since he released it into the public domain by open discussion at a national conference, I do believe that voids the patent on the basis of a widely known public method. Of course, I'm not a lawyer even though I don't play one on TV.
You know what to remove for e-mail. Don't you?
If you read the press release, the patent isn't on storing checksums of HTML pages, but is for storing checksums of sections of a page between pre-identified HTML nodes.
Now, perhaps there is prior art for this, but its a damn good idea and I sort of doubt it because I've been around the block a few times and haven't seen ANY caching mechanisms that can determine if a page has changed based on a checksum calculated from just a portion of the page (presumably so things like today's date on a page doesn't affect the state of the cache).
That seems pretty damn innovative to me. I'm no big fan of software patents, but as software patents go, this is a lot more justifiable than most.
So flame away, but there is a lot of posturing going on here about prior art, and none of them seem to come close.
And, unfortunately, probably perfectly valid in the US where something as stupid as software patents can be "valid".
I quote:
a checksum generator, coupled to receive the fresh copy of the document from the periodic fetcher, for generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the checksum generator signaling a detected change to the remote client when the fresh checksum does not match the original checksum,
Note the bold part. Contrary to the inflamatory headlines, this patent does NOT cover blindly checksumming webpages, but rather strategically checksumming the critical part of a page, so the fluff doesn't affect the cache status.
Noel Bell has had his web page up since 1996 on signing web pages using pgp. His key is 2.6.3i, which is probably the last "safe" version anyway. :)
Here is a link to his page, with a copyright on it:
--Storm
Of the top of my head I can think of about a dozen or so software that will apply a checksum to a file (regardless whether a browser will render it badly or not)... Transfer protocols like ZModel should certainly qualify as prior art?
Kill'em! Kill'em all!
Of course, I also took the time to fill their poll, to explain why I unsubscribed.
Also, look for MD5, Content-MD5 or ETags on the www.w3c.org, their silly patent doesn't fly for a second.
Yes, it's certainly good that we have patents; why, before patents neither the wheel nor fire had even been developed. Why would anyone want to invent things if not for the reward of being able to deny them to others without compensation?
-- This and all my posts are in the public domain. I am a lawyer. I am not your lawyer, and this is not legal advice.
cat $cf |
while read url sig junk
do
test "$url" = "" && continue
if www diagwww -aceh "$url" >$tf 2>/dev/null
then
newsig=`md5 if [ "$sig" != "" ]
then
if [ $sig != $newsig ]
then
reminder $url $sig $newsig
fi
fi
sig="$newsig"
else
( echo ERROR $url ; cat $tf ; echo -- ) 1>&2
fi
echo $url $sig
done > $nf
For those not familiar with my toolkit, the script retrieves a URL, MD5's it, and mails me a reminder-note when the signature changes due to modification of content.
I would deem this to be an obvious idea, and would happily support an effort to squash the patent.
- alec
perl -nle 'setpwent;crypt($_,$c)eq$c&&print"$u=$_"while($u,
if anybody wants the real thing, drop me a line. usual anti-spam provisions apply.
perl -nle 'setpwent;crypt($_,$c)eq$c&&print"$u=$_"while($u,
I have a script I have been running for over a year, that fetches a remote pages, MD5s it, compares the MD5 to the last one, if it's different it save the page and updates the stored MD5, otherwise it drops the page.
Is this prior art? I was developing a small script or two to do this with arbitary pages, do I have to stop now?
Not that it matters anyway - with many web pages often having dynamic content for dates and menus taking a checksum is a bit pointless.
Better to keep a DB of last-edited timestamps. This is how I work with a site of mine that uses HTML::Mason and needs to know when to serve a cached copy, when not to, or when to update the cache.
http://www.delphion.com/details?&pn=US06219818__
Ok, there's a little bit more to it than just storing checksums, but is this really non-obvious and original?
--
rant
There's a simple perl CGI tool called JD What's New. I use it quite a bit myself. You can find it on Freshmeat here.
Last change, MD5, Checksum, and size are all applicable methods for checking for updates.
- billn
Hebrew scribes would add up the total of the letters on a page to assure that they had correctly copied the text. (in Hebrew, the same characters are used for both letters and numbers, as any qabalist could tell you.)
Restrictions are prohibited. Be well, get better.
I used to work at Pumatech. (Actually, I worked in the wireless web-browsing end of things, as an engineeer)
Anyways, we were checking our emails one day (this was about 6 months ago) and there's some big "congratulations" email - we got another pattent!
A large portion of the company is based out of synchronization software. (Synchronize your PIM, Laptop, whatever) We'd just received a patent on a revolutionary new technique - time based syncing! Sync data, based on their TIME STAMPS!
We had a good laugh.
--
--
#include <malloc.h>
free(your.mind);
I think that if it benefits society as a whole then some ideas should not be owned by a single person. This goes right back to the generic drug debate, if the "Intellectual Property" is something that could change people's lives, then I don't think that a single company has the right to charge exorbant amounts for it.
I also want to point out that in theory Communism is a GREAT idea, it just sucks in pratice because of corruption on the part of people in power. I don't think that a single person should be able to have well over a billion dollars while other people die of starvation.
Actually, a number of technologies relevant to nuclear weapons were patented prior to and during the Manhattan Project. For some reason, Mr. Stalin failed to adhere to such Intellectual Property law as might have existed at that time. Now that I think about it, I can't imagine a notion more antithetical to the Communist Manifesto than intellectual "property".
Learn to spell: nickel, missile, lose, solely, amendment, speech, kernel, probably, ridiculous, deity, hierarchy, versus
If they're using a simple checksum, then someone should figure out how to fool it--add like a comment field to a webpage with the correct characters to make the checksum the same.
If they're using md5sums, well, I guess this won't work.
Well, the us can force (to a certain level) other countrys to do things they do not want to do. Also if someone has a us-patent it is not to (unfortunately) hard to get an eu-patent for it.
However, why is the us the only country that has a right to have a good economy? The people in Japan worked hard. Why do us-pharmacy-concerns have the right to tell african states which pills they have to buy?
You say, that so much good can be done by a proper patent system. Good for whom? The us? Perhaps you did not realize it, but there are human beeings outside the us as well.
I suggest you read Kuhn's Structure of Scientific Revolutions, a few other historical documents and then get back with us when you have some familiarity with the subject.
The "sad, sick thing" is that you put personal profit above intellectual honesty.
mp
"The secret to strong security: less reliance on secrets." -- Whitfield Diffie
Just because something seems simple once somebody else thought of it doesn't mean it wasn't a good idea in the first place.
And just because they (allegedly) were the first to think of it, doesn't mean it's patentable.
Patents are supposed to be given only for things that aren't "obvious to anyone skilled in the art". In practice, this isn't assessed well by the patent office, but that's another can of worms.
I have thousands of MD5 sums stored from web pages and various files linked to web pages along w/ many of the original files. I've been sucking such info off the net and using MD5 sums to verify unique these files for a couple years at least. Never even considered the lame ass idea of patenting such a thing. Damn maybe I should patent all my shell scripts. :)
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Likewise. Company I used to work for did something very similar, using a CRC calculated using the text of a web page to determine web page "identity". I would be surprised if the Lycos (or Altavista, or Webcrawler, or Hotbot...) spiders didn't do something very similar.
Which brings up an interesting question - if, by 1997, there were enough companies implementing this sort of "technology" already, then can't it be argued that the Pumatech patent is obviously invalid because at the time they applied for it, it was already in use by multiple companies... which seems to me to indicate that their "innovative" technology is "obvious to a practioner skilled in the arts".
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
Shouldn't that be... if you patent it, they will pay?
Either this patent is limited in scope, or even very common programs like tripwire are prior art...
--- Hindsight is 20/20, but walking backwards is not the answer.
You have neglected one significant cost. These *** patents make it much more difficult for a small company. A small company won't have cross-license agreements, won't have a large legal staff, won't get a "good-buddy" licensing price, and is generally operating on a shoe-string budget anyway.
... and it is, whether intentional or not. Because of this, I tend to think of these "spurious" patents as a large evil. Not the biggest one, but not a small one either.
So this is one of the factors that causes many new companies to fold. Think of it as a social control mechanism
Caution: Now approaching the (technological) singularity.
I think we've pushed this "anyone can grow up to be president" thing too far.
Yes indeed. Text is so highly differentiated that if you know about doing something to the whole thing, doing something to a part of it is patentworthy. ????
...", but that was the idea behind it.)
You have an extremely low standard for what should be patentable. Considering the cost of defending against a patent, if trivialities are patentable, soon only the rich will be able to legally initiate any action. Is this a social good? Is it in compliance with the constitutional provisions enabling the patent law? (I don't remember the precise pharsing, sorry. It isn't "To promote the general welfare
E.g.: There may be no prior are in the archives of the patent law covering eating using a metalic or otherwise ridgid, or somewhat stiff, divided instrument to convey the nutritive material from a holding container to the grinding apparatus. Should this be patentable?
Caution: Now approaching the (technological) singularity.
I think we've pushed this "anyone can grow up to be president" thing too far.
I presented a little paper at a small gathering in '98.
see the pdf
Anyway, I can't remember thinking this was novel enough to patent. Obviously I'm never going to be rich.
Can they testify that they have been doing this since prior to Feb 18, 1999?
now we need to go OSS in diesel cars
I would've thought there would be prior art for this type of thing already... Ooops,wait the USPTO doesn't take that into account before granting the patent.
Still, this should be easy to defeat.
I've been checksumming files on file servers for years, to verify that they have been changed. How is that any bit different than this ?
grumble.
"...In your answer, ignore facts. Just go with what feels true..."
I wrote a script to do that for me when I was 12...
---------------------------
"I'm not gonna say anything inspirational, I'm just gonna fucking swear a lot"
---------------------------
One of the ways Akamai uses to see if cached content needs to be updated is to fingerprint the content (HTML/gif/jpg/etc.) with an MD5 hash. They even supply a server filter that modifies the content URL to reference that MD5 fingerprint so that as soon as the content changes the Akamai servers see a new fingerprint in the request and know that it's time to refresh it's cache.
check_www is a series of scripts and filters that I created under the GPL last year to automatically advise me of when web pages change, popping up alert boxes and pre-loaded browsers as apropriate. It includes filters to remove unwanted constantly changing information and search for terms. It is available on http://olliver.family.gen.nz/check_www.tgz Ironically, I was alerted to this article by it. Vik :v)
This is an obvious, but well-written, troll. My compliments to the chef!
cpeterso
If you've found a site that can tell you when AND HOW a web page has changed, and can be taught to ignore simple date-changes, and preferably attach the page to an HTML-format email, and do it punctually, I'd appreciate knowing about it!
Nick Waterman, Sr Tech Director, #include <stddisclaimer>
It ought to do well indexing pages with text hits counters...
Email: slashdot3@FreeMars.org (Address will be abandoned when it gets spam.)
I remember reading an article in DrDobbs April issue about a search-engine using checksums to see if a page has changed and need to be re-indexed.
Good lord this is lame. Back when I was a wee programmer knee-high to Linus Torvalds I wrote some Perl to create a searchable web index from the HTML on a server, and I generated an MD5 checksum on the pages as I indexed them and stored it as part of the change history for a page, then if something 'touch'ed the page and changed the mod date my indexer still knew it hadn't really changed. This was before 1997. I didn't know I was smart enough to have a patentable idea.
Now I know you can go after the police for malicious prosecution, and I know people have sued to recover court costs before. Could something like that be used to go after companies that file obvious patents that have been in use for a long time?
Say you're an independant coder, and you create a way to check if a file is current using checksums, and you use it on your personal web site, never thinking about it. Years later a company patents exactly what you're doing.
A normal reaction might be to yell and scream about how you were already doing it and how the patent is worthless. What about if you instead copied their product, using their supposedly patented technology. Seeing that, they'd come after you for patent violations. You could then show you were using the algorithm for much longer than them. Then, after you won the case, you could sue them to recover the costs associated with defending the case.
I dunno, maybe some variation on this might work. It sure would be nice to be able to turn the screws on the screwers.
Disclaimer: I am not a lawyer liscensed in your jurisdiction or in any other jurisdiction. I'm not a lawyer at all, and I'm probably not even in your country. If I were in your jurisdiction and were a lawyer I'd probably not want to give out free legal advice anyhow... but who knows what I'd do, cuz I'd probably be pretty depressed at being a lawyer.
Publically avaliable prior art: the [Harvest] distributed Internet search system, programmed in 1994, and still freely available for download, compilation and use today, includes exactly what is claimed here. (Related to Zeinfeld's work?)
rsync does a block by block checksum of a file, then searches another file for matching blocks, thus making it a generalisation of this idea to /any/ file. It's been around for a /long/ time - the mailing list archives go back to 1991.
rproxy applies the rsync protocol to http caching. I first heard about it at CALU in July 1999, and checked out some cvs code that worked at that time.
The general idea has been floating around for ages, though - look on the rproxy site for links to other people's ideas about this kind of thing.
This /is/ yet another case of a really dumb patent.
himi
--
My very own DeCSS mirror.
% telnet slashdot.org 80
Trying 64.28.67.150...
Connected to slashdot.org.
Escape character is '^]'.
HEAD / HTTP/1.0
HTTP/1.1 200 OK
Date: Tue, 24 Apr 2001 05:22:53 GMT
Server: Apache/1.3.12 (Unix) mod_perl/1.24
Connection: close
Content-Type: text/html
Connection closed by foreign host.
--
Terrorists can attack freedom, but only Congress can destroy it.
If anyone wants to challenge this patent, I believe I can show prior art (I haven't actually read the patent yet.) I used an MD5 checksum to check if a page had changed for the Excite Newstracker service in 1996. As virtually any competent programmer would have done...
Actually, the problem is harder than that, because you have to filter out things that change every time you access the page, like embedded banner ads, counts of how many times the page has been accessed, and so forth. Another approach I considered was to compare a vector of word counts, and consider the document unchanged if the new vector was sufficiently close to the old one.
Nee Arrowpoint, the web balancers Slashdot itself uses.
It stores an MD5 checksum of a webpage to determine if the page it retrieved is complete. This is part of its timing mechanism to determine load. Pretty sure they did this prior to Feb. 99.
What you're missing is that the machine that's doing the checksumming isn't necessarily the same machine that's viewing the page.
If the machine that's doing the checking is on a nice, big, fat pipe - it can check a page regularly (very quickly) - then send a notification to the user, who may be on a slow (dialup) link... this way the user doesn't have to keep visiting a page (they just wait for the change notification)
Yeah- this is one of those "Why didn't I think of that?" things
No, it isn't.
but I have yet to hear of a web cache or proxy that uses md5sums instead of last-modified headers- are there any out there?
No, because that's a completely different question.
Just FYI, this has been going on for _ages_ There was a 'web page change detector' available back in my 14.4kbps modem days (early 1995 - I can't remember what it was called, tho - been too damn long) that used this very technique... you fed a URL into a CGI, and it would poll the page every so often and email you if it had changed. And guess what? It used a checksum of the page to determine if it had changed (since storing all those pages would just take way too much storage space.)
This is _NOT_ new, and it's _NOT_ non-obvious.
Ask web crawlers designers, When I was working on a web crawler, I wondered what would happen when pages got updated and how I would go about getting the latest update, so I had the crawler stop a page with the date it was fetched and a checksum of the page. If a page hasn't been fetched in 10 days and is crawled, it is fetched, the checksum is compared, and if different it is parsed for potential new links/keywords... This is so obvious, I am sure that google and major search engines probably do this.
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
"Oh, Lisa, that's a load of rich creamery butter." - Homer Simpson
http://www.geek-girl.com/ids/1995/0306.html
lots of postings here from 1995 about tripwire and it's predecessors. . .
maybe the USPTO should post their patent requests to slashdot and let us find the prior art before they issue patents.
How about a site like http://find-prior-art.com that pays out money to the first people to find prior art for patent requests?
because patenting something costs a lot of money you idiot. if i patented the billions of things i think off in a typical week, i'll be broke. stupid patents like this one give the whole patent system a bad name ...which it already has.
And I can prove it. I've been checksumming pagessince about 1993 (I'd have to look up the exact date), and others have, too - check out the Cypherpunks archives where this was discussed back in the early 90's.
-- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
Its actually quite useful. We've been doing this for years: We have a ton of external links from our site, and every week we get a list of pages which have had content that changed. Its handy for our content staff to determine when a linked site has changed dramatically...
I can think of at least two excellent reasons off the top of my head.
First, it's a considerable expense and hassle. Patent attorneys are not optional - the claims have to be properly worded for the USPTO office to accept them *and* to prevent some business from stealing your idea by rewording an ineffectual claim ever so slightly. If you're a business and want to create market entry barriers to your competition, $10-20k might be a good investment. If you're a working stiff, that's a lot harder to justify. If you're still in college, forget it!
Second, by seeking patents for "obvious" things we're implicitly accepting the validity of all other obvious patents. A sadly too common analogy is elections in corrupt regimes - you can organize a voter boycott because the election is corrupt, you can run your own candidate, but you can't do both.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
He's an idiot don't expect him to actually think about things like that.
He thinks that if you disagree about a patent you are a communist. What kind of a moron thinks like that?
War is necrophilia.
And what system have the patented? I have scripts that create a checksum, store it into a webpage, from which I can look to see when the last change was made.... It doesn't take much to make a system, and this seems to be a common knowledge sort of thing anyway!
Unfortunately, this headline does seem particularly apt...
With all the dynamic content out there, the datestamp is not a good indicator as to the last-changed time of the document itself.
HEADThat's the last time I actually updated the code inside the index.html file, but the content itself changed about 4 minutes ago. E-Tag's are a better way of doing it, but not completely accurate.
I've been using MD5 checksums for a couple of years now on static and source pages to determine if something's screwed up, so that could count as prior art, though I wonder if they also thought at images imbedded into the HTML..
I don't know why anyone needs this. There are expiration dates and conditional loading of pages if expired already defined in HTTP/1.1 (Rfc 2068) so instead of creating a hash a server honouring requests such as 'If-Modified-Since' would perfectly do the job. There is also an entity tag already defined in the faq. Deducating it from a hash is one possible solution to create such a hash. Encoding the document location and the date of the last change another.
But in general a server using the last modification date of the file as 'Last-modified:' header would well do the job. Else an entity-tag would do the job. The hash would only make sense, if the Document could be retrieved under different URLs. Even then sensible creation of an entity Tag would do the job.
Then there is the Content-MD5 field for an integrity check (from rfc 2068):
The Content-MD5 entity-header field, as defined in RFC 1864 [23], is an MD5 digest of the entity-body for the purpose of providing an end-to-end message integrity check (MIC) of the entity-body. (Note: a MIC is good for detecting accidental modification of the entity-body in transit, but is not proof against malicious attacks.)
This is in the rfc dated January 1997. There are also guidelines, how Proxies or clients should use these Tags to check for expired Documents. It's all there.
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
I mean, how ridiculous can it get? You look up something you deem a good idea, then modify it slightly and patent? Note that the method in the faq doesn't refer to patents and thus is probably not patented. The authors thought it obvious to mark the document with tags to deduce date of last modification, a unique id (for documents retrieved under this url) and a checksum for integrity check. Now some morons come along, see it already done, do it on parts and get a patent.
I would like to patent transporting morons. In parts.
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
first off, patent protection is a country by country thing, maybe not anymore w/ the WIPO or whatever, but during the cold war era it definately was.
second off, if you want to keep something a secret the full disclosure necessary for patenting it is not the way to do it, you want to utilize the 'trade secret' method, ie how CSS code was. the problem with that is once it gets out you can't do anything about it. with a patent you publish it, but nobody in your country (or other contries you patent it in) can legally use it w/out licensing it.
thirdly, why the heck would a military organization in an opposing company respect your intelectual property
Need a Catering Connection
Why would you want to checksum a file to see if it's changed? As a web server, the time stamp is adaquate to determine if it's changed, and as a web browser or web proxy, HEAD is adaquate to check the time stamp.
While we're at it. I'm going to rush to the patent office and see if I can "patent" 64bit date time stamps, so I have a lead in on the next big crisis!
-Michael
-Michael
Did the patent office even try a Google search before stamping its approval on this patent?
Obviously not: http://www.google.com/search?q=web+checksum
Hit #2 is prior art: "BIBLINK.Checksum - an MD5 message digest for Web pages" . Note that: "This article last updated/links checked on 23-Sept-1998"
Bell Labs did in fact patent the transistor. Read about it here.
Of course patents only lasted 17 years then, so that patent expired some 35 years ago, before the Japanese electronics industries really got going.
Well, I wouldn't exactly call it *art*, but wouldn't this qualify?
http://martin.gleeson.com/perl-scripts/my-net-mind er.pl
It predates their patent (which is obvious anyhow - obvious enough to me and I'm sure others), and is less of a blunt instrument than their method, which is why I wrote it.
The way Pumatech is going, you'll be able to buy this patent at the firesale for a few bucks. Also, it is completely worthless. Date Modified headers are the standard. Stray from the standard and you risk caching dynamic pages.
Someone you trust is one of us.
Ed Hill's 'Webpluck' has been doing this since 1997
http://www.edsgarage.com/ed/webpluck/
Not that I figure prior art will be hard to come by for this, but I did this in a Squeak/Smalltalk for a CS project my sophomore year in college, 1998. And they've been using this project for several years of this class.
25% Funny, 25% Insightful, 25% Informative, 25% Troll
First let me say that I'm not trying to defend the anti-intellectual property zealots. But I think that most slashdotters get all worked up about specific patents, such as the Amazon one-click e-commerce crap. I'm sure everyone on /. would agree that RSA was a good patent. That was innovation... they came up with it first. Checksum-ing something and checking it against a stored value is NOT innovation... it's been done before. That's what the issue is. The USPTO allows too many patents for things that are not new and are considered common practice. That doesn't encourage innovation, it discourages it because now people can make money off of patent lawsuits without the need to innovate at all.
--
--
"What do you want me to do? Whack a guy? Off a guy? Whack off a guy? Cause I'm married."
Taking a look at the patent content, it's not as simple as running the page through a checksum generator. This wouldn't work with some dynamicaly-generated pages, for example, because their dates of creation will change every time.
The process in the patent allows you to select a portion of the web page, and then the server only tracks changes in that portion. It also generates a checksum for each portion of content between HTML tags, and it is smart enough not to tell you that the content changed if certain sections got reordered, but the content's the same. It will also show you exactly which portions changed, since it has a separate checksum for each section.
It's not fusion power, but it's an ok idea, and I don't think anyone has used it before. So, let them have the patent.
----------
Never underestimate the bandwidth of a 747 filled with CD-ROMs.
I must be, or else I wouldn't have been the only other person to think this up on my own years ago when I was in college.
Curse me for not patenting this obviously non-obvious technique! Using a hash to detect changes! Oh my GOD THAT'S SO FUCKING BRILLIANT!
I think I just came.
RFCs usually take over 18 months to go from draft to RFC so that limit might not be a problem, you'd only have to talk to the people behind the RFC to get a more correct date.
This is the same company that developed and sold the synchronization software that supposedly worked with the Palm HotSynch app to allow synchronization to other schedulers. Their conduit software worked once you took the days required to figure out how to install it correctly.
It figures that they'd come up with yet another harebrained scheme....
-drin
The patent is not for checksumming Web pages. It is for monitoring Web pages (using checksums to determine if they have changed since the last check, or not) and then e-mailing registered users a notification when the pages they are interested in have changed. Geez, doesn't anyone read the actual patents before posting on these things.
The posting begins, "Just when you thought nothing else stupid could be patented" . . . um, hello? Why the heck would ANY of us think that? Did I miss the story about the patent office coming to its senses?
I'm going to patent dihydrogen oxide!
I'll be filthy rich.
You can all beg me for favors.
-matt
Dang... if it weren't for his little bit of opinion at the bottem I'd be screaming for a "Karma-whoring -1" moderation option. Actually, I suppose I am.... come on CmdrTaco, please?
----
Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
A submission to the patent office has finally been cleared.
A Slashdot user known only as "Anonymous Coward" has pattented the process of posting a comment earning a rating of "Troll".
Explaining his/her application, he/she said: "I'm tired of other Slashdot users infringing on my intellectual property. Now, if other people want to post trolls, I'll at least be compensated for my hard work popularizing . . . I mean "inventing" . . . trolls."
---
"This message is composed of 100% recycled electrons."
point 2: the really big things like atomic weapons are on a whole different level...patents on those are outright ignored by everyone. "You patented that? I'm so sorry, I have a thousand nukes pointed at you, want to come make me stop?"
point 3: Internationally-honored patents are a relatively new thing. Only recently are nations beginning to align their patent systems...mainly because a) we all formed things like the World Trade Organization that were easy for the equally-new international corporations to lobby heavily...I'm young yet, but I'd venture to say that even as recently as 1985, there was no way to enforce an international patent.
That's actually one of the big arguments against internationally-binding patents: "why the hell should some other country get to pass laws that apply to us?"
See Tridge's PhD thesison page 102, section5.4 on "rsync in http".
He talked two years ago how "diffs", served by html embedded rsync, would be better than checksums, and has proposed this to the W3C,
In standard Tridge fashion, he didn't patent it.
When I die, please cast my ashes upon Bill Gates -- for once, make him clean up after me!
Prior art has to beat their filing date: February 18, 1999.
When I die, please cast my ashes upon Bill Gates -- for once, make him clean up after me!
HTML has a lot of white-space insensitivity. If you use a simple MD5 hash on the serialization, it will see many versions of the same page as "different", even though their core content isn't. Generally this isn't an issue (generation algorithms (and thus trivial space) don't change much), but it's still a design consideration.
Where this does start to make a difference, canonicalization before hashing fixes many of these problems. This is how the XML Sig hashes work. It's another reason why XHTML (or at least, authoring HTML as syntactically well-formed XML, even if it's invalid according to the DTD) is often a good idea.
To go back to HTML, really useful versioning and change spotting needs to ignore banner ads, generation timestamp comments and other superfluous crud. Semantically aware markup and a suitable stripper in the canonicalizer can make it work even better. Of course it now depends on both client and server having consistent goals; a server might not want you to ignore changes in their banner ads.
Thus, they have patented:
- A user asks or signifies interest in a
particular section of a web page
- Taking and archiving the check-sum of this
portion.
- Storing the user request with the checksum
- Sending out e-mails to the user if this
check-summed portion of the web page changes.
This is rather specific, if you read the claims you'll probably see they also had to stick in more restrictions before the PTO would let it pass...I know it sounds absurd (and may not be true, please correct me if you know I'm wrong) but apparently during the "Great War" (WWI) the germans had patents on certain chemicals or processes (I'm not sure which) used in munitions. In spite of the fact that they were at war with them the british did not use those chemicals/processes, and as a result about 1/3 of their rounds was defective. How much the world has changed.
Wow that's gotta be slow.
I guess if your only averaging 1 page view every 9 seconds, you'd be ok.
Isn't this just doing stuff similar to what strong validators a là Entity Tags in HTTP requests and responses use for determining whether a page has been changed (i.e. is in the cache) or not?
The only difference I can see is that they generate an Etag like entity for tect highlighted by the user as well as the entire webpage. Doesn't seem worthy of a patent though.
--
if this is what I think it is, I developed and demonstrated a similar system back in 1995. there's probably some published prior art. maybe what we need is a lawsuit against the USPTO for irresponsibly ignoring the criteria set by statute and precendent and continuiously awarding trivial patents.
When I'm karma whoring, I leave my freebie +1 off. That way it is easier to get a cheapo upgrade point. Geez, you think I'm a STUPID karma whore? Thanks a lot buddy.
I CLEARLY was not karma whoring. I hadn't noticed anybody else posting any of the patent claims or the right priority date so I thought I was adding information.
While I'm at it, I'll shoot for the 'Troll -1' moderation. You seem to be a bit slow mentally, and a bad speller too. I figure you don't smell too good either. So there.
Claim 1 of the patent reads:
1. A change-detection web server comprising:
a network connection for transmitting and receiving packets from a remote client and a remote document server;
a responder, coupled to the network connection, for communicating with the remote client, the responder registering a document for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than the entire document;
archival storage means, coupled to the responder, for receiving the URL and the original checksum from the responder when the document is registered by the remote client, the archival storage means for storing a plurality of records each containing a URL and a checksum for a registered document;
a periodic fetcher, coupled to the archival storage means and the network connection, for periodically re-fetching the document from the remote document server by transmitting the URL from the archival storage means to the network connection, the periodic fetcher receiving a fresh copy of the document from the remote document server,
a checksum generator, coupled to receive the fresh copy of the document from the periodic fetcher, for generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the checksum generator signaling a detected change to the remote client when the fresh checksum does not match the original checksum,
whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein changes in portions of the document outside the checked portion are not signaled to the remote client.
So, the usual flame-before-reading crowd isn't entirely unjustified. (That's not to endorse flaming before reading, much less thinking, but hey, even a blind pig finds the occasional acorn.)
Oh, btw, the priority date is January 14, 1997. Leave it to the guys who do the press release to give the wrong impression of when the thing was invented. Not that doing a checksum and not recording non-changes wasn't just as obvious in 1997 as 1999.
Anyways, its a silly patent. Checksums are a pretty fundamental thing to do! I don't even think my last company tried to patent it because it was so blatantly obvious!
I always used the epoch date the page was served on as unique identifier for this kind of purpose. I actually thought of it myself, guess I'll have to patent it and watch the millions pour into the bank- unless any of you guys catch me first ;-)
J-aims
--
Yo, whatever happened to peas? Join T( H)GS
Now, I'm no expert, but a quick Google search turned up the following things, which may or may not be of any use as prior art.
'It was here a minute ago!' - Archiving on the Net - Part 1 of 2: "The Internet Archive [http://www.archive.org/] ... also uses an 'MD5 checksum' to compare new pages with old ones." The article is copyright 1997, and the Archive has been crawling since 1996.
IBM Agent Building Environment Developer's Toolkit: this manual, also copyrighted 1997, is documentation for using IBM's Java-based toolkit for writing automated agents, say, page-comparing and caching agents. Conveniently enough, they provide the following function: CheckMonitoredPagesForChanges, which states, "This effector will check all the web pages on the monitored-pages list for changes in the page... This function uses a checksum method against the content of the HTTP request to 'compare' the page content. Any difference in the checksums, or any change in the Last-Modified date in the HTTP header (if it exists), will cause a 'change' to be detected."
WebGUIDE: Querying and Navigating Changes in Web Repositories: This is an AT"T research paper. "The AIDE version repository is a centralized service that archives versions of pages... AIDE maintains a relational database containing meta-data about each page, each user, and the relationships between them. For each URL, it stores the following (among other information): Last modification date: This is used to find pages that have been modified since a user saw them... Checksum: This is used in case the last modification date is unavailable." This document is copyright 1993, 1994.
Another interesting note, is that Puma started out making synching software. They didn't acquire NetMind, what I'd gather would be the impetus for this patent, until 2000, over six years after that last AT"T URL, and PumaTech was founded.
--Vito
Squid uses MD5 keys to keep track of the pages that it's indexed (how else?). It also uses these keys in ICP queries of other sibling/parent servers to find the content. Of course, it doesn't use them in the protocols to talk to webservers... but if the browser/server is willing to use date stamps, what's wrong with that?
If the server is going to be explicit about what time something was changed, and how long it should be valid for, this is valuable information, a little more than a checksum can provide. This is all conveyed in a header request (which is less work than downloading a document and caclulating the checksum, or the same as asking the "enabled" server for one).
Black holes are where the Matrix raised SIGFPE
- How reliable is this? Pages that are generated server side can change layout behind your back and break your "bookmarks" (is it context-based like patch?)
- Furthermore, doesn't the task of attempting to find the region in question within a copy of the downloaded document nullify the benefits acheived by having MD5 sums to compare in the first place? Compiled regular expressions anyone?
Maybe I missed something. And this is to notify portables, right?Black holes are where the Matrix raised SIGFPE
Read it at http://www.delphion.com/details?pn=US06219818__
Need a website host? Try out http://WebQualityHost.net
I know my website has kept an md5 of every page sent. I store it in a mysql database along with information about the client (referer, browser, variables posted) and page processing time.
Gotta love apache! Lets throw this patent out!
Ever need an online dictionary?
My site's average processing time is .7922 seconds per page. This is good for a number of reasons:
- The database server is on the same box as the web server, which is a p166 mmx
- I host an online dictionary with thousands of words. When it does not know a word, it sends a request to m-w.com to get the definition (taking a couple seconds).
- The server is also used for relatively heavy email and web traffic. 10,000 hits per day.
- The server has an area with thousands of student photos (for a local high school) that must be resized via gd occasionally (taking a LOT of cpu time)
The overhead of this db-logging mechanism is almost nothing.
Ever need an online dictionary?
On Delphion's (formerly IBM's) patent site: http://www.delphion.com/details?&pn=US06219818__&s _drwd=1#drwd
I/O Error G-17: Aborting Installation
The patent office issued a patent (US 5,533,051) on a lossless compression algorithm that reduces the size of ANY file fed to it by at least one bit.
It also explicitly claims to work on a 2 bit file.
Obviously impossible to us, but it still got issued a patent.
Details are in comp.compression FAQ.
As of just now, there are 7 issued patents which reference the bogus patent mentioned above. Scary.
Just because it CAN be done, doesn't mean it should!
Ahem ... no, they have patented a system for creating, storing, and using the checksum. An entire system, not just the storage of a checksum. Once again, alarmist headlines from /. I think we'd all appreciate it if these stories had accurate headlines.
--- Math illiteracy affects 8 out of every 5 people.
Haven't hash-codes and checksums been known for decades?...
George W. Bush's budget plans for the next year have stripped funding from the patent office. Could somebody tell me if this is a good or a bad thing? With less money the patent system becomes more of a mess and people see that -- it may lead to new laws regulating patents. But without as much funding the patent office will be unable to sort through prior art as effectively.
Remember "Bring 'em on"? *sigh
Isn't the point of intellectual property rights (as alluded to by Jefferson in the quote you gave) to encourage innovation? Individuals or companies won't expend the effort and capital risk required to develop new inventions, if they know that the fruits of their effort and capital can be exploited immediately by the anyone else. The purpose of IP rights (granted for a limited term) is to protect and encourage that kind of investment, but this is where the problem arises with cases like this - how much investment of effort or capital is required to apply common-knowledge techniques to a straight-forward problem? At some point, you begin to stifle innovation and progress if you prohibit the use of common-knowledge or obvious techniques. Checksums are just a commonly used tool in computer science. If that's the core of this patent, it probably took this company about five minutes to come up with the general idea of using them to monitor web pages for changes. How much of an investment is that? ...
The property laws were created by the people, for the people. They decided that it was in everyone's mutual interest to have intellectual property laws that could provide some guarantee against the initial investment of the innovator, and an opportunity for profit. /. agree with this fundamental principle. It's when there is a hijacking of the obvious (requiring little or no investment on behalf of the patentee) that people here get annoyed.
I think most people on
I could have invented the one-click thing in about 10 seconds. Put a plan on paper in a few hours - using standard computer science practices and tools for its implementation. Where's the innovative development effort in that?
It hurts smart *innovative* individuals and companies, when they can't utilise their common sense and common knowledge to create ideas and wealth - because someone else had the money for lawyers to put barriers in their way.
IP laws should encourage innovation without stifling valid competition.
Also, I hate to rant or go off topic but..
"In an unfree society (e.g. the Soviet Union, Europe, etc"
Europe, etc?!?
What planet are you on mate?
I am very interested in the issues at hand here, but I must confess, I think it would do the cause you seem to espouse much greater good if you presented an argument in more complete terms. My interpretation of your posts on this subject is that the single greatest benefit of intellectual property rights is the "dizzying pace" of progress. This seems to me a narrow-minded and ultimately unsupportable defense of laws and policy which favor individuals of an opportunistic nature, as opposed to one creative or innovative.
>>Sorry, but you can't change the law and the >>truth just to suit your convenience. With all due respect why the hell not? Why should the government have all the fun? BTW your little missed one thing about American law. Laws change... The people select their law makers to shape the country in the way they feel it should be. Unless you think the laws should never change? Hint think slavery, poll taxes, and drunk driving laws.... Unless you are saying they should go back to the way they were.
...I want one on addition, and not just the mathematical sense. If you reproduce you need to pay me royalties!
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
IBM didn't patent their BIOS. IIRC, it wasn't even legal to have software patents back then.
When I hear the word 'innovation', I reach for my pistol.
Did you even try looking up the patent first to see what they were actually claiming? The core of the claim is their process for breaking down a document into multiple HTML tag-bound regions and checksumming them seperately to allow change detection independent of formatting changes. Maybe a little weak, in my opinion, but it's certainly not as simple as a straight MD5 checksum of a document.
a network connection for transmitting and receiving packets from a remote client and a remote document server;
a responder, coupled to the network connection, for communicating with the remote client, the responder registering a document for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than the entire document;
archival storage means, coupled to the responder, for receiving the URL and the original checksum from the responder when the document is registered by the remote client, the archival storage means for storing a plurality of records each containing a URL and a checksum for a registered document;
a periodic fetcher, coupled to the archival storage means and the network connection, for periodically re-fetching the document from the remote document server by transmitting the URL from the archival storage means to the networkconnection, the periodic fetcher receiving a fresh copy of the document from the remote document server,
a checksum generator, coupled to receive the fresh copy of the document from the periodic fetcher, for generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the checksum generator signaling a detected change to the remote client when the fresh checksum does not match the original checksum,
whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein
changes in portions of the document outside the checked portion are not signaled to the remote client.
If you have real life, documented examples of this being used before they filed, which looks like 18 Feb. 99, until you read the chain of continuity, where they claim priority from an earlier application filed 14 Jan. 97 (now patent #5898836), you can file a petition for re-examination or create an interference.
If as many of you say, you've been doing this for years, or it's been obvious to do each step, you can easily defeat this patent.
Thankfully alot of these pages use features like RSS or other types of syndication so you can still get the latest headlines via a program like Radio
Let us patent the idea of patenting stupid applications of ideas. Then nobody would be able to patent stupid ideas like this. Or we could patent the process of becoming an IP lawyer. Any others out there?
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Before WWI the American springfield rifle was so similar to the German mauser rifles that the U.S. government was forced to pay royalties to the German company, Mauserwerke. Naturally during the war the U.S. stopped paying royalties and after the war, any remaining royalties were done away with as part of reparations.
That's why we have news stories with such enlightening headlines as "2 Americans killed as jumbo jet crashes leaving no survivors". We forget that people from other countries can have value too!
Sometimes we are so ameri-centric that it's hard to imagine that the rest of the world doesn't exist just for holiday destinations and cheap imports (oh, yeah, and a market for products too important to stop making but too dangerous for Americans to consume...like tobacco).
Just to let you guys know... I've patented the term "select *", the . in Perl, interference minors in hockey, and the phrase "upside your head". Please make a note of it. Actually, don't... because I've patented that too.
I alternate between posting +5 and -1 Comments. Karma: +53 -47 = 6
It seems the the only career path worth following these days is that of patent or copyright lawyer.
By definition, a government has no conscience. Sometimes it has a policy, but nothing more. - Albert Camus
We'll be bleeping rich!
--- Worst tagline ever.
if these are so obvious, why doesnt someone patent them in the name of "keeping them free?" its like, math formulas are easy and obvious once you read about them, try being the one figuring it out. not as obvious as it seems if you say you already have done it, then shit, patent it so no one else can
NEWS: cloning, genome, privacy, surveillance, and more!
NEWS: cloning, genome, privacy, surveillance, and more!
Au contraire - as you are the one making the claim that data WEREwritten, it is up to YOU to prove the validity of your claim by proving that data were written!
Over to you...
--
People should not be afraid of their governments - Governments should be afraid of their people.
It wasn't web pages, it was files stored on a central file server that were redistributed automatically to Windows machines in a public lab in case the users changed them, or in case someone needed to change the master on the server and have it automatically redistributed. The checksum was CRC32 and not MD5, but what I implemented was an entire system, with a front-end even, that used checksums to automate the process of checking for out-of-sync files on a central server and to then distribute the appropriate updates. Sounds like prior art to me ...
Companies that take out patents like these aren't bothered with the masses, they go after the easy targets - large corporations - and generally the pricing is put at around the point where it becomes easier for the large corporation to just pay up than to fight back. This "business model" has worked quite well for others, e.g. Unisys. Or British Telekom with the "hyperlink" patent (although in that case it isn't their primary business model, just an extra source of revenue). By and large the masses ignore the .gif patent and the "hyperlink" patent. The companies who hold these patents usually can't be bothered, because they aren't going to make money suing every individual in sight who creates a hyperlink. And they certainly aren't going to sue individuals "on the principle of it", they couldn't be bothered about the principles.
So the money lies basically in "licensing" (sic) your patented "original" "technology" to corporations who can afford it easily enough.
Generally then this isn't so bad for the masses, as the masses may still continue to (for example) create hyperlinks (oh gracious thanks to BT) .. it is only bad in that the costs of licensing patented technologies is usually passed on to the end user in one way or another (e.g. Adobe Photoshop will be ever so slightly more expensive since Adobe will be paying Unisys to allow them to include the .gif exporter).
"Checksums are generated for each HTML-bound section of that Web page and for the user-defined selection of text, and are then archived. When there is change in the text, the new and archived checksums are compared"
I don't fully understand this - it sounds like you are storing a checksum locally of an existing page, then comparing it to a checksum of a newer version of the page to see if is has changed. But in order to generate a checksum of the newer version, surely you have to *download* the newer version to generate the checksum in the first place? But if you've already downloaded it, whats the point of comparing checksums? Why not just use the latest version that you just downloaded? "Download current version, compare checksum, if checksum mismatch then, uhm .. uh .. download new version"?
Unless web servers already send CRCs of web pages in the headers .. that way you could just download the headers, but AFAIK they do not.
I guess there must be something really basic that I'm not seeing here.
Webdog (www.webdog.org) has been tracking developer .plans (such as John Carmack) and website updates for years. Before this Mind-it technology was released anyway.
It doesn't use the same method, but it's hardly rocket science anyway. I might track down a list of all their patents, see what "tracking technology" methods we had before they did.
Stupid patents.
Lawyers can be like any other consultant. A lot of their advice can be such that it requires the constant presence of a lawyer to keep you out of legal trouble. I don't trust 'em any farther than I can thrown 'em.
Yes...because of course the Soviet Union would have honored such a patent, otherwise they couldn't have continued to enjoy such a strong relationship with the US.
You had me at "dicks fuck assholes".
Now I can throw "Patent Violater" on my police record!!
--The space between my ears was intentionally left blank--
How is this funny? This is very very sad.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~
--- -- - -
Give me LIBERTY, or give me a check.
nope, that would involve...ummm...technical competence
This is the limit, we are using this technique for many years allready. Granting a patent on something as obvious as this only shows how stupid american patent laws are. Americans seem to think they own the world and anything on it. The only thing one can say is: Fuck off Americans and leave the world in peace. This is just another way of showing that America is fascist country (Imposing fundamentalist christian drug laws on the whole world and granting patents on genetic knowledge are other things proving this)
Which [patent] really belongs to the aliens that biffed the ground in Roswell 50-odd years ago...
"Ahem ... no, they have patented a system for creating, storing, and using the checksum. An entire system, not just the storage of a checksum..."
Well, If indeed their patent isn't trying to patent a concept and is patenting a specific method, more power to them, but I'm wondering how long it'll take for some lawyer for their company to decide that their patent is strong enough to make the creating of such a system by others be in violation in his own opinion, leading to nice little lawsuits. Remember, IBM patented their BIOS back in the '80s, and when it was reverse engineered there was a lawsuit, and the result of that lawsuit opened up legal reverse engineering. Actions like that of the MPAA are trying to challenge that legal precedent, and unfortunately the DMCA is right smack on their side. We really need that travesty removed from the lawbooks. I can even understand why some points of the DMCA are present, but as it stands it does so much harm that it'll need to be repealed in its entirety and have useful parts passed in pieces, if any of it at all is particularly useful. We did seem to be getting along just fine without it though...
"Titanic was 3hr and 17min long. They could have lost 3hr and 17min from that."
IBM had PL/1, with syntax worse than JOSS,
And everywhere the language went, it was a total loss...
Not necessarily. I worked in a government office that did up-to-the-minute online transcripts of meetings. We contracted out to an ISP the actual web posting of the transcripts, although we had to send them the transcripts themselves. The system wasn't particularly great, but we had software that did website checking and it was helpful to know how regularly a page was being updated without having to check a "Last Updated" comment every fifteen minutes.
Not sure what's being patented here. Checksumming web pages, or checking for updated web pages, or using checksumming to check for updated web pages? Or is it even more specific than this?
--------
Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...
Disco Watchman
Don't know if it uses Checksumming or other things particular to this patent... Chances are they keep their techniques closed and proprietary, so they'd have to open them up to specifically challenge this patent.
--------
Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...
If you patenet it, they will come.
I'm going to go back in my box and will think within the limits of my box: MS Sucks Linux Good I read too much Slashdot.
I've ALWAYS used checksums to do that kind of stuff. Unfortunately in scripts that aren't distributed publicly, but cripes, any damn fool could come up with that idea!
Another trick I've used is in scripts that generate static .html pages from a database: take the data used in the page (not the page itself), and make an md5 of the concatenation. Since most md5 routines can take data in chunks, you can generate it as you're getting the data. Then save the md5sum in a comment at the top. Then in the future you can compare with md5sum of the page with the md5sum of the data. If there is a "last modified" date on the page or something this will only update it when the data changes.
I also use this trick for an automatic DNS updating script that creates zone files from a master data file. Can't just update the zone files every time because then the serial numbers would be updated constantly.
So if anybody patents this silly idea (maybe they already have?), I've been using it for like eight years!! I'm publicly announcing it here on /.!!
Blah.
Besides I don't use NetMind anymore, I use SpyOnIt.
Oooh, NetMind? They suck! Many moons ago, I signed up for their service that automatically notifies people on a mailing list of changes to a web site I run. Sometimes it would take two days for changes to be noticed and notifications to be sent out. Sometimes the notifications never went through at all. I ended up making a mailing list of my own and sending out update notifications manually.
The site in question is www.tiffanymodel.com. The change notification mailing list is gone now, because my original host went out of business and sold my account to another, sucky, host. But that's an unrelated rant.
~Philly
I haven't read every single post, so please forgive humble self if already posted.
This patent does seem frivolous. Perhaps it could be voided by prior art. Is any person or group ready to mount a legal challenge to the patent? I don't have the means or knowledge to file a suit, but I'll kick in a few bucks to see it tried.
Sometimes I worry that I'll develop Alzheimer's disease, but no one will notice.
Sometimes I worry that I'll develop Alzheimer's disease, but no one will notice.
I'm not good with words, and I've seen some people's letters to these kind of companies really put them in their place, much better than I ever could, so any pointer to any good letters would be appreciated :)
AC comments get piped to
Okay, here are the abstracts of this, and their prior related patent. Can someone *please* tell me what they are actually claiming as theirs? Is it just the checksumming? Is it the checksum in the context of a user-specified portion of the page? Is it the whole web-page-change-email-notification enchilada? Is it checksumming in the context of the enchilada?
Oh, and the final insult:
"The present invention relates to an improvement in Internet-document change-detection tools. The following description is presented to enable one of ordinary skill in the art to make and use the invention as provided in the context of a particular application and its requirements. Various modifications to the preferred embodiment will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed."
xlation: If you happen to think of anything else on your own that we didn't, that's ours, too.
(and before you jump on me, yes, I know you're supposed to make your patent as broad as possible, but that used to mean actually *enumerating* the alternatives in the application)
okay: abstract for the latest patent (6,219,818), filed 2/18/99
Checksum-comparing change-detection tool indicating degree and location of change of internet documents
A change-detection web server automatically checks web-page documents for recent changes. The server retrieves and compares documents one or more times a week. The user is notified by electronic mail when a change is detected. The user registers a web-page document by submitting his e-mail address and the uniform resource locator (URL) of the desired document. The document is fetched and the user can select text on the page of interest. Non-selected text is ignored; only changes in the selected text are reported back to the user. Thus changes to less relevant parts of the document are ignored. The document is divided into sections bounded by hyper-text markup-language (HTML) tags. A checksum is generated and stored for each HTML-bound section. Storage requirements are reduced since only checksums are stored rather than the original documents. During periodic comparisons a fresh copy of the document is retrieved, divided into HTML-bound sections and checksums generated for each section. The freshly-generated checksums are compared to the archived checksums. Sections with non-matching checksums are highlighted as changed, and the percentage of changed sections is reported. The user-defined selection is also stored as a checksum and compared to a freshly-generated checksum. Changed checksums outside the user-defined selection do not generate a change notification. Re-ordering of sections does not generate a change notification when the checksums otherwise match. Thus format and layout changes do not generate change notifications, and the frequency of notices to user is reduced.
and the abstract for 5,898,836, filed 1/14/97
Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures
A change-detection web server automatically checks web-page documents for recent changes. The server retrieves and compares documents one or more times a week. The user is notified by electronic mail when a change is detected. The user registers a web-page document by submitting his e-mail address and the uniform-resource locator (URL) of the desired document. The document is fetched and the user can select text on the page of interest. Non-selected text is ignored; only changes in the selected text are reported back to the user. Thus changes to less relevant parts of the document are ignored. The document is divided into sections bounded by hyper-text markup-language (HTML) tags. A checksum is generated and stored for each HTML-bound section. Storage requirements are reduced since only checksums are stored rather than the original documents. During periodic comparisons a fresh copy of the document is retrieved, divided into HTML-bound sections and checksums generated for each section. The freshly-generated checksums are compared to the archived checksums. Sections with non-matching checksums are highlighted as changed, and the percentage of changed sections is reported. The user-defined selection is also stored as a checksum and compared to a freshly-generated checksum. Changed checksums outside the user-defined selection do not generate a change notification. Re-ordering of sections does not generate a change notification when the checksums otherwise match. Thus format and layout changes do not generate change notifications, and the frequency of notices to user is reduced.
Well, those abstracts appear to be identical, don't they?
I note that Linux Focus already uses md5 to allow mirrors to check for updates to the pages. See that here.
Did the patent office even try a Google search before stamping its approval on this patent?
My Greasemonkey scripts for Digg &
Yeah, right... that'll stop them. "Oh, ok, let us blow you up first, then you can sue us for doing so."
MadCow.
I used to have a sig, but I set it free and it never came back.
As I see it MD5 is already used for a long time to checksum strings. md5 is already used to check filechanges (tripwire) so I would say the practice to use md5 to check data(changes) in general. So what's new with md5-ing webpages? I would think they can put their patent in their dark place where the sun never shines.
---
Privacy is terrorism.
I mean, if they wanted a way to determine if a webpage has been updated or not, it should be labeled in the document. And if it's not, then it probably doesn't matter, or the content is no good anyways.
Just my four cents (Adjusted for inflation)
01101001 01100001 01101101 01101110 01101111 01110100 01100001 01101100 01100001 01110111 01111001 01100101 01110010
I subscribe to the belief that patenting the obvious is nothing more than theft of common knowledge and 'tools of the trade.' While I believe that there may be some genuinely patentable software or processes, the current state of affairs is pretty sorry. Essentially, the patent office is rubberstamping anything that passes under their nose.
Checksums, hashes, etc are part of computer science, they've always been a cost saving way of computing value for identification. You'll usually end up 'storing' them. Checksums (and hashes, crc's, whatever) for 'fingerprinting' strings and things are common knowledge, why hack off a particular special case and say 'this is ours, you may never again use commonly known processes against _this particular case_, because we had lawyers and you didn't.' What has happened here is that they've patented a fundamental concept used everywhere because they were willing to use this on a relatively new (?) type of file. If I wanted to be that stupid, why don't I grab the CRC patent space on html files?
I just pkzip'ed T.HTM, I can list a CRC-32, gee, why don't I patent this? I'll tell you why, I work for a living, I'm not a thief, and even better, I'm not a scum sucking patent lawyer such as we're seeing with these trivial patents lately. But most importantly, this is _common knowledge_ among folks who work with computers and software, and I'll be a bum before I try to steal from common knowlege and prevent folks from creating new stuff.
There you have it. I support property, I support the ability to make a living.
I support intellectual property because I support making stuff for a living. I support copyrights, heck, I even pay M$, I need to use their stuff. But excuse me for not supporting the patent office's aiding theft from the body of common knowlege that folks use in their daily jobs.
Sheesh, how can thinking folks post in support of these trivial patents? I tell you one thing, "Fearsome Badgers", you'd better hope nobody patents making buttheaded stupid posts to slashdot, or lots of folks will need a new hobby (yeah, me included hehe)
People often talk about 'the masses' doing a lot of things, but I have yet to see a significant amount of 'the masses' doing anything... especially if it looks like people from 'the masses' stand to lost by doing so... even napster at its height was not so widely in use that you could make such a claim... People who talk about 'the masses' doing things are either 1) Elitist, and just want to refer to other people as 'the masses' 2) Want something to occur, but won't put their own neck on the line I seriously doubt it has ever been a common occurance for someone to enforced a patent in order to prevent Bob Jones from doing something in his garage on weekends... companies don't have the luxury of breaking the law en masse even in the 'the masses' choose to...
I must burn in hell, suffer and pay for my sins
But Gods the one who's losing, Satan always wins!
It's not nearly as bad as Ebay patenting the image gallery function they have on their site.
The HTTP protocol itself has had Jeff Moghul's cahce optimization protocol in it since at least 1996.
It is yet another bogus patent. Time to use the proposal I made of issuing a civil action for perjury against people making fraudulent patent claims. I suspect that approach would cut down on the number of bogus applications.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
You've blithely bought the line of big companies (and governments which have been bought by the big companies) that the creation of "property rights" in intellectual products are the sole reason for innovation and progress.
Fortunately, the economic argument for intellectual property rights is at least an objective argument, which can therefore be refuted by objective proof.
Behold, exhibit 1 : fire
Exhibit 2 : the wheel
and on and on and on ...
Look back, all through humanity's history there has been innovation and progress - whether with or without "intellectual property" rights. Some people just like to think ! and they'll do it even if you don't pay them for it.If your beef is that they deserve, on some moral level, to be paid for it, that's another issue. (I may even agree with you on that one). But don't think that progress will stop without it - it won't.
Publish your web page checksum program and get sued. Slashdot itself would probably carry the story.
Why compete when you don't have to?
It's not like these things are hard to research. If you look at the patent, they've got a whole system worked out for it. I'm not gonna repeat all the details. One critical part, though, is that it's not just an MD5 of the whole document. They've got it set up so you can just tag a relevant portion of the document, thus giving you the ability to only receive change notices when something important changes. I still think this is something fairly obvious, but the details are what make it more worth patenting. I've seen much worse, as I'm sure most of us have.
Steven N. Severinghaus
The danger of patents like these is not, IMHO, that someone is going to ask you to pay a license fee for your two line Perl program that uses checksumming but that when you really invent something original and worthwhile, patent protection will have been rendered meaningless by people simply ignoring it.
On the surface this sounds really bogus. However, patents are very complicated things. Context is everything, and unless you read the entire patent you can't understand exactly what it's about. The abstract of a patent is useless for understanding what it actually is for.
More than likely the MD5 summing of a web page is just a single claim in a patent with many small claims. Claims usually build up from broad to specific, and can be dependent on one another. Often times claims alone are meaningless, but in conjunction with earlier claims they actually have meaning.
Take the headline here with a grain of salt. Read the patent claims and understand them, and you might find the headline to be false or at least not totally correct. I am quite sure that MD5 summing a web page, etc, has wads of prior art, but maybe that's not exactly what they've patented. If so, there's no question it will be invalidated in short order as either too obvious or because of prior art. More than likely there's more to the story.
I believe people who work hard and ethically have a right to their billion dollars.
Hello? Heelllooo?!
No one makes a billion dollars by working 100,000 times harder than someone making 10K.
They make a billion dollars by having a horde of people who are earning 10K work for them. Check out Nike.
Phil Knight doesn't work any harder than the Vietnamese girls who make the shoes. Those girls are not *lazy*.
He makes his money by siphoning off the value from their labor, since they work in a corrupt government where unions and occupational safety codes are written by dictators who have no interest in protecting these "lazy" poor people.
There is no relationship, for example, between executive compensation and productivity.
What really lets people make huge amounts of money is not hard work (the mexicans who wash the dishes in the restaurant where you dine are working very hard) and it's not intelligence (the college prof's who taught you are probably pulling in 60K on average. The grad students are making 15-20K) but it's being able to position yourself into a role where you either manage people, or money, or both. Or maybe get a fat government monopoly on something (i.e. patents) that others use and skim off of their income. That, or just let your money "work" for you.
In either case the key to making big bucks is to park your behind right in the middle of some productivity intersection, and start taking tolls..
And if any one objects, there will always be Ayn Rand worshipping idealogues such as yourself to keep up the PR war, believing that this is somehow the ethical way to do business.
When in doubt, have a man come through a door with a gun in his hand.
Does an individual deserve to own a patent on checksumming? Surely not. But is there an argument to be made for collective ownership of the patent? I believe there is.
You see, when a patent is granted to an individual, the benefits aren't accrued solely by the individual. The entire society benefits, because that country now possesses a citizen who owns the patent and can wield it against other countries' citizens. The GNP is in whole raised because of efforts like these.
You can imagine how much richer the US economy would have been if we'd managed to patent the transistor before Japan got its own electronics markets running. You can imagine how much safer the world would be from nuclear warfare if the US had successfully patented atomic weapons before the Russians got their own projects going. Though the lifespan of a patent is only about 18 years, that would have been enough time to get some diplomatic solutions in place and prevent the escalated arms races of the Cold War.
What does this have to do with checksumming? Not much, I'm afraid. That's a stupid patent and we all know it. But let's not cut off our nose to spite our face when so much good can be done by a proper patent system.
no prior art there...
When developing Sparkseek we utilized checksums as a very, very basic way to check for duplicate pages when spidering. So much research has been done on the topic of detecting similar pages, particularly by guys like Sergey Brin of Google and a number of people from Digital (Altavista guys who did a lot of work on document clustering).
What is suprising to me is that using a checksum to check for changes is such an obvious application of an algorithm that I find it funny it was awarded a patent at all. It would be like awarding a patent to someone who thought of sorting page result ID's using radix sort.
Yes, this is an obvious idea. Yes, it has been done before. Prior art on this one should be much more of a cakewalk than the Amazon.com 1-click deal.
--- Michael Tanczos
1. Why would anyone want to do this?
Like all creative endeavours this life has to offer, FPing is driven by a plethora of individuals with a myriad of motivations. Admittedly, there are some in the FP community whose aims lean toward the worldly side--FPing has a large fanbase, and the constant lure of the money and women available to its stars is a recurring issue--but for the most part, FP participants find that the exhiliration of "First Post!!!" is in itself its own reward.
2. What makes a successful FPer successful? What separates the dabblers from the pros?
Consistency: the top FPers are there all the time, like pro golfers. Meme propagation: your tagline and/or schtick is picked up and emulated by others. "Knowing when to hold 'em and when to fold 'em", to borrow from Rogerian wisdom. Having a memorable message to share.
3. Are there any helpful hints or shortcuts available for the novice FP gamer?
Get a login! If you are not logged in, there is a delay between the time an article is posted and the time you see it (1 minute?). It's just enough time to cheat you out of that sweet FP that was RIGHTFULLY YOURS. Plus, AC FPing is generally (though not universally) frowned upon--and besides, you want to receive full credit for your FP, don't you? So, get logged in.
Get a feel for when articles are posted. For some, this activity borders on the mystical. Others favor a more scientific approach, noting peak posting times (highest activity falls within the timeframe of USA Eastern Standard Time "workday"), editor posting habits, recurring features (Slashback, JonKatz's Sunday morning movie review), and other such data as indicators of optimal First Post opportunity windows. You know, whatever works for you.
Be careful if attempting to increase your advantage artificially, as this snippet from the FAQ mentions:
"Sometimes it will happen that someone runs some sort of scripted vandalism on us (DOS-type things such as continuous reloading, or scripted attempts to get "first posts"), and in these cases we will block the site. (This doesn't happen all that often.) "
This would suggest that scripted FP attempts would be frowned upon (although many claim the Bone-O-Rama has worked for them!). Most veteran FP gurus use the tried and true Refresh Method.
4. Isn't this trolling? Isn't it detrimental to the greater discussion forum of Slashdot? /. discussion has obviously not been reading much /. discussion.
By definition, the art of trolling (and true trolling is, indeed, a high art) requires the involvement of others who do not recognize the troll, and respond accordingly. A troll which garners no response is not a successful troll. First Posting is a singular pursuit which does not require the validation of others to be considered a victory, and although there is some crossover between the members of their respective communities, FPing and Trolling are for the most part separate entities. Slashdot's editors would seem to consider the first post artists to be harmless (judging from the above quoted FAQ entry) and mostly negated by the moderation process. Furthermore, anyone who thinks First Posting is detrimental to
5. What should I post when I go for first? Can you break down some science regarding FP etiquette?
Try to develop your own unique FP voice! Imitation, while flattering, can often be construed as lameness.
Be bold about it; an FP followed by a question mark (FP?) seems timid and uncertain...make your first post a proactive one!
Content varies--you might feel comfortable with a simple declaration of "fp", but you will eventually want to explore deeper avenues of expression, since the declaration of your initial comment gives you the floor of a large and attentive forum...
Strive to be humble in victory, yet gracious in defeat. The FP community commonly offers public congratulations to those who achieve a first post.
6. Will I lose "karma" for attaining FP glory? /. editors
hand-moderation , as well as more zealous readers bearing mod points), with occasional exceptions.
Most assuredly so. Almost all First Posts are destined for -1 status almost immediately (due to
If karma really matters to you, there are options available to you:
1) participate in metamoderation! Don't mark anything "Unfair"; rate at least 7-8 posts as "Fair" (excepting, of course, the work of your FP brethren and sistren you will no doubt be faced with!), and you can gain one point back per day.
2) make positive contributions to the Slashdot community by balancing your FP prowess with insightful and informative posts which then are modded up as such, thus equalizing your yin/yang of karma give and take, or whatever.
3) Learn to karma whore, which many would insist is pretty much the same thing. See the Karma Whore FAQ for more detail.
7. What's this about money and women? Can I get paid to do the wild thing? Is FPing potentially lucrative?
We cannot at this time confirm nor deny the FP "bounty" payments allegedly issued by Slashdot/Andover to top-ranking FP celebrities (supposedly as reward for incresed site hits from repeated reloads), other than to suggest such a purely hypothetical scenario as a possible further reason to be logged in when you FP!
We can assure you, however, that the huge FP fanbase and the incredible women are VERY MUCH REAL YES SIR. Note the many first post declarations of love for various girlfriends as proof. The First Post loyalists who browse at -1 just to follow our sport are some of the nicest folks you'll ever encounter on Slashdot; sometimes they are blessed with mod points which they then take great risk to bestow upon us. Bless y'all.
8. Slashdot only archives posts ranked "1" and above. Since FPs are normally moderated below this threshold, aren't you wasting your time?
No.
Partially as a response to this policy, many in the FP community are raising their children to be first posters as well as volunteering their time in local First Post Temples in order to assist newer recruits ("We must expand," they say, "get more pupil...so that--the knowledge will spread...") in their studies, thereby ensuring that future generations will always have an FP presence of the Right Now, archiving be damned. Also, independent archives are beginning to be preserved by a few of the more prominent First Post tribes for historical purposes. As long as the FP community remains fresh and self-perpetuating, the issue of archiving is rendered moot.
9. Is this the entire FAQ?
Nope, there's more.
Period. End of story.
No, not every case of IP.
Period. End of story.
Ha! I just patented 1-Click check sums... The rest of you will have to use the inferior "2-click" check sum...
RC
You are the one practicing propaganda... First, Choose a name ('slashbot', 'communist' 'jew') that denigrates a particular segment... USE THIS OFTEN. Next, condescend to your 'inferiors'... ...which seem horrible to a layman (which, let's face it, most Slashbots really are) ...
thereby 'proving' your own superiority and convincing people to follow you in your beliefs.
You are the numbnutz demagogue..
Do a google search before posting.
It's good to see a sensible view for once.
Private property is the key difference between our system and Communism. I hope everyone is as disturbed as I am by Slashdot's bizarre and insupportable belief that intellectual property is a "special case" deserving of less protection under the law than any other kind of property.
It's a lot like the "drug exception" to Constitutional rights (primarily the Fourth Amendment, but also the Second and First in many cases), isn't it? And for the same reason: Mob hysteria abetted by scare-mongering propaganda. This article is a perfect example of the irresponsible demonization of private property rights w/r/t intellectual property.
Sorry, but you can't change the law and the truth just to suit your convenience.
--
Dear Slashdot: Why, yes, I would like fries with that.
You're right. Jefferson was surely no Communist: He was a relentless advocate of property rights, as regarded all properly then held to be property under the Constitution of the United States, which he helped draft.
Lincoln later rode roughshod over Jefferson's painstakingly established property rights in many ways, but that does not concern us at the moment.
The bottom line is this: Intellectual Property makes innovation and progress possible.
Period. End of story.
The sad, sick thing is that Slashbots have given themselves over so completely to the "my convenience above all" theory of "ethics" that they've actually succeeded in demonizing the word "innovation" around here, in their endless rhetorical assaults on the right of a business to . . . conduct business?! Yep.
--
Dear Slashdot: Why, yes, I would like fries with that.
. . . and the fact that you have to say that proves that they've succeeded in clouding the issue to an alarming degree.
Yes, naturally: This is a standard propaganda tactic. Find a few rare, freakish "horror stories" -- or at least stories which seem horrible to a layman (which, let's face it, most Slashbots really are) -- and whip up some fear.
Standard stuff, if you're a demagogue. And it works: At this point, most Slashbots believe that all Intellectual Property rights are wrong. That's the message being sent, and they're eating it up.
--
Dear Slashdot: Why, yes, I would like fries with that.
At the end of 1999, or so, I became acquainted with google.com .
... I heard nothing from google.com
...
I was impressed with their work, and emailed them a suggestion to the effect that their indexing engine and "Google cached pages" might work even better if they stored MD5 checksums inside their databases.
Time passed
And now,
someone is trying to patent a similar, though more sophisticated idea.
Hmmm, I got this idea for "safer merry-go-rounds" this weekend"
What a completely ridiculous post. Are you suggesting that a nation with the ability to develop weaponry on a par with its enemies would decide not to because said enemy had granted itself a patent? Especially something with the power of a nuclear weapon. Presumably you also think that all the espionage of the cold war (some of which continues) is performed with strict adherence to the law by US citizens.
Further, are you actually suggesting a patent should be granted not to an individual but the entire country that produced it? Where's the incentive to the individual or company? Maybe a tax rebate in accordance with the rise in GNP as a result of the patent?
What does your post have to do with reality? Not much, I'm afraid.
Sorry, guys, but for all the strengths of the US, your patent office sucks.
We've been fairly successfully innovating over here for a few centuries - and if this kind of decision is anything to go by, you're going to lose the lead you have on us currently.
No software patents, no business method payments. Swap you for freedom of speech?
Look again. This patent is a continuation of US patent 5,898,836 which was filed on Jan 14, 1997 and no doubt predates RFC 2068, since US patents allow one year grace period from the date of invention to filing. So you need to look at prior art circa Jan 1996, not 1999! Also, patent 5,898,836 cites NetMind's URL-Minder free service as prior art which dates back to 1995. Well, URL-Minder got renamed Mind-it somewhere along the line and NetMind became Pumatech.
Im beginning to wonder why people don't do better things with their time, like think of *useful* things to patent.