Deja, Google, Open Source, Oh My
blkros writes: "Over on Wired
there's an article about Deja News and the plans to try to get Google to open source the Usenet archives it got when it bought Deja News. Part of the plan is to have the Library of Congress oversee it and put it on university mainframes. Google has taken the archives off the web for now Aaagh!"
Actually, google.com makes real money off ads - its just that they're not obnoxious (and easily blocked) banners. Sometimes, when you do any of the somewhat generic searches, there is an URL returned at the top of the page, above the search results, which is 100% topical, but paid for. Advertising like that, I can appreciate.
Read more about it here: http://www.google.com/ads/index.html - they boast a clickthrough rate 4-5x the industry average, and you bet they make you pay for it!
"I will take the Ring," he said, "though I do not know the way."
NOBODY else kept archives of Usenet? Not even the
core heirarchies like comp.* and soc.* ??
That's very surprising to me. It's not like
dejanews was ever that good, that *nobody* else
needed to keep a usenet archive.
Talk about your single point of failure...
-fb Everything not expressly forbidden is now mandatory.
Are they obligated not only to delete their copy of it but correct your oversight in not saving a copy for yourself?
I see even classic Slashdot is now pretty much unusable on dial up anymore.
A tiny minority do. I just grepped through several thousand sitting in the spool here, and 47 articles had expiration dates. Most were posted by the same crackpots who add X-No-Archive headers to their posts. Expires: headers are basically irrelevant to the discussion.
Storing a few weeks of Usenet isn't that complicated (but it's more than "a simple script"). Storing and being able to retrieve several years' worth is something else entirely. Come back to us once you've actually dealt with terabytes of data being randomly accessed by millions of people.
"Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
That is NOWHERE near true, alas.
Only because the LOC can only contain works which have been registered. Copyright law currently recognizes your IP right (whether you like it or not) to anything you create at the moment you create it. Registration, which will get your work into the LOC under appropriate circumstances, is only a tool to strengthen your copyright which you have anyway if you are the creator of a new work. Of course, if you create it it's copyrighted by default but the LOC doesn't have a copy, which is true of many of the works ever created.
Brackets contain world's first nanosig, highly magnified:[.]
So does this mean that the Library of Congress will now be home to the greatest collection of ASCII porn ever assembled by man?
47.5% Slashdot Pure(52.5% Corrupt)
the archives the archives have no source they are not code. They are going to make them "free". I would guess under a license that does not make the public domain but it is not open source by definition open source is code. Sorry but this has been bugging me for a long time.
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
While many web forums offer a search function, this is useable at the site only and not indexable by net-wide spiders (such as Google). While in some cases this is a feature, it locks up the content in a way that prevents it from being found, used and archived by net users in general.
I know a few subject matters very well and am happy to be helpful to pass on knowledge, answer questions and participate in dialogue. When this becomes lost I have to answer the same questions again and again, wasting my time. Furthermore, my answers that may be of help to others are lost, depriving them of knowledge that may have helped them.
I have surface knowledge of a great many more topics. I research these, I try to further my knowledge in some, I have to learn about others for work or for other reasons. Being able to easily find information is invaluable and my publicly archived questions may be useful to others.
I know little or nothing about an even greater range of knowledge. Being able to read what others have asked and answered is a wonderful way to start bridging those gaps.
Unarchiveable web forums, mailing lists that don't archive messages on the web and even IRC let this human knowledge slip away.
Not that there isn't a place for all of the above, but I wish more people would consider things beyond their immediate needs.
Bleh!
Personally, I think Deja dropped the ball when they "improved" the site to include all the ads and such. I think the google interface, even though it's more limited, is light years ahead of where Deja was going in the last few years. Deja was soooo slow .. it went from like 2-3 seconds per page to like 20-25 seconds per page a few years back (when the ads came in with the new interface). Now with Google, it's lightning fast. Since google has taken over, I guess that I have benefitted from the improvement in speed that Google has to offer that in my mind makes the service much more valuable. I especially like the ability to view multiple responses simultaneously and the highlighting is much better. I can't say enough good stuff about the google interface, and as far as what's missing from Deja .. the _only_ thing I miss is posting .. Even the older articles I used less frequently and am willing to sacrifice in the name of the tremendous speed!
.. it's a major part of my very existence.
I use groups.google.com at least 10 times every day
--
- Aaron Hightower - Lead Programmer - Rush2049 Coin-op
My FOIA inquiry got a PDQ PFO from the CIA. They can FOD, FWIW.
---
---
Slashdot: News For Zealots. Stuff That's Hypocritical.
I don't pay taxes to the US government; they have no jurisdiction over me, and hence no obligations to me either on either moral or legal grounds. So why they might choose to make their resources (say, a Library of Congress USENET archive) available to me as a courtesy, such a 'right' to their newsgroup archives would be even more tenuous than the relationship between me and a company providing archive access to customers. (Be the customers paying fees, or viewing ads, or whatever).
So if the archive ever did go to the Library of Congress, I would encourage them to make the archive available for high-profile mirroring; if the National Library of Australia had a copy I'd feel a lot better.
The tone of the article was such that it implied that Google should be providing this information to the public at large, simply *because* they bought it from what was Deja.com. There's the accusation that Google are doing something morally wrong by taking the archive offline - meanwhile ignoring the fact that Deja.com had already taken a large portion of the archive offline with little or no warning. Google, apparently, are online villains of the deepest dye for wanting to get some form of commercial return for the money that they paid to acquire the archive in the first place.
So let's start from first principles here: the fact that Deja had such a comprehensive archive is not remarkable. The remarkable bit is is that *nobody else has done anything similar*. Deja's value as a resource, both in the commercial sense, as well as in the historical sense, is in its rarity. Goggle, in acquiring the deja.com archives, *prevented* this resource from being lost forever. Yet they're apparently villains for not immediately doing whatever the Open-Source community wants them to. Talk about bloody-minded ingratitude.
There's an argument being made that this information is ours already, although from what I understand, this is legally problematic. However, if you don't agree with Google being able to commercially exploit *your* precious Usenet postings, the answer is straightforward: start posting with "X-No-Archive: Yes" in your headers, and write a *polite* email to Google asking them to remove all your posts from their archive.
For myself, I'm quite glad to see that Google have obtained the archive, and if they do as good a job of running it for easy access as they have with their search engine database, I'll be extremely pleased.
Meg Thornton.
Perkin's Postulate: Online tech support is designed to provide everything short of actual help.
Historians love to read old snail-mail -- reading letters written by Victorians tells far more about their culture than any books written in the time. What should it be any different for USENET?
Google has taken the archives off the web for now Aaagh!
Google has taken the archive down only until they can integrate it with their own archive. Once this is done, it sounds like we will once again have a reliable source of old newsgroup postings.
I highly doubt that they will ever open source the information though. The terabytes of data that they purchased as a part of deja.com is probably the most valuable part of the deal. Why would they then want to turn it over to the government? What financial incentive is there for them? The only way they are going to recover their investment is to create a service like Deja's, only better and integrated with their own.
The following is from Google's press release on their aquiring the data;
Available now at http://groups.google.com, this powerful new Usenet search feature enables Google users to access the wealth of information contained in more than six months of Usenet newsgroup postings and message threads. Once the full Deja Usenet archive is added, users will be able to search and browse more than 500 million archived messages with the speed and efficiency of a Google search. In addition to expanding the amount of searchable data, Google will soon provide improved browsing capabilities and newsgroup posting.
A friend who works at Google said that they got the archives "barely" -- they were apparently copying data as technicians were tearing apart what was left of Deja's systems and hauling the equipment away.
I got the impression that there was a lot of work to be done to fix the data so it was in a coherent form, much less fit into Google's existing storage and databasing environment.
As long as they're still collecting news, plan on improving the existing search engine (my source says yes to this one thing) and it remains free-as-in-beer I'll be satisified.
What kills me overall is the decline in the overall quality of USENET. Too much good content has gone to crap, non-archived, non-searchable web forums (ahem) and what's left on USENET outside of a few newsgroups is spam, porn and isn't worth the time to search.
When they took down the old deja, I was quite mad. Google's way of viewing the messages sucked. I can't post there anymore either. So I found news.interbulletin.com. It is a usenet service that allows you to view about 30,000 newsgroups and lets you post under any name. It also uses frames which I like becuase the whole page doesn't have to be reloaded. Their server is very slow right now becuase lots of deja users switched to their service. Its usually around 200/bytes per second on my 56k modem. Anyone who was mad too you can use news.interbulletin.com for a while and hope the old deja.com will sometime come back alive (probably not)
--
Hm. Does Google own the database of Usenet postings?
Google does indeed own a copy of the database of usenet postings. More on this later
You see, since every person ever write to the Usenet still retains copyright to their postings, isn't it in the slightest bit illegal to actually *sell* the database? Or at least immoral?
This is a funny bit of Usenet culture/law. While it is generally accepted that usenet users are giving others permission to copy there works they *do* retain copyright. So why can deja go around selling this work? IANAL, but here is how I see it, I think I'm (mostly) right.
1) When you post to usenet, you're sending your work to whatever every archives are in place, and you know it. By posting, you are giving any other user permission to view and archive the material. In fact, you yourself are commanding that the message be forwarded to all other connected computers, and therein lies the implied permission.
2) This strikes me as an important point. What deja.com is selling is not the rights to the posts, or the posts themselves, but the work that they put into archiving the posts, which is considerable. It is the same way that free software sell CDs with open source programs on them. They are selling the data itself and the work that went into collecting the data, not the rights to the data. So, while you may have put a lot of effort into writing that post for alt.silly.rantings, deja.com didn't sell that work, deja.com merely sold the work that went into collecting your work.
Do you see what I'm saying? Or am I just rambling?
Stupid like a fox!
I don't really understand what's so interesting about Deja's code. Should be no major problem to create a search engine / interface with all the code that is out there for indexing etc. and all the capable people willing to write / enhance free software.
;-)
The archived postings are the interesting part. At groups.google.com it says that there is a terabyte of data. Maybe it could be made available for download per FTP, one tar.bz2 file per month per newsgroup. Different projects could then try to use the data... Tools like MG (Managing Gigabytes) can create an inverted index that reduces textual data to about 40 percent and is searchable. Well, that's still 400 GB, but HDDs are getting cheaper all the time
Um, that's probably because Yahoo is now using Google as its search engine. See this press release.
The freaking article is entitle Deja 'Revolt' Against Google, how anyone could have completely misread it and gave the horrible write up we just got is quite amazing.
This leads me to the main question: Major sites such as Google, eBay and Amazon, have become a valuable part of the 'Net and have become an intrinsic part of the World Wide Web experience for many people. Yet, these companies are yet to prove their viability and could collapse at any time if their investors grow tired of shouldering their debts and underperformance. What will happen to the 'Net when the next big dotcomm to fall is eBay or Amazon, or Google? Especially since Google's USENET archive and WWW cache have become invaluable to a number of people.
Does this justify asking the government to step in and take over these resources so they are preserved for posterity as Frank Davies and many others have suggested or is would this be undue interference by the government?
Finagle's First Law
Does Google own the database of Usenet postings?
isn't it in the slightest bit illegal to
actually *sell* the database?
I would have to think that by the act of posting a message on the a newsgroup you have given permission for it to be distributed and copied via NNTP to the various and sundry news-servers on the net.
Very few of these servers are available on an open basis. ISPs almost always require some sort of compensation for access.
Whether the Deja archive is a news-server or something more woud be a point for lawyers to argue.
I would say that it is quite clear that the transfer of a news-server and it's contents from one commercial entity to another is a common occurance - any time an ISP is bought out this will obviously occur. So the idea of your posts getting bought and sold - get over it, it's already happened, and will continue to happen.
For my own case, I feel that the usefulness of the Deja archive as a source of knowledge far outweighs the loss of whatever small value my postings may have, and as such I happily provide such under the BSD license.
I hope that other Open Source users will take the same view.
MOVE 'ZIG'.
I have to admit, the immediate goals of such a project illude me. USENET postings are already public. These are open forums, and the groups can be read from most libraries or other public sources.
I don't see the value in the long term achival of USENET posting. The library of congress contains just about every copywrited work ever written. This serves not only as a national archive of our author's produced works, but gives our legislature access to the documentation and research they need to do their job. Would the archiving of USENET posting serve the long term mission of the nation's library?
It also bothers me slightly to think that people's comments and flame wars will langish forever in the federal library. I don't think access to USENET postings is something the nation craves or needs. What the nation needs is access to works that have been researched and published, works from professionals. The library of congress is a lbrary of professional works, not the "my 2 cents" postings that tend to dominate USENET frequently.
----------------------
Kurt A. Mueller
kurtm3@bigfoot.com
PGP key id:0x4FB5FB1D
Lawrence Lessig is my personal hero.
Hm. Does Google own the database of Usenet postings? You see, since every person ever write to the Usenet still retains copyright to their postings, isn't it in the slightest bit illegal to actually *sell* the database? Or at least immoral?
:-)
At least I am giving no permission whatsoever for someone to sell my posts...
Someone could argue, though, that by posting to the Usenet you have implicitly maken your work public domain, but I doubt that you can get rid of your copyright that easily. Books still have copyright, and you even paid money for them, so shouldn't you be getting more?
The submitter was correct. Google only has the archives from August of 2000 and after up on the Web at the moment. Currently the archives going all the way back to 1995 are offline.
Has anyone tried submitting FOIA requests to the CIA, FBI, NSA, NRO, etc, to try to get copies of any Usenet archives they may have? If they have such archives, it is unlikely that they will meet any of the criteria that would allow them to deny a FOIA request, e.g., privacy, national security, etc.
If you love God, burn a church!
Ewige Blumenkraft!