Is There Demand For A Better Usenet Search Engine?
Anonymous Employee writes: "I was asked for a feasibility analysis to provide high-quality searching in a large Usenet archive (all expect binary/porn groups and several years worth of archives). This is similar to what Dejanews wanted to provide before they re-branded to Deja last year. Do you think there is a need for this or is high-quality Web searching + Usenet browsing meeting your everyday needs in terms of information retrieval? If not, do the existing Usenet search interfaces suffice (Deja, one year worth of archives, not-so-good search interface - Remarq, three months worth of archives, okay search interface)? ...and also, is real-time indexing (i.e., you can search for an article 'very soon' after it has been posted) important?" In light of Deja's recent faux pas, I think this question is rather timely, and I have to admit, I wouldn't mind the ability to search Usenet posts older than one year.
cut back to yesterday's talk : the internet is a public place. And records should be kept public, and accessible.
Mode (3) smart-aleck mode. Press * to return to main menu.
it's pretty much the whole freakin thing :)
-Jon
this is my sig.
Usenet can be a valuable resource but currently there isn't a good interface for combing through the tons of historical posts. I usually go to Deja and do a power search when I'm researching a product because I believe the people are the best judges rather than some lazy, paid reviewer. I hope more search engines begin to tap the usenet.
-- You see, there would be these conclusions that you could jump to
Even with the advent and the humoungous (sp?) popularity of web-based message boards, I think in the end usenet still remains the best message-board system available. The old (relatively) methods are always the best. FTP is still the staple file transfering, and IRC remains a chat king (does AIM have more users?). Usenet has had the following from the begining and still is thought of as a place where the intellegent and learned go to converse (at least from my perspective, and don't get me wrong, it has it's share of trolls, just like anywhere else). I know my oldest brother to be one of the most knowlageable civilians in the country when it comes to military aviation, and where does he go to chat it up? Usenet. There's a wealth of useful information there that not many people know how to access.
-- From my Best Friend (Written to me over ICQ): "i was gonna go to a party...but i had to reinstall windows"
Napster is not "lawbreaking" and/or an "illegal activity." The courts are decididing what should happpen day by day, but as of now its not illegal. "Those supporting Napster and MP3.com are just peddling in stolen material." I am an artist, i dont have any illegal MP3's, and i support both Napster and MP3. I have turned over my music to companies such as these in hope that people will enjoy my music and to prove that these services can be used legally. I support them both, do i peddle stolen materials? moron
The Dejanews usenet page has been my home page for years now. Whenever I needed to find something out, it was far easier to see if someone else had asked the same question I had in Usenet then it was to wade through Microsoft's MSDN site or page after page of crappy vendor HTML.
In the last few months, the quality of the results that I'm turning up has decreased markedly. Deja has decided to shelve all their 1995-1999 Usenet archives and concentrate on just the newer stuff, apparently because that older traffic only accounts for 10% or so of their bandwidth.
WHAT? Of course it does! There are enough people using Deja as their Usenet client for this to be obvious. The 10% or so of their traffic that was a result of the 1995-1999 archives was th result of hundreds of thousands of other people like me searching and finding answers.
Deja has made a mistake in alienating the audience that made them one of the most visited sites on the web. For this, I predict that Deja will either fold of massively re-organize within the next year.
They screwed us over and broke a trust. You can't regain THAT in an IPO.
The signal/noise ratio of usenet, and particularly the more Web-oriented forums, is getting exponentially low. I, personally, would be interested in the moderated technical groups being archived & searchable eg clcm. I'm not sure how effective the others may be. It'd be an extraordinary pain getting some decent info out of all the noise.
----
Greetings, Recently we moved the Deja.com servers to a new facility in order to provide greater reliability and performance. The move is now complete and we thank you for your patience.
Please note that currently our Usenet Discussion Service only retrieves messages from the past year (back through June 1999). As announced, we are reconfiguring the service that provides messages posted more than 1 year ago in order to provide greater reliability and performance. This will take some time though, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible.
-----
So I would wait for a few more weeks, and see if the situation improves.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
As regards 70% disk / 10% bandwidth, they do have to draw the line somewhere, but you can't go with just what makes the most money. I'd be curious to see if the cost of bandwidth and storage outweighs revenue from advertising, and other intangible revenue such as name recognition.
There is a real lot of demand for this. The only really working one (at least among known ones) is Deja, and its Usenet search capability is rapidly becoming third-grade-of-importance add-on to their commercial setups. Compare this to how many Internet searches we have (and how many we had before Google - and it's still the top one I use). We really need the Usenet Google - there's a lot of useful information among that noise, and we need a tool to extract it.
Also, fast indexing would be a real bonus - so that if you look for a comment on recently-released software, for example - you won't get two monthes old data. But at least decent search and archival is necessary. Deja desperately needs strong competitor.
-- Si hoc legere scis nimium eruditionis habes.
If you could get a high quality search engine with archives going back for many years (at least 1991 would be nice), I'd pay for a subscription to a service like that. But a free front end with ads would be acceptable.
:-)
I have several clients who have almost completely abandoned deja because the quality has disappeared. They've asked me how hard (i.e. how much $$$) it would be to set up a similar service for them internally. I give them the cost estimates for a full time usenet+searchengine system admin and a pair of good machines. Then they ask if there is a company out there who would do the same thing for less money than the US$100k/year it would cost to do it themselves.
It would be especially nice to see corporate accounts set up as well, so any employee in a company could do high quality searches.
My own opinions on deja are pretty vituperous right now. If you could buy a copy of their old archives and provide a better service than those losers, you'd have a fairly large audience. Try doing what dejanews did when they started, going around to usenet admins and asking for copies of backup tapes. Be prepared to get old DC-150 carts and 9 track reel to reel and many other esoteric formats. Could be a fun project
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
The old Dejanews site was a great way to find information, and was displayed in an easy to navigate format. The new layout is awful.(Not to mention their stupid policy of advertising placement -- I post my messages to Usenet, NOT to Deja.)
I don't know what the overall reaction to a site like this would be, but definitely count me in. jh
-- Gah!
Now, sadly, the internet is a corperate money making buzzword. Companys try to reach any audience with advertising they can, and the internet is a cheep way of reaching millions of potential customers. However in the need and greed of modern sociaty. People wanting to make a quick buck outnumber the people wanting a nice place for discussion, and it is easy for the money-grubbing people to write one message to hundereds or thousands of groups. Thus many many groups are significantly more spam than relavant posts. This drives off the people who would otherwise have been frequent posters, makes good posts hard to find and generaly makes the experience to un-enjoyable for a large part of the public to continue to have any intrest in. True, fringe groups will continue for a decade or more, but sooner or later the nntp protocal will become to much a bandwidth hog (thousands of spam email messages can do that) and most servers will close down.
Alas poor usenet, I new it well
Little Brother, watching the watchers
If you provided comprehensive Usenet posting indexes in V-Twin, er, Apple Information Access Toolkit, format -- you would have the entire Mac Evangelism Strike Force bowing to you. I would pay a significant amount to have those indexes mailed to me quarterly. So would many, many, other Mac users I don't doubt.
What
It's a very small niche. I suspect a small fraction of net users know what USENET is. As it stands, it will be a small island habited by old time hackers and net users.
To much of the public, the net is becoming something like TV. Not many people on the street know what a newsgroup is (if you don't believe this, you don't know many people on the street - try asking random people, you'll be surprised.)
Given this scenario, it's not likely that a usenet search engine will last for very long. People who want to use USENET will actually use it. Now that's a surprise.
w/m
I would concentrate on the comp.* and other technical newsgroups rather than trying to mirror the whole damn thing. I would hazard a guess that a lot of that 10% of traffic that Deja said made up their backpost searching was looking for technical support, hardware information, or software help. Having a 15 year backlog of rec.humor.jokes or alt.fan.brittany-spears (or any other pop-culture NG, of which there are thousands) might be cute, but it's really rather worthless.
... !) Usenet has since its inception been a celebration of free expression. Stifling that because people have to worry about repercussions far in the future would be kind of shitty.
:)
Just as food for thought, there are also some privacy issues here. You have to ask yourself: do you really want a decade or two of your scribblings to be instantly available and indexable and searchable by anyone on the planet? Think about it - every immature flame, every embarrasing post, every moment you'd love to live down, now showcased and painfully easy to find by someone with a couple of minutes and a computer... I'm kind of glad that Slashdot "forgets" or de-indexes my comments after a few weeks. There are a lot that I'd just love to bury and in effect have as soon as they exit my user info page and leave the search index. Now imagine them staying with you for years, even decades.
And it can get worse than simple embarassment. I know for a definite fact of one case where two guys were engaged in a long-standing flamefest in a NG. Guy 1 went on dejanews.com to look at what else the other guy (Guy 2) was posting... and found some two-year old backposts to a cancer support group because guy 2 was battling some form of cancer. Guy 1 brought that up in his next flame and really just humiliated guy 2 in front of hundreds of people. Until deja killed their backlog, you could still find both those posts, and hundreds more just like it. Imagine trying to live that down.
It's incidents like that that really cause me to agree with privacy advocates about the danger the Internet poses. Never in human history has it been as easy to delve into a person's past as it is now even without a superorganized listing of their thoughts and opinions of everything they felt compelled to write about for years into the past. Such a complete archive really would pose a lot of problems for many people (imagine just a 10 year log of alt.support.cancer
With that in mind, like I say stick to the tech newsgroups, and you'll run into far fewer problems.
--
I think there is a world market for maybe five personal web logs.
I cannot fathom ever needing to search Usenet for anything. I can find everything I need with an AltaVista search on the web.
If you aren't part of the solution, there is good money to be made prolonging the problem
yes.
--
share and enjoy
I'm glad this subject has come up, because I recently posted a similar question as an "Ask Slashdot" but guess it was rejected.
The Dejanews Usenet archive was one of the best resources on the Internet. I'd always check there before doing searches on company sites. The recent decrease in the archive database has reduced the usefulness of the service dramatically.
I suppose the important question should be: Is the old Usenet archive worth preserving (for general use, or even as an historical record (it might appear that most of it is useless waffle, but who know's what people will think in 100 years))?
If the answer to the above is "yes", then how can the archive be saved? Leaving it in the hands of a single company (Deja) means it's vulnerable to any silly decisions that the comapny makes.
Perhaps a better solution would be a huge distributed database, where sites archive particular groups for a particular time (eg. some of the big Linux companies could "sponsor" the comp.os.linux.* newsgroups for the dates between 1995 and 1998). These could then be mirrored by other sites with the same interests.
The two negative points I can think of (aside from the nightmare administrative aspects) is 1) what would the sponsor get out of this, and 2) just how big would the archive be?
It is possible that a Gnutella-like system could evolve where people could search for archives, with a set of "root servers" providing searching facilities. With 100GB disks becoming available, the possibility of smaller newsgroups being archived becomes a possibility.
All we need then is to persuade Deja to reimplement the full database (which I believe the eventually intend to do), and then get a tool to archive interesting articles. Anyone out there think they have the skills to write a "deja extractor"?
a market research tool? The slashdot users ideas are being exploited for the commercial benefit of others. And we don't even get the free food/coffee that is at most focus groups.
i have misplaced my signature.
Seems like News archiving could benefit from the forgetless nature of Gnutella and similar technologies. The part I don't like about those designs is the HTTP-based transactions. It seems that since Usenet traffic is already encapsulated in messages and relies on the mailbox synchronization services of NNTP, we could just create a massive message file system (take a gander at what MS has in line for Office and Exchange). As more people get on line with permanent connections we could easily offer a small part of our disk space for a shared mailbox file system accessable via IMAP. Information would simply drift to where it is needed.
The biggest risk with such an automatic scheme is that some data would eventually timeout because no one requested it anymore. I guess these messages would start to be treated like endangered species. Maybe we could just send them out into a deep-space, time-delay file system to save them.
Well. To a certain point, I can see what would motivate Deja to do this. If the revenue from it isn't coming close to the cost of upkeep, I can see where they'd rightfully shelve it.
However, the Usenet junkie in me is kicking and screaming over this. My Usenet service from my ISP drops posts off after a couple weeks. And even the group specific server that my group's moderation pool uses drops stuff after a couple months because of the sheer quantitiy (and because it's only purpose is support of the moderation bot).
Personally I'd pay money for and/or put up with a reasonable amount of banner ads to be able to search back through all the content.
Chas - The one, the only.
THANK GOD!!!
Chas - The one, the only.
THANK GOD!!!
A few months ago, Deja made an announcement about the site move. According to the accouncement, which has not been updated since its original release, the old messages would temporarily be taken down, but we should "have no fear: [Deja is] committed to bringing these messages back online as soon as possible.
In the meantime, Deja has been transformed into a mere free Web-based Usenet server that happens to have unusually long retention, but no binaries access.
It has been a couple of months since then. Last month, Deja announced that the move was "complete"; however, most of the old posts are still nowhere to be found. There was an interesting Usenet discussion on the state of things, which included at least one thoughtful post as well as possibly a little light at the end of the tunnel.
Perhaps not all is lost. When (if) the Deja archive ever comes back in its entirety, it will still be the best Usenet archive around, hands down.
I disagree with the Slashdot article's claim that Deja has a merely "okay search interface". As long as one uses the Deja Classic Power Search, Deja has one of the cleanest interfaces around, with extremely flexible and powerful query options.
One would be hard-pressed to come up with something better at this point. Even if one were able to cook up a better interface with even better query features, where would the content come from? Who has been archiving Usenet all these years other than Deja, Remarq, and perhaps a few other little-known entities?
I daresay that none of the current archive holders would be willing to grant archive access without considerable compensation. Unfortunately, one would have no choice; it's a little too late to start archiving the old stuff now!
All in all, I would probably be in favor of just trying to get Deja back up in its full glory; this would be so much easier than starting from scratch. Perhaps all Deja needs is to hear (from thousands of concerned Slashdot readers) that their "old" archive is their most valuable resource, and should thus be given the attention that it deserves. I personally consider the "old" archive so valuable that I would be willing to pay a subscription fee to access it; I'm sure I'm not alone in this.
So shall we all write Deja now, and let them know what we think?
--
begin 644
Hits pays the bills, so people are moving to message forums.
What about Ultimate Bulletin Board exporting messages to newservers?
Then its the best of both worlds.
Only newsgroups I use anymore are hardware vendors.
I'm on a dozen webbased message forums, all those banner ads.
-Brook Harty
[I have the Enemy Flag heading back to our base. Clear the mines from our flag...]
My news server has just the big 8 and only the alt groups that users request. With a 15 gig news spool, I only have to expire articles after two months.
Doesn't take a math wiz to extrapolate that to see how mucn disk space a years worth of REAL usenet newsgroups would hold.
They should have never trashed 1995-99 without notice. 95 was when the net started to explode and removing that removed history that can never be recreated.
(Then again, I'm glad some of my old posts finally went away. x-no-archive works, but since everyone these days just quotes entire articles when replying with one line at the top, x-no-archive was a bit useless anyway...)
I have been using DEjA for a while and in the past year or so, the quality exponentially went down.
So now I am looking for alternatives? A search on google doesn't reveal much.
Can people name any similar services that exists? A poster mentioned dogpile. ANy others?
I would really like to see GOOGLE getting into this.
Deja probably can't keep everything that is on USENET (anyone have any idea the total number of newsgroups there are?) and for them to make any money, the ability to search an increasingly large archive becomes unfeasable.
I used to love USENET, but now it's nothing more than a tool for the would be marketers to send unsolicited e-mail and to shameless promote crap. It basically has lost a lot of its appeal lately.
One good way to get by the search problem is to distribute the content of USENET to separate archives based upon their hiearchy, so that comp.os.linux gets archived by VALinux (or similar) and that all you need is a client tool that would search the indexes of these separate archives.
Unfortunately, it is somewhat late to actually implement this. If only this was a requirement far earlier in the process of creating newsgroups...
The really good stuff was from the start (1990ish?) to when the net exploded (1995ish).
During that period, in the sci. and comp. groups, there was a lot of good information as it was mainly university people exchanging info. I remember back then reading about people archiving the net to glass disks and such. Where did all the old net go?
Is it me or is Dogpile's usenet search limited? I can't seem to search by date, language, sort by, etc. like Deja.com's Power Search. :(
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
It's too easy to spam, even semi-private NNTP servers, like Netscape/Mozilla server are hit with spam enough to piss you off. All the serious development takes place on mailing lists and that can be a lot of email. (and it's not spam proof either, the kernel list is spammed at least once a week)
We need a public key infrastrucutre added to NNTP such that to post you need to use a key that has been submitted to a server, at least for technical forums. Call it automatic moderation, once you're trusted then you can do whatever, spam causes your trust to be yanked. This has the added benefit of building up the key web.
Services like Deja provide a useful service, once you weed the crap out. Most of what they archive is junk. And there is so much of it that they can't keep it all online. There have been some critical usenet threads that need to be archived for easy access. There are still important threads and messages posted.
What about that group that says they mirrored the "internet" going back so many years. I wonder how much of dejanews is stored in their archives, or any other usenet->web gateway that existed however briefly.
Any sufficiently advanced civilization is indistinguishable from Gods.
I find USENET to still be one of the most useful internet resources. It has never been easy to search, even with deja; I was delighted to discover the link to dogpile in a post below, I'll definitely be making use of it.
I have found that at my current place of employment, where there is no news feed, it is considerably harder to get work done without that resource. I used to ssh into my cable modem box to browse and post from there, but now I've moved to an apartment without cable. So I'm negotiating with our sysadmins to try to get a newserver set up.
The dogpile interface is definitely better than the old deja. In particular, each "hit" gives you the entire thread, instead of spreading a thread out amoung individual article links.
I think that many of the complaints about usenet being swamped with spam and useless are from people who are not familiar with better news readers. You can filter a lot of that stuff.
One thing I would definitely like is a usenet interface to slashdot. If it was read-only and you had to log in and go through the web interface that would be fine.
I agree, and I've thought about that some. But I think the only way you could guarantee a spam free USENET is either strict moderation of everything, or a registration system where you would not be allowed to post until you registered, giving address, phone, etc., and having it verified. It would have to be controlled by an independent orgaization that would have a contract with ISPs -- they would not get posting access until they guaranteed that all their users registered with it.
I dunno -- it would be a mess, but would sure solve some problems. Of course, kiss privacy and anonymity goodbye...
One of my current projects is a search engine that combs both the web and usenet based on simularity data. A portion of this data is computed using analysis of files, locations, etc and the rest is done by a sort of moderation system similar to Slashdot that lets users group and rate files. To the system both text and binary files are able to be searched. So if you found a pic you liked you could use it as your sample and the search engine would return all of the others that matched the search you specified. You might get back pics that matched the same signature as the sample, pics w/ a similar name, or pics that had been group moderated into the same class as the sample. Right now I'm doing a lot of research on file signatures, ways of telling how similar one pic (or mp3, or anything) is to another file of the same type (pic, sound, text).
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
You can connect to this server using any USENET newsreader program, read any article you like that way (but not post, of course). That would let people use their favored newsreading environment--which already has functions for threading and searching individual messages--to go in and read whatever they want without having to screw around with USENET search engines' moronic interfaces (of which I have never yet found one that worked decently).
Granted, this would make it harder to do global searches across multiple years...but I'd gladly sacrifice that in exchange for an interface more useful to me in searches of smaller scope.
--
Editor Emeritus and Senior Writer, TeleRead.org
This is what you used to be able to do before AltaVista's redesign:
Type in a search phrase and pull up all matching web pages and usenet articles which matched.
This was EXTREMELY valuable, especially if you were hunting down the answer to a question or problem. One search did it all, and the breadth of knowledge in the web + usenet could not be beat.
AltaVista modified their service so that they no longer do usenet searches, about a year ago, I think it was. I couldn't believe that they would take out that excellent feature, and I wrote them and complained, but of course they didn't care (for the record, you can still do some kind of usenet search, but its not on articles it just returns pointers to "relevent newsgroups").
I never use AltaVista anymore. I use Google, which doesn't do usenet either (although I suggested it to them and they said they would be considering it in the future), but I'll be damned if I use AltaVista again.
Oh yea, there's a demand all right, I'd certainly like a better search than deja but would I be willing to pay, will advertisers be willing to pay, who else is going to pay? Last year a business plan that skips the payment part might have worked, these days "show me the money".
Instead of asking "is there a demand" ask
is there a demand if each search costs the user $0.01
or
is there a demand if each results page has three banner ads (and is there a demand from advertisers for this)
So, what's your projected cost per search and how do you indend to cover it?
development.lombardi.com
I pretty much gave up reading Usenet on a regular basis probably about 3-4 years ago due to the volume and noise. Dejanews at least provided me with a way of looking for specific info when I needed it. Granted, for that kind of use a shorter, year length history is usually fine. However, the much longer history has proved indispensible on a number of occasions when dealing with older systems. One particular project I worked on was upgrading the version of Tcl/Tk from one of 5 years ago (when the project's software & hardware were frozen) to the current version. There were a number of major changes in the language & libraries in that time, but instead of having to find all the difficult migration issues (some of which the docs didn't mention) myself, I did a bit of searching on Dejanews and saved weeks of time and considerable fustration. One is rarely the first to have a problem -- and interactive knowledgebase of Usenet often provides the solution. I'm really disappointed by deja cutting back this resource and would love to see a more complete & better archive out there!
Also, aside from the practical aspects, the contents of the archive are a valuable bit of .net history & group memory -- it'd be a shame to lose it...
Why rely on one centralized commercial archive? I guess most of the users of the usenet have their own small archive of newsgroups and posts they see as valuable.
All we need is an open protocol for communication between newsreaders, for passing around searches and results, and enough readers to support this. Searches would be independent of the moods and financial motives of the owners of a single archive and consistency could be assured by the redundandcy of hundreds of people having the same article in their archives.
Not only does the Internet need a better Usenet search engine, it needs an entire Usenet frontend. More and more ISPs are either not offering news service, or simply pointing people to supernews.com. To me, that's a waste of time, as supernews is often overloaded, and the traditional Usenet interface isn't exactly as user-friendly as it once was, what with it being mostly spam and porn these days
I'd like to see a site that I can not only use to search, but also to post and reply to Usenet articles. I could give out a lot of help and free advice if only I didn't have to fire up a news reader. Don't get me wrong, because I love command-line interfaces, but just being able to bring up the relevant information is much more helpful than having to look through a buch of posts about how I should buy these printers or use these domain registration services.
Brad Johnson
--We are the Music Makers, and we
are the Dreamers of Dreams
Brad Johnson
in alt.fiction and other groups people post original works and some copyright those works. There is an implied right for that work to propigate through usenet and be used for a certain amount of time but Deja by keeping such posts for YEARS, rebranding them, slapping an ad on them and turning a profit on them is nothing short of blatant copyright violation (indeed US copyright law doesn't require a copyright notice on the work).
I doubt you'll see ANY service carry more than a year's worth of articles and for many corporate lawyers a year may be too long. One day someone IS going to sue these services for copyright violation and the Usenet services live in fear of that day.
... was Jeremy Nixon's Deja power search, especially after the redesign/relaunch. It's basically just a reorganization of the form from Deja's own power search page, but I find the slightly different interface (with no unnecessary graphics and no scrolling) to be simpler and quicker to use.
...
Unfortunately Jeremy doesn't have his own back archives
----
lake effect weblog
{Network engineer in Chicago--looking for work!}
Should have a full USENET archive back to when they started Alexa. They kept the library after selling Alexa to Amazon.
Deja was useful and still is, but they seemed to decide supporting USENET was not where the money is. No surprise, USENET is largely abandoned by the software development community and money community now. The top newsreader is from microsoft!
Has it been over a year since you last donated to the Electronic Frontier Foundation
A search frontend, like this one? You can also go find some more frontends, they're out there, or you can just write your own.
But even for web searches, I'm coming to the conclusion that Altavista is just sucky. Firstly, it's way out of date, half the links are broken and it's still indexing my homepage as having content that was changed over two months ago. But mostly I get annoyed that the search page refreshes itself every 5 minutes. Presumably this is done in an effort to fake ad impressions but it is annoying to have a search page disappear while you're trying to read it (especially since at work, I frequently have to disable the proxy for development purposes so it refreshes to an unavailable page) but it doubly sucks at home where I have a dialup connection and leaving Altavista on screen means my phone line is always busy.
Sorry for the rant, I just had to say it. Suffice it to say that I'm looking around for a decent alternative (I'm starting to use Google from Monday I think)
Rich
We definitely need a good Usenet site for searching and timely browsing/retrieval. I have used deja religiously. When my company blocked it as explicit content I freaked and sent an e-mail to everyone in the company I knew. A slew of people backed me up and those are only the people who cc'd me. They wrote long descriptions of why the site was so valuable, ordering our firewall group to return it's access. Took about a week but we got it back.
We must start an open source project to provide this functionality. It is our duty to expose this wonderful free forum to the general internet community. The problem is the technical limitations. Such a program(probably distributed) must have these characteristics:
1) Fast Searching - this is rather tricky. Dejanews was, or still is, capable of searching the full text of 100 million articles VERY quickly. I don't know how they did it but I would imagine this would require custom database like functionality that could preprocess all articles indexing each word such that all the work was done in advance. This would certainly be the most difficult part of the project.
2) Quicker Posting/Retrieval Response Time - As it is deja does not respond too quickly to posts. It takes several hours I believe before a message is searchable on the site. I don't know about browsing particular groups but it would have to be on the order of minutes. This should be possible as MS Outlook News Reader is quite fast at this and we all know it has nothing to do with MS Software :~)
3) Ability to populate database with mailing lists. It would be a very cool feature if one could add an arbitrary mailing list such as that of an open source project to the indexable archive.
Also I believe someone could make a viable business out of it. A Usenet search site could be very profitable. I'm surprised no one has caught on to this. Deja dropped the ball and hid the "discussions" pages behind it's new facade. Odd. I can only imagine how many hits a site like that would get. If the USENET search capability was presented as the premiere offering of a site it might not seem so obscure to average users. They would perhaps discover Usenet as the great alternative source of information that it is. Usenet would be the latest and greatest thing! With all the "instant messaging" going on it would not seem so foreign to people. Also I would think some people would be willing to pay for this service. I would be willing to pay a small fee. Say $20 a year?
KidSock
This doesn't only relate to usenet, the problem with search engines today is that they don't in any way cater to people who know *exactly* what they're searching for. I think it's about time that someone comes out with a search engine for geeks that forgoes all the fru-fru simple language stuff for oodles of terabytes of pages and way of searching the damn thing with regular expressions. Hell, I'd pay good money (maybe $100 a year) for a service like that.
Of course we all expect binary/porn groups and several years worth of archives...
What? Oh, you meant except? Damn...
"The best we can hope for concerning the people at large is that they be properly armed." - Alexander Hamilton
Brewster Khale's Internet Archive has an archive of Usenet from 1996-1998, but they stopped for some reason (did they think Deja was doing a better job?). And it's only about 600 GB, so the disk space should be pretty cheap.
In a world where technology companies companies can sink overnight, are we to leave this to Deja and others? Is there no public repository where these Usenet articles are archived for all time? Is this publicly generated mass of information not available directly to the public?
No, the question is not whether we need access to a better search engine for Usenet but that we should have access to Usenet itself. I should be able to order a 100(or whatever) DVD set of Usenet posts to have in my own home. This is too important to leave to the whim of the market, this needs to be preserved for humanity for all time.
Maybe some of the people who have copyright on Usenet posts could bring some pressure to bear on the archive companies to make something like this available.
Rich
The Internet Archive Project has Usenet archives from 1996-1998...it is a .5 terabyte collection, but it is currently all on tape. However, they STOPPED archiving Usenet in 1998. www.archive.org
The Internet Archive Project is the project attempting to archive the entire web and related internet contents as a matter of public record. They currently have around 15 terabytes in the archive.
Push them to resume archiving of Usenet, and to get their old stuff online from the tapes. This is HISTORY, people! Historians 50-100 years from now will be DIEING to look at this stuff, and won't be able to belive that we threw it all away, even though the cost of storing it was dropping exponentially.
I would kinda hope that my great-great-grandchildren could get to know me by reading some of my better usenet posts.
--Braddock Gaskill
I started reading Netnews back in 1982 or so, before the Great Renaming ... heck, back when "the Internet" was a Larry Landweber proposal to replace ARPANET. At the time, aside from mailing lists, it was the only electronic discussion medium in existence. Like the current e-mail network and the World Wide Web, and IRC at one point, it had an interesting property: there was *one* network. Some sites got full feeds, some partial feeds, and there were a handfull of local groups, but everyone was on "the 'Net" whether they were at UC Berkeley or Bell Labs or the Pentagon. If you wanted a discussion, you took it to a mailing list, or you took it to Netnews. (There was FIDO, but rounded to the nearest hundred thousand, it had zero users.)
... and *all* those advantages have hurt Usenet when it comes to mindshare, and to the ability to attract the people who make 'Net communities work.
.sig)
That creates an effect not everyone sees. Usenet was the birthplace of hundreds, maybe thousands, of electronic communities, long before people were using "e-" as a prefix. Those communities, the people and personalities and cultures, are what made Netnews so attractive, so involving. (The current buzzword is "sticky.") Of course you're going to come back to see if your favorite netscum posted something outrageous, or if someone answered your question or replied to your answer.
Web-based discussions didn't kill Usenet, but they darned sure hurt it. Instead of one "'Net," there are tens of thousands, maybe more. I can't count the number of Web-based discussion forums I've seen. This conversation we're having right now is off in some tiny little corner instead of in a news group. There are lots of advantages to having it here
Instead of a grand city, with some wonderful neighborhoods and some seedy ones, we've got surburban sprawl.
Netnews could have survived spam. It could have survived the astonishing growth of online participants in the past five years. (It survived AOL, in many senses.) It's having a hard time suriving its current competition. Part of me is very sad to see it wither.
Ironically, the Web is both the medium in which Dejanews tried to grow, and the medium that choked off some of its best source material.
I'm saddened by Deja's dwindling support for Netnews archives. (Did they used to go back as far as 1990?) I understand why they failed to turn a profit on the business, why they've got a terabyte and a half (literally) of archived material they consider too expensive to keep online. I appreciate what they've done, and I'm glad to have what they still offer. I wish the Dejanews business had thrived; I still wish it well. --PSRC
"I'm not speaking for the company, I'm just speaking my mind." (my old Netnews
Stupid job ads, weird spam, occasional insight at
As of May 15, all messages posted approximately a year ago or more have become temporarily inaccessible via Deja.com. We will be taking this opportunity to reconfigure the service that provides messages posted prior to September, 1999. Therefore, these messages will not be accessible on the site for some time, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible. We request your patience as moving our server bed to a new facility will greatly increase our reliability and performance.
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
in a word? YES. There are so many times that my USENET searches get all weird on me, like returning things I didn't want and such, and I haven't found a search site that implements the kind of filtering that I want.
Fight crime, shoot back.
What? That's "Socialism?"
Fine, let Microsoft compete against AOL/Time Warner/CNN/Netscape on an equal footing -- no break up.
Seastead this.
I'm sure that there is a demand for the sort of service you suggest, but I doubt that there's enough of a demand to make it commercially feasible. In my opinion, if it could have been a money maker, then Deja would have hit the jackpot. The changes they made (or tried to make) to their service a year or so ago were innovative and interesting, but never seemed to catch on, or perhaps weren't implemented properly. For instance, Deja created a feature that would generate an E-Mail message to a user's mailbox if a response to his post was detected: what a great idea! However, it never seemed to work.
:)
I was really excited when the new Deja went into Beta testing of their expanded capabilities. Unfortunately, the potential was never completely developed, and now Deja has changed directions: Usenet is almost an after-thought, now.
Another example of Deja's Usenet scale-back: some of the slick graphical Usenet navigation tools have been removed. Remember the four-way arrows introduced early last year? I believe the up and down arrow would jump to the next thread. The left and right arrow allowed movement within a thread. Very handy tool. Now it's back to the old style, still effective, but not as user-freindly as the arrows.
It's a shame that Deja has moved away from Usenet, but I suppose it was inevitable. As a 5 year veteran of Usenet, a self admitted newsgroup junkie, and an unapologetic devotee to Agent, a piece of software that's seen little modification in two years, I have to admit that Usenet is not a tool that is easily mastered. Well, at least not by the majority of moderate-use Internet visitors, that is. I'm still explaining the concept to my co-workers but, for some odd reason, they seem to be intimidated by Usenet. Guess if it gets beyond point and click, homepage and favorites, most people lose interest.
To sum up, although I'd like to see a service similar to the one that you mention, I don't think it's a money maker. If it were, then Deja would be promoting, expanding, and improving their Usenet capability, rather than scaling back and minimizing it.
There is, of course, at least one alternative possibility: Deja mismanaged their upgrade, and squandered it's potential.
I don't know enough about the inner workings of the company to say one way or the other. However, I tend to think that the problem lies not with Deja, but with the nature of Usenet. Usenet is intimidating to many Internet users. For some, the concept can be difficult to grasp. Obviously, it's not as simple as the Web, and of course, the simplicity of the Web spoils many Internet users. My point is this: I don't believe Usenet, outside of the binary groups, particularly MP3 and porno, will ever attract the level of usage that the Web generates, even with tools such as you propose. And, of course, you specify that binary groups will not be implemented in the proposed service (and rightly so). So, although I'd like to see you give a favorable report, I doubt that you will. Please let us know one way or the other.
One good thing that will come of this: Deja's "Power Search" has had some of it's fangs pulled: all of those embarrasing posts I made to Usenet years ago, before I realized they could all be traced back to me, as the years go by and Deja loses Interest in archiving, they'll be that much harder to access
But be sure to make your stuff better than Deja and to better respect people's rights and feelings. I.e. NO editing of the messages, please, and certainly not to insert ads. As a matter of principle, I'm busy nuking all my posts from Deja because of the ad issue. The home ones are gone already, the work ones will be as soon as I get back to the office.
--
Linux user since early January 1992.
I think the store-and-forward scheme that Usenet uses is just obsolete, and this is causing the gradual decline of all the other technologies built on it. If I post a short article to an obscure newsgroup with a handful of readers, how many megabytes of storage is that innocuous message taking up on thousands of news servers around the world?
My ISP gets newsfeeds from several sources, and still I get maybe 30% of articles missing. Loss of articles is simply not acceptable in a modern system, when you have to compete with message boards which have no article loss at all.
Usenet needs a rebirth to position it as a viable alternative to message boards. It needs to concentrate on high traffic groups which actually benefit from worldwide mirroring.
Meanwhile, private message boards can take over the low end of the market, while providing NNTP interfaces and all the features of a real news server. The big difficulty here, in fact, is that most web hosts would not allow their users to set up a 'server', restricting the idea to the big boys, which is the opposite of the proposed low-end niche message boards should have. Perhaps something could be jury-rigged to send text via HTTP, with a client-side translator program acting as a proxy news server. (Too techy, though)
And news readers, the final point in the triangle, need to have solid and seamless support for getting groups from many servers.
Even the best web-based message board is nowhere near a good newsreader for ease of use. Sadly, the Web is seen as the only interface needed for internet applications these days.
I saw the article summary up there which said that Deja now only archives back about a year. So I dug back into it with the email address I still use as my primary address (the one that goes on resumes etc.) and yes! They're finally not indexing all the irresponsible crap I was saying as a Linux zealot and a Chaos Magick dabbler a few years back. I say Usenet should go to hell. I'm glad they've pitched the deep archiving they used to provide (going back several years).
Soon, with the help of Web discussion sites like Slashdot, maybe Usenet will just cease to exist.
Make all the search engines you want. You don't have the DATA! You can make all the front ends you want, but when (and not if) DEJA goes down the toilet, it ain't gonna matter. It will be gone!!!
This is nearly inevitable. Any single node of control becomes a point that will fail. It may not happen this year, but it will happen.
Multiple copies via diverse methods is the only nearly secure approach. That way when one of the versions fails, there will probably be time for one of the copies to replicate.
The problem comes in selecting which information to preserve, since one can't save everything. Probably the best approach is for those who would back up the internet to specialize in certain areas. Remember that nobody can learn everything anymore. Well, nobody can store everything either. Other groups need to take the place of public libraries, and index the sources of information in various ways. It won't be as easy as it used to be, but consider the rate at which new information is being created.
I think we've pushed this "anyone can grow up to be president" thing too far.
Is there a DVD-R set that contains the entire usenet archives? usenet is distributed - is there any node that has been archiving since the start?
Thanks!
1. 2.
I just want to say that the old DejaNews postings where a gold mine of useful information. I have worked at a major ISP and answers I found on DejaNews have saved my ass many times. The connectivity, e-mail and DNS of tens of thousands of users has been affected because of this. Imagine this being replicated hundreds or even thousands of times a day around the globe. DejaNews improved the fabric of the internet. The loss of the older posts is a major blow to the internet and needs to be fixed ASAP.
What do you mean, "Off topic"? We're talking about USENET search engine problems, right?
I think one of the problems that usenet is having is that it is becoming fragmented.. Many companies have decided to host their own newsgroups on their own servers and not share them with the rest of the world. (for example, borland.* newsgroups come to mind)
If a new search engine would allow me to search these "private usenets" as well, I would definitely use it over deja! Especially since I can't access NNTP from work because of our firewall and well, I fully agree that deja's new focus on reviews is stupid. They should've spawned a new site for reviews.
There was discussion of this on the NeXTSTEP newsgroups just a few weeks ago. So I'd say a lot of people are unhappy with the loss of a resource (deja) to which they've grown to like.
--Matthew
The biggest advantage Dejanews has in dealing with all this Usenet data is its ability to sort by subject, newsgroup, date, author, and so on. One can't do this with web pages, either. With Deja, you can be really imaginative, trying to recreate in your mind how someone might have phrased a particular question or who might have been interested in a certain topic which you want information about. This is why Dogpile (which another poster mentioned), which does not offer such options, is inferior.
Deja hosts chats about obscure technical questions, breakfasts in Pittsburgh, debates about graduate schools--thousands of real communities, and opinions, categorized to the tiniest niches, so nicely sorted and searchable, and which can be captured through time. What a sociological resource, if nothing else!
Usenet is a database, and Deja provides a proper search feature for it. That's value. Great value. What Deja refuses to realize is that it could charge for the resource. I would pay $20 a month for it, easily. Especially if Deja promised to maintain it well (and of course put the old archives back online).
Deja could add even more value to it. If it had been a little more ambitious, it would have added to its Usenet database discussion forums akin to ForumOne's. These are Web discussion forums: Salon, the Utne Reader, etc. Searching each one alone for a topic of interest is arduous, since each forum's population can be comparatively small and topics broad. Usenet archives are useful because they provide enough of a range for niche subjects to be covered.
ForumOne, though in concept magnificent, is practically useless because you cannot use boolean and you cannot sort by author, date, etc. But can you imagine being able to search the entire range of discussion on the web PLUS Usenet with one search engine, and being able to sort via useful database fields? It would be a treasure trove easily equal to the Web in value.
Deja could charge even more for that.
that older traffic only accounts for 10% or so of their bandwidth.
The more important question is, "How much revenue does that older traffic generate ?". Deja has turned far more commercial of late, the shopping review emphasis, the embedded advert links. We, the geeks who used it for Usenet searches, just aren't a useful revenue stream to them, so they've dumped us.
Face it guys, Deja is no longer your handy geek-friendly Usenet archive. It's now a "What HomeVideoTheatre Pork-Rind-O-Rama" review site, selling dumbed-down content and adverts to the stupids.
Time for pastures new. Maybe Dogpile.
If you are looking for old porn and warez from years back, this would be helpful. News is slowly dieing, (un)fortunately portals, like /. are becoming more the norm.
does anyone have any info on what a full usenet feed (with binaries) is running now? When I left Deja (April) we were getting 70+GB a day. But I don't think that was all the binaries.
Just curious
"We are not tolerant people. We prefer drastically effective solutions"
Eric
"Seven Deadly Sins? I thought it was a to-do list!"
I've only been on the internet since 1989, but I, too, used to read Usenet. I haven't for several years. Why? The sheer volume of it makes it too difficult to keep up with.
As you point out, Usenet used to be the default place for discussion. But that in itself--not the rise of other discussion forums--led to the current state, where a much smaller fraction of people on the internet use Usenet for discussion. As the volume of Usenet users grew, newsgroups simply became too big for the average person to follow. Face it, if you want to discuss the latest episode of your favorite TV show, would you rather do it in a group of twenty people, or in a group of a thousand? Sure, the larger group will be more diverse and the best comments will be more insightful than in the smaller group, but that's a secondary consideration compared to the time it takes to follow the group.
It's almost paradoxical--fewer people use Usenet now because so many people use it that it's inefficient. I imagine a much smaller percentage of people use it than did 10 years ago, while the absolute numbers who use it are still growing.
The rise of web-based and email-based discussion forums are a result of the death of usenet, not its cause. It's a far from ideal solution. The implicit goal is to cut the time it takes to follow a discussion to a reasonable amount. The solution is to create smaller groups of people simply on the basis that not everybody knows about the forum. (Not that this was intentional, mind you--I'm not saying anyone ever sat down and said, "I'll create a web-based forum so that only a handful of people will know about it and thus discussion will occur at a manageable level, despite the fact that I'm not going to actively exclude anyone." It's more of a Natural Selection sort of pressure.) But for the average user, it's better than Usenet.
Never take moderation advice from sigs, including this one.
Henry Spencer at UTZOO kept all of usenet on tape. These tapes went to weber@ucsd then magi@uwo.edu
and somebody (it might have been me) got them tpo Brewster@archive.org. I've been trynig to find a post I made in 88/89 and so far nobody has made these online and available for public searching.
I think Henry's tapes go back to '80 or '82. Brewster is very good at makinf them available; these is a couple of years of gaps. Deja has them but won't share. I think 91-92 is the missing bit.
Need Mercedes parts ?
"Nobody goes there anymore. It's too crowded."
Never take moderation advice from sigs, including this one.