Is There Demand For A Better Usenet Search Engine?
Anonymous Employee writes: "I was asked for a feasibility analysis to provide high-quality searching in a large Usenet archive (all expect binary/porn groups and several years worth of archives). This is similar to what Dejanews wanted to provide before they re-branded to Deja last year. Do you think there is a need for this or is high-quality Web searching + Usenet browsing meeting your everyday needs in terms of information retrieval? If not, do the existing Usenet search interfaces suffice (Deja, one year worth of archives, not-so-good search interface - Remarq, three months worth of archives, okay search interface)? ...and also, is real-time indexing (i.e., you can search for an article 'very soon' after it has been posted) important?" In light of Deja's recent faux pas, I think this question is rather timely, and I have to admit, I wouldn't mind the ability to search Usenet posts older than one year.
it's pretty much the whole freakin thing :)
-Jon
this is my sig.
The Dejanews usenet page has been my home page for years now. Whenever I needed to find something out, it was far easier to see if someone else had asked the same question I had in Usenet then it was to wade through Microsoft's MSDN site or page after page of crappy vendor HTML.
In the last few months, the quality of the results that I'm turning up has decreased markedly. Deja has decided to shelve all their 1995-1999 Usenet archives and concentrate on just the newer stuff, apparently because that older traffic only accounts for 10% or so of their bandwidth.
WHAT? Of course it does! There are enough people using Deja as their Usenet client for this to be obvious. The 10% or so of their traffic that was a result of the 1995-1999 archives was th result of hundreds of thousands of other people like me searching and finding answers.
Deja has made a mistake in alienating the audience that made them one of the most visited sites on the web. For this, I predict that Deja will either fold of massively re-organize within the next year.
They screwed us over and broke a trust. You can't regain THAT in an IPO.
----
Greetings, Recently we moved the Deja.com servers to a new facility in order to provide greater reliability and performance. The move is now complete and we thank you for your patience.
Please note that currently our Usenet Discussion Service only retrieves messages from the past year (back through June 1999). As announced, we are reconfiguring the service that provides messages posted more than 1 year ago in order to provide greater reliability and performance. This will take some time though, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible.
-----
So I would wait for a few more weeks, and see if the situation improves.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
If you could get a high quality search engine with archives going back for many years (at least 1991 would be nice), I'd pay for a subscription to a service like that. But a free front end with ads would be acceptable.
:-)
I have several clients who have almost completely abandoned deja because the quality has disappeared. They've asked me how hard (i.e. how much $$$) it would be to set up a similar service for them internally. I give them the cost estimates for a full time usenet+searchengine system admin and a pair of good machines. Then they ask if there is a company out there who would do the same thing for less money than the US$100k/year it would cost to do it themselves.
It would be especially nice to see corporate accounts set up as well, so any employee in a company could do high quality searches.
My own opinions on deja are pretty vituperous right now. If you could buy a copy of their old archives and provide a better service than those losers, you'd have a fairly large audience. Try doing what dejanews did when they started, going around to usenet admins and asking for copies of backup tapes. Be prepared to get old DC-150 carts and 9 track reel to reel and many other esoteric formats. Could be a fun project
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
I would concentrate on the comp.* and other technical newsgroups rather than trying to mirror the whole damn thing. I would hazard a guess that a lot of that 10% of traffic that Deja said made up their backpost searching was looking for technical support, hardware information, or software help. Having a 15 year backlog of rec.humor.jokes or alt.fan.brittany-spears (or any other pop-culture NG, of which there are thousands) might be cute, but it's really rather worthless.
... !) Usenet has since its inception been a celebration of free expression. Stifling that because people have to worry about repercussions far in the future would be kind of shitty.
:)
Just as food for thought, there are also some privacy issues here. You have to ask yourself: do you really want a decade or two of your scribblings to be instantly available and indexable and searchable by anyone on the planet? Think about it - every immature flame, every embarrasing post, every moment you'd love to live down, now showcased and painfully easy to find by someone with a couple of minutes and a computer... I'm kind of glad that Slashdot "forgets" or de-indexes my comments after a few weeks. There are a lot that I'd just love to bury and in effect have as soon as they exit my user info page and leave the search index. Now imagine them staying with you for years, even decades.
And it can get worse than simple embarassment. I know for a definite fact of one case where two guys were engaged in a long-standing flamefest in a NG. Guy 1 went on dejanews.com to look at what else the other guy (Guy 2) was posting... and found some two-year old backposts to a cancer support group because guy 2 was battling some form of cancer. Guy 1 brought that up in his next flame and really just humiliated guy 2 in front of hundreds of people. Until deja killed their backlog, you could still find both those posts, and hundreds more just like it. Imagine trying to live that down.
It's incidents like that that really cause me to agree with privacy advocates about the danger the Internet poses. Never in human history has it been as easy to delve into a person's past as it is now even without a superorganized listing of their thoughts and opinions of everything they felt compelled to write about for years into the past. Such a complete archive really would pose a lot of problems for many people (imagine just a 10 year log of alt.support.cancer
With that in mind, like I say stick to the tech newsgroups, and you'll run into far fewer problems.
--
I think there is a world market for maybe five personal web logs.
My news server has just the big 8 and only the alt groups that users request. With a 15 gig news spool, I only have to expire articles after two months.
Doesn't take a math wiz to extrapolate that to see how mucn disk space a years worth of REAL usenet newsgroups would hold.
They should have never trashed 1995-99 without notice. 95 was when the net started to explode and removing that removed history that can never be recreated.
(Then again, I'm glad some of my old posts finally went away. x-no-archive works, but since everyone these days just quotes entire articles when replying with one line at the top, x-no-archive was a bit useless anyway...)
One of my current projects is a search engine that combs both the web and usenet based on simularity data. A portion of this data is computed using analysis of files, locations, etc and the rest is done by a sort of moderation system similar to Slashdot that lets users group and rate files. To the system both text and binary files are able to be searched. So if you found a pic you liked you could use it as your sample and the search engine would return all of the others that matched the search you specified. You might get back pics that matched the same signature as the sample, pics w/ a similar name, or pics that had been group moderated into the same class as the sample. Right now I'm doing a lot of research on file signatures, ways of telling how similar one pic (or mp3, or anything) is to another file of the same type (pic, sound, text).
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
... was Jeremy Nixon's Deja power search, especially after the redesign/relaunch. It's basically just a reorganization of the form from Deja's own power search page, but I find the slightly different interface (with no unnecessary graphics and no scrolling) to be simpler and quicker to use.
...
Unfortunately Jeremy doesn't have his own back archives
----
lake effect weblog
{Network engineer in Chicago--looking for work!}
I started reading Netnews back in 1982 or so, before the Great Renaming ... heck, back when "the Internet" was a Larry Landweber proposal to replace ARPANET. At the time, aside from mailing lists, it was the only electronic discussion medium in existence. Like the current e-mail network and the World Wide Web, and IRC at one point, it had an interesting property: there was *one* network. Some sites got full feeds, some partial feeds, and there were a handfull of local groups, but everyone was on "the 'Net" whether they were at UC Berkeley or Bell Labs or the Pentagon. If you wanted a discussion, you took it to a mailing list, or you took it to Netnews. (There was FIDO, but rounded to the nearest hundred thousand, it had zero users.)
... and *all* those advantages have hurt Usenet when it comes to mindshare, and to the ability to attract the people who make 'Net communities work.
.sig)
That creates an effect not everyone sees. Usenet was the birthplace of hundreds, maybe thousands, of electronic communities, long before people were using "e-" as a prefix. Those communities, the people and personalities and cultures, are what made Netnews so attractive, so involving. (The current buzzword is "sticky.") Of course you're going to come back to see if your favorite netscum posted something outrageous, or if someone answered your question or replied to your answer.
Web-based discussions didn't kill Usenet, but they darned sure hurt it. Instead of one "'Net," there are tens of thousands, maybe more. I can't count the number of Web-based discussion forums I've seen. This conversation we're having right now is off in some tiny little corner instead of in a news group. There are lots of advantages to having it here
Instead of a grand city, with some wonderful neighborhoods and some seedy ones, we've got surburban sprawl.
Netnews could have survived spam. It could have survived the astonishing growth of online participants in the past five years. (It survived AOL, in many senses.) It's having a hard time suriving its current competition. Part of me is very sad to see it wither.
Ironically, the Web is both the medium in which Dejanews tried to grow, and the medium that choked off some of its best source material.
I'm saddened by Deja's dwindling support for Netnews archives. (Did they used to go back as far as 1990?) I understand why they failed to turn a profit on the business, why they've got a terabyte and a half (literally) of archived material they consider too expensive to keep online. I appreciate what they've done, and I'm glad to have what they still offer. I wish the Dejanews business had thrived; I still wish it well. --PSRC
"I'm not speaking for the company, I'm just speaking my mind." (my old Netnews
Stupid job ads, weird spam, occasional insight at
As of May 15, all messages posted approximately a year ago or more have become temporarily inaccessible via Deja.com. We will be taking this opportunity to reconfigure the service that provides messages posted prior to September, 1999. Therefore, these messages will not be accessible on the site for some time, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible. We request your patience as moving our server bed to a new facility will greatly increase our reliability and performance.
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
I'm sure that there is a demand for the sort of service you suggest, but I doubt that there's enough of a demand to make it commercially feasible. In my opinion, if it could have been a money maker, then Deja would have hit the jackpot. The changes they made (or tried to make) to their service a year or so ago were innovative and interesting, but never seemed to catch on, or perhaps weren't implemented properly. For instance, Deja created a feature that would generate an E-Mail message to a user's mailbox if a response to his post was detected: what a great idea! However, it never seemed to work.
:)
I was really excited when the new Deja went into Beta testing of their expanded capabilities. Unfortunately, the potential was never completely developed, and now Deja has changed directions: Usenet is almost an after-thought, now.
Another example of Deja's Usenet scale-back: some of the slick graphical Usenet navigation tools have been removed. Remember the four-way arrows introduced early last year? I believe the up and down arrow would jump to the next thread. The left and right arrow allowed movement within a thread. Very handy tool. Now it's back to the old style, still effective, but not as user-freindly as the arrows.
It's a shame that Deja has moved away from Usenet, but I suppose it was inevitable. As a 5 year veteran of Usenet, a self admitted newsgroup junkie, and an unapologetic devotee to Agent, a piece of software that's seen little modification in two years, I have to admit that Usenet is not a tool that is easily mastered. Well, at least not by the majority of moderate-use Internet visitors, that is. I'm still explaining the concept to my co-workers but, for some odd reason, they seem to be intimidated by Usenet. Guess if it gets beyond point and click, homepage and favorites, most people lose interest.
To sum up, although I'd like to see a service similar to the one that you mention, I don't think it's a money maker. If it were, then Deja would be promoting, expanding, and improving their Usenet capability, rather than scaling back and minimizing it.
There is, of course, at least one alternative possibility: Deja mismanaged their upgrade, and squandered it's potential.
I don't know enough about the inner workings of the company to say one way or the other. However, I tend to think that the problem lies not with Deja, but with the nature of Usenet. Usenet is intimidating to many Internet users. For some, the concept can be difficult to grasp. Obviously, it's not as simple as the Web, and of course, the simplicity of the Web spoils many Internet users. My point is this: I don't believe Usenet, outside of the binary groups, particularly MP3 and porno, will ever attract the level of usage that the Web generates, even with tools such as you propose. And, of course, you specify that binary groups will not be implemented in the proposed service (and rightly so). So, although I'd like to see you give a favorable report, I doubt that you will. Please let us know one way or the other.
One good thing that will come of this: Deja's "Power Search" has had some of it's fangs pulled: all of those embarrasing posts I made to Usenet years ago, before I realized they could all be traced back to me, as the years go by and Deja loses Interest in archiving, they'll be that much harder to access