Domain: archive.org
Stories and comments across the archive that link to archive.org.
Comments · 7,005
-
Oldest /. emtry
Look, ma - no trolls!! But anti-MS comments in da hizzouse!!
I much prefer the current /. -
not new at all
Excite White Paper
Check the copyright date.
BugBear -
Re:But...
and many linux distros only have beta quality 64 bit OS'es.
LFS + CFLAGS="-O2 -m64" + Building a x86-64 toolchain
Haven't tried it myself, having no Opteron and motherboard. :(
I can dream, can't I? -
Re:Use the Archive's crawler
From the FAQ:
I need to crawl/archive a set of websites, can I use Heritrix?
Eventually. For now, the crawler is still in early development, and only if you are comfortable grabbing code directly from CVS, wrestling with incomplete documentation, and running into undocumented limitations, would you want to use the current software. -
Wayback machine
Give us a link anway. There is a chance it is in the Internet Archive Wayback Machine. So give me a link, so I can check if it is archived.
-
Use the Archive's crawler
How about using Heritrix, the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler?
-
Here is DeadRat
-
Re:For all the PostgreSQL zealots out there...
"If you re-read my post, you'll realize that my point was not to target specific features of PostgreSQL, but rather to bring attention to some examples that were glossed over by the PostgreSQL zealots a couple of years ago"
Actually, the 8K row limit was well-documented. Checking archives of the PostgreSQL mailing list confirms that there were many posts about it, prior to PostgreSQL 7.1 (released in early 2001) which removed the limit. In fact, there is a web page, PostgreSQL Limits, located just 2 clicks off the main PostgreSQL site, which spelled out these limits. This page has been there for years, as verified by the Wayback Machine.
Further, the need to reguarly vacuum databases hasn't (and isn't not) glossed over. It's well-documented, listed in numerous FAQs and tuning guides, and comes up all the time in #postgresql on openproject.net.
I'm really not sure where you got the idea that these aspect of PG (one of which was removed over 2 years ago) were "glossed over." Nor do I see what relevance they have today. -
Anyone can be fooledI got conned out of a few hundred dollars around 2 years ago. It was so good that I didnt even really believe it was a con till now. Anyone remember cyberrebates.com? They had all sorts of stuff that was free or almost free after a rebate that they sent you in a few months. The catch was the stuff was ridiculously marked up in the first place (eg $100 for a flashlight).
The site meticulously documented the many thousands of people who already received their rebates and the millions in merchandise already sold.
But my rebate check never came. They suddenly went bankrupt and disappeared into the night. At the time I thought it was just another dot bomb. But now I see it has the earmarks of a huge con that must have made the principals insanely rich.
As with the original story I guess the sweetest con is the one where the mark never even realizes he was was hit. Looking back at the cyberrebates story, at the time it was widely considered legit. I'm still not sure!
-
Re:That's a bit overcritical....
The idea of eCommerce over the web started taking off in the '98-'99 time frame... Prior to that, CGI scripts and ColdFusion (only a short time prior) were the only options... To be asking for 7+ years experience in "web technologies" is like asking for Marc Andresson himself
Hmm... this time in 1996, I'd just started working for a company producing e-commerce sites for British Telecom. Looks like I'm Marc Andreesen
:-)FWIW, the back end was using an Oracle database running on Solaris. The web server interface was something called Oracle Web Agent, IIRC. This stuff was all available in late '96, two years before your time frame.
-
Re:Insightful 50%, Funny 50% !?!!!
If you go here you'll find a whole entirely data driven website (no static pages at all) which I wrote in January 1996. It obviously wasn't my first such website, but it's the oldest one that the Wayback machine has records of as most of the earlier ones were on intranets. Parts of it were written in C CGIs but most of it was written in PHP version 2.
-
Re:guilty until proven innocent?
Ok, this is what I thought the OP was referring to. Etree has a mailinglist with a bunch of ftp servers with "taper friendly" only music. There's also the Live Music Archive serving legal downloads through http, and Furthurnet which is P2P whitelisted for taper friendly bands. As a side note the hendrix estate has been gracious enough to permit "liberation" of old bootlegs. There are 8 or so shows on furthurnet, that alone makes it well worth the effort imho.
-
Re:guilty until proven innocent?
Furthurnet for one provides free legal lossless music downloads. Archive.org is loaded with fun stuff to saturate your pipe with. Perhaps I want to send digitized home movies to my parents across country, or doing the webcam thing. Maybe I run gentoo. Just because you can't think of good uses for your bandwidth doesn't mean there aren't any.
-
Re:guilty until proven innocent?
What legitimate need does a single person have when downloading 40 gigs of data over a short period of time?
As others have said, there are plenty of legitimate rich media sources on the net and reasonable ways to use the net that result in a lot of traffic. My favorites are downloading (free) music from places like archive.org and doing distributed backups via rsync.
Here in NZ, while you can get unlimited dialup access (its hard to do 10Gb in a month of dialup), virtually every national service that goes faster than dialup is capped at 5/10/15Gb with 10 being the most common. If you want to go faster than 256k, then the cap is more likely to be 500-1500Mb
:-(Oh and the local monopoly telco just fucked the gamers over with a hardware upgrade..
[/rant] :) -
Re:guilty until proven innocent?
What legitimate need does a single person have when downloading 40 gigs of data over a short period of time?
The single most obvious answer is videophone. Someone streaming the high-res output of a firewire camera can generate gigabytes of new data every hour, copyrighted only to him.
Another possible answer: He may be downloading music and movie files, and he could've paid for them. Or (more likely, today) he could be collecting hundreds of huge, public-domain movies
While it's currently true that no major legit service offers decent digital movie downloads, the ISP industry shouldn't assume it has to stay this way. If they advertise unlimited, they should try to provide it, or change the ads.
It's quite reasonable to suspect that if 40GB of data was taking place of the port Kazaa uses, that he's not transfering a family photo album or business documents from his office network.
If criminal activity is suspected, they should contact the police. -
SleepyCat RPL ???While most may be comfortable with OSS/FS or FOSS - free software under the GPL & LGPL along with software under various approved open source licenses there are some potential surprises.
The OSI approved SleepyCat license is used with a number of software projects including XAO Apache Web Services and the very widely used dual licensed Berkeley DB software products. The WayBackMachine has a WinterSpeak interview from 2001 with Sleepycat President & CEO, Michael Olson on How to make money with the GPL
...Berkeley DB is embedded in network infrastructure products like routers and switches, DNS and Web content caches, email servers and clients,
With just a few very limited exceptions SleepyCat license payment may be required should one "redistribute" the Berkley DB software, even when just done internally. ... Companies like Cisco, Sun, HP, IONA, Amazon and Sendmail use Berkeley DB. Open source projects like Cyrus, Squid, RPM, Postfix, and MySQL include it.The OSI approved Reciprocal Public License (RPL) while used infrequently is reportedly more viral than GPL, actually extremely viral per Technical Pursuit which dual licenses Tibet potentially requiring payment under TPL Biz licensing when not in compliance with RPL.
Are there other projects, licensing & circumstances of note that might be similarly surprising or problematic to OSS/FS users ???
-
Re:The Windows XP file system is crippled.
Yes, it's up there now, but back when I got XP I'm pretty sure that page didn't exist ( http://web.archive.org/web/*/http://www.microsoft
. com/windowsxp/winxpqanda.asp)
Also, due to the damn slashcode, all my pound signs have been removed. Remeber that 360 Pounds Sterling is 628 US Dollars! I'm sorry, but not wanting to spend $600+ on getting two computers to connect to a domain isn't cheap. -
Re:You went instead of his *girlfriend* ???
All Bush said was that the major combat was over.
No. He said Combat Operations in Iraq Have Ended
and it was later altered (without a notice indicating it) to say Major Combat Operations in Iraq Have Ended
When caught in this lie, the Bush administration web-masters made it harder to catch these revisionist tactics by disallowing spiders on the web-site
Another link:
http://www.lessig.org/blog/archives/001619.shtml
-
A few clarifications...for those few who didn't RTFA (I heard that happens on
/.)I am one of the people building SFLAN. Our map is a little outdated (and the San Bruno Mountain node is in the wrong spot). SFLAN and BAWRN have some 30 nodes in as many locations in San Francisco and a few outliers in surrounding counties. If you are in San Francisco and want to try it out, Cole Street is well covered. The SSIDs are sflanNN or BARWN-xxxxx; DHCP, no WEP.
The nodes are owned and paid for by individuals, many of whom are members of the Bay Area Wireless User Group. The Internet bandwidth for SFLAN is sponsored by the Internet Archive. If you live in SF and want to buy a node to connect your house and your neighbors, contact us.
We like to keep these networks as free (as in speech and beer) as possible. And it's working out so far. I hear Tim Pozar's neighbors keep him happy with occasional pies...
-
A few clarifications...for those few who didn't RTFA (I heard that happens on
/.)I am one of the people building SFLAN. Our map is a little outdated (and the San Bruno Mountain node is in the wrong spot). SFLAN and BAWRN have some 30 nodes in as many locations in San Francisco and a few outliers in surrounding counties. If you are in San Francisco and want to try it out, Cole Street is well covered. The SSIDs are sflanNN or BARWN-xxxxx; DHCP, no WEP.
The nodes are owned and paid for by individuals, many of whom are members of the Bay Area Wireless User Group. The Internet bandwidth for SFLAN is sponsored by the Internet Archive. If you live in SF and want to buy a node to connect your house and your neighbors, contact us.
We like to keep these networks as free (as in speech and beer) as possible. And it's working out so far. I hear Tim Pozar's neighbors keep him happy with occasional pies...
-
A few clarifications...for those few who didn't RTFA (I heard that happens on
/.)I am one of the people building SFLAN. Our map is a little outdated (and the San Bruno Mountain node is in the wrong spot). SFLAN and BAWRN have some 30 nodes in as many locations in San Francisco and a few outliers in surrounding counties. If you are in San Francisco and want to try it out, Cole Street is well covered. The SSIDs are sflanNN or BARWN-xxxxx; DHCP, no WEP.
The nodes are owned and paid for by individuals, many of whom are members of the Bay Area Wireless User Group. The Internet bandwidth for SFLAN is sponsored by the Internet Archive. If you live in SF and want to buy a node to connect your house and your neighbors, contact us.
We like to keep these networks as free (as in speech and beer) as possible. And it's working out so far. I hear Tim Pozar's neighbors keep him happy with occasional pies...
-
The media-driven ad factor is underreported.
I think the advertising factor is underrated. People like to focus on the Internet and Dean's campaign funds, but I think a good deal of this is a result of the coverage his campaign receives to the exclusion of other campaigns.
As recently as the ABC Democratic party debate, Dean gets a lot of air time from mainstream corporate news agencies. Dean's ideas get covered but we don't hear about the other candidates' ideas. In the ABC debate, the first quarter (roughly) of this debate was spent with a Dean-centric version of the old joke "Enough of me talking about Howard Dean. What do you think about Howard Dean?" while more important national issues took a back seat. At the end of the debate, Al Sharpton was denied the chance to make a formal closing statement.
When Dennis Kucinich, Al Sharpton, and Carol Moseley Braun pointed out Koppel's bias of focusing on campaign funds and poll standing, Ted Koppel repeated his focus on the horse-race saying this was about "money and polls" and all but said these three candidates either had "vanity candidac[ies]" or should get out. Kucinich got the longest applause of the night with his response about how the media views politics. He has since said people who used to ignore him now applaud him for his forthrightness in addressing the media issue.
The day after the debate, ABC news decided to pull their embedded reporters from Kucinich, Sharpton, and Braun's campaigns. They claim they will still cover these campaigns by phone and they claim their coverage is more than other networks. Meanwhile, Braun, in an interview with Democracy Now! today, noted that their campaigns are polling higher in some places than John Edwards' but ABC is not pulling their reporters from day-to-day coverage of Edwards. I encourage you to hear the interview for yourself and hear how these three candidates and ABC explains ABC's decision.
-
The largest databases aren't what you think
Stanford Linear Accelerator Center weighs in at 500TB. They run Objectivity.
Internet Archive weighs in at 300-400TB and runs Linux.
Google is probably somewhere in that range, but they don't tell. A rough guess would be 3307998701 pages * 100KB/page / 1024KB/MB / 1024MB/GB / 1024GB/TB = 308TB. They run pigeons -
Re:What about the WWW?
How about www.archive.org?
Makes you wonder what the hell kind of data France Telecom is storing....
Yes, Jean-Pierre on the 11 December 2003 at 11:03pm you called "Chaud et Sauvage" escorts and ordered a brunette 5'6", to arrive wearing a Napolean hat, snorkel and flippers..... -
Archive.org not on the list?They claim to have over 300tb of data.
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here -
Hmmm
OK so this is obviously only vendors of databases and RDBMS systems.
In a broader sense aren't such things as the wayback machine a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN. Who's the daddy of these guys?
-
Re:Are you an RIAA spokesperson?
I can understand what you are saying. However, I believe the RIAA only succeeds at making untalented "artists" survive. If you have talent, most likely you will be able to develop a fanbase without the help of the RIAA. At least that's how it is in today's world. There are so many good bands out there that get free promotion from their fans regardless of their label affiliations or lack thereof. They may not be superstars, but they're not unknown.
-
Re:Not that easy...
Yes you can. Check out the Archive.org Wayback Machine. They cache tons and tons of conent with pictures even!
As you can clearly see. -
Mirror
Here's a backup of the WTC page in the wayback machine.
Here's another guy with a lego WTC
--I prefer the term "Karma Slut" -
Re:What I am surprised by NOT seeing yet
In fact, McDonalds did a very specific test of robotics in their food lines. It was a Fanuc A-510 that had its cast components replaced so it had a stainless steel body (for wash-down purposes). It also had the regular grease replaced with non-toxic grease. It was edible, but NOT tasty!
:P You can see archive.orgs cache of a page that mentions the A-510's successor, the A-520i, here. Needless to say, it never made it past the initial study.Also, to be technical, there is a difference between the term "robot" and what is called "hard automation". I have seen people claim that a dishwasher is a robot. It is not. A robot is programmable and multi-functional. A dishwasher has a single purpose (two if you count torturing the cat). The same is applied to factory automation that is driven by automated equipmet runnign off of cams or pneumatic/hydraulic cylinders. Those are "hard automation" devices, as they perform a single function until they are mechanically altered.
-
Putting the "whore" in "camwhore"
(warning: slow links ahead)
I recall a little rant from long ago in response to then-19-year-old Jenni admitting to cheating on her boyfriend. Thrice. With the same 32-year-old guy. She caps-off the cyber-confession by commiting some good ol' fashioned copyright infringement against Peter Gabriel.
Good reading, really.
Oh, and good riddance you lousy, cheating, exhibitionist bitch. -
Putting the "whore" in "camwhore"
(warning: slow links ahead)
I recall a little rant from long ago in response to then-19-year-old Jenni admitting to cheating on her boyfriend. Thrice. With the same 32-year-old guy. She caps-off the cyber-confession by commiting some good ol' fashioned copyright infringement against Peter Gabriel.
Good reading, really.
Oh, and good riddance you lousy, cheating, exhibitionist bitch. -
Re:David Koenig is a genius
Well, 12 terabytes seems to be an upper bound per archive.org's FAQ:
How large is the Archive?
The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month. This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress. If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii. -
Nanotechnology timeline
-
Re:Internet archive
-
Re:Internet archive
-
Internet archiveThere is a major source of information about what SCO did and didn't do at the Internet Archive site.
Look here and enjoy SCO's own word on how they supported and contributef to Linux. Year 2000 is the best.
A corporate sponsor of Linux International, SCO has always supported open standards, UNIX Systems and server-based technologies and solutions that benefit business computing. Our engineers have continuously participated in the Open Source movement, providing source code such as lxrun, and the OpenSAR kernel monitoring utility.
Compare this to the legal filing they made here a few days ago telling the Judge that they never contributed Code.
-
Internet archiveThere is a major source of information about what SCO did and didn't do at the Internet Archive site.
Look here and enjoy SCO's own word on how they supported and contributef to Linux. Year 2000 is the best.
A corporate sponsor of Linux International, SCO has always supported open standards, UNIX Systems and server-based technologies and solutions that benefit business computing. Our engineers have continuously participated in the Open Source movement, providing source code such as lxrun, and the OpenSAR kernel monitoring utility.
Compare this to the legal filing they made here a few days ago telling the Judge that they never contributed Code.
-
Been there, done that, two or three years ago.
Should I bother releasing my patch to AALib? Replaces the monochrome buffer with a 32bpp RGB- buffer, uses a much more tuned colour-selection system than libCACA appears to.
A screenshot of the BattleToads title screen, linked out of Archive.org as I don't have a current website for the TextNES emulator worth pointing too at the moment, unfortunately.
But yeah... been there, did that, didn't think anyone would be interested so I never released the patch. -
We have this now. It's an archive.org reference.Here's an old Slashdot page in the Internet archive. Decoding the URL http://web.archive.org/web/20000301205131/http://
w ww.slashdot.org/ is straightforward. It's just the archiving site ("web.archive.org"), the medium being archived ("web"), the date and time, and the original URL being archived.There's another copy of the archive at "archive.bibalex.org", in Egypt. Brewster Kale wants to have four copies worldwide; then, he thinks, the information will be safe.
One problem with the Internet Archive is that the server farm is unreliable. Sections of the archive drop offline for days at a time. It's built out of thousands of commodity PCs sitting on shelves in a building in San Francisco.
Another problem is that web sites that are too complex don't get archived properly. If there are links embedded in JavaScript, Java, or Flash, they won't be properly adjusted to the appropriate archive references. This becomes more of a problem as more pages are created with overly complex authoring tools.
-
Re:archive.org and copyright?
How come archive.org seems to be above copyright law?
Archive.org invokes the DMCA safe harbor provisions (see bottom of that page for the DMCA boilerplate), which is described in Title II of the DMCA.
However, you'll find a careful reading of the DMCA reveals that none of the exclusions really quite applies to them; a good lawyer might be able to get them protected but I would bet against them.
Mostly they get by because they will remove content if requested, and nobody who cares cares quite enough to sue them on behalf of "the world" when they are satisfied to have their own content removed. In other words, they are basically OK because nobody cares to sue them. Strictly speaking, archive.org probably is the world's largest copyright violation.
This goes to show that sometimes if you break the law in a big enough way, you can get away with it. ;-)
(Not responsible for the results of any actions based on taking that sentence to heart. For entertainment purposes only. etc.) -
Re:archive.org and copyright?
Archive.org's terms
They get around copyright in two ways. First of all, the copyright owner can request that their material be removed from the archive. Beyond that, they basically describe an honor system; if you're not supposed to view something, don't. -
Rigidity stifles creativity
Any extra effort required to make web pages and their URL's preserved for eternity makes it more difficult for people to create them in the first place, which will mean less knowledge available, not more. Something unobtrusive that goes around preserving pages for posterity, like the Internet Archive, is the best soplution.
-
Obvious solution...Provide an alternative link to the source material on the Wayback Machine or archive.org.
What was the problem again?
-
This has already been done with industrial films
The Your Name Here Story did the same thing years ago.
We already have form letters, form movies, and form music. Not surprising we get form commercials as well. -
Hey kids, let's learn how to make links!
Like this: The Your Name Here Story
-
Not true
According to The Register, the contents of MP3.com will be hosted at archive.org
-
hasn't anyone ever heard ofFreecache? I've never used it, but I've also not seen it used widely yet and I wonder why. Please check it out. It's perfect for this type of situation. Unlike bittorrent, there is no seeding, no extra steps. Quote:
An example:
Say an up-and-coming rock band, the RockLobsters, has a website that has a large file, say
http://www.rocklobsters.com/videos/my-new-rock-v ideo.mpg
that is 5MB-1GB in size. If it gets popular, they will lose their guitars and homes to their ISP because their bandwidth bill will shoot up.While keeping their big file on their webhost, the RockLobsters change the URL on their webpage to point to:
http://freecache.org/http://www.rocklobsters.com /videos/my-new-rock-video.mpgWhen a user clicks on this,
- the user downloads the file from a nearby machine on their ISP's network, and
- the user is happy because it was fast.
- The RockLobsters are happy because they distributed their file to another user but did not have to send the file from their ISP.
- The RockLobsters' website's weblog registers that a download happened so they can ratchet up their expectation of breaking into the big leagues.
- The user's ISP is happy because they only downloaded it to their network once and served it to many users thereby saving on their Internet connectivity bill.
-
Re:Pot, meet kettle.
What's the moderation for "Wrong"?
A quick check on the wayback machine shows that the name was used at least since August 2002, fo0bar corrected his initial statement, but the correction has score 2 and the wrong statement score 5. -
Re:Pretty boneheaded move on Red Hat's part
The last entry on the way back machine doesn't show that TM on the FEDORA Project site.
However, it does appear that they have been using the Fedora name longer than the original Fedora Linux Project, but not longer than Red Hat has been associated with the Fedora.