World's Largest Databases Ranked
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
I would've expected to see Google in there somewhere.
scored a measley 17th. Oh well, time for more surfing.
Does the SQL Server mean MS-SQL?
I would have liked to see SQL vs non-SQL ranking too.
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
My organisation is an order, statement and invoice processing/clearing company ($5bn worth of transactions a year) and our database is 100GB.
The largest in the survey is 30GB.
Is my organisation the new record holder?
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
I wonder how many of the spammers allowed their databases to be evaluated for this list.
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed
You're off by 3 orders of magnitude. The largest is 30TB.
I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
Lasers Controlled Games!
I thought the Sloan Digital Sky Survey would have made the ranking too. But damn! It's not even close!
Based on all the hype about the national Do Not Call registry, I would have expected to see that up there somewhere. Then again, it probably consists of like one table and 3 fields. It certainly would qualify as a very popular database.
OK so this is obviously only vendors of databases and RDBMS systems.
In a broader sense aren't such things as the wayback machine a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN. Who's the daddy of these guys?
my other sig is written in brainfuck
I would imagine that the Winter Corporation's db is now climbing up the peak performance for online transactions right now ;o)
$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.
I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
Certainly. Microsoft is developer friendly. Their software just works, and it works well. For example, I just finished an LDAP login module, and after testing it against Novell's eDir and several other vendors, ActiveX Directory was hands down the fastest.
Also, Microsoft produces the only C++ IDE I'll ever use - Visual Studio. I just haven't found anything else for ANY platform that compares.
As long as Microsoft is on top, I can continue getting their solutions cheap, and they can continue to produce excellent products.
Shouldn't the World Wide Web be ranking 1st with its huge pr0n database?
Real geeks use acronyms.
29TB actually. (Due to rounding, presicely 28.547 TB)
"If anyone needs me, I'm in the angry dome."
I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...
I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?
.
1. SCO's database of threatening Lawsuits
2. The United States DoD of truly amazing useless shit
3. Slashdot's collection of slashdotted sites.
3. My pr0n database tops the chart at 437 nonabytes.
http://www.nonabyte.org
Ride on my dear friend. Ride on!
AT&T 94,305GB Daytona SMP AT&T Sun Sun
I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!
Seems to me that an image DB would be one of the largest on the net these days.
Case in point - look at all the goatse.cx mirrors there are.
Wait I'm confused... Is this supposed to be sarcasm?
Real geeks use acronyms.
I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.
It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...
A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.
Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...
Kudos...
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here
If you can read this sig - the bitch fell off.
CIA and RIAA are given a raw deal!
I cannot see what OS each DB is running on. Is that irrelevant?
France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.
:-)
A case of acronym confusion, I guess.
Sig ?
And the cost of an individual "unit" in a commoditized market is pretty close to the marginal cost to produce that "unit".
And the marginal cost to produce a copy of software is about as close to zero as any product will ever be.
In other words, Microsoft is trying to buck the tide of the process that reduces prices in damn near every other mass market in human history.
Get over it. Free software is the future.
Stanford Linear Accelerator Center - 828 293 GB. Almost 30 times more than in France Telecom's database.
I wonder if the anonymous DB2 database is microsoft's PC activation/snoopware database?
Especially in the peak workload category. I seen a lot of ms sql databases working overtime when slammer first came out!
Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.
Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms
ms
... database? Or is p2p not really a database... per se?
Sure there are a lot of redundant (read: hilton video) files, however there's something like 4,627,200GB of data available.
or... not...
Large yes, but I'm sure that their list of who they want to sue must be huge! (Atoms in the universe huge.)
One line blog. I hear that they're called Twitters now.
Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.
While it is nice to see the ranking in terms of size and usage, it would be nice if the survey ranked other factors like maintenance time and number of users to see how they really compare in operation. Largest number of OLTP might signify lower downtime but maybe not.
Well, there's spam egg sausage and spam, that's not got much spam in it.
The guy who did the summary is going to have a bill on his way... :)
Quote: "If this is your website please contact Verve Hosting"
And Verve hosting address is billingadmin@vervehosting.com...
JP
Yeh... anyone care to post a google cache link of dat?
troll ?, Thought it was kinda funny
wanted: one clever sig,apply within
Winter Corp's own results database shoots to number one in the 'Peak Workload' rankings after being linked to from Slashdot...
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Hmm - how to /. your own website in one simple step?
Boy is the slanted. I work on Large IBM machines with DB2 built-in... Were are those?
Some one lese wrote about google, it should be in this listing too, even if it is using a in-house developed DB.
Platforms: Windows or Unix... BAH!
does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?
Even better is the Google Glossary to solve your acronym hell.
Some are scored 10-1
shouldn't the overall best performer have been ranked 1984? and the rest from there?
every day http://en.wikipedia.org/wiki/Special:Random
From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.
I'm absolutely shocked that the NCBI's (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.
Oracle is 1st (France Telecom). I bet larry Ellison is launching a *big* advertising campaing based on these data.
They are going to exploit this thing "ad nauseam". Wait and see.
Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.
If this were Usenet, I'd killfile the lot of you.
Anyone else notice if you go to wintercorp.com it states:
Makes you wonder how definitive this survey really is.
#exclude <ms/windows.h>
Even funnier is that there's no such thing as OTLP: it's OnLine Transaction Processing. On-Transaction Line Processing???
This was tested against a live directory with the same number of users and objects each time?
How was your test environment organised?
Oh no, you were being ironic, I must pay more attention.
Slashdot, as the biggest SCO Flames database ..
The registry of some of my NT5 servers that has become HUGE after 2 years
My pr0n cd's sql database : )
WTF am I doing replying to an AC at 5 A.M on a Friday night?
in my life God comes first.... but Linux is pretty high after that
Francis Smit
Gee, it's too bad they couldn't get any responses from some of the big SPAMMERS. I bet their db tables and #rows are pretty PHAT too!!
pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]
I believe you have the right to, once a year, get your credit rating, for free, on demand (usually written.)
Here in Colorado, Equifax sends me a notice every year that my credit was checked and offers me a free copy of the report they (alledgedly) are sending out.
What remains scary is that, although my credit report was dead on, I have in the past had reports that were so wildly inaccurate I had to laugh out loud. But because the person whose information was included on my report had such great credit the credit reporting company (not Equifax, the other one), told me to just leave it on there and take the benefits.
So thank you, Mr. X in Texas! Without your lack of control and deep pockets we probably wouldn't have got our house. Merry Christmas!
Obviously, you would be crazy to not use some middleware, but things aren't as simple as any of the PR guys claim. Running queries asynchronously creates a different set of problems and complicates the entire architecture. If you look at the biggest installation, they all use middleware and most of them use Tuxedo. This includes most, if not all MS Sql Server deployments. OLEDB can't that kind of load and neither can standard COM+. Just look read the full disclosures for TPC. You'll see all the MS Sql Server tests wrapped Tuxedo with COM+. As much as Microsoft likes to slam EJB and Tuxedo being too expensive, you can't scale Sql Server without using tuxedo for really heavy deployments.
I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
I wonder if any of these are large government surveillance databases?
you mean 28.547 Tibi byte (TiB?) or 30TB, right?
OK I'll be flamed for technical illiteracy, but there are a number of archival systems which go into the Petabyte (1000 Terabyte) range but are still relational databases with row level access.
One I worked on stored the output of Cray supercomputers running modelling programs 24x7. The data was output to a bank of Teradata boxes and then archived to tape. The system had a robot tape librarian at the back end but could still operate as a relational database.
The historical data should all be in there by now which would make around 1.5Pb.
The vendor of the software that managed it all was talking about telephone companies planning similar systems to put up to 5Pb in a system.
Anyone top that?
I had always heard that walmart maintained one of, if not the biggest database in the world. Kmart appears on one or two of the top ten lists here, but not walmart. Anybody know what gives?
I'd truly expect the truly largest databases to be maintained by financial institutions (banks, credit card companies, the stock market, etc) based on the sheer volume of transactions. Either them or the NSA or the FBI.
Well, the results are wrong. Where I work they were told by Microsoft they had the largest MS SQL operational (all live) database of 18terabytes...
My database professor gave us the run down of the technologies that the NIH databases employ- its some impressive business! Researchers all over the world are indexing and adding papers... SCREW amazon!
In the future, I would want to not be isolated from my friends in the Space Station.
I'm surprised that Walmart is not listed.
Could the rest be just logs of past telephone traffic? All phone traffic ever made through the company? What portion of these databases contain actual used data (data that is likely to be used in business), rather than just stored historic data? Are companies kepping huge amounts of old data because they can? Because it gives the db administrator a stiffie to think he's got $many terabytes in his db rather than on old tapes in the basement?
We have databases in our organization (Star Schema, Red Brick) where the fact tables literally have billions of rows. I'm sure there are many other organizations (especially government entities) that have huge databases not on this "list". For those interested on operating at this scale, other interesting hardware/software data mining solutions in the same vein as a Teradata are Netezza Corp's database applicances.
The Gnutella Network used to have network sizes of aprox. 1 TB, according to LimeWire servents. Gone are the days of 1TB Gnutella networks...
I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.
I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
Developers: We can use your help.
France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture.
<grin>
Somebody give Mr. Fonseca a clue. With so many unemployed geeks running around, why can't eWeek find somebody who knows this stuff (even cursorily) to write?
I would be surprised if some government databases, such as Social Security's, would not rank on this list if they were allowed to be analyzed.
Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.
I used to be really interested in what DB2, MySQL etc. could do until I was turned on to Hans Reiser's vision with respect to file systems. In his view, the storage layers above the file system (complex database software) can be replaced by a more intelligent filesystem that itself acts like a database. I'm currently trying out ReiserFS (a filesystem included in the Linux 2.4 kernel) which internally uses balanced trees to achieve much higher performance in large directories. ReiserFS also wastes much less space in the storage of small files.
I'd say Oracle. Have you tried installing that baby? It makes MS Office look like Twiggy.
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.
If you go to the article, you will find that AT&T had the largest listed database at 94.1 TB - that's 9 times your speculation for Google.
And I used to work with some of the AT&T databases. Heck, the payroll system alone would have probably made the list in those days. (And I was the DBA for payroll for a while).
Also, some of the winners were using IDMS - a network implemenation of DBMS - not relational.
n/t
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
From my forays into mainframe land, 93tb could be supported by:
2 sysprogs,
2 dbas,
2-3 operators,
1 applications programmer,
and thousands of data entry personnel.
Everything would run batch, (including dumps into satellite dbs for regional or department uses) except for the online data entry, and the apps programmer would dbe setting up jobs for the operators to run at night.
And the first time any of them hose up the DB, would be thier last day of work on a mainframe.
putting the 'B' in LGBTQ+
This is over a month old already. Oddly enough I thought that I read it on slashpot in the first place. But maybe here.
I was never so happy as the day I was able to burn all my Teradata manuals, cause I ain't going back to one of those turkeys ever again.
The largest machines were about 250 nodes (Kmart, and look where they are today, and Walmart). I worked on machines up to about 135 nodes (Amps) (and 50 or so COPS) The performance never matched anything I've seen in DB2.
And even today, the performance tuning tools suck.
Oh, and as for your 1500 node limit, better check your manuals. Tucked away in the manual, and hardcoded into the operating system is a little limit - 1024 nodes - the origin of the name Teradata....
Oh, and it only takes 54 legal commands to crash one of those suckers (if you know the right commands, because of a hard coded limitation in the os as well).
I used to work for a company called Epsilon Data Management[1], in Burlington MA. They've been bought since I left them a while ago, but they where the keeper of AmEx customer transaction database for data mining and direct marketing (junk mail and phone calls).
Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.
[1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIII
My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.
As of 2001-01-01, France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that
FTE had 2002 revenues of 49B, with 240,000 employees.
ATT had 2002 revenues of 40B, with 71,000 employees.
Finally, SBC had 2002 revenues of 43B, with 175,000 employees.
So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...
The cure for cancer is coming: Reovirus
I once helped out in a study for the largest database AT&T wanted to do. To just store the data would have been 6 times the huge Walmart database's size or more. And this was just for a 3 month rolling store of the calls made on the AT&T network.
The 94.3 TB database is nowhere near what AT&T has to store. That is just one of 7 (last count I had) data centers they maintain. The total size of all the AT&T data approaches several THOUSAND terabytes. They maintain a converted bunker just to store tapes in!
Think about it, they have to keep records for YEARS about every call made on the worldwide entwork.
This is like ranking projects based on largest number of lines of code.
Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.
What about total cost?
annual cost?
time to build?
software versions?
hardware?
staffing composition?
I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.
Under the category: Database Size, All, DSS
#7 Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi
Mine's 20 feet long. And you?
Rank them by load, and you'll note the winners =)
Am I the only one surprised to not find eBay on the list? I suppose on one hand it is respectable to have a large and complex database, but on the other companies with massive databases as part of their business that DON'T show up on the list impress me more.
I work for a company that makes billing software for tier 1 telcos. My job is to tweak performance of the billing system and environment as we deploy into the client's production environment.
My team has an internal 17 TB database we use to test performance against, and every one of our clients has at least a 15 TB database. I can list four of our clients who maintain at least a 40 TB database. Not one of our clients is listed on that list (nor are we).
"The market alone cannot provide sufficient constraints on corporation's penchant to cause harm." -- Joel Bakan
The largest DB I've done was about 1 billion rows, processing the weblogs of a large ISP into SQL Server. It was about 1.5 TB.
:-)
I wrote some queries that reduced the processing time from 6 hours to 45 minutes
Me = smart
Microsoft OLE DB Provider for ODBC Drivers error '80004005'
[Microsoft][ODBC SQL Server Driver]Timeout expired
/vldb/2003_TopTen_Survey/TopTenWinners. asp, line 99
Bad web monkey!
1) ASP blows
2) You didn't catch your error
3) You let your error get spit out on the web page for me to start learning about your source code.
4) You should have used the OLEDB driver.
5) You should have cached those results instead of crippling your sql server fetching the same damn info 1 million times.
I didn't pay attention to politics until my country started to scare me. Recently.
FUCK tibi, gibi, mebi, and kebi. FUCK THEM.
That's a lot of data folks. For comparison, Microsoft's Terraserver, which in cooperation with the USGS (geological survey), maps the entire surface of the united states with photographs from the air, satellites, and so on.
That database of pictures is around 6 TB.
Some of the databases listed on the survey are even larger - approaching tens of terrabytes!
I wonder what Skyserver will be (new successor to terraserver, designed to collect and stitch together a map of the entire sky in 3d from all known and future telescope pictures)
Natural != (nontoxic || beneficial)
The BaBar Database System provides data persistency for the BaBar project. The system acts as an interface between about 5 million lines of custom written C++ code and a commercial Object Oriented Database System (Objectivity/DB), all running on 2000 CPUs and 100 servers.
This database is arguably the largest in the world: as of Fri Dec 12 00:01:26 2003, over 895.0 TB has been stored in 847149 files.
Since neither PostgreSQL or MySQL showed up in the list (not surprisingly), does anybody know what the largest databases are running either of them?
I would guess that PostgreSQL maxes out larger than MySQL. </fuel-on-the-fire>
France Telecom : 29,232 : Oracle : SMP : Oracle : HP : HP
AT&T: 26,269 : Daytona : SMP : AT&T : Sun : Sun
SBC : 24,805 : Teradata : MPP : Teradata : NCR : LSI
***Anonymous*** : 16,191 : DB2 for Unix : MPP/Cluster : IBM : IBM : IBM
16 terabytes, and anonymous.... Hmmmm.... I know! It's the motherlode of all porn sites! Either that or the NSA. Same thing, really...
Mmmmm Condeeeeeeeeee!
RS
Shoes for Industry. Shoes for the Dead.
I'd be very surprised if there aren't megalithic databases churning away in a black budget projects operated by unnamed government agencies that make these commercial ones puny by comparision.
For that matter, I'm curious as to who "Anonymous", the operator of the #3 db in terms of size, is...
---anactofgod---
---anactofgod---
"Equal opportunity swindling - *that* is the true test of a sustainable democracy."
Null Value != "unknown value"
A Null value is one that does not have a value.
For example:
I have a table of data about people, including eye colour.
You have an entry in this table, with your current eye colour.
I poke your eyes out.
What do you put in this column? To put 'no eyes' in would be inaccurate, as that isn't an eye colour. A null value would be accurate.
As a former employee (in the store, not at ISD) I know that most of that 240 terabytes is going to be in a database, not just files. I know Walmart keeps alot of stuff a secret, but they are rather proud of their IT stuff, and I'm surprised it didn't make the list
I find it hard to believe that K-Mart has a bigger inventory database than Wal-Mart!
How big is Slashdot's database?
Stanford Linear Accelerator Center weighs in at 500TB. They run Objectivity.
Internet Archive weighs in at 300-400TB and runs Linux.
Google is probably somewhere in that range, but they don't tell. A rough guess would be 3307998701 pages * 100KB/page / 1024KB/MB / 1024MB/GB / 1024GB/TB = 308TB. They run pigeons
You missed one important point!
FT provides a public service in France which means that they are not expected to make as much profit as a company in the free market.
For example, FT has the obligation to maintain the telecommunications for remote parts of France (mountains, islands,...).
A private company would just refuse to do it or would charge a lot more that FT.
Wally World, errr Walmart is suspiciously absent from that list.
They have a HUGE (200+ node) Teradata install.
Wal-Mart's Teradata database is more than twice that size. > 70TB
Wherever You Go, There You Are
(insert 'insert into' joke here)
So, what database does Slashdot use and how big is it ?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
http://www.slac.stanford.edu/BFROOT/www/Public/Com puting/Databases/
The relational theory of data is generally credited to having been launched with Ted Codd's famous paper, published in 1969. Interestingly, that puts it as dating from the exact same year as object oriented programming, which kicked off with Simula, also in 1969. It's funny because the oop people often criticise relational databases as being "out of date" somehow.
See this interesting site for details.
You would think that the largest DSS database record holders would be renowned for the use of such data to improve their business. So I was quite surprised to see that instead of WalMart, one the list was populated by K-Mart. Yes, K-Mart!
Number one in Decision Support System Peak Workload for Windows!
Number eight in the same category for all platforms!
See, small guys can do big things! We're a small to midsize consulting firm (50 or so employees), and yet we're on the top ten list of largest databases in the world!
*pops the champagne*
I am disrespectful to dirt! Can you see that I am serious?!
FT also runs Minitel, which some might scoff at but is not trivial to run. I actually found it more ominious that the "health insurance review agency" had so many top 10 entries. Makes ya wonder what who the heck THEY are and what info they've got hanging around.
Judging from the username, sql*kitten, it's a chick. In a geek forum like /. chicks get modded up faster than any other post, except maybe Linus'.
Perhaps a database of all the filthy jews "cured" by our beloved fuhrer.
Or as in the case of the US, would be required to provide service to everyone in their service region. I've been in some rather isolated areas in the US that had cheap affordable phone service. But I got to agree, we can't expect a public service to be run efficiently.
I looked and looked and couldn't find MS SQL Server. I wonder why.. Must be a flawed survey. ;)
required msg
My guess is NSA or another agency (DoE?). Maybe Echelon. Definitely somebody who doesn't want to throw away what the accumulate - the obvious possibility is intelligence, of course customer and usage data is also possible.
Travis
Particle physics experiments routinely collect far more data than this. The Babar experiment that I participated in stores enough data that its database is an order of magnitude greater in size than anything in this article (Current size: 895.0 TB).
See: BaBar Database for details, it uses an OO database (which in my experience was very painful for users)
Dan
I remember back when iPhoto was announced, first thing my friends and I thought when seeing the library was "PORN!!!!".
;-)
My db is the OS X library file.
Or largest publicly admitted-to databases.
Mine at work is 44TB, DB/2 for AIX, running on an RS/6000 system with 128 nodes. DSS only.
But I work for a really huge US company who doesn't talk to the media much.
It makes me wonder how many really huge ones are also flying "under the radar screen". Such as SCO's database of all Linux users, perhaps...
I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.
0 20 412/database.html
Press release:
http://www.slac.stanford.edu/slac/media-info/20
Cheers
KdenLive/PIAVE - non-linear video editing
"France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symbol manipulation program) architecture."
The author of this article just failed my bullshit filter. SMP in this context is "symmetric multiple processors" -- yes, SMP "Symbolic Manipulation Programme" was the name of what Stephen Wolfram wrote back in the early 80's while a grad student at Caltech, and open-sourced, and got heaps of shit for, because of a nasty copyright battle with Caltech over it. He was a student, and felt he owned the code he wrote while a student. Caltech felt differently when it started giving MacSyma a heck of a run for its money -- and Maple started raising their prices.
But this has abso-fucking-lutely nothing to do with database architecture. What "geek dictionary" did this writer look up this acronym in? Doesn't know what he's writing about. At all.
> FT also runs Minitel, which some might scoff
/.! Can I say PORN two, ... three times without being moderated down?
> at but is not trivial to run
Minitel is also one of the largest PORN DATABASE in the world. You can find there millions of PORN images in high minitel resolution (40x25 in 16 colors).
I hope that there is no spam filter on