"Slacker DBs" vs. Old-Guard DBs
snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."
FTA: "The world won't end if some snarky, anonymous comment on Slashdot disappears." :/
What? Nothing more important than anonymous slashdot trolls to moderate
Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?
I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...
You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.
And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.
Whale
Like the article says, "The world won't end if some snarky, anonymous comment on Slashdot disappears."
even technical folks sometimes don't like change.
Also, calling people who've worked on DBs for a long time codgers and younger DBAs "twerps" is stupid.
Haha, yeah, tongue in cheek. I get it.
Still lame.
Now get off my patch of closely-cropped ground cover, you callous jerks. ;D
Sent from your iPad.
Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.
It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.
Either is cool with me, as long they are cool and takes care of business, you know what I am saying?
It's all good.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Why the need to make it 'old guard' vs 'new guard'... seems like flamebait for fanboys.
"tastes great" vs "less filling", or just explain the merits of both and leave it there.
It's like forum kiddies arguing raid 5 vs raid 1+0.
Yes, if the database is important, you want the most CAREFUL management available. Obviously.
But if these -db apps work fine, and your data isn't corporate mission critical, who cares?
Seems to me convenience and interoperability score higher for most small datasets, am I wrong?
If I could do a security audit on a website by flying through a psychedelic 3D futurescape, I might just become a workaholic.
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
... it's when they get referred to with "relational" or "management system". DB fine. RDBMS they are not.
What is a Cartesian?....is that the water in Olympia Beer?
"tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'
If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...
the article is right that in some cases it doesn't matter if a transaction is lost. but in any case where money is involved it's a must. you can't just start a fund from your Oracle or SQL Server savings to pay for mistakes because it will kill your brand and you may lose a lot of future business. and any savings will be eaten up by the extra cost to hire people to solve all the data problems
i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found
The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.
Aside from Google, I've never even heard of those "upstarts".
There is a war going on for your mind.
Seriously any Old Guard DBA will put MySQL in the toy category.
I wrote an article about non-relation databases, and there were some interesting comments about the various tradeoffs etc: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
Last.fm - join the social music revolution
Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.
Network model databases will outperform relational all the time. You just don't have the same flexibility.
Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.
I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.
Fight Spammers!
I've never understood the UNIX world's fascination with relational databases.
Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.
We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.
A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.
Records not allocated are filled with a deallocated fill pattern.
A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.
To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.
Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.
We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.
I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
I think the question here isn't New DB or Old DB but when do you stop considering any data store a database? There are plenty of ways to write data to disk fast as hell but God help you if you want to do anything with the data later. I see these as specialty data stores - get the data in fast and then batch it out to your "old school" relational database to perform analytics on it later. Relationally Yours, MonoX.
What does this have to do with UNIX? Relational DBs were invented in the mainframe world.
Oops. Forgot that brackets get eaten. Typical record format is RECORDTYPE/FILENUMBER/RECORDNUMBER. The first Type 1 record for File 100 might look like 01-0127-0001 or whatever (specific binary representation in hex or octal would obviously vary depending on implementation and preference).
In our case, it's a 36-bit word shown as 12 octal digits, probably not a popular choice with UNIX folks. :-)
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
I'm a DB admin, and I use things that aren't toys; but what I've heard here is kinda harsh.
Look, it's all about "right tool for the right job." Why do you need a nuclear-powered drill that can make a tunnel from here to China, when really all you needed was a shovel?
For most daily projects that have small amounts of data, they may be using something like Crystal Reports or Excel or SPSS that just does all the number-crunching client-side anyway. You don't always need Oracle or [favorite DB flavor] for that.
"They said I probly shouldn't fly with just one eye," "I am Bender. Please insert girder."
When I saw the title I thought "I'm old-guard". Then I read the article and JOINs are a key concept to the old-guard.
My first few DB apps involved using a b-tree or ISAM library (or writing our own). Then the "new guys" started wanting to pay for a server that did JOINs. We did JOINs, just at the app layer and without the guaranteed consitency that a good relational design gives you. And getting a server that does it was expensive.
I wouldn't want to go back to pre-relational server days, but am also very thankful that I did write my own DBs from the ground up. I will probably never need to use the entire experience, but can often use bits and pieces of it, and I appreciate a good key/value store.
Can't quite fit the whole query into the title box, but if you were using one of those databases that Wayner's article talked about, you'd be able to query and find out if you were first...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
01-0127-0001 is the first type 1 record for file 127. 01-0100-0001 would be the first for file 100. That's what I get for doing patchwork re-editing an existing message before sending it...
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
...especially when you don't know what "better" is and you're too lazy to learn: unwillingness to learn is stupidity. Like the quote says, ignorance is curable; stupidity is terminal.
People who use these things and think they're great and that they're doing amazing things don't realize that the time they're taking and the problems they're struggling with are long-solved trivialities. Nothing new. Nothing cool. It's like someone struggling with a bunch of complicated excel formulas to make their spreadsheet do something that you could do in a few lines of your favorite scripting language.
Unfortunately this is something that afflicts most of the industry these days, and we end up thinking half-assed pieces of crap are cool just because you see them in your browser.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
I can't believe there hasn't been any mention of Berkeley DB yet. Guess what, folks: sometimes you just don't need the features of a full relational database. Sometimes all you need is fast, robust, reliable storage of indexed key/value pairs.
I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).
Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.
Tired of FB/Google censorship? Visit UNCENSORED!
In my experience, most UNIX programmers tend to assume a relational database for almost everything if it isn't a vanilla flat file. That includes programmers in realtime applications, C people, Java people, etc. I can't begin to tell you how many applications I've seen written to use Oracle, Sybase, etc., just to store a simple static table of information. There's no POINT to that!
Most mainframe environments, on the other hand, have many established options, and relational is usually only considered if you actually need that type of functionality. Normally, systems are written with a mix of different database types. Could be flat files, could be freespace, could be RDMS, or could (in our case) be DMS, a network database with some types of set-linking properties but not really table-based.
Again in my experience, working in the airline industry, I've seen a bias towards RDMS for enterprise applications that I simply haven't seen on the mainframe (in my case Unisys transaction processing) side of life.
A web site is a tranctions processing facility. Just replace fancy 3270 or Uniscope screens with HTML. Same idea, forms, etc. It wants in and out, fast. Why use something not really made for that?
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
yeah, who wants consistent data, that's for old guard type people
gee I deposited my paycheck three weeks ago and it's not in my account yet?
bank: don't worry, it will be there eventually.
I dunno.. You know what.. I am all about using the tool that gets the job done but if I'm spending the time to develop something I'm not going to take a chance. If I'm going to build a house am I going to build on a cracked foundation because it's convenient and cheap. I'm going to spend money on a sound foundation. If I am going to go into months of development I'm going to use something fundamentally sound like a relational. Besides you don't really need to know SQL inside and out anymore anyway. That's what ORMs are for.
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
Wayner's usually a good writer, and did some good theoretical-computer-science work back in the day, but this article was too short to answer the questions he asks at the beginning, and he mostly highlighted the new shiny things from big ASPs, which is generally what Infoworld wants.
I'm particularly disappointed that while he referred to the name and history of Berkeley DB, aka Sleepycat, aka Oracle Renamed-foo, he didn't actually talk about using it. (OTOH, Infoworld did review one version of it in 2005.) I no longer have my 4.1BSD manual on the shelf, but it was useful if you wanted something faster than using grep/sed/awk/look on tab-separated text files (which were the canonical Unix database format, and what I normally used for databases.)
These days if I want a lightweight database, I usually just put build tables in Excel, and then bitch about how it doesn't have a join or even decent text-editing and filtering capabilities, and occasionally have to save it as a CSV file and install vim on Yet Another Work-owned Windows box so I can get some bloody work done. I supposed if Excel did have a join function there'd be fewer people buying MS Access...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.
It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.
Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.
It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.
--Donald
www.rdex.net
I would LOVE to see a freespace database ported to Solaris
Sound like you might want to check out the "Berkely Database". It is very fast and has been in wide use for many years.
"http://en.wikipedia.org/wiki/Berkeley_DB
Berkeley_DB is also one of the unlaying data store methods that MySQL can use. I think MySQL can use either Barkely or a raw file system.
Going way back IBM had a system like this call ISAM. All of these are very simple
In Re: to my Re:, I like sqlite for simple DB applications, I get DB functionality with a very low overhead. Otherwise I use postgresql.
I have used Oracle and some others before now, but those are my two current DB's (sql-engines?) of choice.
www.rdex.net
Non-normalized databases are fine, and might be faster, for small sites, but when things scale, the sloppy databases (or worse, sloppy frameworks like Ruby's Active Record) just cause problems.
A scalable, normalized database means consistent data, when you have multiple applications hitting it.
For a web forum, sure, a relational database may be the wrong tool, because all you care about is speed on new stuff, the archive can crawl, etc.
However, what happens when your web forum adds some actual data, and then a few years down the road, you need new tools to talk to that data? You can abstract everything through code, and post into your webserver and let Perl/PHP manage it, but then that's a new piece of legacy code to maintain.
I keep all my stuff in a PostgreSQL database, and build Schemas with Views for web apps, etc. So when a new piece of functionality is needed, it's kept segmented off. So you can Prototype a Ruby app, maintain a PHP Web App, and even build custom tools in VB or other environments that talks directly to the database for manipulation. The spreadsheet guys ALWAYS loved when I could setup an ODBC connection, and they could pull real time data into Excel, instead of needing to go through a web interface and grab CSV pulls. Hell, I had a simple Excel spreadsheet that went out to my PostgreSQL database, got the necessary data, prepped it (all in Excel), and then stuck the data into Quickbooks via the SDK (using VBA of all technologies) to prevent needing to double enter.
If you were on a real GL powered with DB2 or Oracle, you could do even fancier things.
RDBMS skills are a good thing to develop. The overhead is pretty minor for starting off, and it gives you great flexibility down the road.
Now, if you have a technology REASON to want a non-relational database, go nuts, new tech can do new things. But if it's a refusal to learn relational theory, pick up a good book and learn the mathematics behind it.
Alex
For the vast majority of web applications, the "key-value pair" class of databases work fine.
I think the real problem is that the "relational database weenies" look down on the key-value pair databases, and there are a lot of non-DB-weenies out there who like using true relational databases as nothing more than key-value pair. It degenerates into name calling, instead of getting the job done, pretty fast.
You may have seen in the news recently how in the last decade or so Wall Street ignored some of the hard-won regulations and guidelines developed in the wake of the Great Depression.
We all know what happened as a result.
The same is true when dealing with data. You don't ignore the rules completely, or follow them only when you feel like it, or when you have time. As the old joke goes, Quality is *not* Job 1.1.
If the data isn't important enough to store correctly, then it's not important enough to be stored at all.
"My country, right or wrong; if right, to be kept right; and if wrong, to be set right." --Senator Carl Schurz (1872)
Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.
Knowing the ins & outs of your data structures is still a vital skill of programming.
Question everything
Maybe the fascination with relational databases is that you can easily work with the data in there.
What you describe just sounds like a file system. A specialized one, but it doesn't really support more than a filesystem does. Everything works fine if you have the key to the data. You can read the data, do your stuff, and update the data. But what if your problem it to find the key? Like you want to know which orders are overdue? Doesn't sound like the freespace file will help me there. Sounds like I have to implement the whole searching by myself.
When I am searching for a database solution then probably because I really need that searching and I want it to be fast, and I don't want to do it myself.
What you suggest doesn't sound like a database. It sounds more like an allocation scheme any database could use under the hood. What you suggest may suffice if my requirement is a high performance filesystem. But I don't see how it supports even the most basic database operations. I don't say your solution is bad. It just doesn't solve the same problem as a database.
so you start a small project, "we just need a few hundred/thousand records, a few key value links and the occasional transaction". so you start with a slacker DB. A slacker DB far too often implies a slacker hack software d00d.
Then it grows. Instead of educating themselves (Q: what's the difference between those who can't read and those who don't? A: nothing. ) and finding a better DB solution they thrash around trying to hack in DB functions into their code.
So they lose consistency etc. Soon they have a polluted DB that breaks all the time. Often they are proud of the heroics of the wasted effort they put into it. A good programmer know how to be correct form of lazy: do not reinvent the wheel.
putting the 'B' in LGBTQ+
Okay how do you find the data without a record number? I can see the value of the system but it also seems very inflexable.
I do agree that way to many programmer use MySQL for a file system, flat files, configs, and goodness knows what else.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Every database covered in the article is a toy.
From TFA: "The problem is that JOINs are really, really slow when the data is spread out over several machines."
This is the result of a poor design, not a database flaw. If you are running a web application against multiple databases, either cluster them or store all the data for a user in one database. (i.e. hash the login_id and select the database based on the result). If someone is doing JOINs across multiple machines and doesn't have a very good reason for doing so, then nothing short of a lobotomy is going to help them.
From TFA: "Each query can only run 5 seconds. The answer can only hold 250 items. Each item can have only 250 pairs."
Yeah, I'd say that meets the definition of a toy database alright.
From TFA: "Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too."
The writer must be joking. Who would do this when there are better options that don't involve implementing your own database?
From TFA: "there's no big reason to use Ruby, Python, Java, or PHP on the server when it can all be packaged in JavaScript"
Many people who write web applications actually want to do usefull things with the data they store like generate reports, keep logs, track inventory, or run queries. This doesn't work very well when the "database" is a text file sitting on the user's harddrive.
The answer is: simplicity and making it somebody else's problem. Think of a typical Slashdot web page. You are logged in to Slashdot so it prints out the data you chose. Specifically, it prints out the groups of data under the topics you chose, in the way (page layout) you chose. You could walk the individual data records yourself and decide what to display where, or you could tell something else to do the grunt work and simply apply some string formatting to the results. It has its good sides and its bad sides.
I have an uncle who was first in his old university who went from mainframes to the PCs because as a student he saw they are the future. His arguments were the standard ones - it's smaller, simpler, everyone will have / has one, and for a long while he made very good money selling business applications in DBase, later Clipper and the like. When he discovered those tools (as a student...) he was immediately drawn to them as they were more powerful and easier to what he used on the mainframes (i.e. exactly what you describe), and business was really good. The way you program in dBase/Clipper is really just a single step up from the "freespace" model you describe: additional features are that the library is taking care of maintaining data structures within the files (i.e. "records") and you have a rudimentary indexing capability, even with multiple fields in the data records, which makes searching enormously faster. For any kind of operation you still need to perform a loop over all records (or a subset of records/record IDs returned by the index operation) and do your calculation or other processing. For each record in the loop you can do whatever you like since it's your own code.
It's fair to say that this uncle is now old. dBase and Clipper were children of MS-DOS and as his customers migrated to GUI OS-es (i.e. Windows) so they migrated from his MS-DOS programs, though they were still perfect for the job, jevels (or abominations, depending on your point of view) of microoptimizations, every kind of tricks to calculating taxes, expenses, whatever. They simply clashed with Windows (even more so with Windows networking - his native network environment was Novell). So, the solution was apparent: start coding Windows applications or lose clients.
The thing is: he simply cannot wrap his head about these two things:
The event-driven GUI thing is easier to explain: in the old days, if he wanted the letter "A" to appear in the middle of the screen, he just poked some bytes in memory, and when he wanted input, he looped/blocked on the input function. The idea that something else is reading the user input and notifies you when it happens is... different.
SQL is harder. All his important applications - some developed over the course of 20 years, basically depended on the fact that core business processing would be a loop over some records, examining each record and with a bunch of calculations, IF statements, etc. decide what to do with the records - e.g. to what sum to add it. The idea that you *don't do it yourself* but say something like "SELECT SUM(x) FROM t WHERE cust_id=(SELECT cust_id FROM w WHERE name='xzzy')" is again something hard to swallow. There is an additional problem that he could easily kludge in arbitrary logic into records processing, creating complex special cases with ease. This is very hairy in SQL.
It's not than that he doesn't see how it works or that the result is the same as before, or that it's a valid way to do it - the problem is that apparently he can't wrap his head around these concepts. So his code has things like blocking the entire Windows application because he wants total control of the user input or again looping over all records with "SELECT * FROM t W
-- Sig down
If one in 1000 postings fail, the programmer does not care- there is a 99.99% success rate , but as far as that 1 user is concerned, there is a 100% failure.
Fine, codger together some assemblance of data storage using notepad, access, abacuses, whatever. If, heaven forbid, these "startups" ever took hold and gained any significant size, this "new model" will break, and I can't even imagine the hell it would be to merge, the "new model" into classical rdbms.
Sorry kids, you've bitten off more than you can chew, should have stayed in school and actually attended a class in db modelling. Good luck with this "eventual consistency", you'll need it.
Hi, I Boris. Hear fix bear, yes?
There's more to it than that. If I make a wrapper for a text file that lets me find and delete rows that's not really a database. (It only becomes a database when I call it TextDB and package it with an AJAX API)
// MD_Update(&m,buf,j);
I'm not sure if it is so much age or experience. The things you know end up boxing you in and it can be hard to overcome.
I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)
Sounds like a great open source project so why not start working on that? If you want it badly and would use it heavily and yet you cannot be bothered to do the work of porting one, writing one, or paying someone else to do it then why bother complaining about it?
Don't judge based on this article. The author's "young guys playing fast and loose" vs. "stuffy but reliable old guys" way of explaining things misses the point. Either he's a bad writer, or he doesn't know what he's talking about. A much better treatment can be found here.
Supposedly the benefits to something lightweight and flexible is that..it's that. But really, is it that hard to setup XAmp and dump some info into it with an INSERT statement? NEvermind that stuff like MySQL is free... I don't get what the fuss is when your standard MySQL DB can probably fit onto a USB Stick.
FTA:
The field was surprisingly diverse despite the fact that the offerings are so stripped down that they really don't have more than three major commands: Insert, Update, and Delete.
There's a write-only database now?
My guess is that part of the reason is historical - RDBMSs were coming out around the time Unix machines were, and both could be used by small departments as opposed to mainframe production shops.
They're also an extension of the native Unix toolsets, which were flat files with tab-or-comma-separated columns of data, so anybody who learned Unix in its first couple of decades generally had the expectation that you could do ad-hoc queries and build tools to automate them, without needing to spend 6-12 months negotiating a development project with the mainframe database owners. SQL is a bit clunky as a format, but the concept of schemas, where your database structure is stored and manipulated the way the data itself is, really works well if you're a tool-builder.
Is the Berkeley DB stuff close enough to what you need for a database?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The term "old-school" in this context makes me laugh. Back in the days when air was clean and sex was dirty, "relational" databases were considered a resource hog and were shunned by competent programmers. The fastest and most efficient databases were the "network" databases, but they also required the most work and the trickiest coding. Right in the middle were the "hierarchal" databases. Many programmers avoided the database problem by using a "reverse ISAM" arrangement which still used up some extra resources, but were easier to maintain. Of course, nowadays, when it is almost impossible to find programmers who can even program apps from tape into 32K systems, I can see why youngsters use the "telephone book" databases.. (so they can avoid actually having to think about their data!) I guess that's why it's so hard to find good assembly language programmers, too.
Anyone wanting to find out what it used to be like in the bad ol' days could look up CODASYL. Watch out for bad dreams.
"The mind works quicker than you think!"
Sometimes to beat the competition you need to re-invent a DB wheel.
The company I'm working for is producing a software for a niche market and the very reason we're beating our competitors to death is that their solution mandates the installation of a SQL DB by their customers, which is alienating their mainly IT-clueless customers (SMEs that do not have the budget to pay DBAs).
In that market, we need to crunch data and it's a very particular niche (which I won't name). Basically by being very clever, we came up with a solution that does not mandates the installation of a SQL DB (we have a one-click install, which our customers love) and that smartly bit-packs everything into memory. The compression that takes place is amazingly efficient, for our solution is completely tailored to the problem domain. You simply can't do that with an all-purpose DB.
We *own* our competitors on data import speeds by one order of magnitude (in our competitor's offerings, the DB is the bottleneck on data acquisition) and we own our competitors on queries. Our customers loves it.
So, assertions are great but sometimes you beat the competition because you have a product that is easier and faster to use because you re-wrote the weel.
Sure, it won't fit your Fortune 500's needs but it beats the crap of any traditional SQL DB for the problem we're solving.
They probably use it because it is already there. It is backed up. It is what they know. Is it right? Well that depends. What if you have 200 applications all storing their config data all over the place or in 1 file? Which would you rather have?
For simple things flat file is just fine (smaller datasets). But when you start pulling data out to keep it consistent, AND you have say 20k in records. There is a HUGE difference between Log(n) and n^3. You could be looking at 40 records vs 8000000000000 records (with a linear scan). That is just on a 'smallish' join in many databases.
Do not discount ACID just because its 'in your way' and 'you do not get it'.
Also 'just' in programmer speak is usually 'oh that is probably easy but will take awhile'. And 'probably easy' is programmer speak for 'not very well tested'. How do I know? I use the lingo all the time myself :)
Unix hackers are traditionally fine with octal, as long as you don't try to fit a whole digit in it, though I've generally found hex more useful. And as far as 36-bit words go, I know one local Unix hacker who has a PDP-10 in his garage. (Not sure if it's still there, and it might have been a -20 instead.) I don't think my wife's copy of "Meet Macro-10" survived our mid-90s move, and when I took a compiler course at that school, I decided to use the still-clumsy-at-the-time Amdahl mainframe Unix system at work rather than deal with the PDP-10.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Why use something not really made for that?
It's simpler to use something already built and tested, with known strengths and weaknesses, multiple mostly-compatible implementations available, tool support, and plenty of books and trained personnel to choose from, than to use a much simpler solution that is less well understood, or worse, one that I have to design and implement myself.
Or, to put it another way, why do the Chinese and Indians do business with each other in English when Esperanto would suffice?
Well, even something that's based off of at-the-time sound principles can end up being a mess.
Take, for instance, a product falled FileMaker. It's a product with a long software lineage - it's origins were FoxPro, way back when. I don't know how it performed back then, or how it was designed, but now it's got a massive WYSIWYG themableing 'frontend' to make a custom application, and the database is not directly accessible by the designer (just logical containers). It probably can be normalized, to some degree, but...
But it's not a good database for large amounts of data. In fact, I'd argue something like Access might even be faster/better than the modern incarnations. It might work fine for a small, initial dataset, but it doesn't scale all that well.
I guess my point is: a rational database can be poorly normalized, but a 'slacker' database can't be improved upon. The slacker db might work OK for what you initially intend it for, but data will often grow faster than estimated, and beyond the original design.
That's why relational/SQL is preferred by most technical people: not only can it be poorly designed and work well for small stuff (then normalized w/o changing all that much, and used for larger projects), but then it can be relatively easily migrated to a larger/more robust SQL database if need be.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Not sure what platform you were using or what years (lots of things had b-trees, though ISAM tended to be on IBM machines), but Unix V7 had a join command, which worked on the canonical tab-delimited ascii flat files that most Unix tools did, and PDP-11s weren't that expensive.
I last used it in the early 90s; I'd prototyped an application in Informix, but my department was too cheap to buy enough licensed copies for production use. You had to sort your data for the join to work, but that also meant you could use "look" to do binary lookups instead of grep. Since I only had to support a small number of scenarios that used join, it was easy to write a shell script to call them.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Too funny. I'm old enough to remember when non-relational databases were old-guard proven technology and the newfangled relational stuff was buggy, bloated, complex and unproven. I guess everything old is new again.
Many of these comments seem to focus on using these non-relational databases because the developer is to lazy to use, or doesn't understand how a proper relational database functions. It is probably true that that happens but that discussion totally overlooks what these non-relational systems are actually for and why they are popping up all over the place.
If all you want is a key-value store then why not use an existing relational database? They are amazingly good at what they do and storing key-value pairs could be considered a small subset of what they do. But even that they do very well. They have very fast data storage formats, they are very good at not losing your data, they have all the networking figured out, authentication, etc, etc. It would seem silly to be create a brand new database that does only a strict subset of what existing dbs can do. There is no point unless they can do things that an RDBMS can not do, or unless they could do that small subset of things better than a traditional RDBMS.
The main reason that these dbs are popping up all over the place is that people want to scale, and scale quickly. Google doesn't use big table because their devs are lazy or un-knowledgeable. Google uses big table because they need to scale. Transactions, constraints, joins, ACID. Doing all of those things in the db makes it harder to scale the db. Implement those features in the app and now your db can scale more easily and the app servers can still scale, thus your app as a whole can scale. That is the idea that is being explored in many different directions by all of these different non-relational dbs.
Mabye some of these databases are just jumping on the bandwagon without even knowing what the point is. Maybe some of their users are just too lazy to learn SQL. But the real reason for these new db's existence is that scaling a relational database is very hard and people are trying to find easier ways to do it.
I'm still in wait and see mode but that doesn't mean that this new breed of databases doesn't have a place.
From TFA: In the past, the answer was simple: Hook up an official database, pour the data into it, and let the machine sort everything out for you while you spend your time writing big checks to the database manufacturer.
What? Where has this wondrous product been all my life!? I mean, I've always stuck with the free shit like MySQL and Postgres on the assumption that paying top dollar would only get me a bit of extra polish and maybe some support from people who own socks.
Little did I realize that, had I re-mortgaged the house and bought one of these wonderous, I could "just pour my data in" and I'd miraculously reap the benefits of advanced relational technology without all those tedious decisions about data structures, normalisation, and writing queries. Not only that but (looking at the rest of the article) it sounds like merely using an industry strength RDMS will guarantee data intergrity? OMG! That would be cheap at any price! It would certainly cheaper than succumbing to that nagging feeling that maybe I should park my ego and pay a RDB specialist to do it properly.
Now I feel really stupid. There was me assuming that even a high-end relational DBMS would only be efficient and secure if the database was designed and coded carefully by someone with a clue, and that if you're just going to bosh something together to get the job done you might as well stick it in a flat file (or the ancient pre-InnoDB version of MySQL which comes with your web hosting package) and only worry about scaling it to cope with a billion records when someone paid you to do that.
The scales have fallen from my eyes. I'm writing that check to Oracle right now.
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
Yeah, when I first read this article I thought that was the dumbest thing I'd ever heard, but reading it made alot of sense. It's basically just using a simple schema like the "slacker" DBs for canonical storage, and then using additional tables as 'indexes.'
How FriendFeed uses MySQL to store schema-less data
Given their needs in terms of adding features, altering the schema, and building indexes, being able to make the indexes "eventually consistent" was huge. You have to remember that to keep things nice and denormalized, you need lots of tables, joins, and that MySQL (or any other FOSS RDMS) CANNOT build indexes across tables.
It turns out that there actually _are_ neurological reasons that music from your teenage years is extra-evocative, just as language-learning works better with young kids. Go read "This is Your Brain on Music" for more details.
A certain amount of music sensitivity appears to be hardwired into our brains, and the extra hormones after puberty increase music-remembering ability and the emotional aspects of it that younger kids don't have as much of. There's also a lot of intellectual development going on in those years, and it's easier to pick up more complex ideas from the music than you could when you were younger.
As you get older, that still happens a bit, and you'll still run into music that's new and cool which you'll enjoy years later, but now it's competing with lots of other cool music that's in your head which your teenage-years music wasn't.
What's much more annoying is when you find yourself tuning by a different radio station and wondering "What is all this noise those kids are listening to? They should turn that crap down and listen to good stuff" just like your parents said when you were a kid. Some of that's because 90% of everything is crap, and it's not the crap that you find evocative because it was around when you were a kid, and some of it's because 90% of everything on the radio is highly-packaged commercial crap, making it 99% crap instead of only 90%. And some of that's because kids always want to listen to new stuff and piss off their parents, and musicians always like to do new stuff, and if you want to bust into the Top 40 you've either got to do identical commercial crap better than anybody who's already there or else do something new. Rap was creative and interesting, but the whole gangstas-dissing-women motifs that dominated it were offensive. Hip-hop took that music and started doing lots of interesting things with it, though I haven't followed it. I'm finding my self playing a lot of old-timey (average hair color in our jam session == gray, leaning toward white :-), and starting to listen to jazz more (lots of deep classical stuff in there, which I haven't had the patience to listen to for a while.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
does any dbms implement the relational model properly?
I presume he said that because SQLite doesn't actually keep track of a column's data type.
Oh, the dynamic typing issue. Then I guess someone should write a tool that compiles column types into triggers that enforce them in much the same way that the genfkey tool compiles foreign key constraints into triggers that enforce them.
This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be.
Ah, the naivete of youth... These guys clearly have never spent a few weeks debugging a concurrency problem. If your data is important enough to keep around, it's important enough to get it right.
There's nothing deeply philosophical about corrupting the relationships between various data sets because your database doesn't enforce consistency. A certain desktop recently discovered just how bad poorly enforced consistency can make things. Those *young whippersnappers* won't stay young for very long trying to debug that seemingly impossible to find data corruption problem, or worse, a web site which displays garbage pages at random because your data storage mechanism isn't consistent when it needs to be.
Consistency in databases has always been a ground rule because consistency checks are more easily done by a database than an application programmer. Consider, for example, the prototypical record read-update-write operation on a database with strict consistency and enforced locks:
Now consider the same operation with a database which enforces no consistency, or does so rather lazily:
And let's not forget how confusing this is for users:
If you don't want a database, you can restrict your web app to a single thread and use flat files. For a lot of amatuers and personal web pages, this is perfectly fine. But don't call it a new kind of database: IBM was using flat files in the 60's. The reason why flat files were abandoned was because they didn't scale well and couldn't handle concurrency correctly. It is not a matter of *size* but of correctness.
The society for a thought-free internet welcomes you.
conveniently, all *nix systems come with a write only database.
Just pipe your data to /dev/null. I think you'll be impressed by the write speed!
[badum-ching]
A Human Right
I just wanted to give kudos to the submitter for linking to the print version of the article.
When I read 'InfoWorld' in the summary I was at first hesitant to click the link. And really, the original article spreads over 8 pages, contains a giant ad in the middle of what little text is shown on each page and even tries to open a popup.
I can't actually comment on the quality of the submission itself, as I haven't RTFA, but the quality of the link should serve as an example to everyone.
(USER WAS PUT ON PROBATION FOR THIS POST)
Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.
Unless you need more functionality than a flat file but less functionality than a relational database, and there aren't any key-value databases installed on the system you plan to deploy on. Or you don't need a relational database yet, but you might in the near future as you add features to meet customer demand. Then you might reach for the SQLite.
It's the same principle. Why are we still using POSIX and SUS when there is Plan9? Because the former is established and works just good enough. People know how to use it and they don't have to learn something new.. and there is a *huge* set of stuff built on the platform that is not available to the new one. So basically the new is a superior design but a fail in support.
It's the same with databases. RDBMS are not the best we can get in terms of design, but it is established and we have a bunch of tools and technicians who know how to deal with them. Newer concepts much lack the support.
An example is ZODB. It's neat. It incorporates ACID + Transparency + Undo + Pluggable Storages and you mostly get rid of the Billy Tables problem. Still you don't have the technicians who understand how to deal with it and you don't have the myriad of tools accompanying it.
Check out Berkeley DB. It's pretty much exactly what you're talking about, and it's on all the major OSes.
From TFA:
This extra layer of customizability is often quite useful. Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too.
So what they're saying is that if I need some of the "advanced" functionality offered by RDBMS, then it's not a problem because I can always roll out my own.
But why should I, if the result will just be a poorly implemented, underperforming RDBMS?
I hereby propose a new rule along the lines of Greenspun's Tenth Rule:
"Any sufficiently complicated program manipulating large amounts of data that does not use an RDBMS, contains an ad hoc, informally specified, bug-ridden, and slow implementation of RDBMS".
I can remember the day when all we used was flat files and writing multi-file merges in cobol.
We didn't have fancy things like b-trees or indexes, we just sorted the file!
I work in the Wang VS world, a type of system originally patterned after the IBM 360/370 but with an OS designed from the ground up to be interactive. We have multiple file types at the OS file system level... consecutive, indexed, object, print, relative, etc. Indexed files not only store data retrievable by a key, but by up to 17 keys. Unlike some juvenile "database" products that stored data in a .DAT file and indices in separate files, our indexed files contain a mini-db structure inside, with chains of data blocks, index blocks and free blocks, all managed by the file system. It's impossible for the various parts to get out of sync because they are all integrated within the indexed file.
We also have file compression at the OS file system level. Most file types except object can be tagged to be compressed and some are compressed by default. The OS file system uses machine instructions to compress before writing and expand after reading. It's completely transparent to the app code.
We also have PACE, a native 4GL / RDBMS that was developed by Wang in the mid-1980s and had referential integrity rules in the data dictionary and distributed database with two-phase commit, all from the beginning.
I used Oracle 5.1 from 1989 through 1992 and was shocked to learn that Oracle had no referential integrity at the time. What Oracle did was fake it by generating SQL*Forms triggers in their CASE tool. Heaven help anyone trying to build apps without the CASE tool or anyone touching any of the generated triggers.
I also recall reading of the struggles of the mainstream db vendors with distributed database technology and the eventual development and adoption of two-phase commit, many years after Wang had it as a standard feature in their clustered environments.
In 2004 I co-founded a company to virtualize the aging Wang VS. We have been very successful and are now the official source for all Wang VS systems and software. Our virtual Wang VS ranges up to 220% of the performance of the legacy high-end VS18950 released in 1999 and runs in Linux mostly on Dell PowerEdges. The high end supports 500-1000 users, not quite in the IBM mainframe arena but far, far easier to program, operate and use.
The original Wang VS80, released in 1977, supported up to 32 users and scores of devices in no more than 512KB of memory. Right... KiloBytes. Half a MegaByte. Later models grew to be much more capacious but try to imagine supporting 32 connected users running real apps and manipulating real data in half a MB of memory.
All of this reminds me of the horrible disconnect that occurred with the introduction of microcomputers. The folks who worked in the microcomputer field either didn't know about or ignored all the existing OS technologies and reinvented everything. PC users had to wait 10-15 years before MS discovered "pre-emptive multitasking," which was the rule in large systems, even in minicomputers, from the 1960s forward.
Microcomputers, while very enabling of individuals, actually took us backward in OS technology and caused us to have to live through a 10-15 year hiatus while the microcomputer engineers and OS developers rediscovered things that had been standard stuff in the mini and mainframe worlds.
Look at the bright side: there's always seppuku.
I get tired of hearing the same old discussion about whether or not the relational database is going to die. They're not. But the new breed of *specialized* databases work well for their *specialized* purposes. Big surprise. But all of them inevitably make a trade-off. Anyone who works seriously with database design knows that it's all about trade-offs.
One of the main motivations for the new breed of databases is that the standard SQL database relies on things such as foreign keys and other constraints for data consistency, but that requires the data to be directly managed by that running DBMS process. When you require data to be distributed over a network (i.e. over many separate processes), then the only way a *foreign key* can work is if the DBMS process has some sort of link over the network to the separate DBMS process and then use that somewhat as if it were local. (Other strategies involve using external application code for consistency rather than foreign keys, etc.) Of course, the DBMS process can't use it's usual local low-level optimizations behind-the-scenes in order to handle that query efficiently over the network, so it doesn't scale. Specialized DBMS's for distributed data focus on optimizing being distributed, while the typical SQL DBMS optimizes storage and retrieval of data as if it were local. The bottom line is that the traditional SQL database scales well vertically, but not horizontally concerning hardware. Or rather, when you scale horizontally, you forgo a lot of its advantages. The new breed of databases trade-off consistency and other assurances for the sake of "good enough" consistency and really fast retrieval of domain-specific data.
But not everyone is trying to be Google or Amazon. Financial institutions such as banks can't tolerate "good enough" consistency. The biggest problem with relational databases I see nowadays is that people are ignorant about why "relational" is such a good idea, and how SQL only gets you part of the way to "relational" and that SQL's shortcomings are a different issue. The second biggest problem is that most people are used to only one or two data usage patterns, and if it "works for them", then they assume it should *always* be done that way. For example, the hordes of people who barely know Excel (i.e. not a relational database) or Access, and then like to give "expert" advice. Or a web programmer that believes that ORM's are the One True Way because they abstract away choices of DBMS in order to keep favorite language X, despite the needs of other people are the opposite: perhaps we want to abstract away the choice of programming language so that we can keep the same database, and so maybe it's a good idea if the database itself can ensure data consistency rather than relying on the ORM, etc.
That doesn't seem like such a good general purpose solution. For a trivial application, it might work, especially if you place an enormous amount of logic into the application code, but I can foresee problems even then.
How do you deal with disk space wasted by fragmentation? If the "record ID" is essentially an offset, you can't defragment, especially if you want to do it live. That's not even mentioning internal fragmentation - most disk caches store large blocks (64KB or larger), so you're wasting, on average, 50% of your caching capacity because of the mismatch between block and record sizes.
What happens when you've pre-allocated, say, 1000 small blocks and 1000 large blocks, and it turns out you actually need 1001 large blocks? You may have 30% free space left in the small block section, but you can't use it! Creating a new file sounds expensive (has to be filled with a pattern!), whereas creating new files of arbitrary size is essentially constant time in most modern databases (they don't even ask the OS to fill them with a 0 pattern).
This also sounds like it can't handle out-of-order writes. This may be less of a problem now with battery-backed RAM caches on disk controllers, but it would have sucked a decade ago. Without an intent log, you have to perform every write in-order or risk corruption.
Actually, what happens if the program accidentally loses a block key? Would it... leak storage space? How would you reverse that if all the blocks are identical looking binary blobs?
Not to mention that you get the joy of re-inventing the wheel any time you want to do anything other than "retrieve by key". If you want to locate, say, a passenger by name across ALL flights in a day, you'd probably have to scan all records or write your own index or something.
But if you were really keen on using such a trivial system, implementing it wouldn't be that hard in any modern programming language. A few thousand lines of Java or C# ought to do it.
Feh. Mods have no humour nowadays. :^(
Journalism ain't for you.
is not so much working with these databases as a programmer. Given time, a programmer could always work out the data scheme. The trouble ensues when an Analyst tries to get at the data with a report writer and stumbles trying to get the data. A lot of commercial software which uses these embedded databases will include its own reporting tools to mitigate the issue though.
Let A={1, 2, 3} and B={4, 5}. Then the Cartesian product of A and B, denoted AxB, is {(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)}, that is, the set of all ordered pairs whose first coordinates are in the first set and whose second coordinates are in the second set.
True "eventually consistent" systems are quite difficult in general. Game designers struggle with this. A typical example is a distributed game in which A shoots at B. A's client knows where B was at the last update, but due to lag, is behind on knowing where the (authoritative) server says B is now. A's client has to decide whether A's shot at B hit B.
A typical trick is that A's client projects B's current position assuming B's user doesn't input a direction change, and computes a hit or miss on that basis in the client. The actions of A are also forwarded to the server, which makes the official decision on whether A's shot hit B, and that information is sent back to the clients of A and B, after transmission delay.
The trick is making the visuals work for this. One way to hide the problem is that when A's client computes that A's shot hit B, B is displayed as hit and staggering, but not falling. This buys time until the server update comes in to A's client. If the server says it was a hit, B is displayed in A's client falling down. If the server says it was a miss, B is displayed as A's client as staggering and recovering. This is an illusion created for user A to hide the lag.
Meanwhile, in B's client, B doesn't stagger at all if there's a miss, because, by the time B's client hears about the shot from the server, the hit/miss decision is known. So user A and user B see different things during the lag period, but come back into sync after the update.
Randy Farmer and Chip Morningstar invented this back in the 1980s for Lucasfilm's "Habitat", and called it "surreal time".
Web-based "eventually consistent" systems are usually much dumber than this. Most are more like "becomes consistent after the user manually reloads the page a few times". Distributed cache consistency can be done efficiently (every shared memory multiprocessor CPU does it), but modern cache interlocking technology never seems to have made it to web caches. There really should be little cache-invalidation messages pushed around between the servers in a big web farm, but there usually aren't.
I've been kicking around the idea of dynamic relational for a couple of years now. We have dynamic application languages, so why not dynamic databases? The "static" and the dynamic kind serve different needs and can coexist. Why should DB's be any different?
Table-ized A.I.
If you aren't building temp tables you aren't even straining the query engine.
Slacker.
Make your DBA cry. Submit endless long running queries then complain about the server being slow.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Google's BigFile storage system is quite similar in design to this.
Check out the relevant papers in Google Labs.
As a DB admin myself, I find these "Us vs Them" arguments to be ultimately pointless. A company will choose a database based on the application's needs. If "immediate consistency" is needed they will choose a standard relational database. If "eventual consistency" is acceptable, the company may opt for one of the other "not-so-relational" databases. The fact that there are other options is actually a good thing. The "old guard" needs to find the positives and embrace change, or run the risk of being left behind in an evolving world of technology.
A conversation a few years ago between myself, immediately upon my arrival at the office, and my already present friend and co-worker:
Me: We're old.
Friend: What?
Me: We're old.
Friend: What are you talking about?
Me: I was jockeying radio stations on the way here, trying to find something I liked.
Friend: So?
Me: I finally found something on the classic rock station.
Friend: So what? I like some classic rock, too - doesn't make me old.
Me: It was "Shock the Monkey".
Friend: (pause) Crap, we're old.
- T
We generally use a simple flat file as an index. Field 1 is a sorted index field (say a flight number), field 2 is the key to a freespace record. A simple binary search is fast even on a large sorted list.
No, it isn't useful for certain types of applications. Relational databases exist for a reason. :-) But if you have to store something for a well-known static set of fields (say weather stations or flights in an OAG schedule file), something a lot simpler isn't a bad method at all.
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
Design issue. Optimally you use this sort of thing in places where you will always have an easily-available record number. It isn't a replacement for a relational database in places where relations are nice. :-)
In the airline context in which I work, you tend to see sorted indexes containing keys which are accessed by things like IATA station code (e.g., MSP or ATL), or airline code, or flight-date-origin (e.g., NW1492-24-MSP, which provide unique reference items which are easy to sort.
We also sometimes maintain a relational database for searching/reporting purposes on another lower-usage box just for reporting purposes. That frees the freespace file on the primary box to do it's thing quickly, and transactions against that fast database are also split off and inserted as rows against the relational database.
That allows for the speed of a freespace file in production while also giving reporting/query capabilities for past history ... and in a way which doesn't impact production.
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
I think you'd be surprised at the complexity of the applications which use such file structures. :-) When the data itself is simple to store, the complexity of the application isn't really relevant.
Disk space on the mainframe isn't my problem as an applications programmer, and it doesn't seem to be an issue for a freespace file. Continuous space for each record is allocated up front when the file is created, and you are reading/writing data as fixed record sizes. A freespace record is ALWAYS a fixed multiple of a disk allocation size to prevent allocation issues. In our case, I don't actually know what the hardware does (and as an applications programmer that isn't a problem I care about), but logically we're taught that disk is allocated in 28-word sectors. On an OS2200 mainframe, freespace records are multiples of 112 words, and this is somehow tied to the way that operating system allocates data on disk.
What happens when you've pre-allocated, say, 1000 small blocks and 1000 large blocks, and it turns out you actually need 1001 large blocks?
Not an issue with a well-designed database. And historically it hasn't been. Keep in mind that I've been working with this file format almost continually since 1988, so I have plenty of experience seeing it in use in production transaction systems. For well-defined datasets, that isn't an issue.
Out of order writes are not an application issue. All the application cares about is that it can read and write data to a logical record. The underlying record handler has to worry about the specifics of getting that data to and from the disk.
I've never seen a program lose a key, but in our case the base freespace record management system has routines which run periodically against the database to find orphans, etc.
Something like scanning for a passenger name across all flights would be an interesting problem, but I wouldn't btoher to do that in freespace. I'd have a relational database off to the side and populated in parallel for that sort of query. Freespace is about *speed*, remember.
I don't think it would be hard to implement at all. It just takes time, something I can't spend (at work) writing code that isn't directly related to our paying customer needs. But I'm considering it for a home project. Probably in C. We'll see. :-)
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
I think I need to check this one out as well as the Berkeley Database mentioned above.
Thank you. :-)
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.