"Slacker DBs" vs. Old-Guard DBs

slashdot insult? :( by FlashBuster3000 · 2009-03-24 06:34 · Score: 5, Funny

FTA: "The world won't end if some snarky, anonymous comment on Slashdot disappears."
What? Nothing more important than anonymous slashdot trolls to moderate :/

Re:slashdot insult? :( by Anonymous Coward · 2009-03-24 07:41 · Score: 0, Insightful

Well, fuck them very much!
--AC

*mods article -1, Flamebait* by TheSpoom · 2009-03-24 06:34 · Score: 5, Insightful

Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?

I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

Re:*mods article -1, Flamebait* by Anonymous Coward · 2009-03-24 06:58 · Score: 0

lol, insult to injury with the flamebait tag. sorry man.
Re:*mods article -1, Flamebait* by Just+Some+Guy · 2009-03-24 06:59 · Score: 1

I read it exactly the other way, that they were slagging on the newcomers in favor of us old fogies (PostgreSQL FTW!).

--
Dewey, what part of this looks like authorities should be involved?
Re:*mods article -1, Flamebait* by Hognoxious · 2009-03-24 07:00 · Score: 1

It did. Are you one of those fossilised old farts who insists on using a remote control as a remote control?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:*mods article -1, Flamebait* by teknopurge · 2009-03-24 07:42 · Score: 1

Cloud Zealots(google employees, amazon EC2 developers, Wall St. analysts that are trying to attach their name to the "Next Big Thing", etc.) tend to have an insulting tone to their rhetoric.

--
Website Hosting
Re:*mods article -1, Flamebait* by peterwayner · 2009-03-24 07:43 · Score: 1

Phew--- At least I know that I was reasonably balanced. I guess it depends upon how you feel about the intensity of insult buried inside the use of words like "slacker", "codger" etc. If they're all about the same level, well, I've threaded the needle.
Re:*mods article -1, Flamebait* by Just+Some+Guy · 2009-03-24 07:52 · Score: 3, Funny

You seem to imply there was more to the story than the summary. This confuses me.

--
Dewey, what part of this looks like authorities should be involved?
Re:*mods article -1, Flamebait* by Anonymous Coward · 2009-03-24 08:06 · Score: 0

Ha, didn't think the writer would be on here. (Though I should really expect that nowadays... practically all of the tech world is.)
That was just the vibe I got when I started reading the article. Understandable if you didn't mean it that way, and your conclusion does provide a little more balance; I just think early on, it's slightly unbalanced.
Re:*mods article -1, Flamebait* by peterwayner · 2009-03-24 08:13 · Score: 1

It's perfectly fine to say how it reads to you. If I could be all things to all people, oh, the things I could accomplish.
Re:*mods article -1, Flamebait* by peterwayner · 2009-03-24 08:14 · Score: 1

Ha. RTFA where F=fantastic or fantabulous,
Re:*mods article -1, Flamebait* by Anonymous Coward · 2009-03-24 08:19 · Score: 0

Who is the moron who modded this "flamebait"? His point is a valid one.
Re:*mods article -1, Flamebait* by Sarusa · 2009-03-24 08:26 · Score: 1

It went out of its way to be irreverent to everyone: 'The new twerps really get those codgers steamed when they talk about how all of the computers in the cluster will get around to replicating the data' is a playful slap at both sides. And just like the PS3 Fanboys/XBots, if you identify too much with one of the sides you will nod your head knowingly at one and gasp and fan yourself as you suffer an attack of the vapours of offense at the other.
Re:*mods article -1, Flamebait* by turing_m · 2009-03-24 09:43 · Score: 1

Dvorak much?

--
If I have seen further it is by stealing the Intellectual Property of giants.
Re:*mods article -1, Flamebait* by fractoid · 2009-03-24 13:07 · Score: 1

WTF's wrong with these people anyway? This "cloud" thing has been used for decades, and it has a name. Us people who use it daily call it "the internet".

You know, online storage (like webmail) that you can access anywhere? Online services that you can use to perform common tasks? Communicating with other "internet" users? Really, I'm struggling to see what "cloud computing" gives us except another buzzword to describe what we already have. I'm looking forward to stuffing it firmly in a box along with "information superhighway" and "web 2.0".

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:*mods article -1, Flamebait* by Dreen · 2009-03-24 14:51 · Score: 1

Same here. Just quit being so god-damned elitist idiot; its technology not lifestyle habits or art. The only thing that counts is getting the job done.
Re:*mods article -1, Flamebait* by bsdaemonaut · 2009-03-24 23:57 · Score: 1

I don't agree that a DB that breaks with the "traditional" RDBMS model is necessarily "slack." There are plenty of non-RDBMS out there that are fully ACID compliant. Have your cake and eat it too.
Re:*mods article -1, Flamebait* by Raenex · 2009-03-25 00:21 · Score: 1

I'd rather eat my cake and have it too, then I can have seconds.
Re:*mods article -1, Flamebait* by Anonymous Coward · 2009-03-25 14:45 · Score: 0

Yeah, it was pretty insulting. And completely ignorant of what normalization is all about.
There's a place for "slacker" databases - Ray Ozzie figured that out in the 80's when he invented Lotus Notes. Put the same database model in the cloud and it's big news. Yawn.

Normalization doesn't exist to save disk space by qoncept · 2009-03-24 06:35 · Score: 5, Insightful

Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...

You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.

--
Whale

Re:Normalization doesn't exist to save disk space by TheSpoom · 2009-03-24 06:37 · Score: 1

Couldn't you index a char just as easily as you could an int? Or are you saying their status codes are strings?

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Re:Normalization doesn't exist to save disk space by prozaker · 2009-03-24 06:40 · Score: 1

i once had a project once that wasn't normalized and on top of that was written in access, and vba code for forms. high level shit.
Re:Normalization doesn't exist to save disk space by qoncept · 2009-03-24 06:42 · Score: 2, Informative

"CHAR(50)"

Oracle doesn't have a "string" datatype.

--
Whale
Re:Normalization doesn't exist to save disk space by TheSpoom · 2009-03-24 06:48 · Score: 2, Informative

Ah, my apologies. Really, it should be an indexed enum (or whatever Oracle equivalent there is... it's been a while since I used it) if there's no additional data to go along with the status code... or another table if there is additional data.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Re:Normalization doesn't exist to save disk space by qoncept · 2009-03-24 06:53 · Score: 2, Interesting

My point exactly. :) There are a lot of things are data warehouse should be and it's not. We're working on redesigning it now though so we should be resolving a lot of the issues. But most people aren't just about to redesign their databases because it's a huge deal. We have 8 different apps using the warehouse, hundreds of reports and people hitting it we don't even know about that will all be obsolete. The cost to redesign is huge, and we only have the opportunity now because a project it is dependent on is being redesigned.

--
Whale
Re:Normalization doesn't exist to save disk space by Hognoxious · 2009-03-24 07:06 · Score: 5, Funny

You don't want to store the same data in multiple places.

But if one of them is wrong, you can check the others and correct it.
My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Normalization doesn't exist to save disk space by mooingyak · 2009-03-24 07:12 · Score: 4, Funny

But if one of them is wrong, you can check the others and correct it.
My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".
I was about to post something explaining to you why that's bad, and then I reread your post and the whooshing noise around me quieted down.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Normalization doesn't exist to save disk space by Abcd1234 · 2009-03-24 07:20 · Score: 1

And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.
What the hell does that have to do with schema normalization? Normalization has to do with how you architect your tables and relations. The types you use for the columns, and how you standardize their values, is an entirely different, though somewhat related, discussion.
And as it happens, in a data warehouse application, it's not at all unusual to denormalize tables, in order to speed up query performance during data mining operations.
Re:Normalization doesn't exist to save disk space by Hognoxious · 2009-03-24 07:23 · Score: 1

To be fair, I should have quoted the first bit. Maybe that's why I never made it to "lead senior senior lead developer".

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Normalization doesn't exist to save disk space by qoncept · 2009-03-24 07:25 · Score: 4, Informative

Right, and, boss, which one is right?

People that haven't done it don't realize how easy it is to end up in that situation. Say, I write reports about people, and Robin writes reports about assets, whose owners are people, and puts a person's name in her table to make it faster. Someone gets married, their name changes, and now Robin's reports are wrong.

--
Whale
Re:Normalization doesn't exist to save disk space by Hognoxious · 2009-03-24 07:30 · Score: 1

He probably means standardised rather than normalised, but I'm guessing it's pretty hard doing joins where the common field has different values meaning the same thing, or same values meaning different things.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Normalization doesn't exist to save disk space by Abcd1234 · 2009-03-24 07:35 · Score: 1

If that's true, then his comments have nothing to do with the quoted section about how, "Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...". Because that comment is *specifically* about schema normalization in the formal sense.
Re:Normalization doesn't exist to save disk space by rackserverdeals · 2009-03-24 07:51 · Score: 1

You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

I wonder if normalization is going to be less important as de-duplication gets integrated into more file systems.
It's going to be a part of ZFS later this year I think. It looks like it's going to be a block level implementation within ZFS rather than file level. Since ZFS uses a copy on write model it seems fairly easy for them to implement compared to other file systems.
Databases that work on top of existing filesystems could benefit from this whereas databases that use block level addressing (Oracle) may not, but they might be dealing with this in other ways.
It kinda makes sense. Build your DB for the best performance and let the underlying file system handle normalization.
I don't know enough about the tech to determine if this would be an applicable usage but I don't see why not.

--
Dual Opteron < $600
Re:Normalization doesn't exist to save disk space by jahudabudy · 2009-03-24 08:16 · Score: 1

What the hell does that have to do with schema normalization? Normalization has to do with how you architect your tables and relations.

Right. And say I have two lookup tables for, say, county codes. One has county name and FIPS code, one has county name and internal county code. This is denormalization of data (braindead denormalization, if you ask me. which they didn't). And say one of them spells county name one way, and the other another. This is destandardization. However, it is only possible b/c of the denormalization.

--
...sometimes, in order to hurt someone very badly, you have to tell that person terrible lies. - PA
Re:Normalization doesn't exist to save disk space by profplump · 2009-03-24 08:34 · Score: 1

Then all you have to do is make sure that all your duplicated data is exactly 1 disk block in length. That's got to be easier than just not storing it twice in the first place, or adding more disks.
Re:Normalization doesn't exist to save disk space by Lord+Ender · 2009-03-24 08:45 · Score: 1

A short char field isn't necessarily slower than an integer, though. Right? They could both be indexed with log(n) search time.

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Re:Normalization doesn't exist to save disk space by Abcd1234 · 2009-03-24 08:53 · Score: 1

Yeah, but as you yourself point out, this isn't specifically because the tables are denormalized. It's because they're denormalized *badly*.
Denormalization, itself, does not equate to poor performance. In fact, quite the opposite is true, when it's done properly and thoughtfully. Which brings us back to the original quote: "Now that disk space is so cheap and many of the data models don't benefit as much from normalization" is *absolutely* true. Does it have to be done carefully? Yes. Is denormalizing a schema a universally good idea? No, of course not. But these days, given the cheapness of storage, if your game is large-scale read-only or minimal write databases where you're performing a lot of data mining, denormalization can be a very useful tool. It just needs to be done properly.
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 09:14 · Score: 0

How can you assume data integrity is going to suck by storing data in multiple places? I don't even get the basic logic behind your reasoning.
Having data stored in multiple places somehow affects data integrity negatively?
If data is mined or input from a central location, then replicated (or sent to multiple locations to begin with) data integrity is IMPROVED. Queries are IMPROVED. Redundancy is IMPROVED.
This was modded +5? Seriously, I am beginning to think that there isn't a single person on /. that knows what he is talking about or even doing.
I'm getting tired of pseudo-techs giving pseudo-advice.
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 09:16 · Score: 0

Luckily if she gets divorced she gets all the assets and this would not be a problem.
Re:Normalization doesn't exist to save disk space by PseudoIdiot · 2009-03-24 09:29 · Score: 2, Informative

Why in the world would you allow access to a line in a database while simultaneously allowing access to another DB with the same line of data? This is easily disallowed by properly using reference ID's, which should have been implemented during the conceptualization of the DB at the very beginning but could still easily be attached to the entire DB. Don't allow data edits without first locking that line in the database, cross-referencing the reference ID, and preventing that same reference ID, regardless of which other DB is exists on) to be modified. If you don't know how to accomplish this with Oracle, or even SQL, you do not have any business touching the database to begin with. Based on your answers, and how ridiculously they've been modded, goncept you barely understand databasing at all.
Re:Normalization doesn't exist to save disk space by NotVeryOriginal · 2009-03-24 09:37 · Score: 1

I guess it all depends on what he means by "scanning char fields". What I imagined was a single field called Status with many values in it:

"status1, status2, status3"

Obviously a case of a denormalization, though it wouldn't be too bad if you didn't have to search based on these values as they would return quickly. Still, makes me a little queasy just to look at it.

If he's talking about a column with a single char value vs a column with a single numerical value, then I don't see his point either.
Re:Normalization doesn't exist to save disk space by Tupper · 2009-03-24 09:42 · Score: 1

The meta-rule in computer science is "Once And Only Once". In the database world this is called Normalization.
Normalization holds that there should be exactly one definitive place for each bit of data. For example, customer's name should be in her row in the Customer table. It should not be in 5 different tables and in the Orders table once per order. If it is in 5 different tables sooner or later they will get out of sync. That is bad. If the database is large and important it is very bad indeed.
Backups, datawarehouse etc don't count here as definitive places, because while there may circumstances that will make them definitive, in the normal course of events they are not.
Denormalization is sometimes done to improve speed, but it is dangerous.
Caches and replication can also be dangerous if they can get out of sync, or worse, if they muddy the concept of what is the definitive version of the data.
Re:Normalization doesn't exist to save disk space by Hognoxious · 2009-03-24 09:48 · Score: 2, Funny

Total failure to understand the situation. You, I mean he, didn't understand the concept of "factoring out" common information - say, the customer details on an order - from the variable per item data - product code, quantity.
What he, er, you appear to be talking about is natural vs surrogate keys.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 10:14 · Score: 0

Not necessarily.
If there's a timestamp associated with the now stale data, it's arguably more correct to use the old name.
Names change for valid reasons. UIDs, PIDs etc. don't.
Re:Normalization doesn't exist to save disk space by tahpot · 2009-03-24 10:18 · Score: 1

And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.
That's a problem that can easily be fixed. Replace the char fields with the int's you so crave, maybe adding any missing indexes (indices?)
Do it once and enjoy the results (and speed) forever.
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 10:42 · Score: 0

Use a function-based index if your database supports it.
Re:Normalization doesn't exist to save disk space by leperkuhn · 2009-03-24 11:34 · Score: 1

As someone that's worked on a few decent sized projects (2-3 million pageviews / day), I can say that a normalized database is a complete nightmare to try to work with. It just doesn't scale. I'm not sure what constitutes a "huge" data warehouse, however, if you're running a SELECT with a where clause that triggers a table scan against more than a million rows, yeah, it'll take a while, no matter if you're selecting against an integer or a char field.

--
http://www.rustyrazorblade.com
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 13:54 · Score: 0

>>Right, and, boss, which one is right?
I had a "programmer" that came across a problem like this. He arbitrarily changed one of the values so they matched (in a transaction that was specified as being "read only" on the DB). I asked him that question.
He said "this one."
"How do you know?"
"That's the one the program read first!"
I tried to fire him, but senior management had a hiring freeze on and he couldn't be replaced. He was shuffled from one project to another till he was lost in a basement office and quit.
Re:Normalization doesn't exist to save disk space by thethibs · 2009-03-24 14:47 · Score: 1

Odd. In my lexicon, a data warehouse is a read-only copy of the production database that's been denormalized to streamline queries. Why would anyone design one to slow down queries?

--
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
Re:Normalization doesn't exist to save disk space by petermgreen · 2009-03-24 15:45 · Score: 2, Insightful

and in the Orders table once per order.
I'd disagree on this one, it seems to me like it would be a good idea to record the customers name and address (probablly both billing and shipping) at the time of an order even if they later change the details on thier account.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Normalization doesn't exist to save disk space by Tupper · 2009-03-24 16:26 · Score: 2, Informative

I'll buy that: the name on a mailing label or shipment may be a different concept than the primary name associated with the account. If so, its wrong to conflate them.
Re:Normalization doesn't exist to save disk space by Anonymous Coward · 2009-03-24 16:26 · Score: 0

Denormalizing a data warehouse is reasonable performance tuning if you're starting with sane inputs. The real problem with these "slacker DBs" is they lure you into starting out with a system of record that's denormalized and has no constraints, which means every mistake anyone makes fucks you over permanently.
Re:Normalization doesn't exist to save disk space by fractoid · 2009-03-24 16:40 · Score: 1

He was shuffled from one project to another till he was lost in a basement office and quit.
Did he then burn the place down?

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Normalization doesn't exist to save disk space by einhverfr · 2009-03-24 17:09 · Score: 1

I agree but would add a point.
The big issue here is what apps have access to your data. If you are just using the db to store data and retrieve it for a single app, it doesn't make a difference what db you are using and in fact non-relational db's have a number of advantages over relational ones.
The big change happens when you shift from thinking of data as something which is stored in order to give it persistance within the program and view it instead as something which needs to be management and leveraged in different ways across programs. In this case, the only way to go is with a relational db. As soon as your data has value independent of your application, you need to go with a relational design, and a good one (POstgreSQL over MySQL etc).

--

LedgerSMB: Open source Accounting/ERP
Re:Normalization doesn't exist to save disk space by fractoid · 2009-03-24 17:56 · Score: 1

I'm not a database specialist (I only work with them, and tend to have a programmerly point of view on them) but it seems to me that you should have two 'layers' of database for very large, performance-sensitive, usually-read-only data. You have the raw data, which is what you write to. This one's all nicely normalised and so forth. Then you have regular updates that write to a second database which is denormalised but lets data be recalled easily in useful arrangements.

An old job of mine involved working with an Oracle database that was a wonder of normalisation. No data was ever stored twice, barring the 80% of the data that were (often multi-part) foreign keys. Any useful query had to join at least half a dozen tables, and most of the larger queries joined 20 or more different tables by long chains of foreign-key values. To get a manufacturer's name and stocked store names for an item ID, for instance, you'd have to do:

Select Manufacturer.Name, Stores.Name from
Items, ItemTemplates, Manufacturers, ManufStores, Stores, StoreTemplates
Where
Items.TemplateID = ItemTemplates.ID AND
ItemTemplates.ManufID = Manufacturers.ID AND
ManufStores.StoreID = Stores.ID AND
ManufStores.ManufID = Manufacturer.ID AND
Stores.TemplateID = StoreTemplates.ID AND
Items.ID = %itemID%

Almost all queries used the 'distinct' keyword to prevent duplicate returns, because so many of the relationships between tables were 1:Many that large joins invariably turned into Many:Many. It had been like this for years and a certain person who was now the division manager was once upon a time the DBA who designed the database, so refactoring was verboten.

Most pages on the web front end took at least 10 seconds to load over a LAN. It was a valuable lesson for me, and taught me that simply normalising your data representation isn't enough to make a database schema non-god-awful.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Normalization doesn't exist to save disk space by fractoid · 2009-03-24 17:59 · Score: 1

The meta-rule in computer science is "Once And Only Once".
That's funny, I thought the rule was "Less Than Three". If you do something once, you just do it. If you do it twice, then you probably make a utility function to do it. If you do it three times (no longer Less Than Three) then you build a generic module/library/object/package to solve the problem, because you'll be seeing it again.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Normalization doesn't exist to save disk space by angel'o'sphere · 2009-03-24 18:10 · Score: 1

Don't allow data edits without first locking that line in the database, cross-referencing the reference ID, and preventing that same reference ID, regardless of which other DB is exists on) to be modified.

So, I could now start explaining why this is not very easy ... but I'm lazy now.
I propose you implement it, and you are the next nobel price winner ^^ (if that would exist in computer science). Anywy, just implement and patent it and you are a multi billionaire.
If you would think "one single day" about how you can implement that you would realize very soon that your claim is utter nonsense, sorry.
angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Normalization doesn't exist to save disk space by Vintermann · 2009-03-24 20:23 · Score: 1

Once, you just do it. Twice, you grumble, and do it. Three times, and you start writing the universal mega-library/framework that solves the problem once and for all, and gives everyone a pony.
Yeah, sounds about right.

--
xkcd is not in the sudoers file. This incident will be reported.
Re:Normalization doesn't exist to save disk space by timothy_haak · 2009-03-24 21:40 · Score: 1

I agree. We hard drive reads are more of a bottle neck than about anything else. So the less data we have to pull off of the hard drive to perform a query. The quicker it is going to be. Which is where normalisation comes in. As for having multiple copies allows you to go back and find out which is correct. Sometimes they are both correct. Which then requires a manual intervention to merge the data :) Which could have been fixed by the scecond set of changes updating the first set of changes. Though must say the find the fact that the the article say's randomly loosing data is ok :) rather scary. Sure there are certain transactions you don't care wether it works or not but there are others where is matters alot. To use his example of loosing once cent. Sure not a majour problem. But if that one cent was your paycheck going in at the end of the month you would care alot.
Re:Normalization doesn't exist to save disk space by odourpreventer · 2009-03-25 02:41 · Score: 1

> Your query might run faster
Yes, a few milliseconds at most in 99.9% of all cases. Besides, what bugs me is the line
> 'db' onto a 'pile of code that breaks with the traditional relational model'
If the database doesn't have "relational" in its name, then yes, you should not assume that it adheres to the "traditional relational model", issues with MySQL notwithstanding.

--
Adventure, Romance, MAD SCIENCE!
Re:Normalization doesn't exist to save disk space by jadavis · 2009-03-25 05:15 · Score: 1

huge data warehouse that hasn't normalized status codes
What do status codes have to do with database normalization? Database normalization begins with the assumption that you've already decided upon all the constraints your data must satisfy, including type constraints.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Normalization doesn't exist to save disk space by jadavis · 2009-03-25 05:30 · Score: 1

How can you assume data integrity is going to suck by storing data in multiple places?
You're right, but I think there's some confusion here still.
Normalization is about organizing base relation variables so that many common constraints (e.g. functional dependencies) can be enforced within an individual relation variable via keys.
This helps avoid "update anomalies" where, in order to maintain a consistent database, you must update several relation variables at once. Update anomalies can compromise the logical integrity of a database if all the constraints are not still strictly enforced by some other means (e.g. a triggered procedure and careful locking).
You can physically store the data as many places as you want, to ensure physical integrity. That makes perfect sense.
So physical integrity is orthogonal to logical consistency. Storage is entirely physical, while normalization (and other forms of organization) are entirely logical.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Normalization doesn't exist to save disk space by DuckDodgers · 2009-03-25 06:43 · Score: 1

Your example query shouldn't be that bad, especially if you have indexes and use table partitions for large volumes of data. But even my worst queries (and I had some awful ones as I climbed the SQL learning curve) never joined more than a dozen tables. I can't imagine 20.

Mod this down by Anonymous Coward · 2009-03-24 06:35 · Score: 5, Funny

Like the article says, "The world won't end if some snarky, anonymous comment on Slashdot disappears."

Re:Mod this down by shutdown+-p+now · 2009-03-24 10:51 · Score: 1

In fact, the article misses the point there. Comment disappearing is not a problem. But when one query reports that comment is there, and another fails to retrieve its contents - now that may be a problem. And that's what you're going to get if you don't care about data consistency.

moral of the story by Em+Emalb · 2009-03-24 06:35 · Score: 1

even technical folks sometimes don't like change.

Also, calling people who've worked on DBs for a long time codgers and younger DBAs "twerps" is stupid.

Haha, yeah, tongue in cheek. I get it.

Still lame.

Now get off my patch of closely-cropped ground cover, you callous jerks. ;D

--
Sent from your iPad.

Re:moral of the story by Anonymous Coward · 2009-03-24 06:59 · Score: 1, Interesting

I think the question is "to what end will this change benefit anyone".
I view these new "ideas" on data storage and retrieval as a dumbing down of DBs the way higher level languages have dumbed down programming. On the one hand, it's much, much nicer to be able to whip together a working PHP app in a day than it is to have to constantly comb C code to make sure every little exception has been handled and every little bit of data checked. On the other hand, I don't feel quite the same way as I'm constantly zapping little bugs introduced by laziness in type checking or data validation.
It's all well and good that this allows people with less knowledge of the field to put together "good enough" applications, but sometimes I wonder if we're really that much more productive, or if we just shifted all our workload from building stable apps to constantly maintaining buggy ones.
Re:moral of the story by Mr.+Slippery · 2009-03-24 07:18 · Score: 1

even technical folks sometimes don't like change.

There are two types of fool. One who says, "this is old and therefore good", and one who says "this is new and therefore better".
Change for its own sake is great when you're dealing with your air or water supplies, but is a lousy problem-solving strategy. These "slacker DBs" don t seem to introduce anything new. They're keyed value storage, dbm with glitter and racing stripes glued on.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:moral of the story by Ninnle+Labs,+LLC · 2009-03-24 07:19 · Score: 1

I view these new "ideas" on data storage and retrieval as a dumbing down of DBs the way higher level languages have dumbed down programming.
Yeah, seriously. All you noobs who don't write in machine code should get the fuck off my lawn.
Re:moral of the story by Anonymous Coward · 2009-03-24 07:48 · Score: 0

Was that a joke or did you knot your panties up by not bothering to read beyond the first sentence?
I can't tell with this place sometimes...
Re:moral of the story by Ninnle+Labs,+LLC · 2009-03-24 07:59 · Score: 1

No, I just found it amusing how you berate "high-level languages" and then go on about using C as if it isn't a high-level language.

Laziness Rules by ergo98 · 2009-03-24 06:37 · Score: 5, Insightful

Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.

It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.

Re:Laziness Rules by Samschnooks · 2009-03-24 06:51 · Score: 3, Interesting

... and rather than learning it's just as easy to just wave it all off as obsolete.
I don't know about that. But maybe these slacker DBs are perfect for what they're doing? Glancing at the those mentioned in the FA, it just looks like their simple tools to do simple things.
Don't get me wrong. I once had the pleasure of working with an Oracle god. This dude was about to take his final Oracle exam in a series of exams and he turned my Join that took ten seconds into a Join that took less than a thousandth. I have no idea what he did to this day, but it took several lines of PL/SQL. We were dealing with tens of millions of rows that had to be processed every night.
My point is if it's something simple to do, why all the RDBM overhead? Many times, just a simple flatfile is all you need and maybe a little more.
Re:Laziness Rules by KagatoLNX · 2009-03-24 07:01 · Score: 3, Interesting

In the end, the problem is that people just want a "default tool". They don't want to think about their requirements for data consistency. The really scary bit is that while RDBMses are the "default tool" of yesterday and slacker DBs are the "default tool" of tomorrow, neither of them are really the "problem".
The "default tool" attitude IS the problem. Unless you carefully weigh your data consistency requirements, you shouldn't be making that call at all.
I welcome the slackers and all of their new options along the spectrum of speed versus consistency. It's just that most of the people developing applications scare the shit out of me. They're so cavalier (or should I say, "agile", or maybe "pragmatic") about requirements that it's truly disturbing.
That said, if you're really interested in all of the options, I also recommend checking out memcachedb, memcacheq, and redis.

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Laziness Rules by phoenix321 · 2009-03-24 07:02 · Score: 2, Insightful

Problem is, you're re-inventing the wheel several times over in the process. Hint: "a flatfile and maybe a little more" could very well be all the storage technology invented today only a few years down the road.
At first, all you need is to store key:value pairs. That works with a flat file or with Oracle. Then you need some consistency checks, which are can be modelled fast in Oracle or reasonably fast in your software. Then you need some triggers, which could be written fast in Oracle and not-so fast in your software. And so on until you have progressed through the whole platform effect with several squeaky wheels invented and thousands of hours wasted.
Any project worth doing that involves storing key:value pairs is worth a real database. Take the tiniest, lowliest member of the crowd as long as it can somehow speak SQL and allows to be linked and unlinked into the project. Everything else will require at least a medium rewrite at some point when you switch over to a real database. You could of course extend everything upon a glorified flatfile until your reinvented wheels strangles all your progress.
Re:Laziness Rules by metalhed77 · 2009-03-24 07:02 · Score: 4, Informative

Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...

--
Photos.
Re:Laziness Rules by ergo98 · 2009-03-24 07:11 · Score: 3, Interesting

I'm just going on the statements he made about his own (lack of) knowledge in this video.
Re:Laziness Rules by Ambiguous+Puzuma · 2009-03-24 07:12 · Score: 4, Insightful

If you want "a little more" than a simple flat file, perhaps SQLite is the answer? The people on the Firefox team seem to think so, for example.
SQLite has been a pleasure to use for a small personal project involving a few Perl scripts. Granted my background is with SQL Server and Oracle, so perhaps I'm not the target audience, but I found it extremely easy to use and surprisingly efficient--and I didn't need to set up a server or anything. I didn't even need to explicitly create a database!
Re:Laziness Rules by jellomizer · 2009-03-24 07:13 · Score: 1

For the most part the Overhead of running a Real DB is usually made up over time.
Small Apps tend to grow to big ones over time. Having Babbie Databases can become a stubmling block to your application. As well as for the organization. They may want to wharehouse your application data for making better business decisions and integration across apps. So you little 1 million record database will need to be integrated into a billion/trillion record database.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:Laziness Rules by Anonymous Coward · 2009-03-24 07:19 · Score: 5, Funny

Thanks for validating the OP comments....
Re:Laziness Rules by sl0ppy · 2009-03-24 07:21 · Score: 2, Informative

first some context. i architect data warehouses for a living. i also live in a world of building fairly specialized frameworks to deal with data warehouses architected as star and snowflake schemas. i tend spend quite a lot of time in pseudo-relational databases that don't fully implement codd's rules.
for fun, i like to spend some time toying with couchdb, using it for loose data warehousing, extending it, and generally enjoying the application development freedom it gives me.
that said, let me respond to some of your points:

Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.
map/reduce solves a specific problem in data warehousing - column based lookups given specific rules, able to be broken down into atomics and performed in massive parallel. this allows for very cheap horizontal scaling over a large dataset.

It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project.
this just shows ignorance. even just a cursory scan of damien's resume says otherwise.
Re:Laziness Rules by diamondsw · 2009-03-24 07:25 · Score: 4, Insightful

>Damien Katz, CouchDB's creator ... worked on Lotus Notes prior to that...
That's not exactly a ringing endorsement.

--
I don't know what kind of crack I was on, but I suspect it was decaf.
Re:Laziness Rules by AkiraRoberts · 2009-03-24 07:25 · Score: 1

Because many times, you start off needing a simple flat file and then you need a little more and a little more and a little more and suddenly that simple flatfile is the foundation of a steaming mound of crap and your life is horrible. I'd rather think things through a bit more at the start. Yes, sometimes that simple flat file might do the trick. But sometimes you might want to put together something that, right in the moment, will seem like a bit much and then will prove, 2 years from now, exactly what was needed.

--
words, words, words, lemur, words, words words
Re:Laziness Rules by morgan_greywolf · 2009-03-24 07:26 · Score: 1

Well, as ergo98 said, he hadn't a clue about databases when he began his project.
*ducking*

--
My blog
Re:Laziness Rules by 0xdeadbeef · 2009-03-24 07:27 · Score: 1

It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.
It's funny how some people react by attributing ignorance to others when confronted with things they themselves don't understand.
Re:Laziness Rules by Anonymous Coward · 2009-03-24 07:37 · Score: 0

MySQL. Lotus Notes. I rest my case.
Re:Laziness Rules by ergo98 · 2009-03-24 07:45 · Score: 1

map/reduce solves a specific problem in data warehousing - column based lookups given specific rules, able to be broken down into atomics and performed in massive parallel. this allows for very cheap horizontal scaling over a large dataset.
That's great. No one said they had no use. Do you think you're refuting something? You aren't. Linking to Wikipedia doesn't really make you authoritative, as an aside.

this just shows ignorance. even just a cursory scan of damien's resume [209.85.173.132] says otherwise.
Ignorance indeed. Again, as stated otherwise, I was basically quoting from he himself said in a presentation that he gave. But to humor your point, both a cursory and a intensive look at that resume gives me zero hints that he has any credible database knowledge prior to CouchDB. Do you mind guiding where it says otherwise?
Re:Laziness Rules by 0xdeadbeef · 2009-03-24 07:45 · Score: 1

Nevermind, I watched part of that video. Yikes!
But that underscores the validity of these schema-less database systems. Due to the technical depth required to use relational databases effectively, many people are better off not using them at all. They will get farther faster and with fewer bugs with something that is flexible and forgiving.
Re:Laziness Rules by sl0ppy · 2009-03-24 07:52 · Score: 1

Linking to Wikipedia doesn't really make you authoritative, as an aside.
no, its puts things in context, and helps to make sure that we are using common terminology via definition.
anyways, your attempts at flamebait won't draw me in - you've already shown in this, and previous comments that you're just out looking for a fight.
if you were willing to learn about something, and not just make inflammatory comments, that would be something else entirely. i feel kind of sorry for you.
Re:Laziness Rules by Anonymous Coward · 2009-03-24 07:52 · Score: 2, Informative

Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...
He started work on CouchDB in 2005. Prior to that he was a Notes grunt of little significance.
He started at MySQL in 2007.
The point holds.
Re:Laziness Rules by ergo98 · 2009-03-24 07:56 · Score: 1

anyways, your attempts at flamebait won't draw me in - you've already shown in this, and previous comments that you're just out looking for a fight.
You added some noise dressed up with faux authoritarianism, and you consider being called on it a demand for a fight?
Hardly.

if you were willing to learn about something, and not just make inflammatory comments, that would be something else entirely. i feel kind of sorry for you.
Every word of your two posts have been pure troll. I enjoy trolls (it's a weakness), so yes, I do respond.
Tell me again what in Katz' resume is so impressive that my absolutely true statement, which was derived from the lips of Katz himself, is "ignorance"?
Re:Laziness Rules by samkass · 2009-03-24 07:57 · Score: 1

The fundamental problem of enforced consistency with our system is that it requires hard locking, and that is at odds with distributed scalability. When you're going over a few satellite hops and you've got users real-time deep collaborating (ie. they're all heavy writers) with >2s latency, a traditional DB isn't going to scale very well. Even the Facebook MySQL+memcached is going to break down in that environment.

--
E pluribus unum
Re:Laziness Rules by zip_000 · 2009-03-24 08:01 · Score: 1

That guy's resume needs some work.
Re:Laziness Rules by Seakip18 · 2009-03-24 08:03 · Score: 1

After all, who wouldn't mind an application that says you need to restart your computer because the application crashed? A text pushing application at that....

--
import system.cool.Sig;
Re:Laziness Rules by sl0ppy · 2009-03-24 08:07 · Score: 2, Insightful

Everything else will require at least a medium rewrite at some point when you switch over to a real database. You could of course extend everything upon a glorified flatfile until your reinvented wheels strangles all your progress.
not really. i think that you (and, unfortunately, the FA) are missing the point that the map and reduce functionality, while powerful, have one major advantage: scalability. simply put, a query can be, by definition of the map function, broken up into several discrete operations and performed simultaneously on the data.
while this can be done in Oracle, using RAC, to some extent, the cost and complication is a major barrier to entry. Cache-Fusion, while typically good, can also end up being a liability when the cost based optimizer attempts to split up the query into atomic tasks in order to correctly parallelize the query. for instance, on one application of RAC (multiple multi-core servers, fibrechannel disks, and oracle clustered filesystem), across 100,000,000+ rows, when heavy writes were occurring, it was cheaper computationally to force a full disk scan, using hints, than to rely on Cache-Fusion to figure out what data was stale and what data was fresh. this was discovered after several days spent neck deep in tkprof output.
conversely, map, by design, already does this.
Re:Laziness Rules by Trifthen · 2009-03-24 08:35 · Score: 1, Insightful

That's what I don't quite understand about all this. It's been the case for a while now that:
1. If you want a full RDBMS, use Oracle, or PostgreSQL, or a similar ACID + SQL92 compliant DB.
2. If you don't really care, use MySQL.
3. If you want ridiculous speed, and actively hate your data, use SQLite.
4. If you have one file, or maybe two, use BerkeleyDB or similar.
5. Flat files are fine for config.
I'm not sure we need yet another category here. Then again, we're now seeing things surfacing like database sharding which currently limits all data interaction to whichever application managed the data distribution. It would be nice to see a DB capable of hiding such things behind the classic SQL engine so not every client app and API requires the chosen sharding method implemented in possibly mutually exclusive and buggy ways.

--
Read: Rabbit Rue - Free serial nove
Re:Laziness Rules by tepples · 2009-03-24 08:52 · Score: 2, Insightful

If you want ridiculous speed, and actively hate your data, use SQLite.
Care to explain why SQLite requires one to "actively hate [one's] data"?
Re:Laziness Rules by mlwmohawk · 2009-03-24 09:26 · Score: 1

1. If you want a full RDBMS, use Oracle, or PostgreSQL, or a similar ACID + SQL92 compliant DB.
2. If you don't really care, use MySQL.
3. If you want ridiculous speed, and actively hate your data, use SQLite.
4. If you have one file, or maybe two, use BerkeleyDB or similar.
5. Flat files are fine for config.
This is a perfect example of the thinking of those who know nothing about databases. It is nonsensical conventional wisdom created by ignorance.
"If you don't really care, use MySQL."
How do you know before hand if you care or not? The floor is littered with projects that started as "simple" an grew. MySQL is a terrible database and almost any full RDBMS is better.
"If you want ridiculous speed, and actively hate your data, use SQLite."
What is "ridiculous speed?" On reads? Writes? Concurrency? Joins? Complex data selection? A full RDBMS will have HUGE speed advantages over MySQL and SQLite most every case.
"If you have one file, or maybe two, use BerkeleyDB or similar.'
What does BerkeleyDB even bring to this discussion?
"Flat files are fine for config."
What if you want to share config?
Re:Laziness Rules by Anonymous Coward · 2009-03-24 09:41 · Score: 1, Informative

I presume he said that because SQLite doesn't actually keep track of a column's data type. So there's nothing in the database that explicitly keeps you from writing addresses and blog posts in a column titled "Date of Birth" (which in another DB would explicitly be a date type). At least, that's the only explanation I can think of.
Re:Laziness Rules by klui · 2009-03-24 09:42 · Score: 1

But unfortunately the FF developers used SQLite in such a way that I dread having to open my history pane.
Re:Laziness Rules by v(*_*)vvvv · 2009-03-24 09:54 · Score: 1

... Then you need some consistency checks... Then you need some triggers...
You are making the presumption that the project is evolving towards requiring these features. Not every project evolves. Many are bloated. They can be small and contained.

Clearly the developers of these databases did so to satisfy a need. And those with similar needs have contributed to their adoption. To think they were just too lazy to learn a real database and went ahead and built a database platform instead is just too interesting to be true. So I don't think the parent's question is something that can be refuted:

maybe these slacker DBs are perfect for what they're doing?
With that said, sure, there are a ton of people who need a real database who start off with a fake one... That can probably be said about a lot of other things we do.
Re:Laziness Rules by Anonymous Coward · 2009-03-24 10:08 · Score: 0

Lotus Notes doesn't even refer to itself as a "database" - in the text I read, they restrained themselves to calling it "document management system"
Re:Laziness Rules by H0p313ss · 2009-03-24 10:09 · Score: 1

Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...
So many jokes, so little time.

--
XML is a known as a key material required to create SMD: Software of Mass Destruction
Re:Laziness Rules by Sancho · 2009-03-24 10:19 · Score: 1

I'm curious--are you actually advocating a full RDBMS for just about every data storage problem?
Re:Laziness Rules by encoderer · 2009-03-24 10:34 · Score: 1

Actually, he took the job at MySQL after CouchDB.
He worked on Lotus Notes (among other things) before CouchDB.
Re:Laziness Rules by shutdown+-p+now · 2009-03-24 10:57 · Score: 1

Isn't any simple SQL SELECT statement with an aggregate operator in it essentially equivalent to map-reduce? If so, what's stopping the database from optimizing it accordingly?
Re:Laziness Rules by Trillan · 2009-03-24 11:04 · Score: 1

Do you have any benchmarks that back up MySQL being faster than SQLite? Ever set I've seen has SQLite being slightly faster on most operations, which makes a lot more sense considering how much less work it does.
Re:Laziness Rules by metalhed77 · 2009-03-24 11:27 · Score: 1

I know most of these comments aren't too serious, but oddly enough his experience with notes is part of why couch has turned out so well.
His experience with notes is pretty relevant as notes and couchdb are both document oriented databases. CouchDB is VERY different than notes in a lot of ways, Damien designed Couch with a lot of the problems of notes in mind. So, you can think of his time with notes as time he spent learning what not to do.
IBM funds CouchDB development through Apache paying Damien. They initially wanted him to work for IBM as a regular employee he told them he was unwilling to work in that corporate structure, leading to the current compromise.

--
Photos.
Re:Laziness Rules by afidel · 2009-03-24 12:36 · Score: 1

Why? I keep 99 days of history and I'm a 20-50 tab at a time guy and my history tab takes all of 3-5 seconds to open.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Laziness Rules by mlwmohawk · 2009-03-24 12:50 · Score: 1

I'm curious--are you actually advocating a full RDBMS for just about every data storage problem?
Not at all, but the above "conventional wisdom" list is bogus. It has nothing to do with how you choose a data storage technology. In the spirit of Billy Madison, we are all stupider for having read it.
Re:Laziness Rules by mlwmohawk · 2009-03-24 12:54 · Score: 1

Do you have any benchmarks that back up MySQL being faster than SQLite? Ever set I've seen has SQLite being slightly faster on most operations, which makes a lot more sense considering how much less work it does.
This is my biggest problem with these discussions, before we talk about "performance" we need to understand how to quantify performance. Is it read? write? update? complex extraction? joins? concurrency? What combination?
To say SQLite is "fast" is meaningless unless you define the criteria by which you come to this conclusion.
Re:Laziness Rules by story645 · 2009-03-24 15:45 · Score: 1

Granted my background is with SQL Server and Oracle, so perhaps I'm not the target audience
I probably am the target audience as I'd never worked with db's (or even taken a course on them) before being required to last year. My first experiment with db's was the python interface to SQLite and I found it rather simple and pain free once I got the hang of it. Now I'm working with SQLalchemy, and I think it's another great newbie db tool 'cause it offers a dozen ways to do things, so a newbie can start with sql/sql like commands and work up to orms or whatever else.

--
open source modern art: laser taggi
Re:Laziness Rules by Anonymous Coward · 2009-03-24 17:00 · Score: 0

his experience with notes is part of why couch has turned out so well.
What I learned from Notes is that any app will turn its database into a sack of incomprehensible crap after a couple of years of active development and turnover. You can't fix this with views unless you know how to interpret the fields of the underlying documents, and there comes a point where not even a developer can look at them and give you a definitive answer.
If your DB doesn't enforce a schema and constraints and migration of old data, either you build another database system on top of it (far more effort than just choosing one that works from the start) or the whole system is living on borrowed time.
Re:Laziness Rules by Trillan · 2009-03-24 17:32 · Score: 1

When I posted that original reply, I'd misread what you wrote and thought you said SQLite had a huge speed advantage over MySQL. They're usually comparable, with some operations being faster for one or the other. I don't think speed is a good reason to pick one or the other, given how different they are in other ways, and from the way I'd misunderstood your question I was really wondering what you'd come up with.
But now that I actually understand what you wrote (heh): what areas does a real database system have a huge speed advantage? Concurrency, certainly, is a big weakness of SQLite (though it isn't bad in the small scale).
Re:Laziness Rules by dkf · 2009-03-24 18:15 · Score: 1

But now that I actually understand what you wrote (heh): what areas does a real database system have a huge speed advantage? Concurrency, certainly, is a big weakness of SQLite (though it isn't bad in the small scale).
Curiously, the SQLite devs say that the point when you should be switching up to something like Postgres or Oracle is when you are getting real problems with concurrent writes. In other words, they acknowledge this as an issue and say that it is a non-goal. (On the other hand, the other DB engines listed are a lot more heavyweight in administration terms, so you really are paying for this additional power. That is good and proper...)

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Laziness Rules by fractoid · 2009-03-24 18:30 · Score: 1

To say SQLite is "fast" is meaningless unless you define the criteria by which you come to this conclusion.
And yet it is somehow meaningful to say that "A full RDBMS will have HUGE speed advantages over MySQL and SQLite most every case."?

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Laziness Rules by fractoid · 2009-03-24 18:36 · Score: 1

I can't help feeling that this is the best argument yet for something like SQLite. You start off with an ultralightweight DBMS, and then when you need a 'real' database your app is already written around using a SQL DB, so it's less painful to convert.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Laziness Rules by mlwmohawk · 2009-03-24 22:30 · Score: 1

To say SQLite is "fast" is meaningless unless you define the criteria by which you come to this conclusion.
And yet it is somehow meaningful to say that "A full RDBMS will have HUGE speed advantages over MySQL and SQLite most every case."?
Yes, because in almost every metric, a full blown RDBMS WILL have an advantage. Look at something like Oracle, DB2, or PostgreSQL. The objective of these systems starts and ends with "performance" and reliability on fairly large machines. MySQL started life as mSql, a small "sql" like database for small machines.
The MySQL is not designed to be "high performance" and every site that relies on it, like Slashdot for instance, has to do a lot of heavy lifting just to work around its limitations.
In a 99% read / 1% write environment, MySQL may be fine as long as you are doing simple queries with simple data. Add some complexity to the data and queries and the SQL planner does a bad job at constructing access plans for queries, and performs like crap.
As a contractor in 2006 I wrote a data analysis system for yahoo. I told them I needed PostgreSQL or Oracle, they wanted FreeBSD, so I used PostgreSQL. I had one very cool query, it went through a few hundred million rows of data and aggregated and characterized performance data for an arbitrary grouping of machines. Well, that query ran in an acceptable 2 seconds. A new VP came in and wanted MySQL because they had internal support, On MySQL it took 20 minutes. They had some "MySQL Experts" on hand to help me and they couldn't make it run any faster.
What people don't realize is that a good RDBMS is a powerful beast, and it isn't until you run in to issues related to the science of data access do you appreciate just how awesome they are.
Re:Laziness Rules by pstorry · 2009-03-25 00:46 · Score: 1

Having followed Damian's CouchDB project since its inception, I'd say it's important to note that he didn't know much about TRADITIONAL databases when he started.
By traditional, he'd mean relational.
But he's trying to build a document-oriented database.
A relational database has about as much to do with a document-oriented database as American Football has to do with Football. There's a little common ground (flat green playing surface, two teams, a football) but even in that common ground we find that the definitions hide differences which make them impossibly interchangable. (Try changing a football for a football and then playing football. Doesn't matter which side you approach that sentence from, you're not going to be happy with the results...)
By building a schema that splits "documents" apart and puts them into records in many seperate tables, an RDBMS gets consistency and efficiency across the whole store. It does so at certain costs - for instance the cost of higher maintenance costs of change, as the schema must be updated and current data massaged to fit.
By building without schemas and allowing each document to contain what it likes, a document-oriented database gets consistency and flexibility for all documents. It does so at certain costs - for instance higher costs of storage and querying performance, as identical fields in documents are reproduced rather than referenced, and must be stored/queried each time.
But both approached have their benefits. The RDBMS gets faster querying and better storage efficiencies. The document-oriented database gets document integrity and the possibility of easier versioning and security functions.
Neither approach is globally right. Each has its use cases. If I had to build a database that tracked ticket sales/seat allocations in realtime for a worldwide airline, I'd pick an RDBMS.
But if I wanted to build a discussion forum or a document management system, I'd be more inclined to pick the document-oriented database system.
To say "it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories" merely shows your own ignorance of the document-oriented approach.
You might as well say that it's as rational as trying to build a boat when he has no knowledge of motorbike maintenance...
Still, your title was right. Laziness rules when it comes to people informing themselves before posting on Slashdot... ;-)
Re:Laziness Rules by jedidiah · 2009-03-25 01:26 · Score: 1

> Look at something like Oracle, DB2, or PostgreSQL. The objective of these systems starts
> and ends with "performance" and reliability on fairly large machines. ...um no.
The number one objective of something like Oracle or DB2 is correctness.
Then comes reliability.
"performance" is a tertiary consideration at best.
If SQLLite is for when you "hate" your data, Oracle is for when you "love" it.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Laziness Rules by Prof.Phreak · 2009-03-25 02:43 · Score: 1

Urgh. I still use Lotus Notes every freaking day. Urgh!

--
"If anything can go wrong, it will." - Murphy
Re:Laziness Rules by DuckDodgers · 2009-03-25 07:07 · Score: 1

What's in your Postgres bag of tricks? We certainly have acceptable speed with it, but we have one table with 5 million rows, 1 with 2 million rows, and the rest are all tiny. I understand and use indexes - what are the other tricks? Master the query analyzer? Liberal use of partitions?
Re:Laziness Rules by mlwmohawk · 2009-03-25 13:39 · Score: 1

Well, what do you mean by understanding indexes? One of the biggest problems I've seen is stuff like this:
create table froboz
(
foo integer,
bar integer, .....
);
create index froboz_foo on froboz(foo);
create index froboz_bar on froboz(bar);
then doing this:
select * from froboz where foo='x' and bar='y''
And expecting that both indexes will help. Depending on the distribution of the data within the table using both indexes could REDUCE performance.
However, if you did this:
create index froboz_foobar on froboz(foo,bar);
Your query could be orders of magnitude faster.
Re:Laziness Rules by DuckDodgers · 2009-03-26 01:57 · Score: 1

Thanks for the example. I'll check if there are places I made that exact error, or any similar to it. We implemented indexes by the simple expedient of adding an index, running vacuum analyze to update the query planner, and then running benchmark queries against the database. Rinse, lather, and repeat for a few different columns in a few different tables, and keep the indexes that gave a significant query performance boost.

But I have a few queries I just can't figure out how to speed up, or determine a database refactor that makes sense. I still have a lot to learn.
Re:Laziness Rules by mlwmohawk · 2009-03-26 02:18 · Score: 1

I'm available for a nominal fee :-)
Take a look here:
http://www.mohawksoft.org/?q=node/56
Re:Laziness Rules by Trillan · 2009-03-26 09:13 · Score: 1
SQLite locks the entire database (a single file) using the file system. This is fast and simple, but breaks down when you have a lot of tasks trying to update the database at once. Fixing it would introduce a lot more complexity, so they're just not going to do it.
If you need a lot of concurrent updates, SQLite is clearly not your engine.
The best description I ever read was that SQLite was a replacement for fopen, not a RDBMS.
But I've always been curious where else (if anywhere) it really fails as a database.
- I've run into some poorly-performing queries, but I'm not convinced a "real" database would handle any of them any better.
- The query optimizer is weak (that it exists at all is something of a surprise), but that's just a matter of tuning queries better. The query optimizer rarely causes trouble, and in that case you just disable it using CROSS JOIN instead of JOIN.
- Lack of foreign key constraints is troubling, but not critical.
- "Manifest typing," (store anything in any column) combined with "column affinity," (convent if possible to the column's type, but store anyway if you can't) is as often an advantage as a disadvantage.
So I'd say SQLite is a good pick for non-server applications, and possibly small-scale server applications where a "real" database isn't available. Certainly, I wouldn't hesitate to use it in a desktop (or embedded) application, but I'd probably use something else on a server. Particularly since that "something else" would probably already be installed and running.

Well, it's like... by oldhack · 2009-03-24 06:37 · Score: 3, Funny

Either is cool with me, as long they are cool and takes care of business, you know what I am saying?

It's all good.

--
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.

Re:Well, it's like... by Anonymous Coward · 2009-03-24 09:25 · Score: 0

Sup dawg. We heard you like slacker dbs. So we deleted a column out of each of your tables so your db can slack on returning all the data.

Old guard, new guard, right guard... who cares? by Anonymous Coward · 2009-03-24 06:38 · Score: 1, Insightful

Why the need to make it 'old guard' vs 'new guard'... seems like flamebait for fanboys.
"tastes great" vs "less filling", or just explain the merits of both and leave it there.
It's like forum kiddies arguing raid 5 vs raid 1+0.

Yes, if the database is important, you want the most CAREFUL management available. Obviously.
But if these -db apps work fine, and your data isn't corporate mission critical, who cares?

Seems to me convenience and interoperability score higher for most small datasets, am I wrong?

Re:Hackers. by TheSpoom · 2009-03-24 06:41 · Score: 1, Funny

If I could do a security audit on a website by flying through a psychedelic 3D futurescape, I might just become a workaholic.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

DBs they may be ... by Anonymous Coward · 2009-03-24 06:42 · Score: 1

... it's when they get referred to with "relational" or "management system". DB fine. RDBMS they are not.

Cartesian products are GONE!!! YAYYYYY...... by bodland · 2009-03-24 06:43 · Score: 1

What is a Cartesian?....is that the water in Olympia Beer?

Re:Cartesian products are GONE!!! YAYYYYY...... by KagatoLNX · 2009-03-24 07:03 · Score: 1

Actually, cartesian products and joins aren't gone. It turns out that they just end up being done client side.
It's the tragedy of "join-less" databases. Joins do something that you need. The lack of joins forces people to correctly normalize, which ironically they should have been doing anyway. It doesn't take away any genuine need to join though. :(

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Cartesian products are GONE!!! YAYYYYY...... by aquatone282 · 2009-03-24 07:42 · Score: 1

What is a Cartesian?....is that the water in Olympia Beer?
That wasn't water the Cartesians put the beer. . .
Ask anyone who ever drank Oly and they'll tell you what it was.
Signed,
A former Olympia resident.

--
What?
Re:Cartesian products are GONE!!! YAYYYYY...... by Deagol · 2009-03-24 14:21 · Score: 1

Are there, or aren't there, Cartesians?
Do you think one will ever be found?

--
Method of processing duck feet

a base of data by poot_rootbeer · 2009-03-24 06:44 · Score: 4, Insightful

"tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'

If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...

who needs transactions? by alen · 2009-03-24 06:47 · Score: 3, Insightful

the article is right that in some cases it doesn't matter if a transaction is lost. but in any case where money is involved it's a must. you can't just start a fund from your Oracle or SQL Server savings to pay for mistakes because it will kill your brand and you may lose a lot of future business. and any savings will be eaten up by the extra cost to hire people to solve all the data problems

i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found

Re:who needs transactions? by rackserverdeals · 2009-03-24 08:04 · Score: 1

i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found
Really not trying to troll here, but this isn't too far from what a lot of people are dealing with when they use MySQL, especially the MyISAM engine.
A lot of people are using MySQL so it's just another step in the same direction.
In some projects, RDBM's aren't necessary. Look at what Google's been able to do with Bigtable/MapReduce. The open source equivalent seems to be Apache's HBase in the Hadoop project.

--
Dual Opteron < $600
Re:who needs transactions? by TheSpoom · 2009-03-24 09:50 · Score: 1

Oh, come on. MySQL suffers from the same thing that PHP does; that it's industry standard and easy to use.
That doesn't make it a bad tool.
If you want transactions, all you have to do is use an engine that supports them, like InnoDB, which is fully ACID compliant and has been in MySQL for a long, long time. Using MyISAM for things that require transactions (such as purchases or finances) is just sloppy design.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Re:who needs transactions? by Dragonslicer · 2009-03-24 11:18 · Score: 2, Interesting

Oh, come on. MySQL suffers from the same thing that PHP does; that it's industry standard and easy to use.
The bigger thing that they both suffer from is having a rather poor history. The problem with people saying how bad they are is that the complaints are based on old versions. PHP5 is much better than PHP4 or PHP3, and MySQL is steadily becoming something resembling a real database (5.0 is good, in particular if you use InnoDB, 4.1 was decent, but anything below 4.1 barely qualifies as a database).
Re:who needs transactions? by Simetrical · 2009-03-25 03:46 · Score: 1

MySQL is steadily becoming something resembling a real database (5.0 is good, in particular if you use InnoDB, 4.1 was decent, but anything below 4.1 barely qualifies as a database).
4.0 works fine for Wikipedia.

--
MediaWiki developer, Total War Center sysadmin

distributed databases and P2P by thanasakis · 2009-03-24 06:52 · Score: 4, Informative

The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.

Who? by Jaysyn · 2009-03-24 06:56 · Score: 0

Aside from Google, I've never even heard of those "upstarts".

--
There is a war going on for your mind.

Re:Who? by DigitalSorceress · 2009-03-24 07:25 · Score: 1

... and from the description, the Google solution reminds me a LOT of Zope.
For those who don't know, Zope is a CMS written in Python and the database is pretty much a really big object. (In fact, I think it's technically an OODBMS instead of an RDBMS, but I could just be talking out my arse.
Zope doesn't have a learning curve, it has a learning butte, but once you scratch and claw your way to the summit, the view is kinda nice... It's just that when you get there, you can't help but to notice that there's a whole tour bus of geriatric patients that got a nice air conditioned ride to the same spot on the SQL express.
I think I had a point in there somewhere.
Oh yeah, I remember now... GET OFF MY LAWN!

--

The Digital Sorceress

MySQL not listed as toy ? by nicolas.kassis · 2009-03-24 06:57 · Score: 1

Seriously any Old Guard DBA will put MySQL in the toy category.

Re:MySQL not listed as toy ? by dacut · 2009-03-24 07:18 · Score: 3, Insightful

MySQL strives to provide RDBMS and ACID semantics, though its quality of service (QoS) may fall short. By contrast, these "slacker" databases don't even try to support RDBMS or ACID; even if they operated perfectly, they won't provide RDBMS/ACID.
I work for one of the companies in question (no, I don't speak for them). We rely heavily on a combination of these "slacker" dbs, Berkeley dbs, memcached, Oracle, flat files, and tape backups. Each fills a niche. I wish these articles would quit trying to create a false dichotomy.
Re:MySQL not listed as toy ? by Anonymous Coward · 2009-03-24 07:42 · Score: 0

At least every "Old Guard DBA" will be relieved to know that you are here to speak in their name.

List of non-relation DBs by captainclever · 2009-03-24 07:00 · Score: 1

I wrote an article about non-relation databases, and there were some interesting comments about the various tradeoffs etc: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/

--
Last.fm - join the social music revolution

Re:List of non-relation DBs by Anonymous Coward · 2009-03-24 07:27 · Score: 0

I wrote an article about non-relation databases [....]
Yeah, I wrote one, too. But I stored it in my non-relational DB and now it's gone.
*sigh*

You young whippersnappers don't know nothing! by www.sorehands.com · 2009-03-24 07:00 · Score: 4, Interesting

Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.

Network model databases will outperform relational all the time. You just don't have the same flexibility.

Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.

I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.

--
Fight Spammers!

Re:You young whippersnappers don't know nothing! by hey · 2009-03-24 08:16 · Score: 1

That network model looks useful. Too bad there don't seem to be any reality available implementations to try out.
Re:You young whippersnappers don't know nothing! by www.sorehands.com · 2009-03-24 10:24 · Score: 1

It is available, but they are not free. See http://www.mdbs.com/prod_tde.htm and http://www.raima.com/

--
Fight Spammers!

I've never understood the UNIX world's fascination by Richard+Steiner · 2009-03-24 07:00 · Score: 5, Informative

I've never understood the UNIX world's fascination with relational databases.

Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.

We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.

A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.

Records not allocated are filled with a deallocated fill pattern.

A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.

To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.

Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.

We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.

I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

When is a database not a database? by MonolithicX · 2009-03-24 07:04 · Score: 1

I think the question here isn't New DB or Old DB but when do you stop considering any data store a database? There are plenty of ways to write data to disk fast as hell but God help you if you want to do anything with the data later. I see these as specialty data stores - get the data in fast and then batch it out to your "old school" relational database to perform analytics on it later. Relationally Yours, MonoX.

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 07:06 · Score: 0

What does this have to do with UNIX? Relational DBs were invented in the mainframe world.

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-24 07:06 · Score: 1

Oops. Forgot that brackets get eaten. Typical record format is RECORDTYPE/FILENUMBER/RECORDNUMBER. The first Type 1 record for File 100 might look like 01-0127-0001 or whatever (specific binary representation in hex or octal would obviously vary depending on implementation and preference).

In our case, it's a 36-bit word shown as 12 octal digits, probably not a popular choice with UNIX folks. :-)

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Harsh? by Bobb+Sledd · 2009-03-24 07:09 · Score: 2, Insightful

I'm a DB admin, and I use things that aren't toys; but what I've heard here is kinda harsh.

Look, it's all about "right tool for the right job." Why do you need a nuclear-powered drill that can make a tunnel from here to China, when really all you needed was a shovel?

For most daily projects that have small amounts of data, they may be using something like Crystal Reports or Excel or SPSS that just does all the number-crunching client-side anyway. You don't always need Oracle or [favorite DB flavor] for that.

--
"They said I probly shouldn't fly with just one eye," "I am Bender. Please insert girder."

Re:Harsh? by mattygabe · 2009-03-24 07:48 · Score: 1

I'm a DB admin, and I use things that aren't toys; but what I've heard here is kinda harsh.
Look, it's all about "right tool for the right job." Why do you need a nuclear-powered drill that can make a tunnel from here to China, when really all you needed was a shovel?
For most daily projects that have small amounts of data, they may be using something like Crystal Reports or Excel or SPSS that just does all the number-crunching client-side anyway. You don't always need Oracle or [favorite DB flavor] for that.
What?! We shouldn't suggest that our company buy the biggest, baddest, best-performing supermachine just based on the cost efficiency that it can achieved if pushed to it's optimum limit? Even if we only need a shovel? You're no fun!
Re:Harsh? by Bobb+Sledd · 2009-03-24 09:31 · Score: 1

No no!!
I just don't want their shitty little projects on my supermachine!
So I give them MS-Access or the crappy MSSQL version that runs on their own box or something... and I keep it off my big machine!

--
"They said I probly shouldn't fly with just one eye," "I am Bender. Please insert girder."
Re:Harsh? by maaleron · 2009-03-24 11:46 · Score: 1

Cuz it's a long way to China? Duh...
Re:Harsh? by EvilBudMan · 2009-03-27 05:12 · Score: 1

XTreeGold 3.0 still seems to work for what I need. Put stuff in folders and make use of alt log branch /-}

I feel old by a2wflc · 2009-03-24 07:12 · Score: 2, Informative

When I saw the title I thought "I'm old-guard". Then I read the article and JOINs are a key concept to the old-guard.

My first few DB apps involved using a b-tree or ISAM library (or writing our own). Then the "new guys" started wanting to pay for a server that did JOINs. We did JOINs, just at the app layer and without the guaranteed consitency that a good relational design gives you. And getting a server that does it was expensive.

I wouldn't want to go back to pre-relational server days, but am also very thankful that I did write my own DBs from the ground up. I will probably never need to use the entire experience, but can often use bits and pieces of it, and I appreciate a good key/value store.

Re:I feel old by __aasqbs9791 · 2009-03-24 07:25 · Score: 3, Funny

I was listening to the radio (didn't pay attention the the station it was on) one day and generally liking the music I was listening to on it. Then the station ID came across between songs. It was the "oldies" station. I suddenly felt like I needed a cane (or perhaps a walker). Why does that happen? And is it going to happen every 10 years or so? I don't think I can take too many more of those moments.
Re:I feel old by sysrammer · 2009-03-24 12:14 · Score: 1

I know what you mean. I remember back when the local oldies station KRTH started playing classic rock. I said, "No, no, that's not oldies! Oldies are 50's songs, pre-Beatles stuff. We have a quite serviceable classic rock station w/ KLOS. WTF?" I was quite annoyed.
sr

--
His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain

SELECT * FROM SNARKY_COMMENTS by billstewart · 2009-03-24 07:13 · Score: 4, Funny

Can't quite fit the whole query into the title box, but if you were using one of those databases that Wayner's article talked about, you'd be able to query and find out if you were first...

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-24 07:14 · Score: 1

01-0127-0001 is the first type 1 record for file 127. 01-0100-0001 would be the first for file 100. That's what I get for doing patchwork re-editing an existing message before sending it...

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Worse is better by oGMo · 2009-03-24 07:15 · Score: 1

...especially when you don't know what "better" is and you're too lazy to learn: unwillingness to learn is stupidity. Like the quote says, ignorance is curable; stupidity is terminal.

People who use these things and think they're great and that they're doing amazing things don't realize that the time they're taking and the problems they're struggling with are long-solved trivialities. Nothing new. Nothing cool. It's like someone struggling with a bunch of complicated excel formulas to make their spreadsheet do something that you could do in a few lines of your favorite scripting language.

Unfortunately this is something that afflicts most of the industry these days, and we end up thinking half-assed pieces of crap are cool just because you see them in your browser.

--

Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

Re:Worse is better by Anonymous Coward · 2009-03-24 13:51 · Score: 0

ignorance is curable; stupidity is terminal.
Hopelessly optimistic.

Berkeley DB is awesome by IGnatius+T+Foobar · 2009-03-24 07:18 · Score: 4, Interesting

I can't believe there hasn't been any mention of Berkeley DB yet. Guess what, folks: sometimes you just don't need the features of a full relational database. Sometimes all you need is fast, robust, reliable storage of indexed key/value pairs.

I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).

Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.

--
Tired of FB/Google censorship? Visit UNCENSORED!

Re:Berkeley DB is awesome by Foresto · 2009-03-24 08:57 · Score: 3, Informative

For others who are interested in Berkeley-style key-value stores, check out Tokyo Cabinet.
Re:Berkeley DB is awesome by shutdown+-p+now · 2009-03-24 11:00 · Score: 1

Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).
Exchange uses a database backend similar in features to BDB internally. Yes, it also has journalling and ACID transactions. I don't know what the problem with Exchange mail loss might be, but it may not necessarily be the DB problem.
Re:Berkeley DB is awesome by drfreak · 2009-03-24 13:42 · Score: 1

I've looked into Berkley DB. It is a great engine, but the issue still exists where if you are not familiar with the application that is using it, you're out of luck. Maybe it has changed since when I looked, but the application handles its own metadata and must enforce its own relations if you need any referential integrity.
Re:Berkeley DB is awesome by IGnatius+T+Foobar · 2009-03-29 15:31 · Score: 1

Maybe it has changed since when I looked, but the application handles its own metadata and must enforce its own relations if you need any referential integrity.
Ummm ... yeah, that's pretty much exactly the point. If you want the database to enforce relations then you use a relational database. If you want that logic to be in the application's domain then you use something like Berkeley DB.

This isn't a set of tinkertoys to be used for a barista-turned-programmer to develop a shiny but useless Web 2.0 application. It's a library that implements a very simple but insanely reliable data store that embeds directly into the application that uses it. It does its job exceptionally well, and doesn't attempt to do any other tool's job.

--
Tired of FB/Google censorship? Visit UNCENSORED!

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-24 07:23 · Score: 1

In my experience, most UNIX programmers tend to assume a relational database for almost everything if it isn't a vanilla flat file. That includes programmers in realtime applications, C people, Java people, etc. I can't begin to tell you how many applications I've seen written to use Oracle, Sybase, etc., just to store a simple static table of information. There's no POINT to that!

Most mainframe environments, on the other hand, have many established options, and relational is usually only considered if you actually need that type of functionality. Normally, systems are written with a mix of different database types. Could be flat files, could be freespace, could be RDMS, or could (in our case) be DMS, a network database with some types of set-linking properties but not really table-based.

Again in my experience, working in the airline industry, I've seen a bias towards RDMS for enterprise applications that I simply haven't seen on the mainframe (in my case Unisys transaction processing) side of life.

A web site is a tranctions processing facility. Just replace fancy 3270 or Uniscope screens with HTML. Same idea, forms, etc. It wants in and out, fast. Why use something not really made for that?

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

"eventual consistency"? WTF? by wardk · 2009-03-24 07:24 · Score: 1

yeah, who wants consistent data, that's for old guard type people

gee I deposited my paycheck three weeks ago and it's not in my account yet?

bank: don't worry, it will be there eventually.

It's about the foundation by fluffernutter · 2009-03-24 07:24 · Score: 1

I dunno.. You know what.. I am all about using the tool that gets the job done but if I'm spending the time to develop something I'm not going to take a chance. If I'm going to build a house am I going to build on a cracked foundation because it's convenient and cheap. I'm going to spend money on a sound foundation. If I am going to go into months of development I'm going to use something fundamentally sound like a relational. Besides you don't really need to know SQL inside and out anymore anyway. That's what ORMs are for.

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.

Old vs. New Simple DB's by billstewart · 2009-03-24 07:32 · Score: 2, Funny

Wayner's usually a good writer, and did some good theoretical-computer-science work back in the day, but this article was too short to answer the questions he asks at the beginning, and he mostly highlighted the new shiny things from big ASPs, which is generally what Infoworld wants.

I'm particularly disappointed that while he referred to the name and history of Berkeley DB, aka Sleepycat, aka Oracle Renamed-foo, he didn't actually talk about using it. (OTOH, Infoworld did review one version of it in 2005.) I no longer have my 4.1BSD manual on the shelf, but it was useful if you wanted something faster than using grep/sed/awk/look on tab-separated text files (which were the canonical Unix database format, and what I normally used for databases.)

These days if I want a lightweight database, I usually just put build tables in Excel, and then bitch about how it doesn't have a join or even decent text-editing and filtering capabilities, and occasionally have to save it as a CSV file and install vim on Yet Another Work-owned Windows box so I can get some bloody work done. I supposed if Excel did have a join function there'd be fewer people buying MS Access...

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:Old vs. New Simple DB's by Anonymous Coward · 2009-03-24 10:48 · Score: 0

Wayner's usually a good writer, and did some good theoretical-computer-science work back in the day, but this article was too short to answer the questions he asks at the beginning, and he mostly highlighted the new shiny things from big ASPs, which is generally what Infoworld wants.
I'm particularly disappointed that while he referred to the name and history of Berkeley DB, aka Sleepycat, aka Oracle Renamed-foo, he didn't actually talk about using it. (OTOH, Infoworld did review one version of it in 2005.) I no longer have my 4.1BSD manual on the shelf, but it was useful if you wanted something faster than using grep/sed/awk/look on tab-separated text files (which were the canonical Unix database format, and what I normally used for databases.)
These days if I want a lightweight database, I usually just put build tables in Excel, and then bitch about how it doesn't have a join or even decent text-editing and filtering capabilities, and occasionally have to save it as a CSV file and install vim on Yet Another Work-owned Windows box so I can get some bloody work done. I supposed if Excel did have a join function there'd be fewer people buying MS Access...
So you're one of those, use Excel for everything guys... I don't understand why people don't use Access more, It's not rocket science.
Re:Old vs. New Simple DB's by fractoid · 2009-03-24 12:39 · Score: 1

Sounds to me like he's one of those "everything they ask me to do is easiest done in Excel" guys. Access isn't rocket science, but when a one-page spreadsheet that prints out an invoice is all you need, then you'd have to be crazy (or very bored) to build an Access database for it.

--
Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
Re:Old vs. New Simple DB's by colinrichardday · 2009-03-24 13:48 · Score: 1

I don't understand why people don't use Access more, It's not rocket science.
Do they have access to Access, or do you expect them to buy it?
Re:Old vs. New Simple DB's by petermgreen · 2009-03-24 15:34 · Score: 2, Insightful

the thing that always puzzled me about berkerlydb is it's incessent format breakage requiring dumps and restores.
On a database server at least data upgrading can be handled centrally but on a file based DB where datafiles can be scattered anywhere a lack of a stable data format seems like a fatal flaw.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

Re:I've never understood the UNIX world's fascinat by dcowart · 2009-03-24 07:38 · Score: 4, Insightful

How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.

It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.

Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.

It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.

--Donald

--
www.rdex.net

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 07:41 · Score: 0

I would LOVE to see a freespace database ported to Solaris

Sound like you might want to check out the "Berkely Database". It is very fast and has been in wide use for many years.

"http://en.wikipedia.org/wiki/Berkeley_DB

Berkeley_DB is also one of the unlaying data store methods that MySQL can use. I think MySQL can use either Barkely or a raw file system.

Going way back IBM had a system like this call ISAM. All of these are very simple

Re:I've never understood the UNIX world's fascinat by dcowart · 2009-03-24 07:41 · Score: 1

In Re: to my Re:, I like sqlite for simple DB applications, I get DB functionality with a very low overhead. Otherwise I use postgresql.

I have used Oracle and some others before now, but those are my two current DB's (sql-engines?) of choice.

--
www.rdex.net

The problem is growth... by alexhmit01 · 2009-03-24 07:42 · Score: 1

Non-normalized databases are fine, and might be faster, for small sites, but when things scale, the sloppy databases (or worse, sloppy frameworks like Ruby's Active Record) just cause problems.

A scalable, normalized database means consistent data, when you have multiple applications hitting it.

For a web forum, sure, a relational database may be the wrong tool, because all you care about is speed on new stuff, the archive can crawl, etc.

However, what happens when your web forum adds some actual data, and then a few years down the road, you need new tools to talk to that data? You can abstract everything through code, and post into your webserver and let Perl/PHP manage it, but then that's a new piece of legacy code to maintain.

I keep all my stuff in a PostgreSQL database, and build Schemas with Views for web apps, etc. So when a new piece of functionality is needed, it's kept segmented off. So you can Prototype a Ruby app, maintain a PHP Web App, and even build custom tools in VB or other environments that talks directly to the database for manipulation. The spreadsheet guys ALWAYS loved when I could setup an ODBC connection, and they could pull real time data into Excel, instead of needing to go through a web interface and grab CSV pulls. Hell, I had a simple Excel spreadsheet that went out to my PostgreSQL database, got the necessary data, prepped it (all in Excel), and then stuck the data into Quickbooks via the SDK (using VBA of all technologies) to prevent needing to double enter.

If you were on a real GL powered with DB2 or Oracle, you could do even fancier things.

RDBMS skills are a good thing to develop. The overhead is pretty minor for starting off, and it gives you great flexibility down the road.

Now, if you have a technology REASON to want a non-relational database, go nuts, new tech can do new things. But if it's a refusal to learn relational theory, pick up a good book and learn the mathematics behind it.

Alex

Not all databases have to be relational... by shoppa · 2009-03-24 07:43 · Score: 1

For the vast majority of web applications, the "key-value pair" class of databases work fine.

I think the real problem is that the "relational database weenies" look down on the key-value pair databases, and there are a lot of non-DB-weenies out there who like using true relational databases as nothing more than key-value pair. It degenerates into name calling, instead of getting the job done, pretty fast.

Re:Not all databases have to be relational... by The+Slashdolt · 2009-03-24 08:38 · Score: 1

The problem is that the programmers think they own the data when they don't. Your app will come and go and the data will remain. People will want to query that data, report on that data, or even transfer it into other databases. Database people think beyond the current requirements of your particular app.

--
mp3's are only for those with bad memories

Ignore The Rules At Your Peril by Prototerm · 2009-03-24 07:45 · Score: 2, Insightful

You may have seen in the news recently how in the last decade or so Wall Street ignored some of the hard-won regulations and guidelines developed in the wake of the Great Depression.

We all know what happened as a result.

The same is true when dealing with data. You don't ignore the rules completely, or follow them only when you feel like it, or when you have time. As the old joke goes, Quality is *not* Job 1.1.

If the data isn't important enough to store correctly, then it's not important enough to be stored at all.

--
"My country, right or wrong; if right, to be kept right; and if wrong, to be set right." --Senator Carl Schurz (1872)

Re:Ignore The Rules At Your Peril by avandesande · 2009-03-24 09:55 · Score: 1

I can't ever remember regretting that I had properly normalized data.
Ever.

--
love is just extroverted narcissism

Just data structures by Thaelon · 2009-03-24 07:45 · Score: 4, Insightful

Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.

Knowing the ins & outs of your data structures is still a vital skill of programming.

--

Question everything

Re:Just data structures by Anonymous Coward · 2009-03-24 08:40 · Score: 0

No, knowing the ins and outs of your data structures is definitely not a vital skill of programming.
Good programmers knows about data structures but most of the programmers are clueless monkeys that don't know jack sh*t about data structures, don't know jack sh*t about security, etc.
The state of affair is pathetic and just look at the job market and at all the products that are out there: 95% of the programmers are clueless and produce crap.
Re:Just data structures by pthreadunixman · 2009-03-24 09:52 · Score: 1

Nonsense. A flat file is a database.

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 07:49 · Score: 1, Insightful

Maybe the fascination with relational databases is that you can easily work with the data in there.

What you describe just sounds like a file system. A specialized one, but it doesn't really support more than a filesystem does. Everything works fine if you have the key to the data. You can read the data, do your stuff, and update the data. But what if your problem it to find the key? Like you want to know which orders are overdue? Doesn't sound like the freespace file will help me there. Sounds like I have to implement the whole searching by myself.

When I am searching for a database solution then probably because I really need that searching and I want it to be fast, and I don't want to do it myself.

What you suggest doesn't sound like a database. It sounds more like an allocation scheme any database could use under the hood. What you suggest may suffice if my requirement is a high performance filesystem. But I don't see how it supports even the most basic database operations. I don't say your solution is bad. It just doesn't solve the same problem as a database.

The problem is scaling by plopez · 2009-03-24 07:50 · Score: 2, Insightful

so you start a small project, "we just need a few hundred/thousand records, a few key value links and the occasional transaction". so you start with a slacker DB. A slacker DB far too often implies a slacker hack software d00d.

Then it grows. Instead of educating themselves (Q: what's the difference between those who can't read and those who don't? A: nothing. ) and finding a better DB solution they thrash around trying to hack in DB functions into their code.

So they lose consistency etc. Soon they have a polluted DB that breaks all the time. Often they are proud of the heroics of the wasted effort they put into it. A good programmer know how to be correct form of lazy: do not reinvent the wheel.

--
putting the 'B' in LGBTQ+

Re:The problem is scaling by oGMo · 2009-03-24 09:04 · Score: 1

A good programmer know how to be correct form of lazy: do not reinvent the wheel.

YES. Good lazy is "I shouldn't have to do all this work, either use someone else's, or make the computer to it for me." Bad lazy is "whine, I don't want to figure anything out, I just want to get it done." The difference is crucial; the first is willing to spend time learning to save unnecessary labor, the latter is willing to do unnecessary labor to save learning. The former is laziness, the latter is stupidity.

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:The problem is scaling by H0p313ss · 2009-03-24 10:13 · Score: 1

A good programmer know how to be correct form of lazy: do not reinvent the wheel.
YES. Good lazy is "I shouldn't have to do all this work, either use someone else's, or make the computer to it for me." Bad lazy is "whine, I don't want to figure anything out, I just want to get it done." The difference is crucial; the first is willing to spend time learning to save unnecessary labor, the latter is willing to do unnecessary labor to save learning. The former is laziness, the latter is stupidity.
True laziness lies in recognizing that for over 30 years really smart people have build this wonderful transactional, distributed, multi-user technology that it is far easier to adopt than to reinvent badly.

--
XML is a known as a key material required to create SMD: Software of Mass Destruction
Re:The problem is scaling by BigGerman · 2009-03-24 12:47 · Score: 1

If people followed your advice, 70% of programmers would not have a job.
Re:The problem is scaling by CAIMLAS · 2009-03-24 14:30 · Score: 1

Oh man, you just described one of my previous employer's primary software tool: a custom app written in a nominal WYSIWYG/Access type database package, which had undergone 6+ years of near-constant linear modification and addition. This, on top of one or two upgrades over the years that broke functionality which needed to be hacked around.
So, question: how do you work around something like that? This particular situation, in my estimation, required an entirely new product. What they had couldn't be fixed, not by one or ten people, in a reasonable time frame. I was working on moving the whole mess on over to MySQL + PHP after their dismissal of alternatives, when I got sacked: "we don't need you anymore, we decided to go with a vendored app".

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Re:The problem is scaling by arevos · 2009-03-24 21:12 · Score: 1

so you start a small project, "we just need a few hundred/thousand records, a few key value links and the occasional transaction". so you start with a slacker DB.
Eh? No, if anything you'd start with a relational DB, and then increasingly use non-relational databases as you scale. Relational DBs have a lot of functionality, but don't scale particularly well.

Re:I've never understood the UNIX world's fascinat by LWATCDR · 2009-03-24 07:56 · Score: 2, Insightful

Okay how do you find the data without a record number? I can see the value of the system but it also seems very inflexable.
I do agree that way to many programmer use MySQL for a file system, flat files, configs, and goodness knows what else.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

All toys by zig43 · 2009-03-24 08:00 · Score: 2, Interesting

Every database covered in the article is a toy.

From TFA: "The problem is that JOINs are really, really slow when the data is spread out over several machines."

This is the result of a poor design, not a database flaw. If you are running a web application against multiple databases, either cluster them or store all the data for a user in one database. (i.e. hash the login_id and select the database based on the result). If someone is doing JOINs across multiple machines and doesn't have a very good reason for doing so, then nothing short of a lobotomy is going to help them.

From TFA: "Each query can only run 5 seconds. The answer can only hold 250 items. Each item can have only 250 pairs."

Yeah, I'd say that meets the definition of a toy database alright.

From TFA: "Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too."

The writer must be joking. Who would do this when there are better options that don't involve implementing your own database?

From TFA: "there's no big reason to use Ruby, Python, Java, or PHP on the server when it can all be packaged in JavaScript"

Many people who write web applications actually want to do usefull things with the data they store like generate reports, keep logs, track inventory, or run queries. This doesn't work very well when the "database" is a text file sitting on the user's harddrive.

Re:I've never understood the UNIX world's fascinat by ivoras · 2009-03-24 08:13 · Score: 1

A web site is a tranctions processing facility. Just replace fancy 3270 or Uniscope screens with HTML. Same idea, forms, etc. It wants in and out, fast. Why use something not really made for that?

The answer is: simplicity and making it somebody else's problem. Think of a typical Slashdot web page. You are logged in to Slashdot so it prints out the data you chose. Specifically, it prints out the groups of data under the topics you chose, in the way (page layout) you chose. You could walk the individual data records yourself and decide what to display where, or you could tell something else to do the grunt work and simply apply some string formatting to the results. It has its good sides and its bad sides.

I have an uncle who was first in his old university who went from mainframes to the PCs because as a student he saw they are the future. His arguments were the standard ones - it's smaller, simpler, everyone will have / has one, and for a long while he made very good money selling business applications in DBase, later Clipper and the like. When he discovered those tools (as a student...) he was immediately drawn to them as they were more powerful and easier to what he used on the mainframes (i.e. exactly what you describe), and business was really good. The way you program in dBase/Clipper is really just a single step up from the "freespace" model you describe: additional features are that the library is taking care of maintaining data structures within the files (i.e. "records") and you have a rudimentary indexing capability, even with multiple fields in the data records, which makes searching enormously faster. For any kind of operation you still need to perform a loop over all records (or a subset of records/record IDs returned by the index operation) and do your calculation or other processing. For each record in the loop you can do whatever you like since it's your own code.

It's fair to say that this uncle is now old. dBase and Clipper were children of MS-DOS and as his customers migrated to GUI OS-es (i.e. Windows) so they migrated from his MS-DOS programs, though they were still perfect for the job, jevels (or abominations, depending on your point of view) of microoptimizations, every kind of tricks to calculating taxes, expenses, whatever. They simply clashed with Windows (even more so with Windows networking - his native network environment was Novell). So, the solution was apparent: start coding Windows applications or lose clients.

The thing is: he simply cannot wrap his head about these two things:

Event-driven GUI programming
SQL

The event-driven GUI thing is easier to explain: in the old days, if he wanted the letter "A" to appear in the middle of the screen, he just poked some bytes in memory, and when he wanted input, he looped/blocked on the input function. The idea that something else is reading the user input and notifies you when it happens is... different.

SQL is harder. All his important applications - some developed over the course of 20 years, basically depended on the fact that core business processing would be a loop over some records, examining each record and with a bunch of calculations, IF statements, etc. decide what to do with the records - e.g. to what sum to add it. The idea that you *don't do it yourself* but say something like "SELECT SUM(x) FROM t WHERE cust_id=(SELECT cust_id FROM w WHERE name='xzzy')" is again something hard to swallow. There is an additional problem that he could easily kludge in arbitrary logic into records processing, creating complex special cases with ease. This is very hairy in SQL.

It's not than that he doesn't see how it works or that the result is the same as before, or that it's a valid way to do it - the problem is that apparently he can't wrap his head around these concepts. So his code has things like blocking the entire Windows application because he wants total control of the user input or again looping over all records with "SELECT * FROM t W

--
-- Sig down

Programmers Vs Users by A+Pressbutton · 2009-03-24 08:18 · Score: 1

If one in 1000 postings fail, the programmer does not care- there is a 99.99% success rate , but as far as that 1 user is concerned, there is a 100% failure.

Re:Programmers Vs Users by Slashcrap · 2009-03-24 22:23 · Score: 1

If one in 1000 postings fail, the programmer does not care- there is a 99.99% success rate
And what an excellent example of comment failure you have provided.

Mish mash of data storage by BigJClark · 2009-03-24 08:20 · Score: 1

Fine, codger together some assemblance of data storage using notepad, access, abacuses, whatever. If, heaven forbid, these "startups" ever took hold and gained any significant size, this "new model" will break, and I can't even imagine the hell it would be to merge, the "new model" into classical rdbms.

Sorry kids, you've bitten off more than you can chew, should have stayed in school and actually attended a class in db modelling. Good luck with this "eventual consistency", you'll need it.

--

Hi, I Boris. Hear fix bear, yes?

a top of lap, a book of note by kestasjk · 2009-03-24 08:24 · Score: 1

There's more to it than that. If I make a wrapper for a text file that lets me find and delete rows that's not really a database. (It only becomes a database when I call it TextDB and package it with an AJAX API)

--
// MD_Update(&m,buf,j);

Re:I've never understood the UNIX world's fascinat by EastCoastSurfer · 2009-03-24 08:27 · Score: 1

I'm trying not to seem offensive, but it very much looks like a paradigm problem - it seems that it's in human nature that they can't wrap their heads around new things past a certain age. And yes, I notice it in myself also.

I'm not sure if it is so much age or experience. The things you know end up boxing you in and it can be hard to overcome.

Re:I've never understood the UNIX world's fascinat by CodeBuster · 2009-03-24 08:32 · Score: 1

I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

Sounds like a great open source project so why not start working on that? If you want it badly and would use it heavily and yet you cannot be bothered to do the work of porting one, writing one, or paying someone else to do it then why bother complaining about it?

Better link by try_anything · 2009-03-24 08:44 · Score: 1

Don't judge based on this article. The author's "young guys playing fast and loose" vs. "stuffy but reliable old guys" way of explaining things misses the point. Either he's a bad writer, or he doesn't know what he's talking about. A much better treatment can be found here.

I don't get it. by tarlss · 2009-03-24 08:47 · Score: 1

Supposedly the benefits to something lightweight and flexible is that..it's that. But really, is it that hard to setup XAmp and dump some info into it with an INSERT statement? NEvermind that stuff like MySQL is free... I don't get what the fuss is when your standard MySQL DB can probably fit onto a USB Stick.

Just 'Insert, Update, and Delete'? by radio4fan · 2009-03-24 08:48 · Score: 1

FTA:

The field was surprisingly diverse despite the fact that the offerings are so stripped down that they really don't have more than three major commands: Insert, Update, and Delete.

There's a write-only database now?

Ad-Hoc Complex Queries, Native Unix Toolsets by billstewart · 2009-03-24 08:52 · Score: 1

My guess is that part of the reason is historical - RDBMSs were coming out around the time Unix machines were, and both could be used by small departments as opposed to mainframe production shops.

They're also an extension of the native Unix toolsets, which were flat files with tab-or-comma-separated columns of data, so anybody who learned Unix in its first couple of decades generally had the expectation that you could do ad-hoc queries and build tools to automate them, without needing to spend 6-12 months negotiating a development project with the mainframe database owners. SQL is a bit clunky as a format, but the concept of schemas, where your database structure is stored and manipulated the way the data itself is, really works well if you're a tool-builder.

Is the Berkeley DB stuff close enough to what you need for a database?

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Forgotten Technology? by meburke · 2009-03-24 08:53 · Score: 1

The term "old-school" in this context makes me laugh. Back in the days when air was clean and sex was dirty, "relational" databases were considered a resource hog and were shunned by competent programmers. The fastest and most efficient databases were the "network" databases, but they also required the most work and the trickiest coding. Right in the middle were the "hierarchal" databases. Many programmers avoided the database problem by using a "reverse ISAM" arrangement which still used up some extra resources, but were easier to maintain. Of course, nowadays, when it is almost impossible to find programmers who can even program apps from tape into 32K systems, I can see why youngsters use the "telephone book" databases.. (so they can avoid actually having to think about their data!) I guess that's why it's so hard to find good assembly language programmers, too.

Anyone wanting to find out what it used to be like in the bad ol' days could look up CODASYL. Watch out for bad dreams.

--
"The mind works quicker than you think!"

But sometimes to beat the competition... by Anonymous Coward · 2009-03-24 08:53 · Score: 0

Sometimes to beat the competition you need to re-invent a DB wheel.

The company I'm working for is producing a software for a niche market and the very reason we're beating our competitors to death is that their solution mandates the installation of a SQL DB by their customers, which is alienating their mainly IT-clueless customers (SMEs that do not have the budget to pay DBAs).

In that market, we need to crunch data and it's a very particular niche (which I won't name). Basically by being very clever, we came up with a solution that does not mandates the installation of a SQL DB (we have a one-click install, which our customers love) and that smartly bit-packs everything into memory. The compression that takes place is amazingly efficient, for our solution is completely tailored to the problem domain. You simply can't do that with an all-purpose DB.

We *own* our competitors on data import speeds by one order of magnitude (in our competitor's offerings, the DB is the bottleneck on data acquisition) and we own our competitors on queries. Our customers loves it.

So, assertions are great but sometimes you beat the competition because you have a product that is easier and faster to use because you re-wrote the weel.

Sure, it won't fit your Fortune 500's needs but it beats the crap of any traditional SQL DB for the problem we're solving.

Re:But sometimes to beat the competition... by tarlss · 2009-03-24 08:59 · Score: 1

uhmmmmmmm. Don't -these- toy databases require an install too? I'm fine with integrated enterprise apps, whatever, that's great, but I'm pretty sure these toy DBs will take up the role of the greater SQL DB. They're not designed to be app specific.

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 08:55 · Score: 0

They probably use it because it is already there. It is backed up. It is what they know. Is it right? Well that depends. What if you have 200 applications all storing their config data all over the place or in 1 file? Which would you rather have?

For simple things flat file is just fine (smaller datasets). But when you start pulling data out to keep it consistent, AND you have say 20k in records. There is a HUGE difference between Log(n) and n^3. You could be looking at 40 records vs 8000000000000 records (with a linear scan). That is just on a 'smallish' join in many databases.

Do not discount ACID just because its 'in your way' and 'you do not get it'.

Also 'just' in programmer speak is usually 'oh that is probably easy but will take awhile'. And 'probably easy' is programmer speak for 'not very well tested'. How do I know? I use the lingo all the time myself :)

36-bit words, octal digits, old Unix hackers by billstewart · 2009-03-24 09:00 · Score: 1

Unix hackers are traditionally fine with octal, as long as you don't try to fit a whole digit in it, though I've generally found hex more useful. And as far as 36-bit words go, I know one local Unix hacker who has a PDP-10 in his garage. (Not sure if it's still there, and it might have been a -20 instead.) I don't think my wife's copy of "Meet Macro-10" survived our mid-90s move, and when I took a compiler course at that school, I decided to use the still-clumsy-at-the-time Amdahl mainframe Unix system at work rather than deal with the PDP-10.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:I've never understood the UNIX world's fascinat by try_anything · 2009-03-24 09:02 · Score: 1

Why use something not really made for that?

It's simpler to use something already built and tested, with known strengths and weaknesses, multiple mostly-compatible implementations available, tool support, and plenty of books and trained personnel to choose from, than to use a much simpler solution that is less well understood, or worse, one that I have to design and implement myself.

Or, to put it another way, why do the Chinese and Indians do business with each other in English when Esperanto would suffice?

For a time... by CAIMLAS · 2009-03-24 09:02 · Score: 1

Well, even something that's based off of at-the-time sound principles can end up being a mess.

Take, for instance, a product falled FileMaker. It's a product with a long software lineage - it's origins were FoxPro, way back when. I don't know how it performed back then, or how it was designed, but now it's got a massive WYSIWYG themableing 'frontend' to make a custom application, and the database is not directly accessible by the designer (just logical containers). It probably can be normalized, to some degree, but...

But it's not a good database for large amounts of data. In fact, I'd argue something like Access might even be faster/better than the modern incarnations. It might work fine for a small, initial dataset, but it doesn't scale all that well.

I guess my point is: a rational database can be poorly normalized, but a 'slacker' database can't be improved upon. The slacker db might work OK for what you initially intend it for, but data will often grow faster than estimated, and beyond the original design.

That's why relational/SQL is preferred by most technical people: not only can it be poorly designed and work well for small stuff (then normalized w/o changing all that much, and used for larger projects), but then it can be relatively easily migrated to a larger/more robust SQL database if need be.

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers

V7 on PDP-11 had joins by billstewart · 2009-03-24 09:10 · Score: 1

Not sure what platform you were using or what years (lots of things had b-trees, though ISAM tended to be on IBM machines), but Unix V7 had a join command, which worked on the canonical tab-delimited ascii flat files that most Unix tools did, and PDP-11s weren't that expensive.

I last used it in the early 90s; I'd prototyped an application in Informix, but my department was too cheap to buy enough licensed copies for production use. You had to sort your data for the join to work, but that also meant you could use "look" to do binary lookups instead of grep. Since I only had to support a small number of scenarios that used join, it was easy to write a shell script to call them.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

IMS and ADABAS are suddenly newfangled? by FeatherBoa · 2009-03-24 09:19 · Score: 1

Too funny. I'm old enough to remember when non-relational databases were old-guard proven technology and the newfangled relational stuff was buggy, bloated, complex and unproven. I guess everything old is new again.

Lazy developer or lazy database by rgigger · 2009-03-24 09:22 · Score: 1

Many of these comments seem to focus on using these non-relational databases because the developer is to lazy to use, or doesn't understand how a proper relational database functions. It is probably true that that happens but that discussion totally overlooks what these non-relational systems are actually for and why they are popping up all over the place.

If all you want is a key-value store then why not use an existing relational database? They are amazingly good at what they do and storing key-value pairs could be considered a small subset of what they do. But even that they do very well. They have very fast data storage formats, they are very good at not losing your data, they have all the networking figured out, authentication, etc, etc. It would seem silly to be create a brand new database that does only a strict subset of what existing dbs can do. There is no point unless they can do things that an RDBMS can not do, or unless they could do that small subset of things better than a traditional RDBMS.

The main reason that these dbs are popping up all over the place is that people want to scale, and scale quickly. Google doesn't use big table because their devs are lazy or un-knowledgeable. Google uses big table because they need to scale. Transactions, constraints, joins, ACID. Doing all of those things in the db makes it harder to scale the db. Implement those features in the app and now your db can scale more easily and the app servers can still scale, thus your app as a whole can scale. That is the idea that is being explored in many different directions by all of these different non-relational dbs.

Mabye some of these databases are just jumping on the bandwagon without even knowing what the point is. Maybe some of their users are just too lazy to learn SQL. But the real reason for these new db's existence is that scaling a relational database is very hard and people are trying to find easier ways to do it.

I'm still in wait and see mode but that doesn't mean that this new breed of databases doesn't have a place.

Where can I buy this wonderful product? by itsdapead · 2009-03-24 09:24 · Score: 1

From TFA: In the past, the answer was simple: Hook up an official database, pour the data into it, and let the machine sort everything out for you while you spend your time writing big checks to the database manufacturer.

What? Where has this wondrous product been all my life!? I mean, I've always stuck with the free shit like MySQL and Postgres on the assumption that paying top dollar would only get me a bit of extra polish and maybe some support from people who own socks.

Little did I realize that, had I re-mortgaged the house and bought one of these wonderous, I could "just pour my data in" and I'd miraculously reap the benefits of advanced relational technology without all those tedious decisions about data structures, normalisation, and writing queries. Not only that but (looking at the rest of the article) it sounds like merely using an industry strength RDMS will guarantee data intergrity? OMG! That would be cheap at any price! It would certainly cheaper than succumbing to that nagging feeling that maybe I should park my ego and pay a RDB specialist to do it properly.

Now I feel really stupid. There was me assuming that even a high-end relational DBMS would only be efficient and secure if the database was designed and coded carefully by someone with a clue, and that if you're just going to bosh something together to get the job done you might as well stick it in a flat file (or the ancient pre-InnoDB version of MySQL which comes with your web hosting package) and only worry about scaling it to cope with a billion records when someone paid you to do that.

The scales have fallen from my eyes. I'm writing that check to Oracle right now.

--
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.

"Schema-less" storage with MySQL by kc8jhs · 2009-03-24 09:27 · Score: 2, Interesting

Yeah, when I first read this article I thought that was the dumbest thing I'd ever heard, but reading it made alot of sense. It's basically just using a simple schema like the "slacker" DBs for canonical storage, and then using additional tables as 'indexes.'

How FriendFeed uses MySQL to store schema-less data

Given their needs in terms of adding features, altering the schema, and building indexes, being able to make the indexes "eventually consistent" was huge. You have to remember that to keep things nice and denormalized, you need lots of tables, joins, and that MySQL (or any other FOSS RDMS) CANNOT build indexes across tables.

Re:"Schema-less" storage with MySQL by Anonymous Coward · 2009-03-24 11:58 · Score: 0

Looks like FriendFeed reinvented RDF. Nothing wrong with that, and in fact it's a perfectly appropriate model for what they do. Doesn't really blow my skirt up, especially with mysql and its horrendous join performance.

Music from your teenage years gets extra cred by billstewart · 2009-03-24 09:31 · Score: 2, Interesting

It turns out that there actually _are_ neurological reasons that music from your teenage years is extra-evocative, just as language-learning works better with young kids. Go read "This is Your Brain on Music" for more details.

A certain amount of music sensitivity appears to be hardwired into our brains, and the extra hormones after puberty increase music-remembering ability and the emotional aspects of it that younger kids don't have as much of. There's also a lot of intellectual development going on in those years, and it's easier to pick up more complex ideas from the music than you could when you were younger.

As you get older, that still happens a bit, and you'll still run into music that's new and cool which you'll enjoy years later, but now it's competing with lots of other cool music that's in your head which your teenage-years music wasn't.

What's much more annoying is when you find yourself tuning by a different radio station and wondering "What is all this noise those kids are listening to? They should turn that crap down and listen to good stuff" just like your parents said when you were a kid. Some of that's because 90% of everything is crap, and it's not the crap that you find evocative because it was around when you were a kid, and some of it's because 90% of everything on the radio is highly-packaged commercial crap, making it 99% crap instead of only 90%. And some of that's because kids always want to listen to new stuff and piss off their parents, and musicians always like to do new stuff, and if you want to bust into the Top 40 you've either got to do identical commercial crap better than anybody who's already there or else do something new. Rap was creative and interesting, but the whole gangstas-dissing-women motifs that dominated it were offensive. Hip-hop took that music and started doing lots of interesting things with it, though I haven't followed it. I'm finding my self playing a lot of old-timey (average hair color in our jam session == gray, leaning toward white :-), and starting to listen to jazz more (lots of deep classical stuff in there, which I haven't had the patience to listen to for a while.)

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:Music from your teenage years gets extra cred by Anonymous Coward · 2009-03-26 01:36 · Score: 0

Generalizations don't work. The great majority of the music I enjoyed when I was younger now sounds dull, bland or downright face palm inducing.

the traditional relational model by Channing · 2009-03-24 09:46 · Score: 1

does any dbms implement the relational model properly?

Then someone needs to write a genfkey for types by tepples · 2009-03-24 09:49 · Score: 1

I presume he said that because SQLite doesn't actually keep track of a column's data type.

Oh, the dynamic typing issue. Then I guess someone should write a tool that compiles column types into triggers that enforce them in much the same way that the genfkey tool compiles foreign key constraints into triggers that enforce them.

Consistency unimportant? Really!!? by gillbates · 2009-03-24 09:51 · Score: 1

This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be.

Ah, the naivete of youth... These guys clearly have never spent a few weeks debugging a concurrency problem. If your data is important enough to keep around, it's important enough to get it right.

There's nothing deeply philosophical about corrupting the relationships between various data sets because your database doesn't enforce consistency. A certain desktop recently discovered just how bad poorly enforced consistency can make things. Those *young whippersnappers* won't stay young for very long trying to debug that seemingly impossible to find data corruption problem, or worse, a web site which displays garbage pages at random because your data storage mechanism isn't consistent when it needs to be.

Consistency in databases has always been a ground rule because consistency checks are more easily done by a database than an application programmer. Consider, for example, the prototypical record read-update-write operation on a database with strict consistency and enforced locks:

Read the record. The database automatically locks it for you.
Update the record.
Write the record back to the database.

Now consider the same operation with a database which enforces no consistency, or does so rather lazily:

Read the record. (Someone else might also read it in the interim, but you'll never know.)
Update the record.
Read the record again. Has someone changed it?
Someone else changed the record. Reread the record.
Before updating the record, check to see if you are going to modify any of the fields previously modified by the intervening write.
Write the old, conflicting values to a log file for manual reconstruction later.
Update the record, and commit it back to disk.
Ooops! - someone else read the record while you were updating it and didn't get your latest changes. Maybe the other reader is going to create another invoice for the customer because they read it before you'd committed your "invoice sent" flag back to disk. Or maybe one poster's comment will show up under another's username. Maybe not. Who knows? Anything can happen!

And let's not forget how confusing this is for users:

User posts reply to comment.
User doesn't see the comment on the page. After a few refreshes, decides to post comment again.
Database finally gets around to committing the changes.
User looks like an idiot for double posting the same comment, or admin thinks this guy is being abusive because he's posting the same comment twice.

If you don't want a database, you can restrict your web app to a single thread and use flat files. For a lot of amatuers and personal web pages, this is perfectly fine. But don't call it a new kind of database: IBM was using flat files in the 60's. The reason why flat files were abandoned was because they didn't scale well and couldn't handle concurrency correctly. It is not a matter of *size* but of correctness.

--
The society for a thought-free internet welcomes you.

Write only databases are standard on *nix... by grassy_knoll · 2009-03-24 10:03 · Score: 1

conveniently, all *nix systems come with a write only database.

Just pipe your data to /dev/null. I think you'll be impressed by the write speed!

[badum-ching]

--
A Human Right

Re:Write only databases are standard on *nix... by Anonymous Coward · 2009-03-24 11:54 · Score: 0

Just pipe your data to /dev/null. I think you'll be impressed by the write speed!
Maybe you're impressed with the speed of your lame-o software null device, but I have dedicated hardware for that. My /dev/dmanull has a write speed that'll make you cry.

Thanks for the print version link by fedxone-v86 · 2009-03-24 10:29 · Score: 1

I just wanted to give kudos to the submitter for linking to the print version of the article.

When I read 'InfoWorld' in the summary I was at first hesitant to click the link. And really, the original article spreads over 8 pages, contains a giant ad in the middle of what little text is shown on each page and even tries to open a popup.

I can't actually comment on the quality of the submission itself, as I haven't RTFA, but the quality of the link should serve as an example to everyone.

--
(USER WAS PUT ON PROBATION FOR THIS POST)

Between flat file DB and relational DB by tepples · 2009-03-24 10:47 · Score: 1

Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.

Unless you need more functionality than a flat file but less functionality than a relational database, and there aren't any key-value databases installed on the system you plan to deploy on. Or you don't need a relational database yet, but you might in the near future as you add features to meet customer demand. Then you might reach for the SQLite.

Good is still the enemy of better by Seth+Kriticos · 2009-03-24 10:56 · Score: 1

It's the same principle. Why are we still using POSIX and SUS when there is Plan9? Because the former is established and works just good enough. People know how to use it and they don't have to learn something new.. and there is a *huge* set of stuff built on the platform that is not available to the new one. So basically the new is a superior design but a fail in support.

It's the same with databases. RDBMS are not the best we can get in terms of design, but it is established and we have a bunch of tools and technicians who know how to deal with them. Newer concepts much lack the support.

An example is ZODB. It's neat. It incorporates ACID + Transparency + Undo + Pluggable Storages and you mostly get rid of the Billy Tables problem. Still you don't have the technicians who understand how to deal with it and you don't have the myriad of tools accompanying it.

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 10:57 · Score: 0

Check out Berkeley DB. It's pretty much exactly what you're talking about, and it's on all the major OSes.

"Write your own" by shutdown+-p+now · 2009-03-24 11:02 · Score: 1

From TFA:

This extra layer of customizability is often quite useful. Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too.

So what they're saying is that if I need some of the "advanced" functionality offered by RDBMS, then it's not a problem because I can always roll out my own.

But why should I, if the result will just be a poorly implemented, underperforming RDBMS?

I hereby propose a new rule along the lines of Greenspun's Tenth Rule:

"Any sufficiently complicated program manipulating large amounts of data that does not use an RDBMS, contains an ad hoc, informally specified, bug-ridden, and slow implementation of RDBMS".

Re:"Write your own" by belg4mit · 2009-03-24 16:09 · Score: 1

One word: MORK

--
Were that I say, pancakes?

I feel older by Anonymous Coward · 2009-03-24 11:06 · Score: 0

I can remember the day when all we used was flat files and writing multi-file merges in cobol.

We didn't have fancy things like b-trees or indexes, we just sorted the file!

This is really weird by vacuum_tuber · 2009-03-24 11:29 · Score: 1

I work in the Wang VS world, a type of system originally patterned after the IBM 360/370 but with an OS designed from the ground up to be interactive. We have multiple file types at the OS file system level... consecutive, indexed, object, print, relative, etc. Indexed files not only store data retrievable by a key, but by up to 17 keys. Unlike some juvenile "database" products that stored data in a .DAT file and indices in separate files, our indexed files contain a mini-db structure inside, with chains of data blocks, index blocks and free blocks, all managed by the file system. It's impossible for the various parts to get out of sync because they are all integrated within the indexed file.

We also have file compression at the OS file system level. Most file types except object can be tagged to be compressed and some are compressed by default. The OS file system uses machine instructions to compress before writing and expand after reading. It's completely transparent to the app code.

We also have PACE, a native 4GL / RDBMS that was developed by Wang in the mid-1980s and had referential integrity rules in the data dictionary and distributed database with two-phase commit, all from the beginning.

I used Oracle 5.1 from 1989 through 1992 and was shocked to learn that Oracle had no referential integrity at the time. What Oracle did was fake it by generating SQL*Forms triggers in their CASE tool. Heaven help anyone trying to build apps without the CASE tool or anyone touching any of the generated triggers.

I also recall reading of the struggles of the mainstream db vendors with distributed database technology and the eventual development and adoption of two-phase commit, many years after Wang had it as a standard feature in their clustered environments.

In 2004 I co-founded a company to virtualize the aging Wang VS. We have been very successful and are now the official source for all Wang VS systems and software. Our virtual Wang VS ranges up to 220% of the performance of the legacy high-end VS18950 released in 1999 and runs in Linux mostly on Dell PowerEdges. The high end supports 500-1000 users, not quite in the IBM mainframe arena but far, far easier to program, operate and use.

The original Wang VS80, released in 1977, supported up to 32 users and scores of devices in no more than 512KB of memory. Right... KiloBytes. Half a MegaByte. Later models grew to be much more capacious but try to imagine supporting 32 connected users running real apps and manipulating real data in half a MB of memory.

All of this reminds me of the horrible disconnect that occurred with the introduction of microcomputers. The folks who worked in the microcomputer field either didn't know about or ignored all the existing OS technologies and reinvented everything. PC users had to wait 10-15 years before MS discovered "pre-emptive multitasking," which was the rule in large systems, even in minicomputers, from the 1960s forward.

Microcomputers, while very enabling of individuals, actually took us backward in OS technology and caused us to have to live through a 10-15 year hiatus while the microcomputer engineers and OS developers rediscovered things that had been standard stuff in the mini and mainframe worlds.

--
Look at the bright side: there's always seppuku.

They're a niche, not a full replacement by PostPhil · 2009-03-24 11:32 · Score: 2, Interesting

I get tired of hearing the same old discussion about whether or not the relational database is going to die. They're not. But the new breed of *specialized* databases work well for their *specialized* purposes. Big surprise. But all of them inevitably make a trade-off. Anyone who works seriously with database design knows that it's all about trade-offs.

One of the main motivations for the new breed of databases is that the standard SQL database relies on things such as foreign keys and other constraints for data consistency, but that requires the data to be directly managed by that running DBMS process. When you require data to be distributed over a network (i.e. over many separate processes), then the only way a *foreign key* can work is if the DBMS process has some sort of link over the network to the separate DBMS process and then use that somewhat as if it were local. (Other strategies involve using external application code for consistency rather than foreign keys, etc.) Of course, the DBMS process can't use it's usual local low-level optimizations behind-the-scenes in order to handle that query efficiently over the network, so it doesn't scale. Specialized DBMS's for distributed data focus on optimizing being distributed, while the typical SQL DBMS optimizes storage and retrieval of data as if it were local. The bottom line is that the traditional SQL database scales well vertically, but not horizontally concerning hardware. Or rather, when you scale horizontally, you forgo a lot of its advantages. The new breed of databases trade-off consistency and other assurances for the sake of "good enough" consistency and really fast retrieval of domain-specific data.

But not everyone is trying to be Google or Amazon. Financial institutions such as banks can't tolerate "good enough" consistency. The biggest problem with relational databases I see nowadays is that people are ignorant about why "relational" is such a good idea, and how SQL only gets you part of the way to "relational" and that SQL's shortcomings are a different issue. The second biggest problem is that most people are used to only one or two data usage patterns, and if it "works for them", then they assume it should *always* be done that way. For example, the hordes of people who barely know Excel (i.e. not a relational database) or Access, and then like to give "expert" advice. Or a web programmer that believes that ORM's are the One True Way because they abstract away choices of DBMS in order to keep favorite language X, despite the needs of other people are the opposite: perhaps we want to abstract away the choice of programming language so that we can keep the same database, and so maybe it's a good idea if the database itself can ensure data consistency rather than relying on the ORM, etc.

Re:I've never understood the UNIX world's fascinat by bertok · 2009-03-24 11:32 · Score: 1

That doesn't seem like such a good general purpose solution. For a trivial application, it might work, especially if you place an enormous amount of logic into the application code, but I can foresee problems even then.

How do you deal with disk space wasted by fragmentation? If the "record ID" is essentially an offset, you can't defragment, especially if you want to do it live. That's not even mentioning internal fragmentation - most disk caches store large blocks (64KB or larger), so you're wasting, on average, 50% of your caching capacity because of the mismatch between block and record sizes.

What happens when you've pre-allocated, say, 1000 small blocks and 1000 large blocks, and it turns out you actually need 1001 large blocks? You may have 30% free space left in the small block section, but you can't use it! Creating a new file sounds expensive (has to be filled with a pattern!), whereas creating new files of arbitrary size is essentially constant time in most modern databases (they don't even ask the OS to fill them with a 0 pattern).

This also sounds like it can't handle out-of-order writes. This may be less of a problem now with battery-backed RAM caches on disk controllers, but it would have sucked a decade ago. Without an intent log, you have to perform every write in-order or risk corruption.

Actually, what happens if the program accidentally loses a block key? Would it... leak storage space? How would you reverse that if all the blocks are identical looking binary blobs?

Not to mention that you get the joy of re-inventing the wheel any time you want to do anything other than "retrieve by key". If you want to locate, say, a passenger by name across ALL flights in a day, you'd probably have to scan all records or write your own index or something.

But if you were really keen on using such a trivial system, implementing it wouldn't be that hard in any modern programming language. A few thousand lines of Java or C# ought to do it.

Re:Hackers. by Anonymous Coward · 2009-03-24 11:35 · Score: 0

Feh. Mods have no humour nowadays. :^(

Keep your night job kdawson by Anonymous Coward · 2009-03-24 12:02 · Score: 0

Journalism ain't for you.

The real challenge by drfreak · 2009-03-24 13:38 · Score: 1

is not so much working with these databases as a programmer. Given time, a programmer could always work out the data scheme. The trouble ensues when an Analyst tries to get at the data with a report writer and stumbles trying to get the data. A lot of commercial software which uses these embedded databases will include its own reporting tools to mitigate the issue though.

Perhaps I'm too pedantic. . . by colinrichardday · 2009-03-24 14:08 · Score: 1

Let A={1, 2, 3} and B={4, 5}. Then the Cartesian product of A and B, denoted AxB, is {(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)}, that is, the set of all ordered pairs whose first coordinates are in the first set and whose second coordinates are in the second set.

"Eventually consistent" by Animats · 2009-03-24 16:20 · Score: 1

True "eventually consistent" systems are quite difficult in general. Game designers struggle with this. A typical example is a distributed game in which A shoots at B. A's client knows where B was at the last update, but due to lag, is behind on knowing where the (authoritative) server says B is now. A's client has to decide whether A's shot at B hit B.

A typical trick is that A's client projects B's current position assuming B's user doesn't input a direction change, and computes a hit or miss on that basis in the client. The actions of A are also forwarded to the server, which makes the official decision on whether A's shot hit B, and that information is sent back to the clients of A and B, after transmission delay.

The trick is making the visuals work for this. One way to hide the problem is that when A's client computes that A's shot hit B, B is displayed as hit and staggering, but not falling. This buys time until the server update comes in to A's client. If the server says it was a hit, B is displayed in A's client falling down. If the server says it was a miss, B is displayed as A's client as staggering and recovering. This is an illusion created for user A to hide the lag.

Meanwhile, in B's client, B doesn't stagger at all if there's a miss, because, by the time B's client hears about the shot from the server, the hit/miss decision is known. So user A and user B see different things during the lag period, but come back into sync after the update.

Randy Farmer and Chip Morningstar invented this back in the 1980s for Lucasfilm's "Habitat", and called it "surreal time".

Web-based "eventually consistent" systems are usually much dumber than this. Most are more like "becomes consistent after the user manually reloads the page a few times". Distributed cache consistency can be done efficiently (every shared memory multiprocessor CPU does it), but modern cache interlocking technology never seems to have made it to web caches. There really should be little cache-invalidation messages pushed around between the servers in a big web farm, but there usually aren't.

Dynamic Relational by Tablizer · 2009-03-24 16:39 · Score: 1

I've been kicking around the idea of dynamic relational for a couple of years now. We have dynamic application languages, so why not dynamic databases? The "static" and the dynamic kind serve different needs and can coexist. Why should DB's be any different?

--
Table-ized A.I.

Tablescan? We used to dream of tablescans. by HornWumpus · 2009-03-24 18:08 · Score: 1

If you aren't building temp tables you aren't even straining the query engine.

Slacker.

Make your DBA cry. Submit endless long running queries then complain about the server being slow.

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

Re:I've never understood the UNIX world's fascinat by Anonymous Coward · 2009-03-24 23:21 · Score: 0

Google's BigFile storage system is quite similar in design to this.

Check out the relevant papers in Google Labs.

Pointless by EvilIntelligence · 2009-03-25 02:01 · Score: 2, Insightful

As a DB admin myself, I find these "Us vs Them" arguments to be ultimately pointless. A company will choose a database based on the application's needs. If "immediate consistency" is needed they will choose a standard relational database. If "eventual consistency" is acceptable, the company may opt for one of the other "not-so-relational" databases. The fact that there are other options is actually a good thing. The "old guard" needs to find the positives and embrace change, or run the risk of being left behind in an evolving world of technology.

Old conversation - Was: Re:I feel old by Anonymous Coward · 2009-03-25 14:51 · Score: 0

A conversation a few years ago between myself, immediately upon my arrival at the office, and my already present friend and co-worker:

Me: We're old.
Friend: What?
Me: We're old.
Friend: What are you talking about?
Me: I was jockeying radio stations on the way here, trying to find something I liked.
Friend: So?
Me: I finally found something on the classic rock station.
Friend: So what? I like some classic rock, too - doesn't make me old.
Me: It was "Shock the Monkey".
Friend: (pause) Crap, we're old.

- T

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-26 04:37 · Score: 1

We generally use a simple flat file as an index. Field 1 is a sorted index field (say a flight number), field 2 is the key to a freespace record. A simple binary search is fast even on a large sorted list.

No, it isn't useful for certain types of applications. Relational databases exist for a reason. :-) But if you have to store something for a well-known static set of fields (say weather stations or flights in an OAG schedule file), something a lot simpler isn't a bad method at all.

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-26 04:44 · Score: 1

Design issue. Optimally you use this sort of thing in places where you will always have an easily-available record number. It isn't a replacement for a relational database in places where relations are nice. :-)

In the airline context in which I work, you tend to see sorted indexes containing keys which are accessed by things like IATA station code (e.g., MSP or ATL), or airline code, or flight-date-origin (e.g., NW1492-24-MSP, which provide unique reference items which are easy to sort.

We also sometimes maintain a relational database for searching/reporting purposes on another lower-usage box just for reporting purposes. That frees the freespace file on the primary box to do it's thing quickly, and transactions against that fast database are also split off and inserted as rows against the relational database.

That allows for the speed of a freespace file in production while also giving reporting/query capabilities for past history ... and in a way which doesn't impact production.

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-26 05:00 · Score: 1

I think you'd be surprised at the complexity of the applications which use such file structures. :-) When the data itself is simple to store, the complexity of the application isn't really relevant.

Disk space on the mainframe isn't my problem as an applications programmer, and it doesn't seem to be an issue for a freespace file. Continuous space for each record is allocated up front when the file is created, and you are reading/writing data as fixed record sizes. A freespace record is ALWAYS a fixed multiple of a disk allocation size to prevent allocation issues. In our case, I don't actually know what the hardware does (and as an applications programmer that isn't a problem I care about), but logically we're taught that disk is allocated in 28-word sectors. On an OS2200 mainframe, freespace records are multiples of 112 words, and this is somehow tied to the way that operating system allocates data on disk.

What happens when you've pre-allocated, say, 1000 small blocks and 1000 large blocks, and it turns out you actually need 1001 large blocks?

Not an issue with a well-designed database. And historically it hasn't been. Keep in mind that I've been working with this file format almost continually since 1988, so I have plenty of experience seeing it in use in production transaction systems. For well-defined datasets, that isn't an issue.

Out of order writes are not an application issue. All the application cares about is that it can read and write data to a logical record. The underlying record handler has to worry about the specifics of getting that data to and from the disk.

I've never seen a program lose a key, but in our case the base freespace record management system has routines which run periodically against the database to find orphans, etc.

Something like scanning for a passenger name across all flights would be an interesting problem, but I wouldn't btoher to do that in freespace. I'd have a relational database off to the side and populated in parallel for that sort of query. Freespace is about *speed*, remember.

I don't think it would be hard to implement at all. It just takes time, something I can't spend (at work) writing code that isn't directly related to our paying customer needs. But I'm considering it for a home project. Probably in C. We'll see. :-)

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Re:I've never understood the UNIX world's fascinat by Richard+Steiner · 2009-03-26 05:09 · Score: 1

I think I need to check this one out as well as the Berkeley Database mentioned above.

Thank you. :-)

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Slashdot Mirror

"Slacker DBs" vs. Old-Guard DBs

267 comments