To start with they used a 40-bit memory address rather than 64-bit since we're not going to need 18 million terabytes of memory anytime soon. Therefore a 40-bit address allows up to 1 terabyte of memory. Thats enough, considering that you won't find a motherboard with support for 1024 sticks of 1GB ram anytime soon.
So there are only 40 physical wires on the motherboard/chipset, and physical memory only lives from 0 to 2^40 - 1, yet algebraically the operating system is expecting 64 wires on the motherboard/chipset?
Wonder what would happen if you threw a pointer at the address 2^40?
Gosh, it's been a while since I wrote any C code, but something like the following:
I said "trolling" because the tone of your messages is "give it to me" instead of "what solutions do I have". You seem very set on an RDBMS, even when it's not proper for storing 2GB data streams. That's why I said syncronous sectors. If you DO make a change to the data, and RDBMS will require a read-update-replace to update a single 2GB BLOB. In a system DESIGNED for large data streams, the updates would only happen to a sector or 2. Basically a logging filesystem with metadata, and an RDBMS for keeping track of things. In essence, an SQL DB with a RAIDed ReiserFS.
I'm open to any suggestions.
But just trying to synchronize your file system [housing your "data"] with your RDBM [housing your "metadata"] is an enormous task in and of itself.
I'd be more than interested in any products which prepackage the synchronization of the two.
As for the "RDBMS" side of things, well, yeah, ultimately I'd like to be able to "search" the binary data in much the way that people use traditional ASCII-ish/SQL-ish RDBMS to "search" through ASCII strings.
For instance, in a traditional ASCII-ish/SQL-ish RDBMS, you can do something like the following:
FIND ALL PEOPLE WHERE ((LASTNAME == SMITH) && (AREACODE == 555))
Ultimately, I'd like to have my binary data [for instance, my "sound files"] searchable for queries like
FIND ALL TENTHS OF A SECOND DURING SONG FOR WHICH ((SOUNDPRESSURELEVEL >= 85DBL) && ({PERCENTAGE OF THAT PORTION OF THE FFT WHICH IS >= 100kHz} >= 50%))
But I guess that just a file system synchronized with some RDBMS would be a start.
In the professional recording studio, that is, not some guy who has some digital recording equipment and a hard drive, the date goes onto tape through a Nagra digital tape drive. What you're talking about is strictly low budget, small time audio recording. If you're actually talking about recording audio, you'd better be using tape, at least until it's time for editing and mixing.
Dude, if it's not done yet, and you don't anything except complaining, that's not going to motivate people to solve the problem. Go hire a developer team to do it for you, and make millions selling the resulting product to others.
Actually, the more I study the problem, the more I sense that that's what's gonna happen, only I'll be "the developer team" all by my lonesome.
Which begs the question of who'll have the time to analyze the data being generated...
If a shop managing large amounts of data in the multimedia or scientific computing fields can't hire an administrator or two (At less than $75,000, at least around here, public sector pays pretty low compared to private), they deserve whatever foul-ups and data-loss that occurs. If they haven't budgeted for administrators along with all that fancy 64-bit hardware, they're asking for trouble. Who runs the servers? It's a strawman, anyway.
As for rsync and whatnot, most database systems can call external programs from stored procedures.
There's also filesystem clustering, SAN solutions that could tie systems together, SAN mirroring between storage chassis.
In 2004, we shouldn't be worrying about much of this, I agree. The point is, you can either work with it, around it, or write your own operating system with a built-in RDBMS.
But that's a huge, huge amount of work you've described.
Look, I'm the guy who's supposed to be doing the "science" or "mathematical" end of things. I just want some big repository where I can dump my data, and then spend my time doing the analysis of the data.
Heck, even if a pre-packaged database product existed, it would take me MONTHS just to learn the thing, get the servers purchased, get it installed on the servers, get it up and running stably, and THEN start writing the client interface to access the damned thing.
But if I have to write the database product ITSELF from scratch, then I'm looking at literally YEARS of work before I ever even get to the point where I can start analyzing the data.
Dude, if it's not done yet, and you don't anything except complaining, that's not going to motivate people to solve the problem. Go hire a developer team to do it for you, and make millions selling the resulting product to others.
Actually, the more I study the problem, the more I sense that that's what's gonna happen, only I'll be "the developer team" all by my lonesome.
Which begs the question of who'll have the time to analyze the data being generated...
Personally, though, I'd rather just leave my files on a real filesystem and serve them using Samba or NFS or even HTTP. Frankly as far as audio is concerned I'd keep the metadata there too, leaving the database to act as a cache for metadata searches, without turning my entire audio collection into a monstrously huge data file I can only access through SQL. That's just me though; I can't pretend to have a deep understanding of the problems you're dealing with.
Right, but even doing what you've described is a HUGE undertaking.
Do you know of anybody out there who's got a product that automates some of this crap?
The poster uses in his sample figures 24 bit, 96 ksample recording. I understand that recording like this is done for mastering audio CDs and other studio processes. OK, so I can see wanting to work with that as your medium.
The poster then refers to this as "medium quality sound" and laments that he can only store 2 hrs of it in a (file/object/dataspace?).
The poster follows this up by declaring that 32 bits are worthless and that he needs 64 bits. The discussion at this point is computer / database and maybe I don't understand. Does the poster require the ability to record regular audio with 64 bit precision? Every bit doubles your resolution, and you want 40 more of them than are available in studio master recordings?
Okay, imagine you're the recording studio geek. Say you're in charge of recording the latest, I dunno, Sting [ex-Police] album.
Monday AM. The drummer comes in, and you record two channels [i.e. stereo] of his riffs. Recording Time: 4 hours, Data Size Total: probably in excess of 8 GB. [But at $0.05 per gigabyte, who cares?]
Monday PM. The bass player comes in. 4 hours, 2 channels, another 8 gigabytes.
Tuesday AM. The lead guitarist comes in. 4 hours, 2 channels, another 8 gigabytes.
Tuesday PM. The pianist comes in. 4 hours, 2 channels, another 8 gigabytes.
And so on, and on, and on, over the course of weeks or more: The lead singer, the backup singers, the violins, the mandolin, the bongos, and who knows? Maybe Yo-Yo Ma will show up with his cello.
Point is, you're generating absolutely massive amounts of data, and you've got to have somewhere to put it. Unfortunately, the industry standard language for database access, namely SQL-99, can't support anything larger than 2^32 bytes of data in any one place, so you're just overwhelming this antiquated language.
What you need is a language [in combination with an architecture] that allows you to address your data in 64-bits, rather than 32. You also need a coherent database product that puts all this stuff together, so you can move it from your client workstation to the server, and mirror it to your failsafe redundant server, and make nightly tape backups, and keep track of just what the hell it is that you've been recording [with what's usually called "Metadata"].
Similar problems are faced by pretty much anyone else in the content creation business [such as the animation guys at a place like Pixar, or the particle physics guys at a place like Fermilab]: How do you keep track of these massive amounts of data that just overwhelm ancient paradigms like SQL?
Perhaps I misunderstood that part. But I'm pretty sure that 24 / 96 recorded audio gets filed under the heading of 'high quality'.
Right now, that's pretty much where the industry's at as a standard for the recording of audio performances [although the audio recording industry has a Moore's Law just like everybody else].
But if you're doing something a little more scientific-ish, like high-speed ultrasound, then 24 / 96 is at the low end of things.
Honestly, I'm not well versed in SQL or POSIX-compliant OSes (I'm a Windows programmer by trade), but I can tell you that usually when an datatype is 32 bits in size, it would indicate that the data is set in WORD size chunks. As most processors are 32-bit, the datatype is 32 bits in size as well. So, in other words, I would guess that the BLOB type is dependent on the CPU, and, once SQL gets ported to 64-bit OSes installed on machines with 64-bit CPUs, this should take care of size limitations you are experiencing.
The problem with SQL [like all the ancient languages, such as BASIC, or C] is that it doesn't have a good sense of datatype. Practically everything in SQL is little more than ASCII [and often only the first 7-bits of that].
A 2^32 byte BLOB is just that: 2^32 bytes, i.e. 2^32 single-byte ASCII characters.
Boy do I wish there had been some honest-to-goodness "scientists" or "engineers" or "mathematicians" around when these languages were being invented. Instead, we had a bunch of ivory tower morons who were trying to create some kind of abstract "natural" language, un-encumbered by what "scientists" and "engineers" and "mathematicians" [and recording studio geeks] really need, which is data with a strong sense of type.
If a shop managing large amounts of data in the multimedia or scientific computing fields can't hire an administrator or two (At less than $75,000, at least around here, public sector pays pretty low compared to private), they deserve whatever foul-ups and data-loss that occurs. If they haven't budgeted for administrators along with all that fancy 64-bit hardware, they're asking for trouble. Who runs the servers? It's a strawman, anyway.
As for rsync and whatnot, most database systems can call external programs from stored procedures.
There's also filesystem clustering, SAN solutions that could tie systems together, SAN mirroring between storage chassis.
In 2004, we shouldn't be worrying about much of this, I agree. The point is, you can either work with it, around it, or write your own operating system with a built-in RDBMS.
But that's a huge, huge amount of work you've described.
Look, I'm the guy who's supposed to be doing the "science" or "mathematical" end of things. I just want some big repository where I can dump my data, and then spend my time doing the analysis of the data.
Heck, even if a pre-packaged database product existed, it would take me MONTHS just to learn the thing, get the servers purchased, get it installed on the servers, get it up and running stably, and THEN start writing the client interface to access the damned thing.
But if I have to write the database product ITSELF from scratch, then I'm looking at literally YEARS of work before I ever even get to the point where I can start analyzing the data.
Personally, though, I'd rather just leave my files on a real filesystem and serve them using Samba or NFS or even HTTP. Frankly as far as audio is concerned I'd keep the metadata there too, leaving the database to act as a cache for metadata searches, without turning my entire audio collection into a monstrously huge data file I can only access through SQL. That's just me though; I can't pretend to have a deep understanding of the problems you're dealing with.
Right, but even doing just what you've described is a HUGE undertaking.
Do you know of anybody out there who's got a product that automates some of this crap?
Look, you seem to be writing this from the point of view that I've got a PhD that involves both RDBM theory and file system encoding theory.
Now, quite frankly, neither of those "theories" is particularly esoteric, and I could, with a good 3-4 months of study, and a good 3-4 decades of coding, write my own 64-bit database with support for strongly-typed data primitives.
The point is, though, that I have no interest in doing any such thing. I can't re-invent the wheel every damned time I need something. Eventually, at least SOME of this stuff needs to be written for me, or I'll grow old and die trying to write it for myself.
What you've outlined sounds like a great idea for a kid who's got both ten years of his life to blow off and some serious financing from a Sugar Daddy with very deep pockets. But I ain't got either the ten years, or the Sugar Daddy, M-Kay???
And please don't call me a troll. I am desperate for a product like this, i.e. something that can store really large pieces of data, with some tracking of at least the Metadata end of things, and maybe the ability for me to define my own binary datatypes [along with methods to act on those datatypes], and with some coherent integration with industry standard products like BackupExec or ArcServe.
If such a product exists, PLEASE, PLEASE tell me about it.
There must be somebody out there peddling such a thing. What does EMI or Sony use to keep track of the data that they record in their studios? What does Pixar use to keep track of their animation graphics and soundtracks?
I know that Computer Associates used to peddle a product called "Jasmine" that was supposed to do these sorts of things, but one day they just up and cancelled it and left all their clients high and dry [I had to work all the way up the ladder to one of their Senior VPs to find out about this].
I know that Progess Software used to peddle something called "ObjectStore," but it has a terrible reputation, and, as far as I can tell, Progress is letting it wither on the vine in favor of their new financial software initiatives.
I know that Microsoft just announced an initiative called ".NET ObjectSpaces," but, for the foreseeable future, it won't be anywhere near ready to use in a mission-critical environment.
File reference data type? Give me a break. Haven't you used a char or varchar to store a filename before? Or heck, generate the filename using your primary key, if possible.
If I'm in the business of writing address-mapping software that translates things like binary file nodes to ASCII characters, then, for all intents and purposes, I'm writing a new computer programming language. [Hell, in this case, I'm practically writing a new operating system.]
Look, it's 2004, not 1984 - all of this stuff should have been done for me by now. I shouldn't have to spend weeks upon weeks of my life writing this kind of crap.
Arcserve and backup exec will backup all files in a directory hierarchy, at least if the hierarchy is the only thing defined. Otherwise more than a few sysadmins would have to rebuild backup jobs every single day.
But do they do it in conjunction with the database itself, or separately? I.e. can I get one single BackupExec/ArcServe copy of both the database and the file system, or do I have to do two backups every night?
And the overwhelming majority of shops that do scientific computing, or multimedia computing, don't have a budget to hire a bunch of $75,000 administrators. Remember that each of your $75,000 administrators costs about $150,000 a year [or more] when you factor in all the overhead of benefits and office space and the like.
Rsync or a shell script can duplicate the data between servers.
Will Rsync talk to Oracle/DB2/SQLServer? Will Oracle/DB2/SQLServer talk to Rsync? What if someone makes a change [i.e. a delta] to the file? Will Rsync tell Oracle/DB2/SQLServer? What if someone makes a change [i.e. a delta] to the Metadata? Will Oracle/DB2/SQLServer inform the filesystem about it?
Like I said above, in 2004, we shouldn't have to be worrying about all of this crap.
If I'm in the business of writing file-splitting software to store a piece of 64-bit data into multiple instances of 32-bit data types, then, for all intents and purposes, I'm writing a new computer programming language.
Look, it's 2004, not 1984 - all of this stuff should have been done for me by now. I shouldn't have to spend weeks upon weeks of my life writing this kind of crap.
Why are you trying to put the "song" into the database? Store it as a file, then put a pointer into your database... Perhaps I am simplfying it too much? Or are you making it too complicated.
Okay, is there a "file reference data type" in SQL-99?
If I "reference the file," will Seagate/Veritas Backup-Exec [or CA/Cheyenne ArcServe] automatically back up the file when I do my nightly backups of the database [even though, strictly speaking, the file isn't part of the database]? Or will I have to go in and manually configure BackupExec or ArcServe for each file I need to have backed-up?
If I "reference the file," will the database automatically move copies of the file [and/or deltas of changes to the file] to the failsafe mirrors of the database, and/or to the load-balancing mirrors of the database?
And will all of these things be done in an ANSI/IEEE/ISO/whatever sort of a standard, so that if I decide to port my code to a different vendor's product, it won't take me forever and a year to figure out how to do the port?
What I'm asking for would have been SOOOOOO simple if only the idiots on the SQL committee had had an ounce of foresight.
PS: The things we have aren't technically "songs," although I suppose that our high-speed ultrasounds might qualify as such.
Which is how large digital media automation systems do it. I know of some that are more than 40Tbytes of spinning disks and one that will be in excess of 100T. They use an open standard based system
Please, please, please expound.
Who are "they"? Who sells these "digital media automation systems"? What is this "open standard"?
You've just proven why a music database would have made a great deal of sense, for your application, or at least, a set of sql functions/extensions like GIS, only applied to your field, with AUDIO64 types being defined, with custom fields like author, copyright, an instruments detail subquery and the like.
YES, YES, YES!!!
With strongly-typed data primitives! [96-bit IEEE Doubles, 128-bit IEEE Doubles, 128-bit LabVIEW TIMESTAMPS, etc.] Or, if they aren't pre-packaged, at least the ability to define strongly-typed data primitives on the fly.
Okay, is there a "file reference data type" in SQL-99?
If I "reference the file," will Seagate/Veritas Backup-Exec [or CA/Cheyenne ArcServe] automatically back up the file when I do my nightly backups of the database [even though, strictly speaking, the file isn't part of the database]? Or will I have to go in and manually configure BackupExec or ArcServe for each file I need to have backed-up?
If I "reference the file," will the database automatically move copies of the file [and/or deltas of changes to the file] to the failsafe mirrors of the database, and/or to the load-balancing mirrors of the database?
And will all of these things be done in an ANSI/IEEE/ISO/whatever sort of a standard, so that if I decide to port my code to a different vendor's product, it won't take me forever and a year to figure out how to do the port?
What I'm asking for would have been SOOOOOO simple if only the idiots on the SQL committee had had an ounce of foresight.
In practical terms, as you've indicated, the hardware side of things just ain't the Great Satan that everyone makes it out to be.
The real problem is software compression. What we need on the software side is, basically, no intereference whatsoever: Record the sound at 96,000+ samples per second per channel, at 24+ bits of resolution, and just dump it to a hard drive. Screw all of these damned compression technologies that give us altered sound - we want the real thing.
PS: For those of you who remember any of your grade school mathematics,
(96,000 samples per channel per second) X (3 bytes per sample) X (2 channels) = 576,000 bytes per second
and we're at about 576,000 bytes per second just for stereo [i.e for two channels]. Multiply that figure appropriately if you're interested in 5+1, 7+1, or 9+2 different channels].
That's looking like about
(576,000 bytes per second) X (3600 seconds per hour) = 2,073,600,000 bytes per hour
So we can fit at most about two hours worth of two channels worth of medium-quality sound into 2^32 bytes.
Here's my dilemma: WHAT WERE THE IDIOTS THINKING WHO DESIGNED SQL-99 AND THE 32-BIT [SO-CALLED] "BINARY LARGE OBJECT" [BLOB]?
Why don't they be honest, and call it what it really is - the Binary SMALL Object [BSOB]?
32-bits are absolutely worthless. We need true 64-bit platforms so that we can dump these things into our databases.
I've asked this until I'm blue in the face, but I'll do it again: Does anyone know of a company [preferably using some sort of ANSI or IEEE standard] that has a product that will allow us to dump truly large [i.e. necessarily 64-bit] amounts of data into a database?
And yes, we are generating this kinds of datasets, but no one seems to want to create a product for us to house them in...
5. Allow the user to browse their own hard drive, and categorize content automatically ("this is a document about lambs"... "this is a picture of a sunflower") and let them group and search for items. Eg. "Pictures like this" or "Documents about cats."
Look, these are all great ideas, but you've just outlined several man-centuries worth of work.
Microsoft has a small army of PhDs, from the best Universities in the world, and several billion dollars in spare change to finance them, yet they're having a helluva time just trying to do something so simple as adding searchable metadata to NTFS. Compare:
Some of the stuff you're talking about is just very, very, very difficult to do, and in the real world of stable, regression-tested, end-user friendly, shippable products, I'd advise you not to hold your breath waiting for this sort of thing to appear anytime soon.
PS: Yeah, I know/.-ers will follow up with a bunch of snide remarks about how Microsoft doesn't ship stable products, but again, I'd caution you not to underestimate how truly difficult these things really are.
No, for the same reason that idiots at IBM doesn't own the rights to Bill Gates's operating systems.
But if the Swiss patent office had had a dime's worth of foresight, they could have forced Einstein to sign a piece of paper that might very have given them ownership over his ideas.
PS: For the record, Einstein was a really abominable human being, and, among other things, a died-in-the wool Bolshevik, so, had he possessed the character to be intellectually consistent [which, of course, he didn't], he would have renounced his own intellectual property rights.
To start with they used a 40-bit memory address rather than 64-bit since we're not going to need 18 million terabytes of memory anytime soon. Therefore a 40-bit address allows up to 1 terabyte of memory. Thats enough, considering that you won't find a motherboard with support for 1024 sticks of 1GB ram anytime soon.
So there are only 40 physical wires on the motherboard/chipset, and physical memory only lives from 0 to 2^40 - 1, yet algebraically the operating system is expecting 64 wires on the motherboard/chipset?
Wonder what would happen if you threw a pointer at the address 2^40?
Gosh, it's been a while since I wrote any C code, but something like the following:
Presumably the OS would catch it, but it would be kinda funny if it didn't.Thanks! I'm looking over Caché as we speak [although I really oughta go home and get some sleep - I gotta be back here in just a few hours].
If you can think of any other names, please post them.
Thanks again!
I said "trolling" because the tone of your messages is "give it to me" instead of "what solutions do I have". You seem very set on an RDBMS, even when it's not proper for storing 2GB data streams. That's why I said syncronous sectors. If you DO make a change to the data, and RDBMS will require a read-update-replace to update a single 2GB BLOB. In a system DESIGNED for large data streams, the updates would only happen to a sector or 2. Basically a logging filesystem with metadata, and an RDBMS for keeping track of things. In essence, an SQL DB with a RAIDed ReiserFS.
I'm open to any suggestions.
But just trying to synchronize your file system [housing your "data"] with your RDBM [housing your "metadata"] is an enormous task in and of itself.
I'd be more than interested in any products which prepackage the synchronization of the two.
As for the "RDBMS" side of things, well, yeah, ultimately I'd like to be able to "search" the binary data in much the way that people use traditional ASCII-ish/SQL-ish RDBMS to "search" through ASCII strings.
For instance, in a traditional ASCII-ish/SQL-ish RDBMS, you can do something like the following:
Ultimately, I'd like to have my binary data [for instance, my "sound files"] searchable for queries like But I guess that just a file system synchronized with some RDBMS would be a start.In the professional recording studio, that is, not some guy who has some digital recording equipment and a hard drive, the date goes onto tape through a Nagra digital tape drive. What you're talking about is strictly low budget, small time audio recording. If you're actually talking about recording audio, you'd better be using tape, at least until it's time for editing and mixing.
Tell that to these guys:
Or to these guys:Thanks for the vote of confidence.
And if you come across such a beast, give me a holler.
Dude, if it's not done yet, and you don't anything except complaining, that's not going to motivate people to solve the problem. Go hire a developer team to do it for you, and make millions selling the resulting product to others.
Actually, the more I study the problem, the more I sense that that's what's gonna happen, only I'll be "the developer team" all by my lonesome.
Which begs the question of who'll have the time to analyze the data being generated...
If a shop managing large amounts of data in the multimedia or scientific computing fields can't hire an administrator or two (At less than $75,000, at least around here, public sector pays pretty low compared to private), they deserve whatever foul-ups and data-loss that occurs. If they haven't budgeted for administrators along with all that fancy 64-bit hardware, they're asking for trouble. Who runs the servers? It's a strawman, anyway.
As for rsync and whatnot, most database systems can call external programs from stored procedures.
There's also filesystem clustering, SAN solutions that could tie systems together, SAN mirroring between storage chassis.
In 2004, we shouldn't be worrying about much of this, I agree. The point is, you can either work with it, around it, or write your own operating system with a built-in RDBMS.
But that's a huge, huge amount of work you've described.
Look, I'm the guy who's supposed to be doing the "science" or "mathematical" end of things. I just want some big repository where I can dump my data, and then spend my time doing the analysis of the data.
Heck, even if a pre-packaged database product existed, it would take me MONTHS just to learn the thing, get the servers purchased, get it installed on the servers, get it up and running stably, and THEN start writing the client interface to access the damned thing.
But if I have to write the database product ITSELF from scratch, then I'm looking at literally YEARS of work before I ever even get to the point where I can start analyzing the data.
Dude, if it's not done yet, and you don't anything except complaining, that's not going to motivate people to solve the problem. Go hire a developer team to do it for you, and make millions selling the resulting product to others.
Actually, the more I study the problem, the more I sense that that's what's gonna happen, only I'll be "the developer team" all by my lonesome.
Which begs the question of who'll have the time to analyze the data being generated...
Personally, though, I'd rather just leave my files on a real filesystem and serve them using Samba or NFS or even HTTP. Frankly as far as audio is concerned I'd keep the metadata there too, leaving the database to act as a cache for metadata searches, without turning my entire audio collection into a monstrously huge data file I can only access through SQL. That's just me though; I can't pretend to have a deep understanding of the problems you're dealing with.
Right, but even doing what you've described is a HUGE undertaking.
Do you know of anybody out there who's got a product that automates some of this crap?
The poster uses in his sample figures 24 bit, 96 ksample recording. I understand that recording like this is done for mastering audio CDs and other studio processes. OK, so I can see wanting to work with that as your medium.
The poster then refers to this as "medium quality sound" and laments that he can only store 2 hrs of it in a (file/object/dataspace?).
The poster follows this up by declaring that 32 bits are worthless and that he needs 64 bits. The discussion at this point is computer / database and maybe I don't understand. Does the poster require the ability to record regular audio with 64 bit precision? Every bit doubles your resolution, and you want 40 more of them than are available in studio master recordings?
Okay, imagine you're the recording studio geek. Say you're in charge of recording the latest, I dunno, Sting [ex-Police] album.
Monday AM. The drummer comes in, and you record two channels [i.e. stereo] of his riffs. Recording Time: 4 hours, Data Size Total: probably in excess of 8 GB. [But at $0.05 per gigabyte, who cares?]
Monday PM. The bass player comes in. 4 hours, 2 channels, another 8 gigabytes.
Tuesday AM. The lead guitarist comes in. 4 hours, 2 channels, another 8 gigabytes.
Tuesday PM. The pianist comes in. 4 hours, 2 channels, another 8 gigabytes.
And so on, and on, and on, over the course of weeks or more: The lead singer, the backup singers, the violins, the mandolin, the bongos, and who knows? Maybe Yo-Yo Ma will show up with his cello.
Point is, you're generating absolutely massive amounts of data, and you've got to have somewhere to put it. Unfortunately, the industry standard language for database access, namely SQL-99, can't support anything larger than 2^32 bytes of data in any one place, so you're just overwhelming this antiquated language.
What you need is a language [in combination with an architecture] that allows you to address your data in 64-bits, rather than 32. You also need a coherent database product that puts all this stuff together, so you can move it from your client workstation to the server, and mirror it to your failsafe redundant server, and make nightly tape backups, and keep track of just what the hell it is that you've been recording [with what's usually called "Metadata"].
Similar problems are faced by pretty much anyone else in the content creation business [such as the animation guys at a place like Pixar, or the particle physics guys at a place like Fermilab]: How do you keep track of these massive amounts of data that just overwhelm ancient paradigms like SQL?
Perhaps I misunderstood that part. But I'm pretty sure that 24 / 96 recorded audio gets filed under the heading of 'high quality'.
Right now, that's pretty much where the industry's at as a standard for the recording of audio performances [although the audio recording industry has a Moore's Law just like everybody else].
But if you're doing something a little more scientific-ish, like high-speed ultrasound, then 24 / 96 is at the low end of things.
Honestly, I'm not well versed in SQL or POSIX-compliant OSes (I'm a Windows programmer by trade), but I can tell you that usually when an datatype is 32 bits in size, it would indicate that the data is set in WORD size chunks. As most processors are 32-bit, the datatype is 32 bits in size as well. So, in other words, I would guess that the BLOB type is dependent on the CPU, and, once SQL gets ported to 64-bit OSes installed on machines with 64-bit CPUs, this should take care of size limitations you are experiencing.
The problem with SQL [like all the ancient languages, such as BASIC, or C] is that it doesn't have a good sense of datatype. Practically everything in SQL is little more than ASCII [and often only the first 7-bits of that].
A 2^32 byte BLOB is just that: 2^32 bytes, i.e. 2^32 single-byte ASCII characters.
Boy do I wish there had been some honest-to-goodness "scientists" or "engineers" or "mathematicians" around when these languages were being invented. Instead, we had a bunch of ivory tower morons who were trying to create some kind of abstract "natural" language, un-encumbered by what "scientists" and "engineers" and "mathematicians" [and recording studio geeks] really need, which is data with a strong sense of type.
If a shop managing large amounts of data in the multimedia or scientific computing fields can't hire an administrator or two (At less than $75,000, at least around here, public sector pays pretty low compared to private), they deserve whatever foul-ups and data-loss that occurs. If they haven't budgeted for administrators along with all that fancy 64-bit hardware, they're asking for trouble. Who runs the servers? It's a strawman, anyway.
As for rsync and whatnot, most database systems can call external programs from stored procedures.
There's also filesystem clustering, SAN solutions that could tie systems together, SAN mirroring between storage chassis.
In 2004, we shouldn't be worrying about much of this, I agree. The point is, you can either work with it, around it, or write your own operating system with a built-in RDBMS.
But that's a huge, huge amount of work you've described.
Look, I'm the guy who's supposed to be doing the "science" or "mathematical" end of things. I just want some big repository where I can dump my data, and then spend my time doing the analysis of the data.
Heck, even if a pre-packaged database product existed, it would take me MONTHS just to learn the thing, get the servers purchased, get it installed on the servers, get it up and running stably, and THEN start writing the client interface to access the damned thing.
But if I have to write the database product ITSELF from scratch, then I'm looking at literally YEARS of work before I ever even get to the point where I can start analyzing the data.
Personally, though, I'd rather just leave my files on a real filesystem and serve them using Samba or NFS or even HTTP. Frankly as far as audio is concerned I'd keep the metadata there too, leaving the database to act as a cache for metadata searches, without turning my entire audio collection into a monstrously huge data file I can only access through SQL. That's just me though; I can't pretend to have a deep understanding of the problems you're dealing with.
Right, but even doing just what you've described is a HUGE undertaking.
Do you know of anybody out there who's got a product that automates some of this crap?
Look, you seem to be writing this from the point of view that I've got a PhD that involves both RDBM theory and file system encoding theory.
Now, quite frankly, neither of those "theories" is particularly esoteric, and I could, with a good 3-4 months of study, and a good 3-4 decades of coding, write my own 64-bit database with support for strongly-typed data primitives.
The point is, though, that I have no interest in doing any such thing. I can't re-invent the wheel every damned time I need something. Eventually, at least SOME of this stuff needs to be written for me, or I'll grow old and die trying to write it for myself.
What you've outlined sounds like a great idea for a kid who's got both ten years of his life to blow off and some serious financing from a Sugar Daddy with very deep pockets. But I ain't got either the ten years, or the Sugar Daddy, M-Kay???
And please don't call me a troll. I am desperate for a product like this, i.e. something that can store really large pieces of data, with some tracking of at least the Metadata end of things, and maybe the ability for me to define my own binary datatypes [along with methods to act on those datatypes], and with some coherent integration with industry standard products like BackupExec or ArcServe.
If such a product exists, PLEASE, PLEASE tell me about it.
There must be somebody out there peddling such a thing. What does EMI or Sony use to keep track of the data that they record in their studios? What does Pixar use to keep track of their animation graphics and soundtracks?
I know that Computer Associates used to peddle a product called "Jasmine" that was supposed to do these sorts of things, but one day they just up and cancelled it and left all their clients high and dry [I had to work all the way up the ladder to one of their Senior VPs to find out about this].
I know that Progess Software used to peddle something called "ObjectStore," but it has a terrible reputation, and, as far as I can tell, Progress is letting it wither on the vine in favor of their new financial software initiatives.
I know that Microsoft just announced an initiative called ".NET ObjectSpaces," but, for the foreseeable future, it won't be anywhere near ready to use in a mission-critical environment.
File reference data type? Give me a break. Haven't you used a char or varchar to store a filename before? Or heck, generate the filename using your primary key, if possible.
Compare my recent rant:
If I'm in the business of writing address-mapping software that translates things like binary file nodes to ASCII characters, then, for all intents and purposes, I'm writing a new computer programming language. [Hell, in this case, I'm practically writing a new operating system.]Look, it's 2004, not 1984 - all of this stuff should have been done for me by now. I shouldn't have to spend weeks upon weeks of my life writing this kind of crap.
Arcserve and backup exec will backup all files in a directory hierarchy, at least if the hierarchy is the only thing defined. Otherwise more than a few sysadmins would have to rebuild backup jobs every single day.
But do they do it in conjunction with the database itself, or separately? I.e. can I get one single BackupExec/ArcServe copy of both the database and the file system, or do I have to do two backups every night?
And the overwhelming majority of shops that do scientific computing, or multimedia computing, don't have a budget to hire a bunch of $75,000 administrators. Remember that each of your $75,000 administrators costs about $150,000 a year [or more] when you factor in all the overhead of benefits and office space and the like.
Rsync or a shell script can duplicate the data between servers.
Will Rsync talk to Oracle/DB2/SQLServer? Will Oracle/DB2/SQLServer talk to Rsync? What if someone makes a change [i.e. a delta] to the file? Will Rsync tell Oracle/DB2/SQLServer? What if someone makes a change [i.e. a delta] to the Metadata? Will Oracle/DB2/SQLServer inform the filesystem about it?
Like I said above, in 2004, we shouldn't have to be worrying about all of this crap.
Split the file across BLOBs.
Compare my recent rant:
If I'm in the business of writing file-splitting software to store a piece of 64-bit data into multiple instances of 32-bit data types, then, for all intents and purposes, I'm writing a new computer programming language.Look, it's 2004, not 1984 - all of this stuff should have been done for me by now. I shouldn't have to spend weeks upon weeks of my life writing this kind of crap.
Why are you trying to put the "song" into the database? Store it as a file, then put a pointer into your database... Perhaps I am simplfying it too much? Or are you making it too complicated.
From my other reply:
PS: The things we have aren't technically "songs," although I suppose that our high-speed ultrasounds might qualify as such.Which is how large digital media automation systems do it. I know of some that are more than 40Tbytes of spinning disks and one that will be in excess of 100T. They use an open standard based system
Please, please, please expound.
Who are "they"? Who sells these "digital media automation systems"? What is this "open standard"?
Thanks!!!
You've just proven why a music database would have made a great deal of sense, for your application, or at least, a set of sql functions/extensions like GIS, only applied to your field, with AUDIO64 types being defined, with custom fields like author, copyright, an instruments detail subquery and the like.
YES, YES, YES!!!
With strongly-typed data primitives! [96-bit IEEE Doubles, 128-bit IEEE Doubles, 128-bit LabVIEW TIMESTAMPS, etc.] Or, if they aren't pre-packaged, at least the ability to define strongly-typed data primitives on the fly.
Compare my recent rant:
Does anybody make a product like this???and reference the file
Okay, is there a "file reference data type" in SQL-99?
If I "reference the file," will Seagate/Veritas Backup-Exec [or CA/Cheyenne ArcServe] automatically back up the file when I do my nightly backups of the database [even though, strictly speaking, the file isn't part of the database]? Or will I have to go in and manually configure BackupExec or ArcServe for each file I need to have backed-up?
If I "reference the file," will the database automatically move copies of the file [and/or deltas of changes to the file] to the failsafe mirrors of the database, and/or to the load-balancing mirrors of the database?
And will all of these things be done in an ANSI/IEEE/ISO/whatever sort of a standard, so that if I decide to port my code to a different vendor's product, it won't take me forever and a year to figure out how to do the port?
What I'm asking for would have been SOOOOOO simple if only the idiots on the SQL committee had had an ounce of foresight.
Do you have any URLs for articles that talk about this phenomenon?
Thanks!
In practical terms, as you've indicated, the hardware side of things just ain't the Great Satan that everyone makes it out to be.
The real problem is software compression. What we need on the software side is, basically, no intereference whatsoever: Record the sound at 96,000+ samples per second per channel, at 24+ bits of resolution, and just dump it to a hard drive. Screw all of these damned compression technologies that give us altered sound - we want the real thing.
PS: For those of you who remember any of your grade school mathematics,
and we're at about 576,000 bytes per second just for stereo [i.e for two channels]. Multiply that figure appropriately if you're interested in 5+1, 7+1, or 9+2 different channels].That's looking like about
So we can fit at most about two hours worth of two channels worth of medium-quality sound into 2^32 bytes.Here's my dilemma: WHAT WERE THE IDIOTS THINKING WHO DESIGNED SQL-99 AND THE 32-BIT [SO-CALLED] "BINARY LARGE OBJECT" [BLOB]?
Why don't they be honest, and call it what it really is - the Binary SMALL Object [BSOB]?
32-bits are absolutely worthless. We need true 64-bit platforms so that we can dump these things into our databases.
I've asked this until I'm blue in the face, but I'll do it again: Does anyone know of a company [preferably using some sort of ANSI or IEEE standard] that has a product that will allow us to dump truly large [i.e. necessarily 64-bit] amounts of data into a database?
And yes, we are generating this kinds of datasets, but no one seems to want to create a product for us to house them in...
5. Allow the user to browse their own hard drive, and categorize content automatically ("this is a document about lambs"
Look, these are all great ideas, but you've just outlined several man-centuries worth of work.
Microsoft has a small army of PhDs, from the best Universities in the world, and several billion dollars in spare change to finance them, yet they're having a helluva time just trying to do something so simple as adding searchable metadata to NTFS. Compare:
Some of the stuff you're talking about is just very, very, very difficult to do, and in the real world of stable, regression-tested, end-user friendly, shippable products, I'd advise you not to hold your breath waiting for this sort of thing to appear anytime soon.PS: Yeah, I know /.-ers will follow up with a bunch of snide remarks about how Microsoft doesn't ship stable products, but again, I'd caution you not to underestimate how truly difficult these things really are.
No, for the same reason that idiots at IBM doesn't own the rights to Bill Gates's operating systems.
== No, for the same reason that the idiots at IBM don't own the rights to Bill Gates's operating systems.
It's late, and I'm tired.
do the Swiss own special relativity\
No, for the same reason that idiots at IBM doesn't own the rights to Bill Gates's operating systems.
But if the Swiss patent office had had a dime's worth of foresight, they could have forced Einstein to sign a piece of paper that might very have given them ownership over his ideas.
PS: For the record, Einstein was a really abominable human being, and, among other things, a died-in-the wool Bolshevik, so, had he possessed the character to be intellectually consistent [which, of course, he didn't], he would have renounced his own intellectual property rights.
It's just morally wrong to claim exclusive ownership over something nonrivalrous.
So Wiles shouldn't claim ownership over semi-stable Taniyama-Shimura?
And Hilbert shouldn't claim ownership over the Nullstellensatz?
And Gauss shouldn't claim ownership over Theorema Egregium or Theorema Aureum?
And neither Newton nor Leibniz should claim ownership over the Fundamental Theorem of Calculus?
And Galileo shouldn't claim ownership over Conservation of Momentum?
And Scotus shouldn't claim ownership over the concept of infinity?
And Archimedes shouldn't claim ownership over the volume of solids?
And Hippasus shouldn't claim ownership over the irrationals?