To me, the biggest drawback is its popularity: the hardware is insanely fragmented. Want to write a Symbian app? Browse the device list http://www.symbian.org/devices
App developers have to support: 1) mix of touch and non touch screens 2) Insanely different display resolutions 3) Crazy list of hardware buttons (some have keyboards, some none, some have the 10 digit numeric, etc.) 4) Different form factors (clamshell, block, etc.)
Basically, writing a very good, elegant app that people WANT TO PAY FOR in Symbian is a disaster. Best to write for iOS and Android. Although both hardware platforms are fragmented they are not nearly as bad to deal with as Symbian. That, and there's a culture of "It's OK and normal to buy apps" (much more so on iOS than Android, of course) that doesn't appear to exist on other platforms (yet).
Why, in this day and age, are we talking about NUMBERS? Do we address websites via IP address? No, we have DNS.
Why isn't there a DNS for phones? I pick a name, perhaps even something as simple and unique as MY EMAIL ADDRESS, and then anyone who knows my email address can contact me. Or, just like DNS, I can set up any number of unique names for various things (my-recruiters@gmail;) that point to some sort of numeric based phone.
Crucial's M225 (I own the 128GB version) 1711 firmware had significant bugs and was quickly yanked. In order to upgrade to the latest 1819 you have to downgrade back to 1571.
RAM is dirt cheap and most server systems support significantly more RAM than most people bother to install. For critical systems, ECC works but that doesn't prevent everything (double bit errors etc.). Is it time for a Redundant Array of Inexpensive DIMMs? Many HA servers now support Memory Mirroring (aka RAID-1 http://www.rackaid.com/resources/rackaid-blog/server-dysfunction/memory_mirroring_to_the_rescue/) but should there be more research into different RAID levels for memory (RAID5-6, 10, etc?)
for a few reasons, the biggest of which is that no one in their right mind would use ASE on Windows to begin with (thus probably wouldn't be running IIS)...
But seriously, ASE doesn't use xtype in such a way, nor do (most) of the (x)type ID's match up to meaningful ASE datatypes (the TEXT type IDs do match).
Anyway, ASE admins need not fear any more than Oracle or MySQL or DB2 or PostgreSQL or $DB admins; this script would have to be modified to run successfully on ASE.
One could actually argue--as I sometimes do--that the success of commercial personal computing and operating systems has actually led to a considerable retrogression in many, many respects.... So I think the lack of a real computer science today, and the lack of real software engineering today, is partly due to this pop culture.
I'd call it "Fad-driven Development" more so than pop culture. But the lack of computer science/engineering causes fad-driven development and vice-versa. It's a feedback loop.
...the adoption of programming languages has...been somewhat accidental, and the emphasis has...been on how easy it is to implement the programming language rather than on its actual merits and features.... it started spreading Basic around just because it was there, not because it had any intrinsic merits whatsoever.
HTML, XML are prime examples of this - and also fad-driven development. Verbose, tag-based, require parsing every time, etc. -- not a very good language in any respect. Yet, people can read it. No technical intrinsic merits push XML over some other format, yet here we are.
All of these ideas could be part of both software engineering and computer science, but I fear--as far as I can tell--that most undergraduate degrees in computer science these days are basically Java vocational training.
This relates back to the failure of CS and fad-driven development.
Jesus Christ, no. The solution is simple: (1) Have every PC OS contain a DBMS (this is not as difficult as you would think) (2) Always keep your data in a DBMS (3) Have said DBMS transfer the data via whatever method it would like. Chances are this would be some sort of compact, efficient binary method.
I think I actually implied otherwise, that there are a host of reasons why products and ideas succeed in the marketplace, technical superiority being only one of them. You said if RDBMS provided "some order[s] of magnitude greater functionality" which it does. Replacement costs for any new technology will never be zero. But I agree those two factors alone will not propel RDBMS development and adoption. The IT industry is too fad driven and too ignorant (paraphrased Date/Pascal) to adopt a true RDBMS; Google is smarter and different than that and has the ability to get it right the first time, but I'm afraid does not have the luxury to mess it up.
Or better heuristics, getting 10000 results isn't a problem as long as my target is in the first 10-30 results. Yet better heuristics, in the long run, will either approximate what a RDBMS would provide (e.g. a schema) or will become prohibitively expensive (those NP-Complete problems). If they end up where a RDBMS starts, why go through the hassle in the first place?
I beg to disagree, I think most people, on looking at the problem realize how hard it is to solve Yet you provide a solution later? I was suggesting that the UI problem is not yet solved, mostly because 1) people have not seriously really investigated it and 2) if they have, they found they didn't have adequate tools with which to define the metadata. Obviously the RDBMS provides the solution to #2, and #1 then is merely an exercise in good UI.
I'd refer to Pike's comments vis a vis kernels: there's plenty of research on kernels.... Actually, there's little research in kernels (well, he discusses OS research which I assume is what you were getting at?) according to Rob's own article (http://freshmeat.net/articles/view/175/). Rob mentions things are different at Google in Question #2, and then proceeds to completely miss the potential of #5, which is what prompted me to post my original message.
Not being on the inside of Google myself I don't know if they think they're running into scaling limits, but outside evidence is that there is no problem. By scaling I was talking about human factors (dealing with search results, defining queries and views, programming new algorithms, etc.) and also computational ones (searching and organizing lots of data, etc.). There's no question that Google is doing a phenomenal job. The point is that they can utilize their talent in a way that not only solves the search problem but also the distributed application and data problems.
As for correct implementations, it seems to me that applying the query as metadata to search results would solve the problem of metadata generation without requiring a radical breakthrough in AI. A-ha! Clarity! And yet, what is metadata? The relational schema is the metadata (and vice versa), and what is a relational query but a definition of a new schema?
...without requiring an earthshattering breakthrough. The breakthroughs have already been made in Codd's work (and the subsequent work of everyone who has researched RDBMS such as Date, etc.) in the past forty years. We'd require "earth shattering breakthroughs" to solve this problem any other way.
I'm still not seeing as to why Google, implementing a D-RDBMS, cannot solve this problem and provide significant additional benefits (irrespective of political, emotional, or other ancillary factors). I would like to know why you think this is so.
Re:I think you and are are defining the same thing
on
Rob Pike Responds
·
· Score: 2, Interesting
What is metadata? Literally it's 'data about data', but everyone knows that. What is metadata in the context of data management? The schema is metadata! The metadata is the schema!
It is the case that, in order for a search to work across systems, there would have to be some commonality of the schema. I would think, though, that this would not be as impossible as you are making it. The "semantic web" people are developing just that, although are using yucky XML to do it. Think of all the standards out there - from SOAP to TCP/IP to RDF to ASCII. All have to do with a common format for exchange and/or meaning. The need for search would drive application developers to adopt similar logical models. The nice thing about the RDBMS concept is that you can define any logical view of the data that you want. If you have someone that cannot access your data your way, you can create a view of that data which corresponds to their schema.
I would think, over time, schemas would converge. Traditional, text-based search tools would perhaps need to be employed to search information that is not correctly defined, but why should that restrict someone from providing more information about their data the way the RDBMS does?
Oh, I also want to thank you for the discussion. It has been, at least to me, thought provoking and remarkably civil. It is rare form these days to have an IT discussion that doesn't involve name calling, flaming, and/or other stupidity.
Re:He got #5 wrong...
on
Rob Pike Responds
·
· Score: 3, Interesting
"the technical point has largely been ignored by the computer industy"
Simply because most of the world is ignorant does not make it a particularly welcome idea to willingly embrace their ignorance. Hence, I try and use correct terms whenever possible (RDBMS vs. SQL DBMS, Cracker vs. Hacker, the Terrorist Attacks of September 11, 2001 vs. 9/11) etc. But, that is neither here nor there.
I'll address your points briefly before I get to the root of my initial desire for posting.
"Technically superior solutions may have aesthetic superiority"
That seems like a contradiction. Something that is "technically" (I am assuming you mean 'of a technical nature' and not 'abstractly') superior is certainly more than aesthetically superior!!
RDBMS certainly have considerably more functionality than SQL DBMS products. This is clear once you read the original theories and the foundations behind them. Your sentence illustrates the myth that, in the IT industry, technically superior products will rise to the top. Your mention of "worse is better" (I really, really hate that title, it should really be "Worse is Sometimes More Marketable" or the like) reinforces this point exactly.
"The query, my name Versailles pictures is probably good enough" It is good enough only in the micro. There is a statistic which mentions the geometric (maybe even exponential) rate at which we are creating and storing new data. Sure, for your current family album this level of granularity may suffice - but I suspect in the future our family albums will be composed of video, audio, stills, etc. at a magnitude that makes getting 10,000 results impossible to sort through by hand. You'll require more accurate search results and will want to ask more precise questions. The RDBMS is the way to get this; read my other posts on this thread to see some suggestions re: metadata; in short, the solution for tagging metadata is obviously not a 'solved problem' yet - this is mostly because no one has seriously tried to study it; also having a complete RDBMS there would aid immensely with relating and tagging your information. Properly implemented (whatever that may be) I would think that there would be little typing required.
The reason why I decided to post my initial reply was that this was a questionnaire by a guy at Google. If there is one company that could/would implement a D-RDBMS it would be Google.
It's obvious that Microsoft, Oracle, et al would not lead the way in this sort of innovation. Their products, marketing strategy, and internal politics would not allow for a TRDBMS to be at the core of any Microsoft operating system and Office Suite they ship, nor would Oracle want to adapt to something which required a shift from SQL or allow for easy migration to a competing product.
That brings us back to Google. Google is just the right kind of company to pull it off: it's got the technical expertise, name recognition and reputation, and the willingness to truly revolutionize the way we work with computers.
Ideally, Google would start using a form of RDBMS for all the search indexes it creates for their desktop search tool (I don't know what kind of DB it uses now). It would take a given document, rip it into their RDBMS, and then allow for searching. Since Google has virtually written the textbook on large scale data distribution they could load your local DB into their pool, so now whenever you log into Google.com you can search (and with enough bandwidth, retrieve) your information anywhere, any time (this would be perfect for companies trying to manage data for projects, etc.).
But, since it was in a RDBMS, other applications could be written to extend the idea. I could extend my product with the Google tool by storing my data in some format edible by the search tool. I now have Google Search built into my application. Or, I write a different UI which allows you to abandon the Windows "Explorer.exe" altogether - it gets rid of the archaic 'files' an
Re:He got #5 wrong...
on
Rob Pike Responds
·
· Score: 2, Interesting
Of course, there are other benefits to universal RDBMS storage: Application programs are easier to write
No need to write custom file format
RDBMS embeds business logic and simplifies code generation (WHAT, not HOW) Perfect extension of "Information Wants [sic] to be Free" - more accurately "Your Information is Free"
Any application can read data created by any other application (security controls permitting)
Source code can be stored in RDBMS
No longer have 'tabs vs. spaces' arguments
Can convert from one language to another easily (just a different view of same data) Data and programs no longer are confined to 'web' or 'local'
It will be impossible to tell whether or not a particular piece of data lives on your own PC, or on a central server (or cluster)
(Depends on universal connectivity)
"But this seems to contradict what you were saying in your original post, about how structures that relyed on the user giving you data were not going to be successful - which I agree with."
True. As I was writing that, I didn't feel comfortable with it, but I wasn't sure how to phrase it any other way. When you are creating content you would have to have an easy method of declaring certain attributes. This comes down to the application program that you use to create the data - and this type of application would be much easier to write because it would contain a search tool in it to specify the metadata; therefore it is much more likely that the application does this in a user-friendly, automatic way (and the user is most likely to provide metadata as necessary).
"Any system must be able to deal with data that primarily is not going to have any meaning assigned to it at any point by the user"
Well, when you create data, the application should probably perform this process and then you would approve/clarify the intent. This, to me, sounds much better than some method out there which takes the content and tries to derive meaning from it independent of you, the creator. Given Joe User or Google, I'll take Joe's word that this document means what he says it means (of course, you get into the whole trust topic, but there is nothing to say that you can't have content aggregators {Google, et al} which rate the assertions made by the metadata).
"Metadata... That's a little abstract, so a few examples are..."
Attributes and metadata are an integral part of the relational model. The types of queries you propose are certainly solvable (well, you would have to define what "like" meant, first) in a RDBMS by a search tool which has access to the schema of the data you are searching.
What I am proposing is not exactly 180* from what you and the other poster have suggested; a search tool would be doing something like what a RDBMS would be doing but would be defining the schema on the fly, without the data creator's input, and with limited information. So, the RDBMS would have a clear advantage because the metadata is already defined up front - you merely have to query it.
Re:He got #5 wrong...
on
Rob Pike Responds
·
· Score: 2, Interesting
I mentioned in another post that SQL products are NOT RDBMS; so such an implementation of D-DBMS would be unwieldy at best in Oracle. That said, I will reply as if we were discussing a generalized RDBMS and not a poor, incomplete implementation (MySQL is an extremely poor, incomplete implementation).
PageRank is an algorithm of popularity and not an algorithm of relevancy and as such, it really bears little relevance to implementation of relevancy algorithms as we are discussing. Of course, relevancy algorithms could contain page rank as a heuristic. See http://www.google.com/technology/
Google, in essence, is creating a schema for every page that it indexes. This schema is, in virtually every case, incomplete (because Google's algorithms are not perfect). In order for you to create a document (in the New World Order there really is no such thing as a web page any more) about Atlantic Slave Trade, you would have to have some sort of schema that defines it (by definition, it would require one). Of course, there would probably be schemas for historical documents, product literature, etc. which share a common foundation and attributes (kind of like inheritance in OO). This is not an impossible task; HTML was standardized, we have standards for everything nowadays.
This idea ("All the world's an RDBMS") merely formalizes this process. Note that this hinges upon whatever document creation software you use to perform this process for you; the nice part is this is not a fundamentally unsolvable problem (Codd did a lot of the hard theoretical work already; all someone now needs to do is implement it). The XML guys are trying to do it with the semantic web; unfortunately they chose a poor implementation technology (XML).
Your metadata is lost because current algorithms are imperfect. Let's consider your trip to Versailles. You probably want to share that information to the world, as people are wont to do. Currently, you probably type something in like: Here are pics from my trip: X Y Z
How does Google know that "my trip" refers to the trip that you took from 01-OCT-1995 to 20-OCT-1995 with three friends? What about the content of the pictures?
Could you ask Google "Where was I on 12-OCT-1995?" What about "Who was I with?" or "Where was this picture taken? What is this picture of?".
You could make "your trip" link to another HTML page which has some information about your trip, but then Google has no capability to make those connections (unless you explicitly have matchable text in the document). If your query was against a defined schema, then those attributes would be available for the engine to utilize. Of course, we're not talking about a magic AI engine that does this; the schema drives and defines the query.
Finally, there is no requirement for a 'formal language' - when you do a Google search do you have to specify a formal language? That is a matter of implementation (as is look/feel: when you view a particular web page, you are unaware as to the source of the data - it could be generated by a DBMS or is simply a static text file.)
In order to provide relevant results the search algorithm must derive at least some meaning from the data. The RDBMS does just this in a well-known, accurate manner. Why not give the algorithm more data with which to make its inferences? That would lead directly to algorithms that are: 1) Less complicated 2) More accurate 3) Easier to develop/debug (probably ties to #1)
And of course the end-user is going to derive more meaning from the data than computers can (currently, without true AI) provide. But the point is that users give data specific meaning when they create the data - meaning which is currently lost when storing 'plain text' (HTML, Word document, etc.). Storing in a RDBMS attempts to preserve as much of that meaning as is possible.
... but then again so did the person who posed the question.
I understand the idea that anything user-facing should probably be as simple as possible. This means that ideas that require user-supplied metadata (as the typical XML-in-filesystem ideas require) are probably not going to be successful. I also agree that Joe User doesn't care whether or not his data is stored in a RDBMS or in a plain text file if his search tool does a good job.
The phrase "structure is meaningless; search is king" is a non-sequitur to someone aware of data management fundamentals. Structure gives meaning which in turn allows you to relate the data to others. The problem today is that we're creating data and storing it in'plain text' (or flat file, proprietary, etc.) physical formats instead of storing emails, word processing documents, etc. in a RDBMS.
The RDBMS is more than simply a search tool; that it has a sound model, provides for easier application development, etc. Wouldn't search be significantly easier to do if your data is given a consistent logical view? If you know the semantics of a particular piece of data, you no longer need to waste your time classifying it to search.
It seems that a proper solution would be that every PC contained a RDBMS, all data is stored in one, and that the internet would simply be a series of interconnected, distributed RDBMS (D-RDBMS). This idea would probably be fairly difficult to implement, but is already being performed at Google anyway (albeit in a slightly different format). Back when Codd developed the model he was primarily concerned with institutional databases -- centralized schema validation/data storage/etc. The problems implementing D-RDBMS products are not trivial, but then again are not insurmountable. The world has been able to standardize on protocols, etc. so I don't think it is out of the realm of possibility to suggest that different companies/users/applications could agree on a particular schema for, say, emails.
The really funny thing is that XML promised "automatic" communication via machines. Namely that the schema would be used to (magically, some would say) divine knowledge from the stream.
I agree wholeheartedly. PostgreSQL and FireBird would suit LiveJournal way better than MySQL. However, PostgreSQL's replication is not exactly fail-save (not sure if that's a requirement here) nor automatic, nor does it have the kind of partitioning features that some of the 'bigger' boys have.
I was thinking mostly of Sybase Replication Server combined with Sybase ASE or Oracle 10g/Oracle Clustering, things that would go really, really nicely in the environment and workload the LiveJournal folk are experiencing.
I can't speak to the "Perl Sux!" allegation but I would say that MySQL is at least partially at fault, too, especially considering the limited clustering, partitioning, replication, and locking schemes it has.
They could/should have moved to a much better DBMS. Although the DBMS licensing fee would've been non-trivial it would have meant SIGNIFICANTLY reduced hardware costs and much much less application code development. I even suggested this several years ago but I was told that licensing costs were prohibitive even as they were throwing away $40K on useless hardware.
To me, the biggest drawback is its popularity: the hardware is insanely fragmented. Want to write a Symbian app? Browse the device list http://www.symbian.org/devices
App developers have to support:
1) mix of touch and non touch screens
2) Insanely different display resolutions
3) Crazy list of hardware buttons (some have keyboards, some none, some have the 10 digit numeric, etc.)
4) Different form factors (clamshell, block, etc.)
Basically, writing a very good, elegant app that people WANT TO PAY FOR in Symbian is a disaster. Best to write for iOS and Android. Although both hardware platforms are fragmented they are not nearly as bad to deal with as Symbian. That, and there's a culture of "It's OK and normal to buy apps" (much more so on iOS than Android, of course) that doesn't appear to exist on other platforms (yet).
Why, in this day and age, are we talking about NUMBERS? Do we address websites via IP address? No, we have DNS.
Why isn't there a DNS for phones? I pick a name, perhaps even something as simple and unique as MY EMAIL ADDRESS, and then anyone who knows my email address can contact me. Or, just like DNS, I can set up any number of unique names for various things (my-recruiters@gmail;) that point to some sort of numeric based phone.
You could even call it Phone Name System.
Crucial's M225 (I own the 128GB version) 1711 firmware had significant bugs and was quickly yanked. In order to upgrade to the latest 1819 you have to downgrade back to 1571.
http://www.crucial.com/support/firmware.aspx
Seems as if most consumer SSD products are still a bit in the "beta" stage.
No, not really.
RAID-5 allows for disk failure via distributed block parity. ECC recovers single bit error.
The "Memory RAID" design should prevent a larger issue (multi-bit/DIMM failure/etc. that ECC cannot prevent) from taking the whole system out.
I would imagine that ECC memory would be used in conjunction with higher-level striping or mirroring to prevent and recover from both failures.
RAM is dirt cheap and most server systems support significantly more RAM than most people bother to install. For critical systems, ECC works but that doesn't prevent everything (double bit errors etc.). Is it time for a Redundant Array of Inexpensive DIMMs? Many HA servers now support Memory Mirroring (aka RAID-1 http://www.rackaid.com/resources/rackaid-blog/server-dysfunction/memory_mirroring_to_the_rescue/) but should there be more research into different RAID levels for memory (RAID5-6, 10, etc?)
No worries -- the concept is right :)
"Obviously the best way of accomplishing such a database is to denormalize any value that might be null"
That's normalizing -- the table in this example is de-normalized
for a few reasons, the biggest of which is that no one in their right mind would use ASE on Windows to begin with (thus probably wouldn't be running IIS)...
But seriously, ASE doesn't use xtype in such a way, nor do (most) of the (x)type ID's match up to meaningful ASE datatypes (the TEXT type IDs do match).
Anyway, ASE admins need not fear any more than Oracle or MySQL or DB2 or PostgreSQL or $DB admins; this script would have to be modified to run successfully on ASE.
A 7 minute into into soldering with lots of good closeups and
explanations. Soldering Introduction Video and Picture Gallery
Another video on surface mount soldering: Surface_Mount_Soldering/101
I suggested this (albeit more generally/less well-written) a while ago:c id=105 60028
http://slashdot.org/comments.pl?sid=126075&
Jesus Christ, no. The solution is simple:
(1) Have every PC OS contain a DBMS (this is not as difficult as you would think)
(2) Always keep your data in a DBMS
(3) Have said DBMS transfer the data via whatever method it would like. Chances are this would be some sort of compact, efficient binary method.
I think I actually implied otherwise, that there are a host of reasons why products and ideas succeed in the marketplace, technical superiority being only one of them.
...without requiring an earthshattering breakthrough.
You said if RDBMS provided "some order[s] of magnitude greater functionality" which it does. Replacement costs for any new technology will never be zero. But I agree those two factors alone will not propel RDBMS development and adoption. The IT industry is too fad driven and too ignorant (paraphrased Date/Pascal) to adopt a true RDBMS; Google is smarter and different than that and has the ability to get it right the first time, but I'm afraid does not have the luxury to mess it up.
Or better heuristics, getting 10000 results isn't a problem as long as my target is in the first 10-30 results.
Yet better heuristics, in the long run, will either approximate what a RDBMS would provide (e.g. a schema) or will become prohibitively expensive (those NP-Complete problems). If they end up where a RDBMS starts, why go through the hassle in the first place?
I beg to disagree, I think most people, on looking at the problem realize how hard it is to solve
Yet you provide a solution later? I was suggesting that the UI problem is not yet solved, mostly because 1) people have not seriously really investigated it and 2) if they have, they found they didn't have adequate tools with which to define the metadata. Obviously the RDBMS provides the solution to #2, and #1 then is merely an exercise in good UI.
I'd refer to Pike's comments vis a vis kernels: there's plenty of research on kernels....
Actually, there's little research in kernels (well, he discusses OS research which I assume is what you were getting at?) according to Rob's own article (http://freshmeat.net/articles/view/175/). Rob mentions things are different at Google in Question #2, and then proceeds to completely miss the potential of #5, which is what prompted me to post my original message.
Not being on the inside of Google myself I don't know if they think they're running into scaling limits, but outside evidence is that there is no problem.
By scaling I was talking about human factors (dealing with search results, defining queries and views, programming new algorithms, etc.) and also computational ones (searching and organizing lots of data, etc.). There's no question that Google is doing a phenomenal job. The point is that they can utilize their talent in a way that not only solves the search problem but also the distributed application and data problems.
As for correct implementations, it seems to me that applying the query as metadata to search results would solve the problem of metadata generation without requiring a radical breakthrough in AI.
A-ha! Clarity! And yet, what is metadata? The relational schema is the metadata (and vice versa), and what is a relational query but a definition of a new schema?
The breakthroughs have already been made in Codd's work (and the subsequent work of everyone who has researched RDBMS such as Date, etc.) in the past forty years. We'd require "earth shattering breakthroughs" to solve this problem any other way.
I'm still not seeing as to why Google, implementing a D-RDBMS, cannot solve this problem and provide significant additional benefits (irrespective of political, emotional, or other ancillary factors). I would like to know why you think this is so.
What is metadata? Literally it's 'data about data', but everyone knows that. What is metadata in the context of data management? The schema is metadata! The metadata is the schema!
It is the case that, in order for a search to work across systems, there would have to be some commonality of the schema. I would think, though, that this would not be as impossible as you are making it. The "semantic web" people are developing just that, although are using yucky XML to do it. Think of all the standards out there - from SOAP to TCP/IP to RDF to ASCII. All have to do with a common format for exchange and/or meaning. The need for search would drive application developers to adopt similar logical models. The nice thing about the RDBMS concept is that you can define any logical view of the data that you want. If you have someone that cannot access your data your way, you can create a view of that data which corresponds to their schema.
I would think, over time, schemas would converge. Traditional, text-based search tools would perhaps need to be employed to search information that is not correctly defined, but why should that restrict someone from providing more information about their data the way the RDBMS does?
Oh, I also want to thank you for the discussion. It has been, at least to me, thought provoking and remarkably civil. It is rare form these days to have an IT discussion that doesn't involve name calling, flaming, and/or other stupidity.
"the technical point has largely been ignored by the computer industy"
Simply because most of the world is ignorant does not make it a particularly welcome idea to willingly embrace their ignorance. Hence, I try and use correct terms whenever possible (RDBMS vs. SQL DBMS, Cracker vs. Hacker, the Terrorist Attacks of September 11, 2001 vs. 9/11) etc. But, that is neither here nor there.
I'll address your points briefly before I get to the root of my initial desire for posting.
"Technically superior solutions may have aesthetic superiority"
That seems like a contradiction. Something that is "technically" (I am assuming you mean 'of a technical nature' and not 'abstractly') superior is certainly more than aesthetically superior!!
RDBMS certainly have considerably more functionality than SQL DBMS products. This is clear once you read the original theories and the foundations behind them. Your sentence illustrates the myth that, in the IT industry, technically superior products will rise to the top. Your mention of "worse is better" (I really, really hate that title, it should really be "Worse is Sometimes More Marketable" or the like) reinforces this point exactly.
"The query, my name Versailles pictures is probably good enough"
It is good enough only in the micro. There is a statistic which mentions the geometric (maybe even exponential) rate at which we are creating and storing new data. Sure, for your current family album this level of granularity may suffice - but I suspect in the future our family albums will be composed of video, audio, stills, etc. at a magnitude that makes getting 10,000 results impossible to sort through by hand. You'll require more accurate search results and will want to ask more precise questions. The RDBMS is the way to get this; read my other posts on this thread to see some suggestions re: metadata; in short, the solution for tagging metadata is obviously not a 'solved problem' yet - this is mostly because no one has seriously tried to study it; also having a complete RDBMS there would aid immensely with relating and tagging your information. Properly implemented (whatever that may be) I would think that there would be little typing required.
The reason why I decided to post my initial reply was that this was a questionnaire by a guy at Google. If there is one company that could/would implement a D-RDBMS it would be Google.
It's obvious that Microsoft, Oracle, et al would not lead the way in this sort of innovation. Their products, marketing strategy, and internal politics would not allow for a TRDBMS to be at the core of any Microsoft operating system and Office Suite they ship, nor would Oracle want to adapt to something which required a shift from SQL or allow for easy migration to a competing product.
That brings us back to Google. Google is just the right kind of company to pull it off: it's got the technical expertise, name recognition and reputation, and the willingness to truly revolutionize the way we work with computers.
Ideally, Google would start using a form of RDBMS for all the search indexes it creates for their desktop search tool (I don't know what kind of DB it uses now). It would take a given document, rip it into their RDBMS, and then allow for searching. Since Google has virtually written the textbook on large scale data distribution they could load your local DB into their pool, so now whenever you log into Google.com you can search (and with enough bandwidth, retrieve) your information anywhere, any time (this would be perfect for companies trying to manage data for projects, etc.).
But, since it was in a RDBMS, other applications could be written to extend the idea. I could extend my product with the Google tool by storing my data in some format edible by the search tool. I now have Google Search built into my application. Or, I write a different UI which allows you to abandon the Windows "Explorer.exe" altogether - it gets rid of the archaic 'files' an
Of course, there are other benefits to universal RDBMS storage:
Application programs are easier to write
No need to write custom file format
RDBMS embeds business logic and simplifies code generation (WHAT, not HOW)
Perfect extension of "Information Wants [sic] to be Free" - more accurately "Your Information is Free"
Any application can read data created by any other application (security controls permitting)
Source code can be stored in RDBMS
No longer have 'tabs vs. spaces' arguments
Can convert from one language to another easily (just a different view of same data)
Data and programs no longer are confined to 'web' or 'local'
It will be impossible to tell whether or not a particular piece of data lives on your own PC, or on a central server (or cluster)
(Depends on universal connectivity)
there are more, but that was a quick bullet-list
"But this seems to contradict what you were saying in your original post, about how structures that relyed on the user giving you data were not going to be successful - which I agree with."
True. As I was writing that, I didn't feel comfortable with it, but I wasn't sure how to phrase it any other way. When you are creating content you would have to have an easy method of declaring certain attributes. This comes down to the application program that you use to create the data - and this type of application would be much easier to write because it would contain a search tool in it to specify the metadata; therefore it is much more likely that the application does this in a user-friendly, automatic way (and the user is most likely to provide metadata as necessary).
"Any system must be able to deal with data that primarily is not going to have any meaning assigned to it at any point by the user"
Well, when you create data, the application should probably perform this process and then you would approve/clarify the intent. This, to me, sounds much better than some method out there which takes the content and tries to derive meaning from it independent of you, the creator. Given Joe User or Google, I'll take Joe's word that this document means what he says it means (of course, you get into the whole trust topic, but there is nothing to say that you can't have content aggregators {Google, et al} which rate the assertions made by the metadata).
"Metadata... That's a little abstract, so a few examples are..."
Attributes and metadata are an integral part of the relational model. The types of queries you propose are certainly solvable (well, you would have to define what "like" meant, first) in a RDBMS by a search tool which has access to the schema of the data you are searching.
What I am proposing is not exactly 180* from what you and the other poster have suggested; a search tool would be doing something like what a RDBMS would be doing but would be defining the schema on the fly, without the data creator's input, and with limited information. So, the RDBMS would have a clear advantage because the metadata is already defined up front - you merely have to query it.
I mentioned in another post that SQL products are NOT RDBMS; so such an implementation of D-DBMS would be unwieldy at best in Oracle. That said, I will reply as if we were discussing a generalized RDBMS and not a poor, incomplete implementation (MySQL is an extremely poor, incomplete implementation).
PageRank is an algorithm of popularity and not an algorithm of relevancy and as such, it really bears little relevance to implementation of relevancy algorithms as we are discussing. Of course, relevancy algorithms could contain page rank as a heuristic. See http://www.google.com/technology/
Google, in essence, is creating a schema for every page that it indexes. This schema is, in virtually every case, incomplete (because Google's algorithms are not perfect). In order for you to create a document (in the New World Order there really is no such thing as a web page any more) about Atlantic Slave Trade, you would have to have some sort of schema that defines it (by definition, it would require one). Of course, there would probably be schemas for historical documents, product literature, etc. which share a common foundation and attributes (kind of like inheritance in OO). This is not an impossible task; HTML was standardized, we have standards for everything nowadays.
This idea ("All the world's an RDBMS") merely formalizes this process. Note that this hinges upon whatever document creation software you use to perform this process for you; the nice part is this is not a fundamentally unsolvable problem (Codd did a lot of the hard theoretical work already; all someone now needs to do is implement it). The XML guys are trying to do it with the semantic web; unfortunately they chose a poor implementation technology (XML).
Your metadata is lost because current algorithms are imperfect. Let's consider your trip to Versailles. You probably want to share that information to the world, as people are wont to do. Currently, you probably type something in like:
Here are pics from my trip: X Y Z
How does Google know that "my trip" refers to the trip that you took from 01-OCT-1995 to 20-OCT-1995 with three friends? What about the content of the pictures?
Could you ask Google "Where was I on 12-OCT-1995?" What about "Who was I with?" or "Where was this picture taken? What is this picture of?".
You could make "your trip" link to another HTML page which has some information about your trip, but then Google has no capability to make those connections (unless you explicitly have matchable text in the document). If your query was against a defined schema, then those attributes would be available for the engine to utilize. Of course, we're not talking about a magic AI engine that does this; the schema drives and defines the query.
Finally, there is no requirement for a 'formal language' - when you do a Google search do you have to specify a formal language? That is a matter of implementation (as is look/feel: when you view a particular web page, you are unaware as to the source of the data - it could be generated by a DBMS or is simply a static text file.)
In order to provide relevant results the search algorithm must derive at least some meaning from the data. The RDBMS does just this in a well-known, accurate manner. Why not give the algorithm more data with which to make its inferences? That would lead directly to algorithms that are:
1) Less complicated
2) More accurate
3) Easier to develop/debug (probably ties to #1)
And of course the end-user is going to derive more meaning from the data than computers can (currently, without true AI) provide. But the point is that users give data specific meaning when they create the data - meaning which is currently lost when storing 'plain text' (HTML, Word document, etc.). Storing in a RDBMS attempts to preserve as much of that meaning as is possible.
Oh, and by RDBMS I do not mean current SQL products; their limitations would probably make such a solution clumsy at best and unworkable at worst.
... but then again so did the person who posed the question.
I understand the idea that anything user-facing should probably be as simple as possible. This means that ideas that require user-supplied metadata (as the typical XML-in-filesystem ideas require) are probably not going to be successful. I also agree that Joe User doesn't care whether or not his data is stored in a RDBMS or in a plain text file if his search tool does a good job.
The phrase "structure is meaningless; search is king" is a non-sequitur to someone aware of data management fundamentals. Structure gives meaning which in turn allows you to relate the data to others. The problem today is that we're creating data and storing it in'plain text' (or flat file, proprietary, etc.) physical formats instead of storing emails, word processing documents, etc. in a RDBMS.
The RDBMS is more than simply a search tool; that it has a sound model, provides for easier application development, etc. Wouldn't search be significantly easier to do if your data is given a consistent logical view? If you know the semantics of a particular piece of data, you no longer need to waste your time classifying it to search.
It seems that a proper solution would be that every PC contained a RDBMS, all data is stored in one, and that the internet would simply be a series of interconnected, distributed RDBMS (D-RDBMS). This idea would probably be fairly difficult to implement, but is already being performed at Google anyway (albeit in a slightly different format). Back when Codd developed the model he was primarily concerned with institutional databases -- centralized schema validation/data storage/etc. The problems implementing D-RDBMS products are not trivial, but then again are not insurmountable. The world has been able to standardize on protocols, etc. so I don't think it is out of the realm of possibility to suggest that different companies/users/applications could agree on a particular schema for, say, emails.
The really funny thing is that XML promised "automatic" communication via machines. Namely that the schema would be used to (magically, some would say) divine knowledge from the stream.
Having a standard XML format is an oxymoron!
I agree wholeheartedly. PostgreSQL and FireBird would suit LiveJournal way better than MySQL. However, PostgreSQL's replication is not exactly fail-save (not sure if that's a requirement here) nor automatic, nor does it have the kind of partitioning features that some of the 'bigger' boys have.
I was thinking mostly of Sybase Replication Server combined with Sybase ASE or Oracle 10g/Oracle Clustering, things that would go really, really nicely in the environment and workload the LiveJournal folk are experiencing.
I can't speak to the "Perl Sux!" allegation but I would say that MySQL is at least partially at fault, too, especially considering the limited clustering, partitioning, replication, and locking schemes it has.
They could/should have moved to a much better DBMS. Although the DBMS licensing fee would've been non-trivial it would have meant SIGNIFICANTLY reduced hardware costs and much much less application code development. I even suggested this several years ago but I was told that licensing costs were prohibitive even as they were throwing away $40K on useless hardware.