>> So the discovery of the method to make steel should not be a patentable invention?
This appears to be simply a fact (discovery of scientific information) that you can make steel. Specific proceses to do it efficiently might be another matter. Steel (today) is relatively easy to make, so it's no longer a revolutionary process. If the process was something not now known, like how to make diamonds in gem quality (say, D IF) and 5 carat size for 50 cents a carat, I'd probably agree that that is new and patentable.
>> A new process that can manufacture amazing new hard drives should not be patentable? A new process that produces an existing product for 10% of the cost isn't an innovation that promotes science?
The key bit in those two is the word "new". I doubt that anyone here has not heard of reviews or consumer reviews, so the smell test for a patent involving them should be starting out with some pretty high hurdles for showing that there's really something other than what we already know being patented, not just the thought of doing them with a green pen instead of a blue pen or a computer screen instead of a piece of paper.
The problem for the cases which are mentioned on Slashdot is that they don't pass the smell test and are generally not considered new or innovative by those here who are professionals in the field the patent involves. When the patent system is granting such patents, then it's obvious that those same professionals are going to lose respect for the system.
For those involved in the patent system, I can see that it is a problem, because the system is intended to reward real innovation.
It's also quite tough because almost all of the new work being done in the field doesn't in any way involve the patent system, so I can understand it being hard to know what is novel and what isn't.
In this particular field, the patent system is in many ways granting a patent on a new way of doing art, when it sees perhaps as much as one in a billion of the works of art which are happening and is trying to work out what is new in the field. Meanwhile the typical artists are just getting on and producing the art, without involving the patent system until some person comes and tells them that they have been granted a patent on what they already knew.
The chances of me considering filing a patent application for a ten minute or two day job are close to zero, yet those time thresholds are those often involved in innovation in the field - and are routinely done by the first person to be asked to solve the particular problem presented. Different person, same problem, very likely the same solution, applying the same mixture of known techniques. And those problems aren't unusual. They are typically involved in every job anyone in the field does, as a normal part of their working life, doing it many times a day.
I expect that you both know what you're talking about. The problem is, the person you replied to is suggesting that the law in this area is currently failing any useful rationality or benefit test. Your understanding of this portion of the law doesn't make it more or less rational or beneficial.
For me, such decisions tend to cause me to favor the view that patents of non-physical items are a significant barrier to innovation and we should consider ending them as a means of encouraging innovation.
How does RAID improve the latency or throughput of a database server transaction log file which is being flushed to disk at the end of every transaction?
MySQL and Postgress differ in interesting ways. For example. MySQL now supports the XA distributed transaction standard, beyond the two phase commit planned for PostgreSQL 8.1. It's well worth it for PG users to try MySQL with their real applications, IMO - seems likely to be the only way to find out which really is better for each application, unless the app needs something which is only present in one or the other of them.
So, with things like this going on, what's the relative prevalence of rootkits on music purchased on CD from a store and music downloaded (legally or illegally) from file trading networks?
It's starting to look as though it's more secure to go with the file trading networks than the stores.
Of course you're right that what works for Wikipedia may (will!) not work for everyone. Still, there are a fair number of case studies on the MySQL web site covering other situations. It's not all OK that the database servers are overloaded, just a fact of life for as long as I've been around - make things fast and usage rapidly rises to fill the gap. The request rate graph shows what's been happening - six doublings from March 2004 to October 2005, roughly one every three months. We're gradually changing to a more "serious" place as people's availability and performance expectations rise - and the amount of money raised to support that rises with the audience, so it's looking quite doable.
You're also right that 20 inserts per second is nothing compared to some application needs. Just happens to be what it is for this one. The selects aren't all extremely simple but none are extremely complex and we've tuned most of the problematic ones by now; most of the remaining slow ones were improved by a schema change in the most recent MediaWiki software release. For us, they are denial of service attack vulnerability, so we have to be careful with anything which is really slow.
5.0 has a new greedy query optmiser, limits on search depth and the ability to use more than one index per table alias in a query, so it's a major step forward and should help the 20 table cases quite a bit. But probablly still needs work in that area.
The total data set is around half a terrabyte to a terrabyte. Pretty small compared to serious warehousing and some serious corporate or data collection applications. Pretty big compared to most other things, but nowhere near MySQL's limits.
We take a lot of load off the database servers with caching of various sorts. Not that it would be a problem to add more databse servers to scale read rates. Writes are tougher. Doable, of course but not necessary for us yet. We happen to have a data set which can be partitioned at the application level moderately well, so that offers a convenient way to deal with write load without getting too fancy. We're just starting to do that now, since it's more cost-effective than not doing it. But it's certainly the case that not all applications are so easy to partition at the application level.
You forgot to say why: both were affected by flaws in two caching disk controller brands. Even though they had battery backup, the controllers didn't turn off the drive caches, so they lost what was in the drive cache. Completely defeated the point of having the battery backup in the first place. Both controller vendors subsequently did some work to address this issue.
Personally, I recommend getting controllers which don't throw away the data they are supposed to be protecting with their battery.
Both LiveJournal and Wikipedia have asked MySQL to try to be more tolerant of screwed up hardware like that.
You're right about the disks. There was some thought that a schema optimisation with the new MediaWiki software version a few months ago and/or the switch from 8GB to 16GB might change the database servers from disk limited to CPU limited. Neither did and it appears that something like 10-12 15K SCSI drives will be about right for dual Opterons, maybe 18-22 for dual dual core Opterons. Not sure of the exact status of the order but we've at least one drive box ordered to get some data on how a greater number of drives will do.
The number six happened originally because that was the number of hot swap bays in the 2U cases offered by Silicon Mechanics. Still is.
Yes, I'm the one you're thinking of. There's an email link on my Wikipedia user page if you'd like to email me for a private discussion. How have things been going?
Sorry, I can't say much useful about the Sun v20z boxes. We haven't had them long enough, nor under high enough load, to have much in the way of useful comparative comments yet. Same applies to the HP DL140 boxes. Just too soon. If not otherwise specified you can assume that all the systems are built by Silicon Mechanics.
Adler is a dual Opteron with 16GB of RAM and 6 15,000 RPM SCSI drives. We have two like that, one with 8GB and 15K SCSI, several with 4GB and slower drives.
Suda, a dual Opteron 4GB box with 6 10K SCSI drives has 8 day uptime and 580 qps average but it's probably been out of normal load quite a bit of that time for various chores.
Holbach, a dual Opteron 4GB box with 6 10K SCSI drives has 28 day uptime and 616 qps average.
Ariel, a dual Opteron 8GB box with 6 15K SCSI drives has 8 day uptime after repairs and 1280qps.
Samuel, the current master, is another 32GB dual Opteron with 6 15K SCSI drives, has 83 day uptime (that is, no crashes or deliberate MySQL server shutdowns for 83 days). Only 367 qps and I think it was not in service for quite a while before it was made master - think we put it into service earlier than planned because of the hardware problem on Adler. It's typically running nearer 2000-4000qps now it is in service.
khaldun and bacon, both 4GB dual Opterons, one with 10KK SATA and the other with 7200RPM SATA are both down. All are running Fedora core.
Add up those query averages and it comes to 385 million per day cross 5 servers. We might pass the billion select per day mark this year; hard to predict.
Some people will say that MySQL is incapable of doing serious work, even though just 5 main database servers are powering a top 100 site delivering 1 in every 1,000 web pages viewed in Alexa.com's sample. Others will say they use MySQL because it gets the job done. Including me.:)
You might not believe an anon but how about believing Google's CEO? There's more to Google than you know.
"Google views its ability to innovate as critical key to its long-term success against rivals such as Yahoo Inc. and Microsoft Corp. Schmidt said. But this quest for new ideas is also behind his company's embrace of open source technologies such as the Linux operating system and the MySQL database, both of which are heavily used by the Mountain View, California, search company.
Schmidt had originally wanted Google to use a commercial database supplier such as Oracle Corp. or Sybase Inc. for Google's back end, but his engineers convinced him that MySQL was actually better suited to the company's needs"
Nice summary. One small clarification. In production 5.0 trailing spaces are not removed from VARCHARS:
"When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed."
"VARCHAR values are not padded when they are stored. Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means the spaces also are absent from retrieved values."
10.0.0.101 is Adler. Its uptime is currently 2017391 seconds (23 days). Adler's uptime is that short because it had a hardware repair. It was probably overload - several DB servers are dead right now and Monday is the busiest day for the site. So far the site is consistently filling to capacity all the hardware which is ordered and that shows no sign of stopping. It's now at 4500 pages per second, 400 megabits/s. For scale, the biggest Slashdotting the site saw was about 650 pages per second.
Averages over 23 days for this one server: 1620 selects per second, 10 inserts and 3 replaces per second. That is: 140 million selects per day average. Peak rates are about double average rates, typically in the 3000-5000 qps range.
I'm one of the roots at Wikipedia. Figures from SHOW STATUS just before typing this reply.
MySQL commercial licenses are already available for many platforms. Producing one certified for (tested on) SCO is simply more of the same. If you don't want to buy that from SCO, go right ahead and ask MySQL for a commercial (closed source) license instead. Your choice. As is the decision to need a commercial license instead of going with open source.
I've never used or even installed InnoDB Hot Backup to back up the few hundred gigabytes we have at Wikipedia. What we do is take one of the database slaves out of producton service and then either use mysqldump or shut down its server and copy the database files. It's effective, without adding the unnecessary cost of InnoDB Hot Backup. This approach also avoids adding undesired disk load to a server people are using.
Sabre according to the MySQL site: "estimated a 40% TCO savings when budgeting for the $100 million project. In many cases, the ongoing TCO savings are expected to be up to 80% - twice what was estimated".
Any corporate IT manager who isn't interested in the prospect of saving $40 million in costs and improving results through open source seems quite likely to find their company out-competed by those who do.
My own background is for a little place which is now serving about one in every thousand web pages viewed in Alexa's sample; and with some database servers down still showed billion query per day capability on a handful of MySQL database servers. About 3,500 pages per second (bad Slashdotting is about 600 pages per second). A few hundred gigabytes of compressed data, in the terrabyte range without.
The worlds in the process of changing and those who want to keep up need to be moving fast.
It's not so much that rewriting is but but that there are bad times to rewrite. Really old and stable code isn't a good target. Really new code with completely new function and an architecture which has been found not to be a good match for the real world objective it's addressing would be a much better target.
You might remember that it's possible to dedicate software to the public domain prior to the expiration of a copyright term, including at the time it's created. Such software is the most readily reusable form of open source there is, since it is compatible with public domain works as well as all open source licenses.
>> So the discovery of the method to make steel should not be a patentable invention?
This appears to be simply a fact (discovery of scientific information) that you can make steel. Specific proceses to do it efficiently might be another matter. Steel (today) is relatively easy to make, so it's no longer a revolutionary process. If the process was something not now known, like how to make diamonds in gem quality (say, D IF) and 5 carat size for 50 cents a carat, I'd probably agree that that is new and patentable.
>> A new process that can manufacture amazing new hard drives should not be patentable? A new process that produces an existing product for 10% of the cost isn't an innovation that promotes science?
The key bit in those two is the word "new". I doubt that anyone here has not heard of reviews or consumer reviews, so the smell test for a patent involving them should be starting out with some pretty high hurdles for showing that there's really something other than what we already know being patented, not just the thought of doing them with a green pen instead of a blue pen or a computer screen instead of a piece of paper.
The problem for the cases which are mentioned on Slashdot is that they don't pass the smell test and are generally not considered new or innovative by those here who are professionals in the field the patent involves. When the patent system is granting such patents, then it's obvious that those same professionals are going to lose respect for the system.
For those involved in the patent system, I can see that it is a problem, because the system is intended to reward real innovation.
It's also quite tough because almost all of the new work being done in the field doesn't in any way involve the patent system, so I can understand it being hard to know what is novel and what isn't.
In this particular field, the patent system is in many ways granting a patent on a new way of doing art, when it sees perhaps as much as one in a billion of the works of art which are happening and is trying to work out what is new in the field. Meanwhile the typical artists are just getting on and producing the art, without involving the patent system until some person comes and tells them that they have been granted a patent on what they already knew.
The chances of me considering filing a patent application for a ten minute or two day job are close to zero, yet those time thresholds are those often involved in innovation in the field - and are routinely done by the first person to be asked to solve the particular problem presented. Different person, same problem, very likely the same solution, applying the same mixture of known techniques. And those problems aren't unusual. They are typically involved in every job anyone in the field does, as a normal part of their working life, doing it many times a day.
I expect that you both know what you're talking about. The problem is, the person you replied to is suggesting that the law in this area is currently failing any useful rationality or benefit test. Your understanding of this portion of the law doesn't make it more or less rational or beneficial.
For me, such decisions tend to cause me to favor the view that patents of non-physical items are a significant barrier to innovation and we should consider ending them as a means of encouraging innovation.
Which are those? That's cheap enough to be interesting for people using databases with transaction logs being flushed after every transaction.
How does RAID improve the latency or throughput of a database server transaction log file which is being flushed to disk at the end of every transaction?
MySQL and Postgress differ in interesting ways. For example. MySQL now supports the XA distributed transaction standard, beyond the two phase commit planned for PostgreSQL 8.1. It's well worth it for PG users to try MySQL with their real applications, IMO - seems likely to be the only way to find out which really is better for each application, unless the app needs something which is only present in one or the other of them.
DDL causes an automatic commit, so you can't roll it back.
So, with things like this going on, what's the relative prevalence of rootkits on music purchased on CD from a store and music downloaded (legally or illegally) from file trading networks?
It's starting to look as though it's more secure to go with the file trading networks than the stores.
You're also right that 20 inserts per second is nothing compared to some application needs. Just happens to be what it is for this one. The selects aren't all extremely simple but none are extremely complex and we've tuned most of the problematic ones by now; most of the remaining slow ones were improved by a schema change in the most recent MediaWiki software release. For us, they are denial of service attack vulnerability, so we have to be careful with anything which is really slow.
5.0 has a new greedy query optmiser, limits on search depth and the ability to use more than one index per table alias in a query, so it's a major step forward and should help the 20 table cases quite a bit. But probablly still needs work in that area.
The total data set is around half a terrabyte to a terrabyte. Pretty small compared to serious warehousing and some serious corporate or data collection applications. Pretty big compared to most other things, but nowhere near MySQL's limits.
We take a lot of load off the database servers with caching of various sorts. Not that it would be a problem to add more databse servers to scale read rates. Writes are tougher. Doable, of course but not necessary for us yet. We happen to have a data set which can be partitioned at the application level moderately well, so that offers a convenient way to deal with write load without getting too fancy. We're just starting to do that now, since it's more cost-effective than not doing it. But it's certainly the case that not all applications are so easy to partition at the application level.
You forgot to say why: both were affected by flaws in two caching disk controller brands. Even though they had battery backup, the controllers didn't turn off the drive caches, so they lost what was in the drive cache. Completely defeated the point of having the battery backup in the first place. Both controller vendors subsequently did some work to address this issue.
Personally, I recommend getting controllers which don't throw away the data they are supposed to be protecting with their battery.
Both LiveJournal and Wikipedia have asked MySQL to try to be more tolerant of screwed up hardware like that.
You're right about the disks. There was some thought that a schema optimisation with the new MediaWiki software version a few months ago and/or the switch from 8GB to 16GB might change the database servers from disk limited to CPU limited. Neither did and it appears that something like 10-12 15K SCSI drives will be about right for dual Opterons, maybe 18-22 for dual dual core Opterons. Not sure of the exact status of the order but we've at least one drive box ordered to get some data on how a greater number of drives will do.
The number six happened originally because that was the number of hot swap bays in the 2U cases offered by Silicon Mechanics. Still is.
Yes, I'm the one you're thinking of. There's an email link on my Wikipedia user page if you'd like to email me for a private discussion. How have things been going?
Sorry, I can't say much useful about the Sun v20z boxes. We haven't had them long enough, nor under high enough load, to have much in the way of useful comparative comments yet. Same applies to the HP DL140 boxes. Just too soon. If not otherwise specified you can assume that all the systems are built by Silicon Mechanics.
Correction: Samuel is a 16GB dual Opteron, not 32GB.
MySQL 5.0 can at times use multiple indexes for the same table alias for the same query. See the Index Merge Optimization.
Adler is a dual Opteron with 16GB of RAM and 6 15,000 RPM SCSI drives. We have two like that, one with 8GB and 15K SCSI, several with 4GB and slower drives.
:)
Suda, a dual Opteron 4GB box with 6 10K SCSI drives has 8 day uptime and 580 qps average but it's probably been out of normal load quite a bit of that time for various chores.
Holbach, a dual Opteron 4GB box with 6 10K SCSI drives has 28 day uptime and 616 qps average.
Ariel, a dual Opteron 8GB box with 6 15K SCSI drives has 8 day uptime after repairs and 1280qps.
Samuel, the current master, is another 32GB dual Opteron with 6 15K SCSI drives, has 83 day uptime (that is, no crashes or deliberate MySQL server shutdowns for 83 days). Only 367 qps and I think it was not in service for quite a while before it was made master - think we put it into service earlier than planned because of the hardware problem on Adler. It's typically running nearer 2000-4000qps now it is in service.
khaldun and bacon, both 4GB dual Opterons, one with 10KK SATA and the other with 7200RPM SATA are both down. All are running Fedora core.
Add up those query averages and it comes to 385 million per day cross 5 servers. We might pass the billion select per day mark this year; hard to predict.
Some people will say that MySQL is incapable of doing serious work, even though just 5 main database servers are powering a top 100 site delivering 1 in every 1,000 web pages viewed in Alexa.com's sample. Others will say they use MySQL because it gets the job done. Including me.
You might not believe an anon but how about believing Google's CEO? There's more to Google than you know.
e =2&articleid=1236&pubid=3&issueid=49
"Google views its ability to innovate as critical key to its long-term success against rivals such as Yahoo Inc. and Microsoft Corp. Schmidt said. But this quest for new ideas is also behind his company's embrace of open source technologies such as the Linux operating system and the MySQL database, both of which are heavily used by the Mountain View, California, search company.
Schmidt had originally wanted Google to use a commercial database supplier such as Oracle Corp. or Sybase Inc. for Google's back end, but his engineers convinced him that MySQL was actually better suited to the company's needs"
http://computerworld.com.sg/ShowPage.aspx?pagetyp
You can use the MySQL client libraries with at least 20 non-GPL licenses, including PHP, BSD and LGPL. See MySQL FLOSS License Exception.
Nice summary. One small clarification. In production 5.0 trailing spaces are not removed from VARCHARS:
"When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed."
"VARCHAR values are not padded when they are stored. Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means the spaces also are absent from retrieved values."
http://dev.mysql.com/doc/refman/5.0/en/char.html
10.0.0.101 is Adler. Its uptime is currently 2017391 seconds (23 days). Adler's uptime is that short because it had a hardware repair. It was probably overload - several DB servers are dead right now and Monday is the busiest day for the site. So far the site is consistently filling to capacity all the hardware which is ordered and that shows no sign of stopping. It's now at 4500 pages per second, 400 megabits/s. For scale, the biggest Slashdotting the site saw was about 650 pages per second.
Averages over 23 days for this one server: 1620 selects per second, 10 inserts and 3 replaces per second. That is: 140 million selects per day average. Peak rates are about double average rates, typically in the 3000-5000 qps range.
I'm one of the roots at Wikipedia. Figures from SHOW STATUS just before typing this reply.
MySQL commercial licenses are already available for many platforms. Producing one certified for (tested on) SCO is simply more of the same. If you don't want to buy that from SCO, go right ahead and ask MySQL for a commercial (closed source) license instead. Your choice. As is the decision to need a commercial license instead of going with open source.
I've never used or even installed InnoDB Hot Backup to back up the few hundred gigabytes we have at Wikipedia. What we do is take one of the database slaves out of producton service and then either use mysqldump or shut down its server and copy the database files. It's effective, without adding the unnecessary cost of InnoDB Hot Backup. This approach also avoids adding undesired disk load to a server people are using.
>> is never going to be cheaper than
Sabre according to the MySQL site: "estimated a 40% TCO savings when budgeting for the $100 million project. In many cases, the ongoing TCO savings are expected to be up to 80% - twice what was estimated".
Any corporate IT manager who isn't interested in the prospect of saving $40 million in costs and improving results through open source seems quite likely to find their company out-competed by those who do.
My own background is for a little place which is now serving about one in every thousand web pages viewed in Alexa's sample; and with some database servers down still showed billion query per day capability on a handful of MySQL database servers. About 3,500 pages per second (bad Slashdotting is about 600 pages per second). A few hundred gigabytes of compressed data, in the terrabyte range without.
The worlds in the process of changing and those who want to keep up need to be moving fast.
Can probably trust both but only MySQL found a way to have SCO fund the development of GPL'd software it had condemned... :)
It's not so much that rewriting is but but that there are bad times to rewrite. Really old and stable code isn't a good target. Really new code with completely new function and an architecture which has been found not to be a good match for the real world objective it's addressing would be a much better target.
You might remember that it's possible to dedicate software to the public domain prior to the expiration of a copyright term, including at the time it's created. Such software is the most readily reusable form of open source there is, since it is compatible with public domain works as well as all open source licenses.