What privacy? He's dead and can no longer care about anything. Looks as though there will no longer be a de.wikipedia domain, courtesy of his family. Hope they like their own Wikipedia entry, since the press coverage for this causes them to pass the threshold for qualifying for one.
So the German chapter loses all of its assets over the content of the German language (note German) Wikipedia. That's fine. Beats giving the German or Swiss or other courts control of all German language content.
It's already been established that a collectors' guide can contain images and titles of every image in a set of copyrighted works. Specifically, Beanie Babies in the case Ty, Inc., vs. Publications International, Ltd.. The fair use arguments in that case are particularly interesting, since they cover the requirement for a collector's guide to be complete to be successful and the transformative nature of the use, both of which would apply to the use of baseball statistics in fantasy games.
One of the well established principles of US copyright law is that no amount of hard work creates a copyright. Only original creation does. The "sweat of the brow" factor doesn't matter at all.
You can create custom security zones which don't show up in IE. Those zones are site-specific and could configure just the Windows Update site to have access to ActiveX. Microsoft could ship Windows with such a zone set up.
You may well already know this, but it might be of interest to others:
I recommend reading the full Assessment Technologies v. WIREdata (slow to load) decision because it's a very well written summary of this area of law. In this case the use of proprietary components to prevent the use of underlying public domain data was found to be invalid.
As you note, creativity can still prevent a compilation from being in the public domain, if there's some significant original creativity involved. One of the interesting bits of Assessment Technologies v. WIREdata was the requirement to hand over even the bits which might be copyrightable - the database structure - so that the data would be available.
"We've also made our data available under Creative Commons License 2.5". Data is ineligible for copyright cover in the United States, so no license is needed or can apply.
They wouldn't bundle an unnecessary license with useful data just after writing about bundling unnecessary software with desired applications, would they?:)
It is useful outside the US, though, so this is actually a but tongue in cheek.:)
When load starts to matter you always do have to design with knowledge of the properties of the storage engine in use. It's one of the reasons for the huge range of tweaking options some database servers have.
Consider one Wikipedia example. We keep the full version history for every article. We used to store those in order added, regardless of the article, so each version of each article would be in different pages in different locations on disk. That's the way the MyISAM engine always does it and if I understand correctly, the way PostgreSQL does it (but I don't know PostgreSQL that well, so I could be wrong).
Trouble is, that was extremely inefficient. With each version of one article in a different page we ended up doing fifty page reads to display a list of fifty versions. Worse, the code at that time wasn't very clever about how it went back further and you could ask for the versions between 50,000 and 50,050 and cause 50,050 pages and hence 50,050 disk seeks to be needed. That was slow and a denial of service attack vulnerability.
So, we exploited the way InnoDB stores records physically in primary key order and made the article ID the first part of the primary key. Now one page contains many revisions of the same article and displaying any number of revisions is painless.
None of this matters at low load. Once things get tough, though, I end up thinking about cache hit rates and disk seeks. That Wikipedia case changed lots of disk seeks with terrible cache hit rate to few seeks and excellent hit rate.
Ignoring RAM-only engines for the moment, PostgreSQL and other databases can't avoid such concerns because they also need to put data on a disk, read it and manage a cache efficiently.
Here's where the different MySQL engines also become interesting, because if you can't get a suitable architecture with one, there's a fair chance that another engine exists or can be developed to do what you need to get really efficient.
I see your point about atomicity but why do the engines in MySQL which don't offer ACID guarantees matter? Just don't pick one of those if you need that capability.
InnoDB isn't necessarily slower. It depends on the application. For LiveJournal they found it faster when they switched from MyISAM to InnoDB, for their mostly write load. It has the advantage of controlling both index and data record caching and it seems to do that significantly better than the operating system, so if it can stay ahead on transaction fsyncs it can come out ahead.
Benchmarks say that MyISAM is faster for parts benchmark load if transactions aren't needed, particularly for truly random read only access, but I'm not so sure in the real world, where you're able to design based on knowing the properties of the engines and can exploit them instead of having fixed schemas.
You probably know this anyway, but just for anyone who doesn't, bceause I really don't like telling people that they have no way to recover their data:
If your power goes out (as it will for everyone, if only due to emergency power off switch activation in a data center) then I hope you have a replication slave in a location on a completely different power supply, like a different state or country, because power outages have spanned many states and most of countries. One day, the filesystem or database or whole computer isn't going to survive the outage (or fire or flood or whatever) without corruption and copying from a slave beats backup restore and binary log replay.
For load and transactions, I'm pretty comfortable with it. Could probably find a suitable architecture for most real-world cases. Not all.
That's not all there is to reliability, though - the other part is harder: protecting the data from application developers screwing up or deliberately attacking the system. MySQL is definitely not yet suitable for cases where inside developers will be attacking, unless there's a trusted middleware layer or other protection present. The old data validation criticism of MySQL is just one tiny part of this particular problem.
At Wikipedia we're still concentrating on using the donations to try to keep up with the growth and that's enough strain on resources already. Not likely to get much more reliable until growth slows down and donations have a chance to catch up and start paying for reliability more than sheer capacity. Can't stop trying to grow the capacity because that would cause timeout failures in page loadings and worse apparent reliability.
The best remedy for now is donations when money is asked for. It probably won't dramatically improve reliability this year but it might let us keep up. Might help reliability more next year. Depends on how donations to pay for the capacity for this years growth go and on how long the growth continues and at what rate.
It's an interesting challenge to do what Wikipedia is doing on donations.
True, but MySQL qualifies as one of the poster boys of the open source movement and it does reassure quite a lot of people to have a well funded company to go to. If that's what it takes to get big government and corporate types interested in FOSS, that's useful enough - they are also part of the world FOSS has to take over.
Agreed, it can be important. My own personal preference is for BSD-like licenses or pure PD - I'm not keen on copyleft. But GPL-based plus other open source is MySQL's approach and the company seems pretty consistent in its objectives.
I don't think that MySQL is using open source simply to push a proprietary agenda but I've had the pleasure of meeting the two founders still with the company as well as many other employees, and knowing that MySQL looks for people who are committed to open source goals. But it's really tough to show that online, so the best I can really do is suggest that you take an opportunity to meet MySQL technical people whenever you can and find out for yourself what those people are like.
Instead, I think that anything MySQL does with proprietary software, be it licenses or anything else, is intended to push the adoption of open source software, including a developing a strong and profitable MySQL database company as one of the key players.
What happened to their failover system? What happened to their pager? What happened to their backups? Was it the server down or did someone rename it and not change the PHP connection settings or... lots of other things.
If you use one server, you're going to get downtime sometimes. It's a fact of life. Design your systems for it because the power supply will fail, someone will turn the wrong box off or any one of a thousand other things will go wrong.
One box = certain failure. Just a matter of time. People often don't design in reliability and the people not doing that will not do it whatever the database server they are using is, because it's a people or budget failure, not a failure of the database server.
How does a site handling 6,000 page views per second, around a billion queries per day on five database servers and in the top 40 sites in the world according to Alexa.com sound?
Or how does Google's main revenue source or Travelocity's booking system or big chunks of Yahoo or... do I really need to continue with more examples of massive web traffic using MySQL?
Site design can be screwed up. It can also be done right. People regularly do it both ways. The database server usually isn't the reason. The people using it are.
So, someone wants to tightly link a GPL core to a proprietary tool and redistribute without releasing all of the source code. Who's supposed to be upset at this, other than the person releasing the proprietary product?
Want to argue that binary compatibility is OK - go have fun on the Linux kernel mailing list and argue that a device driver doesn't need to be GPL.
If someone is sure that a library tightly bound to a binary interface isn't a derivative work, they are perfectly free to act on that belief.
MySQL seems committed to the free software objective of making more software free. The company licensing and views support that objective.
Other projects have a different view and accept commercial use with no payback the community or developers. Their call. MySQL's is that if you're using MySQL, you should either also be releasing free software or you should be contributing to the development of the server the free community and everyone else is using.
It appears that MySQL believes that's the practice which produces a strong open source database company. With more than a million downloads in just the first three weeks after MySQL 5 was released a few months ago, as well as several hundred employees, it's getting pretty hard to argue with the success of that view.
Close to two years after Wikipedia switched to Silicon Mechanics we're still happy with them a hundred or so servers later. When there's a problem, they deal with it well. Recommended.
MySQL estimates 8 million users. That's a mass market intellectual property product.
The film "Fahrenheit 9/11" was released on the internet without copy protection with the approval of the director, who said ""I don't have a problem with people downloading the movie and sharing it with people as long as they're not trying to make a profit off my labour". The US version is currently ranked around 1,000 in Amazon.com's DVD sales list. It also has the highest box-office receipts of any general release documentary and won the Palme d'Or (highest award) at Cannes. First day DVD sales were reported at two million copies, also a record for documentaries. Box office revenues are currently in excess of US$360 million worldwide.
Seems to have just the combination of very wide viewership from a mass market internet distribution and sales to meet the requirements of your suggested test.
Companies can and do make money from a paying customer to free customer ratio in the 1,000 to 1 range. It's not the only one doing that. It is a pretty well known example: MySQL.
Movie studios and record labels can compete with those not paing license fees using things like faster and assured high quelity delivery of ownership of the work. If you can pay a dime and get it at high speed from a known reliable source, why bother with a file trading network delivering a version of unknown quality with days or weeks of waiting before you get it?
Anyone can make their own version of MySQL and sell it. or change it and sell that. Except, they don't, because MySQL the company keeps ahead of the game and makes it unnecessary.
At present the film and record labels are delivering lower video or audio quality with DRM, so you can't readily move it from computer to computer as you change room, operating system or company you do business with. It's not surprising that they are having problems - it's a comedy of errors.
A donation of $5 would pay for something like 200,000 to 500,000 more pages delivered over the course of a year.
If no capacity expansion at all happened, with money raised used only for running costs, not expansion of page serving capacity, the $5 would pay for about 180,000 page views over a year. But it will increase capacity, so it'll actually deliver more value than that.
Numbers are _very_ approximate, based on ballpark capacity of the system today (about 6,000-8,000 pages per second, 500 million per day) and ballpark equipment costs to get there, adjusted for guestimated efficiency improvements.
I'm one of the roots on the Wikimedia Foundation servers.
>> It's fantastic publicity, but when is enough enough?
When people stop causing page views per second to double every three or four months.:)
It's demand-driven - the more people want, the more money is needed to scale up the capacity to serve the pages they are after. 6,000+ page views per second of capability today... and still rising.
Not stop serving pages serious. Unable to keep up with growth serious.
This is the year when Wikipedia page views will pass Google page views if growth continues as it has in the past. That's a hardware capability of 6,000+ page views per second today and 3-5 doublings expected this year, taking it to 50,000-180,000 page views per second.
When growth will stop is an interesting question. Nobody knows.
One certainty: hundreds of thousands of authors writing an encyclopedia accessible to anyone free of charge hosted by a charitable Foundation and in the top 25, likely ending in the top 5 sites on the net, is a great achievement for the open source model and people getting together to build and support what they want: an ad-free ever-improving (and ever-imperfect) information resource for all.
It's many end users writing this, tremendously broadening participation in the open source model beyond the programmers who've traditionally been involved.
Some have suggested that people who have donated in the past aren't donating and that's why more money is needed. Not really. When you're doubling what you serve every three or four months you also need to substantially increase the hardware and donations to keep up with the ever-increasing demand for more, though we've managed to do considerably better than doubling the hardware for each doubling in load.
I'm one of the roots on the Wikimedia Foundation servers.
What privacy? He's dead and can no longer care about anything. Looks as though there will no longer be a de.wikipedia domain, courtesy of his family. Hope they like their own Wikipedia entry, since the press coverage for this causes them to pass the threshold for qualifying for one.
So the German chapter loses all of its assets over the content of the German language (note German) Wikipedia. That's fine. Beats giving the German or Swiss or other courts control of all German language content.
It's already been established that a collectors' guide can contain images and titles of every image in a set of copyrighted works. Specifically, Beanie Babies in the case Ty, Inc., vs. Publications International, Ltd.. The fair use arguments in that case are particularly interesting, since they cover the requirement for a collector's guide to be complete to be successful and the transformative nature of the use, both of which would apply to the use of baseball statistics in fantasy games.
One of the well established principles of US copyright law is that no amount of hard work creates a copyright. Only original creation does. The "sweat of the brow" factor doesn't matter at all.
Shouldn't that be Barry "bail" Bonded to protect the commercial interest in the name Barry Bonds? :)
The key question: Is MLB claiming that the statistics are original creative works (made up numbers:)) it can get a copyright on or facts? :)
:)
Probably using the publicity rights of the players instead of copyright law. Not really good to claim you're making up the numbers...
You can create custom security zones which don't show up in IE. Those zones are site-specific and could configure just the Windows Update site to have access to ActiveX. Microsoft could ship Windows with such a zone set up.
Sorry about that. :)
As you note, creativity can still prevent a compilation from being in the public domain, if there's some significant original creativity involved. One of the interesting bits of Assessment Technologies v. WIREdata was the requirement to hand over even the bits which might be copyrightable - the database structure - so that the data would be available.
There's more discussion of the general principle at Feist Publications v. Rural Telephone Service, which contains a fair overview of this aspect of US copyright law.
"We've also made our data available under Creative Commons License 2.5". Data is ineligible for copyright cover in the United States, so no license is needed or can apply.
:)
:)
They wouldn't bundle an unnecessary license with useful data just after writing about bundling unnecessary software with desired applications, would they?
It is useful outside the US, though, so this is actually a but tongue in cheek.
When load starts to matter you always do have to design with knowledge of the properties of the storage engine in use. It's one of the reasons for the huge range of tweaking options some database servers have.
Consider one Wikipedia example. We keep the full version history for every article. We used to store those in order added, regardless of the article, so each version of each article would be in different pages in different locations on disk. That's the way the MyISAM engine always does it and if I understand correctly, the way PostgreSQL does it (but I don't know PostgreSQL that well, so I could be wrong).
Trouble is, that was extremely inefficient. With each version of one article in a different page we ended up doing fifty page reads to display a list of fifty versions. Worse, the code at that time wasn't very clever about how it went back further and you could ask for the versions between 50,000 and 50,050 and cause 50,050 pages and hence 50,050 disk seeks to be needed. That was slow and a denial of service attack vulnerability.
So, we exploited the way InnoDB stores records physically in primary key order and made the article ID the first part of the primary key. Now one page contains many revisions of the same article and displaying any number of revisions is painless.
None of this matters at low load. Once things get tough, though, I end up thinking about cache hit rates and disk seeks. That Wikipedia case changed lots of disk seeks with terrible cache hit rate to few seeks and excellent hit rate.
Ignoring RAM-only engines for the moment, PostgreSQL and other databases can't avoid such concerns because they also need to put data on a disk, read it and manage a cache efficiently.
Here's where the different MySQL engines also become interesting, because if you can't get a suitable architecture with one, there's a fair chance that another engine exists or can be developed to do what you need to get really efficient.
I see your point about atomicity but why do the engines in MySQL which don't offer ACID guarantees matter? Just don't pick one of those if you need that capability.
InnoDB isn't necessarily slower. It depends on the application. For LiveJournal they found it faster when they switched from MyISAM to InnoDB, for their mostly write load. It has the advantage of controlling both index and data record caching and it seems to do that significantly better than the operating system, so if it can stay ahead on transaction fsyncs it can come out ahead.
Benchmarks say that MyISAM is faster for parts benchmark load if transactions aren't needed, particularly for truly random read only access, but I'm not so sure in the real world, where you're able to design based on knowing the properties of the engines and can exploit them instead of having fixed schemas.
You probably know this anyway, but just for anyone who doesn't, bceause I really don't like telling people that they have no way to recover their data:
If your power goes out (as it will for everyone, if only due to emergency power off switch activation in a data center) then I hope you have a replication slave in a location on a completely different power supply, like a different state or country, because power outages have spanned many states and most of countries. One day, the filesystem or database or whole computer isn't going to survive the outage (or fire or flood or whatever) without corruption and copying from a slave beats backup restore and binary log replay.
For load and transactions, I'm pretty comfortable with it. Could probably find a suitable architecture for most real-world cases. Not all.
That's not all there is to reliability, though - the other part is harder: protecting the data from application developers screwing up or deliberately attacking the system. MySQL is definitely not yet suitable for cases where inside developers will be attacking, unless there's a trusted middleware layer or other protection present. The old data validation criticism of MySQL is just one tiny part of this particular problem.
At Wikipedia we're still concentrating on using the donations to try to keep up with the growth and that's enough strain on resources already. Not likely to get much more reliable until growth slows down and donations have a chance to catch up and start paying for reliability more than sheer capacity. Can't stop trying to grow the capacity because that would cause timeout failures in page loadings and worse apparent reliability.
The best remedy for now is donations when money is asked for. It probably won't dramatically improve reliability this year but it might let us keep up. Might help reliability more next year. Depends on how donations to pay for the capacity for this years growth go and on how long the growth continues and at what rate.
It's an interesting challenge to do what Wikipedia is doing on donations.
True, but MySQL qualifies as one of the poster boys of the open source movement and it does reassure quite a lot of people to have a well funded company to go to. If that's what it takes to get big government and corporate types interested in FOSS, that's useful enough - they are also part of the world FOSS has to take over.
Agreed, it can be important. My own personal preference is for BSD-like licenses or pure PD - I'm not keen on copyleft. But GPL-based plus other open source is MySQL's approach and the company seems pretty consistent in its objectives.
I don't think that MySQL is using open source simply to push a proprietary agenda but I've had the pleasure of meeting the two founders still with the company as well as many other employees, and knowing that MySQL looks for people who are committed to open source goals. But it's really tough to show that online, so the best I can really do is suggest that you take an opportunity to meet MySQL technical people whenever you can and find out for yourself what those people are like.
Instead, I think that anything MySQL does with proprietary software, be it licenses or anything else, is intended to push the adoption of open source software, including a developing a strong and profitable MySQL database company as one of the key players.
What happened to their failover system? What happened to their pager? What happened to their backups? Was it the server down or did someone rename it and not change the PHP connection settings or... lots of other things.
If you use one server, you're going to get downtime sometimes. It's a fact of life. Design your systems for it because the power supply will fail, someone will turn the wrong box off or any one of a thousand other things will go wrong.
One box = certain failure. Just a matter of time. People often don't design in reliability and the people not doing that will not do it whatever the database server they are using is, because it's a people or budget failure, not a failure of the database server.
How does a site handling 6,000 page views per second, around a billion queries per day on five database servers and in the top 40 sites in the world according to Alexa.com sound?
Or how does Google's main revenue source or Travelocity's booking system or big chunks of Yahoo or... do I really need to continue with more examples of massive web traffic using MySQL?
Site design can be screwed up. It can also be done right. People regularly do it both ways. The database server usually isn't the reason. The people using it are.
So, someone wants to tightly link a GPL core to a proprietary tool and redistribute without releasing all of the source code. Who's supposed to be upset at this, other than the person releasing the proprietary product?
Want to argue that binary compatibility is OK - go have fun on the Linux kernel mailing list and argue that a device driver doesn't need to be GPL.
If someone is sure that a library tightly bound to a binary interface isn't a derivative work, they are perfectly free to act on that belief.
MySQL seems committed to the free software objective of making more software free. The company licensing and views support that objective.
Other projects have a different view and accept commercial use with no payback the community or developers. Their call. MySQL's is that if you're using MySQL, you should either also be releasing free software or you should be contributing to the development of the server the free community and everyone else is using.
It appears that MySQL believes that's the practice which produces a strong open source database company. With more than a million downloads in just the first three weeks after MySQL 5 was released a few months ago, as well as several hundred employees, it's getting pretty hard to argue with the success of that view.
Close to two years after Wikipedia switched to Silicon Mechanics we're still happy with them a hundred or so servers later. When there's a problem, they deal with it well. Recommended.
MySQL estimates 8 million users. That's a mass market intellectual property product.
The film "Fahrenheit 9/11" was released on the internet without copy protection with the approval of the director, who said ""I don't have a problem with people downloading the movie and sharing it with people as long as they're not trying to make a profit off my labour". The US version is currently ranked around 1,000 in Amazon.com's DVD sales list. It also has the highest box-office receipts of any general release documentary and won the Palme d'Or (highest award) at Cannes. First day DVD sales were reported at two million copies, also a record for documentaries. Box office revenues are currently in excess of US$360 million worldwide.
Seems to have just the combination of very wide viewership from a mass market internet distribution and sales to meet the requirements of your suggested test.
Companies can and do make money from a paying customer to free customer ratio in the 1,000 to 1 range. It's not the only one doing that. It is a pretty well known example: MySQL.
Movie studios and record labels can compete with those not paing license fees using things like faster and assured high quelity delivery of ownership of the work. If you can pay a dime and get it at high speed from a known reliable source, why bother with a file trading network delivering a version of unknown quality with days or weeks of waiting before you get it?
Anyone can make their own version of MySQL and sell it. or change it and sell that. Except, they don't, because MySQL the company keeps ahead of the game and makes it unnecessary.
At present the film and record labels are delivering lower video or audio quality with DRM, so you can't readily move it from computer to computer as you change room, operating system or company you do business with. It's not surprising that they are having problems - it's a comedy of errors.
A donation of $5 would pay for something like 200,000 to 500,000 more pages delivered over the course of a year.
If no capacity expansion at all happened, with money raised used only for running costs, not expansion of page serving capacity, the $5 would pay for about 180,000 page views over a year. But it will increase capacity, so it'll actually deliver more value than that.
Numbers are _very_ approximate, based on ballpark capacity of the system today (about 6,000-8,000 pages per second, 500 million per day) and ballpark equipment costs to get there, adjusted for guestimated efficiency improvements.
I'm one of the roots on the Wikimedia Foundation servers.
>> It's fantastic publicity, but when is enough enough?
:)
When people stop causing page views per second to double every three or four months.
It's demand-driven - the more people want, the more money is needed to scale up the capacity to serve the pages they are after. 6,000+ page views per second of capability today... and still rising.
Not stop serving pages serious. Unable to keep up with growth serious.
This is the year when Wikipedia page views will pass Google page views if growth continues as it has in the past. That's a hardware capability of 6,000+ page views per second today and 3-5 doublings expected this year, taking it to 50,000-180,000 page views per second.
When growth will stop is an interesting question. Nobody knows.
One certainty: hundreds of thousands of authors writing an encyclopedia accessible to anyone free of charge hosted by a charitable Foundation and in the top 25, likely ending in the top 5 sites on the net, is a great achievement for the open source model and people getting together to build and support what they want: an ad-free ever-improving (and ever-imperfect) information resource for all.
It's many end users writing this, tremendously broadening participation in the open source model beyond the programmers who've traditionally been involved.
Some have suggested that people who have donated in the past aren't donating and that's why more money is needed. Not really. When you're doubling what you serve every three or four months you also need to substantially increase the hardware and donations to keep up with the ever-increasing demand for more, though we've managed to do considerably better than doubling the hardware for each doubling in load.
I'm one of the roots on the Wikimedia Foundation servers.