"Yes [it requires updating the reference count for read transactions], but as mentioned before, only the act of updating or deleting a tuple causes a version conflict."
You probably inferred my point, but I should have made it clear. My main "concern" was the overhead involved in updating a reference count for a simple read operation.
The grandparent compared vacuum with garbage collection. Your suggestion sounded to me like reference counting, so I was just pointing out the known issues.
Unfortunately I haven't had the pleasure of hacking PG code, so I'm just speculating.
"I personally don't understand Postgres's issue. If a row is updated or deleted, Postgres knows which tuples are affected; why not keep a running pointer count on these tuples, and when all other references go out of scope, automatically put it into the free-space-map?"
Wouldn't it also have to update the reference count for read transactions?
The short story is, it has deep roots in academia. It was Michael Stonebraker's experimental, "post-relational" database. It had "advanced" features, relative to its precursor INGRES, some of which still remain (e.g. extensible types). (Others, like built-in storage and querying of time-series data, do not.) After the academic project was abandoned, two of Stonebraker's grad students ripped out some of the more esoteric (and unstable) features and added a real SQL parser.
Anyway, I wasn't involved in any of the academic work. However, I was an early adopter in this transition period, circa 1995 (when it was called Postgres95). It was buggy, but it was very, very cool.
I think when MySQL came along, Postgres still had not fully shed it's "academic" pedigree, and still was complex, quirky, and buggy. MySQL was light-weight and simple, and "just worked."
I love PostgreSQL, use it daily, and have had no stability problems in the last five years. But, it was not quite the write DMBS and the right time.
One of the great features of Postresql is Multi-Value Concurrency Control (MVCC). In a nutshell, readers never block: "querying a database each transaction sees a snapshot of data (a database version) as it was some time ago."
If you have a single, long-running write transaction (e.g. a batch process), and many short-running read transactions (e.g. serving web requests), this works very well. When the batch process completes, readers "atomically" switch to the newly-committed version. This (drastically) simplifies the batch process, since you don't have to worry about readers blocking or seeing inconsistent state. (Things get more complex in the many-writers scenario, however.)
I don't think MySQL has this feature. (Please correct me if I am wrong.)
This reminds me of another problem with the current patent system (besides the fact that the USPTO essentially rubber-stamps all patent applications) -- the ridiculous language that they are written in. This makes it so difficult to determine what a patent is supposed to mean.
We programmers (er, "software engineers") are always complaining about the poor quality of our specifications. Could you imagine if the specs were written in patent-ese!?!
Patents are not "the same thing" as free speech rights. Since patents are used to restrict what you and I can do, they are more akin to certain prohibitions of free speech, such as obscenity laws or the proverbial shouting "fire" in a crowded theater.
Huge corporations such as IBM and Microsoft hold thousands of patents. The costs of searching, writing, and filing patents are non-trivial, so patents are not really available to everyone. I think it is fair to say that, in today's world, patents protect huge corporations from having "their" ideas "stolen".
In my opinion abolishing patents is going too far. We need to reform the system by enforcing the rules that are already in place -- do not assign patents when the technique is obvious to a reasonably skilled person in the field or when there is prior art! The patent office isn't performing its duty of vetting the patent applications. It is essentially rubber-stamping them and letting the courts sort it out, which is an extremely costly and high-stakes process.
Also, because of the quickening progress of technological change, I support shortening a patent's duration.
Your last line about "soon in the future, laws will be passed, and that problem [corporate lobbying of politicians] will be gone" is so laughable, I wonder if I have been trolled.
I agree that editing the hosts file is a useful technique for blocking ad servers today, but how hard would it be for advertisers to work around this? All they need to do is use IP addresses rather than domain names. Are any of them doing this today?
Blocking IP addresses at the router/firewall would overcome that problem. But as others have pointed out it isn't foolproof, since some sites put ads on their regular image server, along with the "good" images.
Finally, I have heard of users with extraordinarily large hosts files (say, 1 MB) experiencing slow DNS access and other problems. I searched around but couldn't find the definitive word on this. Anyway, just don't go too overboard with this approach and you should be okay.
Clearly both the patent system and the jury bear a burden for this awful decision. It's bad for us all, because it ties up technology in the hands of a few.
The 1-Click patent was equally obvious and should not have been patentable. Amazon suing Barnes and Noble was ridiculous and should have been slapped out of court from the start.
But at least Bezos had the balls to call for reform afterward. Straight from the horses mouth. But does anybody honestly think eBay would do anything that would harm their precious portfolio? Somehow I can't see Meg coming out against the whole patent system.
Was I the only one who dropped Gentoo when they went from 1.2 -> 1.3 and you couldn't do a simple "emerge -u world"? There was something like four manual update scripts to run. When that didn't work right off the bat, I decided to punt. Not that I couldn't have gotten it to work, but I was worried that this would happen with every major (or even minor) release.
Emerging applications was sometimes flakey as well. I particularly recall having difficulty upgrading KDE.
I was also occasionally frustrated with portage scripts lagging the latest tar balls (or not existing altogether), but of course that happens with every package system.
I had always wished the USE variables would get set automatically, too. So that if I had, say, Postgres and TCL installed the --with-tcl configure option gets set without having to fiddle with the USE variable. That's a weak complaint, though, since that feature is pretty unique to Gentoo anyhow.
Now I'm using Red Hat fairly happily. However I seem to spend a lot of time building custom RPMs to get the equivalent of Gentoo's USE. *sigh* Still, grabbing whole suites of packages from jpackage.org et al via apt-get is pretty sweet.
Anyway, not trying to spread Gentoo FUD. With the amount of popularity and support Gentoo has going for it, I'm sure some or all of these issues have already been addressed, right? I'll have to check it out when I finally decide to cannabalize my Windows box!:-)
"Yes [it requires updating the reference count for read transactions], but as mentioned before, only the act of updating or deleting a tuple causes a version conflict."
You probably inferred my point, but I should have made it clear. My main "concern" was the overhead involved in updating a reference count for a simple read operation.
The grandparent compared vacuum with garbage collection. Your suggestion sounded to me like reference counting, so I was just pointing out the known issues.
Unfortunately I haven't had the pleasure of hacking PG code, so I'm just speculating.
"I don't think MySQL has this feature. (Please correct me if I am wrong.)"
Just correcting my own FUD here. Evidently MySQL does have MVCC, via InnoDB.
See folks, you can't go wrong either way! :-)
"I personally don't understand Postgres's issue. If a row is updated or deleted, Postgres knows which tuples are affected; why not keep a running pointer count on these tuples, and when all other references go out of scope, automatically put it into the free-space-map?"
Wouldn't it also have to update the reference count for read transactions?
For a better understanding of where PostgreSQL sits with respect to MySQL, it's worth reading the history of PostgreSQL on Wikipedia.
The short story is, it has deep roots in academia. It was Michael Stonebraker's experimental, "post-relational" database. It had "advanced" features, relative to its precursor INGRES, some of which still remain (e.g. extensible types). (Others, like built-in storage and querying of time-series data, do not.) After the academic project was abandoned, two of Stonebraker's grad students ripped out some of the more esoteric (and unstable) features and added a real SQL parser.
Anyway, I wasn't involved in any of the academic work. However, I was an early adopter in this transition period, circa 1995 (when it was called Postgres95). It was buggy, but it was very, very cool.
I think when MySQL came along, Postgres still had not fully shed it's "academic" pedigree, and still was complex, quirky, and buggy. MySQL was light-weight and simple, and "just worked."
I love PostgreSQL, use it daily, and have had no stability problems in the last five years. But, it was not quite the write DMBS and the right time.
Sorry, brain fart: "Multi-Version Concurrency Control."
One of the great features of Postresql is Multi-Value Concurrency Control (MVCC). In a nutshell, readers never block: "querying a database each transaction sees a snapshot of data (a database version) as it was some time ago."
If you have a single, long-running write transaction (e.g. a batch process), and many short-running read transactions (e.g. serving web requests), this works very well. When the batch process completes, readers "atomically" switch to the newly-committed version. This (drastically) simplifies the batch process, since you don't have to worry about readers blocking or seeing inconsistent state. (Things get more complex in the many-writers scenario, however.)
I don't think MySQL has this feature. (Please correct me if I am wrong.)
This reminds me of another problem with the current patent system (besides the fact that the USPTO essentially rubber-stamps all patent applications) -- the ridiculous language that they are written in. This makes it so difficult to determine what a patent is supposed to mean.
We programmers (er, "software engineers") are always complaining about the poor quality of our specifications. Could you imagine if the specs were written in patent-ese!?!
Patents are not "the same thing" as free speech rights. Since patents are used to restrict what you and I can do, they are more akin to certain prohibitions of free speech, such as obscenity laws or the proverbial shouting "fire" in a crowded theater.
Huge corporations such as IBM and Microsoft hold thousands of patents. The costs of searching, writing, and filing patents are non-trivial, so patents are not really available to everyone. I think it is fair to say that, in today's world, patents protect huge corporations from having "their" ideas "stolen".
In my opinion abolishing patents is going too far. We need to reform the system by enforcing the rules that are already in place -- do not assign patents when the technique is obvious to a reasonably skilled person in the field or when there is prior art! The patent office isn't performing its duty of vetting the patent applications. It is essentially rubber-stamping them and letting the courts sort it out, which is an extremely costly and high-stakes process.
Also, because of the quickening progress of technological change, I support shortening a patent's duration.
Your last line about "soon in the future, laws will be passed, and that problem [corporate lobbying of politicians] will be gone" is so laughable, I wonder if I have been trolled.
I agree that editing the hosts file is a useful technique for blocking ad servers today, but how hard would it be for advertisers to work around this? All they need to do is use IP addresses rather than domain names. Are any of them doing this today?
Blocking IP addresses at the router/firewall would overcome that problem. But as others have pointed out it isn't foolproof, since some sites put ads on their regular image server, along with the "good" images.
Finally, I have heard of users with extraordinarily large hosts files (say, 1 MB) experiencing slow DNS access and other problems. I searched around but couldn't find the definitive word on this. Anyway, just don't go too overboard with this approach and you should be okay.
Lightweight: mod_tcl
Heavyweight: OpenACS
Score:5, it must be real, right? Could someone please translate it for me?
Clearly both the patent system and the jury bear a burden for this awful decision. It's bad for us all, because it ties up technology in the hands of a few.
But at least Bezos had the balls to call for reform afterward. Straight from the horses mouth. But does anybody honestly think eBay would do anything that would harm their precious portfolio? Somehow I can't see Meg coming out against the whole patent system.
Quite simply, because you shouldn't be able to patent a basic concept (auctioning, or a Buy it Now! variation) just because it is applied to the web.
Was I the only one who dropped Gentoo when they went from 1.2 -> 1.3 and you couldn't do a simple "emerge -u world"? There was something like four manual update scripts to run. When that didn't work right off the bat, I decided to punt. Not that I couldn't have gotten it to work, but I was worried that this would happen with every major (or even minor) release.
:-)
Emerging applications was sometimes flakey as well. I particularly recall having difficulty upgrading KDE.
I was also occasionally frustrated with portage scripts lagging the latest tar balls (or not existing altogether), but of course that happens with every package system.
I had always wished the USE variables would get set automatically, too. So that if I had, say, Postgres and TCL installed the --with-tcl configure option gets set without having to fiddle with the USE variable. That's a weak complaint, though, since that feature is pretty unique to Gentoo anyhow.
Now I'm using Red Hat fairly happily. However I seem to spend a lot of time building custom RPMs to get the equivalent of Gentoo's USE. *sigh* Still, grabbing whole suites of packages from jpackage.org et al via apt-get is pretty sweet.
Anyway, not trying to spread Gentoo FUD. With the amount of popularity and support Gentoo has going for it, I'm sure some or all of these issues have already been addressed, right? I'll have to check it out when I finally decide to cannabalize my Windows box!