There is simply no way for Google to know that those pages are any good until people start linking to them.
Exactly, except turned upside down. It's "there is no way for Google to know that those pages are spam", so they get positive weight until proven otherwise.
from the links from their institution front pages
A few links will make the cluster discoverable by crawlers but won't make a difference for PR. It's the cross links within the cluster that make the difference.
I am sharing a first hand knowledge. I've seen it done this way. You seem to be continuing this conversation just for the sake of argument. But others reading this thread may actually learn something useful.
I can tell you that it does not have the property you assign to it
The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.
That's not the impression I'm under - I thought that most pages were not part of Google's "root set"
I understand that you have such an impression, but that's a wrong impression. Every page gets a non-zero weight by default. If you think about it you will see that your scheme just would not work: emerging subjects/sites would stay with zero PR for a long long time until links to them propagate all the way to the "roots".
Creating lots of pages with no karma that link to you therefore shouldn't do you any good at all
That's not how it works. You assume it's a zero-sum game, but it's not. Every page gets some weight even if no one links to it. It's small, but it's positive. When one page links to another, the weight of the source page is reduced less than the target page gains. So, here is the business plan: 1. Make a lot of unique pages (G in the PR calculation joins identical or nearly identical pages) 2. Crosslink them in a non-obvious way (i.e. not A <-> B but A -> B -> C -> A). 3. Sell ads on high PR pages with a lot of traffic from G or Y 4. Profit! It really works.
This is the most ambitions??? What about Markram & IBM? They must be just fooling around with that Blue Gene (actually I do think they are fooling around, but that's beside the point). What about Izhikvich? He simulated just a puny 100 billion neurons. That's *nothing* compare to this "most ambitious" million.
That's not Braille, but a similar scheme, a raised pattern. The same pattern can be easily added to the dollar bills. It won't cost much and won't require any adjustments to the bill readers.
You are right, but myspace will be using an external database to determine if the music is copyrighted. The audio print tells them something like "this record is equivalent to db entry #123458783 with probability of 97.4%". If that entry happens to be marked as public domain, the upload can proceed.
There is a serious limiting factor that has to be accounted for - listeners. When you uploading music you want your listeners to enjoy it, right? If you add too much noise or other distortions, the music will sound like an old scratched vinyl. I worked with this audio fingerprinting system. The hash space is huge. The false positives are very rare. The rate of false negatives can be controlled by tweaking parameters. Then MySpace doesn't need 100% recognition. Even if they prevent just 95% of copyrighted uploads, it would create too much hassle to upload copyrighted music on purpose. Also, myspace would get a lot better legal standing by demonstrating that it's actively fighting the copyrighted uploads.
Depending on specifics of the algorithm, it may be very hard to defeat it if you still want the music to be recognizable by the listeners. I am familiar with the audio fingerprinting algorithm from another company. The false positives are not a problem. The hash space is huge thus collisions are very rare. The false negatives can be a problem, but if they can weed out even 95% of attempts to upload copyrighted music, their life is going to be much, much easier. And if you distort the music enough to defeat the fingerprinting, then maybe you just have created a new masterpiece (c) you:-).
Ok, you are right, but it still does not apply to this case. The electrodes in the electrolytic capacitors are separated by the aluminum oxide on the surface of one electrode. The electrolyte acts as the second electrode. In case of nanotube fur, there is nothing like aluminum oxide on the tubes to act as an insulator. Thus they can't just immerse the tubes into electrolyte.
Your explanation is obvious. Obviously, it also does not apply in this case. If you RFTA, you would have seen the picture there. And if you saw the picture you would know that your explanation could not be correct. The way nanotubes were shown in the pucture, the [+] and [-] tubes could not possibly be parallel. They could only be tip-to-tip.
You are desribing a different process. You are describing electrochemistry. Indeed in electrochemistry the entire area works (give or take some due to diffusion). Here is just electrostatics. The area is kind of averaged.
Seems like I miss something. It's not the area of the capacitor that matters (yes, I know the formula C=A/d for flat electrodes) but an "effective area". These capacitors are supposedly two flat or nearly flat substrate surfaces each coved with nanotube "fur". There is a gap between these two electrodes. The gap is much larger that the thickness of the nanotube. Consequently, the effective area of the capacitor is not much larger than the area of the flat substrate electode. What's the advantage of the "fur"? I would understand if [+] and [-] charged nanotubes were alternating inside the fur, but it's clearly not the case judging from the picture.
For instance, take a wire, cut it in half and separate two pieces by a small gap. That's a capacitor. Its capacitance is going to be somewhat larger than the A1/d where A1 is the area of the wire crossection, and a lot smaller than A2/d where A2 is the full surface area of the wire. The same applies to nanotubes.
So, obviously, they are doing it differently. How?
Well, some people believe that progress is non-linear. The current velocity of space travel should not be extrapolated to distances like 41 light years. It's unclear what's going to be achievable in 30 years.
Suppose one light year is 1 km. Then the tinyest speck of dust on the monitor is about 5 times bigger than Earth (1 micron), Sun is about half the size of the dot above i (0.1mm), distance from Earth to Sun is the length of the word "length" (1.5cm). The size of the Solar system (Pluto orbit) is about the size of your computer - 0.7 meter. The most distant objects in Oort cloud are probably within your room (a few meters). The nearest star - 4km away, like a gas station. The new planets are 41km away - the state border:-). Our Miky Way galaxy is a few times larger than Earth, maybe half way to the Moon. The nearest spiral galaxy is not too far - just 8 times more distant than Moon. The edge of the Universe (12 bln l.y.) is about the size of Sedna orbit.
That's only if you take it at a face value. The real goal of the article is to reduce out-of-country DVD purchases by scaring some people. And it looks to me the article is working beautifully. Even here everyone discusses the dogs.
That's an incorrect conclusion. Correct would be "Pirated DVDs are not shipped by FedEx". Really, FedEx is not a viable shipping option for a large-scale commercial operation - too expensive. FedEx shipping would add a substantial cost to the DVDs in exchange for saving maybe 1-2 days compare to a no-name cargo airshipment. And that's assuming DVDs have to be air-shipped and not hauled by trucks from Mexico or produced locally.
The article should not be taken at face value. It seems to be pure BS targeted at scaring individuals: "Don't buy disks in other countries, or you will get in trouble". And it is succeeding - the completely kooked up news event gets a lot attention. They obviously have talanted and creative PR people.
Solar wind in the primordial system pushed elements away from the center. The elements with lower ionization potentials were trapped by the Sun's magnetic field and held closer to the sun, the elements with higher ionization potentials were pushed to the outer fringes.
I think I had this kind of discussion with you before.
What version of PG were you using? Recent versions should be fairly good about minimizing the effect of VACUUM on concurrent operations.
The last one we tested was some 8.1 beta on FreeBSD 6. The queries ran anywhere from 50% to 5 times longer with running vacuum. I am sure there are some uses when people won't notice such slowing down. Our case is not one of them.
Did you determine what the cause of this alleged performance decrease was?
I love the "alleged". Denial won't help PgSQL gain market share.
This is market data. The records are time stamped and 99% of selects are constrained by the time stamp "ts>'...'". My guess is that inserts skew timestap index statistics and it becomes useless.
If anything, the shared buffer cache is organized around caching frequently used pages, not results.
Fine, you know better where the problem lies. Take a box, instal Windoze, MSSQL Server 2000, run queries. Observe there is no problem. Reformat, instal FreeBSD and PgSQL 8.1. Same DB, same queries, fresh indexes. The first load of daily bars takes maybe 1.5 sec. Subsequent queries for the daily data take less than 50 ms. Fine. Now load minute bars, then go for the daily again. It takes the same 1.5 seconds again. WTF? If it's not the cache problem then what is it?
I don't want to use mssql. I would love to dump it in favor of pgsql. We had been testing nearly every release of pgsql since the days of 6.4. No luck so far.
OS you should use, that's really up to you, Postgres works well on a lot of platforms
I know that. My real question was: since pgsql relies on OS caching, which one does a proper job of caching for pgsql?
Well, if it's "broken" it seems hard to envision how to fix it. What would you suggest?
Well, I am a user, not a DB developer. I am sure something can be done. Like keep a single "big" counter per table, read in into transaction on start, calculate difference in the transaction. When the transaction ends update the "big" counter. It took only 7 years of constant nagging, but pg developers did find a way to fix min & max. There is simply no excuse to have such often used feature broken for a DBMS which positions itself for enterprise.
They are right in my opinion, and any of the above mentioned OSes will do. They all are quite adept at file caching. PostgreSQL accesses a table file, OS reads it and caches it. PostgreSQL accesses table file again, OS serves file from cache. Where exactly lies the problem?
The problem lies in my inability to use the product. Say DBMS Abc on a certain box runs fine. DBMS Cba on the same box has performance problems because the cache does not work as I would expect. The Cba's developers keep saying it's not a bug, but a feature. The cache works right, just as designed. OK, it's a fine feature then, but it makes their product unuseable *for me*. Maybe this exact feature makes others happy, but I can't use Cba because of it. That's all.
This thread is "Top 5 Reasons People Dismiss PostgreSQL". These are my own purely subjective reasons why *I* dismiss it. I tried very hard to make it work for us. The team spent probably over 2 man-months on it.
You know that count() does full table scans in MySQL too when you use InnoDB? And for MyISAM the result is simply wrong while updating.
What's you point? It's like me saying "Hyundai is not the greatest car" and you are replying with "Yugo is even worse". So what? There are other real DBMS's that don't have this problem. Why compare to MySQL? PGSQL positions itself as a "real DBMS" as opposed to "toy MySQL". Compare to Oracle, DB2 or Sybase.
That's right. Postgres requires that you specify how much memory it should use. As for every complex program, it requires some skill.
Ahm. Well, aaa. I don't even know how to start. First of all, it doen't let you use all available memory. There is no such configuration option. And don't point to shared buffers. That's a different thing. Second, complexity is not an excuse for poor functionality. Oracle or MSSQL are no less complex.
VACUUM is a pain. It's true that VACUUM is annoying, but later releases (especially 8.0 and 8.1) make VACUUM much more tolerable;
Vacuum kills performance. Some uses maybe OK with loosing 50% or more while VACUUM runs. In some uses it's unacceptable. In our case (a lot of inserts with majority of selects going for the newly inserted records) performance degrades within 6-8 hours after running VACUUM & friends. VACUUM takes ~20 minutes to complete which is completely unacceptable during the day and we can't delay it till night.
No, AUTOVACUUM is not an answer because it kicks in unexpectedly and makes random queries run unexpectedly slow at unexpected times. Usual VACUUM makes all queries run slow at predetermined time. Not a very appealing choice.
More reasons:
No memory management. For example, here is 1GB database on a dedicated host with 2GB of RAM. PG should suck the while DB into RAM and run selects from there, right? Wrong. PG is extremely frugal about memory management. It caches the last few results, but otherwise goes to disk for data even if there is anough RAM to cache the whole DB. The PG developers keep saying that it's the job for the OS. Now, which OS should we use then? FreeBSD, Linux, Windows? Which one?
Forever broken COUNT(). Although MIN & MAX were fixed in the latest release, COUNT() is still broken and there is no fixing in sight. Yes, I beieve 10 seconds execution time for count() on a table with just a few million records qualifies it as a broken feature.
Of course, the query optimizer/planner can be improved, but that's understandable and can be applied to pretty much any DBMS
In the days long gone, before the mp3s or even the web itself there were projects at NSCA. One was called Mosaic. Another one was Collage. Mosaic grew to become The Web while Collage withered. Among other things, Collage was about presenting scientific data in novel was, such as audio. That was ca. 1992.
The proposed restriction would affect software for the content encryption only, i.e. the server side of the DRM.
Suppose, someone made a GPL 2 compliant DRMed book reader. The reader comes with the source, so there is no possibility of security through obscurity, the source can be modified and recompiled. In order to read the DRMed content it must be decripted. That means a secret key has to be given to the user, and passed to the reader. The source for the reader is available and can be modified to save the key or the decrypted copy. Consequently, GPL2 is sufficient to make client-side DRM ineffective.
Client-side DRM software, at least in its present form, depends on the closed source software, on obscurity. GPL3 restriction would only affect content creation, the encryption part of the DRM.
Hardware-assisted DRM may be different, but I can't see it right now.
Oooh, hints of dark and secret knowledge!
It's only dark and secret for a newbie
There is simply no way for Google to know that those pages are any good until people start linking to them.
Exactly, except turned upside down. It's "there is no way for Google to know that those pages are spam", so they get positive weight until proven otherwise.
from the links from their institution front pages
A few links will make the cluster discoverable by crawlers but won't make a difference for PR. It's the cross links within the cluster that make the difference.
I am sharing a first hand knowledge. I've seen it done this way. You seem to be continuing this conversation just for the sake of argument. But others reading this thread may actually learn something useful.
who knows as little about this as I do
How do you know that?
I can tell you that it does not have the property you assign to it
The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.
That's not the impression I'm under - I thought that most pages were not part of Google's "root set"
I understand that you have such an impression, but that's a wrong impression. Every page gets a non-zero weight by default. If you think about it you will see that your scheme just would not work: emerging subjects/sites would stay with zero PR for a long long time until links to them propagate all the way to the "roots".
Creating lots of pages with no karma that link to you therefore shouldn't do you any good at all
That's not how it works. You assume it's a zero-sum game, but it's not. Every page gets some weight even if no one links to it. It's small, but it's positive. When one page links to another, the weight of the source page is reduced less than the target page gains. So, here is the business plan:
1. Make a lot of unique pages (G in the PR calculation joins identical or nearly identical pages)
2. Crosslink them in a non-obvious way (i.e. not A <-> B but A -> B -> C -> A).
3. Sell ads on high PR pages with a lot of traffic from G or Y
4. Profit!
It really works.
This is the most ambitions??? What about Markram & IBM? They must be just fooling around with that Blue Gene (actually I do think they are fooling around, but that's beside the point). What about Izhikvich? He simulated just a puny 100 billion neurons. That's *nothing* compare to this "most ambitious" million.
Look at the item number 3 in the picture./ main.asp?file=priznak_2004_eng/Opisan_50R_eng.htm
http://www.cbr.ru/eng/bank-notes_coins/bank-notes
That's not Braille, but a similar scheme, a raised pattern. The same pattern can be easily added to the dollar bills. It won't cost much and won't require any adjustments to the bill readers.
You are right, but myspace will be using an external database to determine if the music is copyrighted. The audio print tells them something like "this record is equivalent to db entry #123458783 with probability of 97.4%". If that entry happens to be marked as public domain, the upload can proceed.
There is a serious limiting factor that has to be accounted for - listeners. When you uploading music you want your listeners to enjoy it, right? If you add too much noise or other distortions, the music will sound like an old scratched vinyl. I worked with this audio fingerprinting system. The hash space is huge. The false positives are very rare. The rate of false negatives can be controlled by tweaking parameters. Then MySpace doesn't need 100% recognition. Even if they prevent just 95% of copyrighted uploads, it would create too much hassle to upload copyrighted music on purpose. Also, myspace would get a lot better legal standing by demonstrating that it's actively fighting the copyrighted uploads.
Depending on specifics of the algorithm, it may be very hard to defeat it if you still want the music to be recognizable by the listeners. I am familiar with the audio fingerprinting algorithm from another company. The false positives are not a problem. The hash space is huge thus collisions are very rare. The false negatives can be a problem, but if they can weed out even 95% of attempts to upload copyrighted music, their life is going to be much, much easier. And if you distort the music enough to defeat the fingerprinting, then maybe you just have created a new masterpiece (c) you :-).
Ok, you are right, but it still does not apply to this case. The electrodes in the electrolytic capacitors are separated by the aluminum oxide on the surface of one electrode. The electrolyte acts as the second electrode. In case of nanotube fur, there is nothing like aluminum oxide on the tubes to act as an insulator. Thus they can't just immerse the tubes into electrolyte.
Your explanation is obvious. Obviously, it also does not apply in this case. If you RFTA, you would have seen the picture there. And if you saw the picture you would know that your explanation could not be correct. The way nanotubes were shown in the pucture, the [+] and [-] tubes could not possibly be parallel. They could only be tip-to-tip.
You are desribing a different process. You are describing electrochemistry. Indeed in electrochemistry the entire area works (give or take some due to diffusion). Here is just electrostatics. The area is kind of averaged.
Seems like I miss something. It's not the area of the capacitor that matters (yes, I know the formula C=A/d for flat electrodes) but an "effective area". These capacitors are supposedly two flat or nearly flat substrate surfaces each coved with nanotube "fur". There is a gap between these two electrodes. The gap is much larger that the thickness of the nanotube. Consequently, the effective area of the capacitor is not much larger than the area of the flat substrate electode. What's the advantage of the "fur"? I would understand if [+] and [-] charged nanotubes were alternating inside the fur, but it's clearly not the case judging from the picture.
For instance, take a wire, cut it in half and separate two pieces by a small gap. That's a capacitor. Its capacitance is going to be somewhat larger than the A1/d where A1 is the area of the wire crossection, and a lot smaller than A2/d where A2 is the full surface area of the wire. The same applies to nanotubes.
So, obviously, they are doing it differently. How?
Well, some people believe that progress is non-linear. The current velocity of space travel should not be extrapolated to distances like 41 light years. It's unclear what's going to be achievable in 30 years.
Suppose one light year is 1 km. Then the tinyest speck of dust on the monitor is about 5 times bigger than Earth (1 micron), Sun is about half the size of the dot above i (0.1mm), distance from Earth to Sun is the length of the word "length" (1.5cm). The size of the Solar system (Pluto orbit) is about the size of your computer - 0.7 meter. The most distant objects in Oort cloud are probably within your room (a few meters). The nearest star - 4km away, like a gas station. The new planets are 41km away - the state border :-). Our Miky Way galaxy is a few times larger than Earth, maybe half way to the Moon. The nearest spiral galaxy is not too far - just 8 times more distant than Moon. The edge of the Universe (12 bln l.y.) is about the size of Sedna orbit.
So, 41 light years is relatively near :-).
Everybody knows the best place to find cracks and keygens is http://astalavista.box.sk/ :-)
That's only if you take it at a face value. The real goal of the article is to reduce out-of-country DVD purchases by scaring some people. And it looks to me the article is working beautifully. Even here everyone discusses the dogs.
That's an incorrect conclusion. Correct would be "Pirated DVDs are not shipped by FedEx". Really, FedEx is not a viable shipping option for a large-scale commercial operation - too expensive. FedEx shipping would add a substantial cost to the DVDs in exchange for saving maybe 1-2 days compare to a no-name cargo airshipment. And that's assuming DVDs have to be air-shipped and not hauled by trucks from Mexico or produced locally.
The article should not be taken at face value. It seems to be pure BS targeted at scaring individuals: "Don't buy disks in other countries, or you will get in trouble". And it is succeeding - the completely kooked up news event gets a lot attention. They obviously have talanted and creative PR people.
Hydridic Earth theory:
I think I had this kind of discussion with you before.
The last one we tested was some 8.1 beta on FreeBSD 6. The queries ran anywhere from 50% to 5 times longer with running vacuum. I am sure there are some uses when people won't notice such slowing down. Our case is not one of them.
I love the "alleged". Denial won't help PgSQL gain market share.
This is market data. The records are time stamped and 99% of selects are constrained by the time stamp "ts>'...'". My guess is that inserts skew timestap index statistics and it becomes useless.
Fine, you know better where the problem lies. Take a box, instal Windoze, MSSQL Server 2000, run queries. Observe there is no problem. Reformat, instal FreeBSD and PgSQL 8.1. Same DB, same queries, fresh indexes. The first load of daily bars takes maybe 1.5 sec. Subsequent queries for the daily data take less than 50 ms. Fine. Now load minute bars, then go for the daily again. It takes the same 1.5 seconds again. WTF? If it's not the cache problem then what is it?
I don't want to use mssql. I would love to dump it in favor of pgsql. We had been testing nearly every release of pgsql since the days of 6.4. No luck so far.
I know that. My real question was: since pgsql relies on OS caching, which one does a proper job of caching for pgsql?
Well, I am a user, not a DB developer. I am sure something can be done. Like keep a single "big" counter per table, read in into transaction on start, calculate difference in the transaction. When the transaction ends update the "big" counter. It took only 7 years of constant nagging, but pg developers did find a way to fix min & max. There is simply no excuse to have such often used feature broken for a DBMS which positions itself for enterprise.
The problem lies in my inability to use the product. Say DBMS Abc on a certain box runs fine. DBMS Cba on the same box has performance problems because the cache does not work as I would expect. The Cba's developers keep saying it's not a bug, but a feature. The cache works right, just as designed. OK, it's a fine feature then, but it makes their product unuseable *for me*. Maybe this exact feature makes others happy, but I can't use Cba because of it. That's all.
This thread is "Top 5 Reasons People Dismiss PostgreSQL". These are my own purely subjective reasons why *I* dismiss it. I tried very hard to make it work for us. The team spent probably over 2 man-months on it.
What's you point? It's like me saying "Hyundai is not the greatest car" and you are replying with "Yugo is even worse". So what? There are other real DBMS's that don't have this problem. Why compare to MySQL? PGSQL positions itself as a "real DBMS" as opposed to "toy MySQL". Compare to Oracle, DB2 or Sybase.
Ahm. Well, aaa. I don't even know how to start. First of all, it doen't let you use all available memory. There is no such configuration option. And don't point to shared buffers. That's a different thing. Second, complexity is not an excuse for poor functionality. Oracle or MSSQL are no less complex.
Vacuum kills performance. Some uses maybe OK with loosing 50% or more while VACUUM runs. In some uses it's unacceptable. In our case (a lot of inserts with majority of selects going for the newly inserted records) performance degrades within 6-8 hours after running VACUUM & friends. VACUUM takes ~20 minutes to complete which is completely unacceptable during the day and we can't delay it till night.
No, AUTOVACUUM is not an answer because it kicks in unexpectedly and makes random queries run unexpectedly slow at unexpected times. Usual VACUUM makes all queries run slow at predetermined time. Not a very appealing choice.
More reasons:In the days long gone, before the mp3s or even the web itself there were projects at NSCA. One was called Mosaic. Another one was Collage. Mosaic grew to become The Web while Collage withered. Among other things, Collage was about presenting scientific data in novel was, such as audio. That was ca. 1992.
The proposed restriction would affect software for the content encryption only, i.e. the server side of the DRM.
Suppose, someone made a GPL 2 compliant DRMed book reader. The reader comes with the source, so there is no possibility of security through obscurity, the source can be modified and recompiled. In order to read the DRMed content it must be decripted. That means a secret key has to be given to the user, and passed to the reader. The source for the reader is available and can be modified to save the key or the decrypted copy. Consequently, GPL2 is sufficient to make client-side DRM ineffective.
Client-side DRM software, at least in its present form, depends on the closed source software, on obscurity. GPL3 restriction would only affect content creation, the encryption part of the DRM.
Hardware-assisted DRM may be different, but I can't see it right now.