Re:I think I have an easier way to explain it
on
Google TrustRank
·
· Score: 2, Insightful
I think you're thinking of Hilltop, (scoring links from pages that score well on the same term but are not related otherwise) but Hilltop had a different problem it would favour sites with a home page on the topic and ignore authority sites with the relevent page buried, because other sites tend to link to the top page.
Pageranks works differently since it covers all search terms and is ignorant of the search phrase. Its is a "how important is this page on the net" rank.
This article is about trust rank, a sort of "this page has X% probability of being spam rank"
I think this trust rank is part of the current Google algo, it shows all the signs. I don't think its used like that in Yahoo.
Google current results are on topic but weak = It is inevitable that some authority pages that are on topic but untrusted will be kicked down making more second tier but non-authority sites come to the top -> shallow results of non-authority sites.
Deep searches hit or miss, will Google find it or not? Sometimes yes, sometimes no = A page about "hotels near montserrat monastry" is spam to someone looking for information on "monserrat monastry" but might be perfectly ontopic for "hotel with disable access ramp near monserrat monastry". The probability of spam is meaningless for that long phrase since it's so obscure as to be not the target of spam-> zero probability of spam. Yet the trustrank would penalise the page the same amount for both phrases. That's why I think they should have a cut off per search phrase, on heavily spammed search phrases the cut off would be high trimming off many sites, on low spammed phrases it would be low or zero. The page might have 0.2 probability of being spam. For "cheap viagra" it would be below the 0.5 cutoff for that heavily spammed term and thrown away, but for "effects of viagra and how to obtain the same result on the cheap" is would be above the 0.001 cutoff and hence ontopic.
Strange fluctuations in serps: Two algos pushing against each other using the same basic page and link information -> chaotic butterfly wing flapping behaviour.
I think I have an easier way to explain it
on
Google TrustRank
·
· Score: 2, Interesting
Suppose you had the perfect Oracle that could check every search result and clean it of spam.
Ranking by onpage text, links etc., the items that make a page relevant or not gives you:
A. 1st most relevant. B. 2nd most relevant C. Spam D. 3rd most relevant E. 4th most relevant. F. Spam
After your Oracle has hand checked every site you get:
A. 1st most relevant. B. 2nd most relevant C. 3rd most relevant D. 4th most relevant.
Not:
A. 10th most relevant B. 2nd most relevant C. 8th most relevant D. 5th most relevant
Ranking by trust as well as relevance gives you a clean but not very relevant result set.
Sounds like a confused algorithm
on
Google TrustRank
·
· Score: 2, Insightful
I've read it but is sounds mixed up. Isn't the ideal result from a search engine:
Matches - spam - offtopic, sorted by relevence
not
Matches sorted by f(pagerank,trustrank)
Google used pagerank+on page text as a measure of how relevent a page is but thats not reliable anymore because the set contains spam pages.
The 'trusted' value tells you nothing about relevence, it only gives the likelyhood of the page being spam or not spam. If its spam you want it removed, if its not spam, then its page rank determines its relevence not some function of pagerank and trustrank.
i.e. they should not promote or demote pages because on trust rank, they simply define a cut off value K, if the trust is less than K then its likely spam and should be removed.
Since spam follows money terms, they should have K(keyphrase), so they can change the value of K on each keyphrase to remove the spam. Otherwise they will filter non money terms where no spam exists and their algo can only do harm!
Notice it says NEF (RAW) without stating the missing white balance information.
Further down it talks about the camera supporting white balance. "Auto (TTL white balance with 420-pixels RGB sensor), six manual modes, preset white balan"
And the only mention of software is in the "Optional Accessories".
1. A misleading statement that NEF is RAW format. 2. A statement that the camera supports white balance. 3. A statement that the capture software is extra.
Due diligence requires proper information. The reasonable expectation is that the camera doesn't raise any artificial obstacles to you getting at your data. This is what you think you're getting at purchase, yet its not so.
Hence it should be labelled and any adverts saying it supports RAW should add that the whitebalance info is only available to some software makers.
You take a photograph, you think its yours, taken with a camera you bought, of a subject you chose, with all permissions sorted.
However you then find there's an extra little catch. You can only access your picture with software that your camera maker has decided to approve.
You didn't agree to any of this, it didn't warn you on the box, nobody told you that the pictures are only your subject to some extra pre-conditions and you had reasonable expectation that the camera would not raise artificial obstacles to you getting at your picture.
And this situation is somehow supposed to be acceptable?
because it does nothing to stop the patent play companies who never make a product to be marked.
It does nothing to help disclose prior art outside of patented products, since only patented things need to be marked.
It does nothing for tarball products. Imagine receiving a Windows XP with a readme listing 70000 patent numbers.
There is no real penalty for overspecifying, its just bytes in a file. So companies will simply claim their software utilises all their patents. Without the code who can prove it doesn't? Rendering the disclosure worthless.
"Kenneth Arrow's information paradox, which describes the problem faced by an inventor selling an idea. Anybody contemplating the purchase of this idea will, naturally, want to know what it is. But if the inventor reveals his idea, he no longer would have anything to sell."
He could implement it and show its advantages and sell it on its advantages. Recall Fox softwares 'Rushmore' database technology. They showed the benefit without revealing the technique.
Of course he, the inventor, must be able to implement it, or how else could anyone else?! Also he must be able to show advantages or it has no worth.
Patent use of concrete extruder to make schools, use of concrete extruder to make offices. Patent making window holes using concrete extruder. Patent concrete extruder with stone chipping attachment. Patent concrete extruder with sharp corner making attachment. Patent concrete extruder with hole proder for making lighter walls.
Better still, wait a couple of years, patent making houses with a concrete extruder, attach it to your abandoned (really failed) earlier patent applications, wait ten years till nobody can remember who invented what then sue sue sue!
Wikipedia search problem, maybe Google could donate one of their search boxes?
Google search problem, Looks like that page is new, the top result I get now is the title minus the description, so it looks like a fresh page just pulled but not in all the data centres yet.
They had an update in early Feb, and a fix in March, its a bit rough right now.:(
Shows you how fresh Wikipedia is, it looks like the DNS Cache poisioning page is too new to be indexed by either Google or Yahoo. More to the point I can see why Microsoft wants to go the same way.
Within anti-trust rules yes they are free to screw over their customers.
The reason they are in court for this one, is because their Windows monopoly clients favour their Servers and they seek to keep it that way via withholding the interoperability information.
You can't leverage a monopoly to gain another monopoly like that, its a breach of anti-trust rules, and the withholding of the APIs is just the lever they're using.
Now as the price for giving up that lever, they want their competitors to be $80/seat or 5% of revenue more expensive and to exclude some competitors they haven't been able to compete against, those pesky open source servers.
Its worth pointing out that they can do this to Real, Sun etc. today, but tomorrow it could be any major corp thats been stupid enough to become over dependent on MS.
Imagine you depended on.NET and.NET changed and became deprecated, but the new.NET would cost you your patents. Ouch!
Same goes with their DRM, can you imagine what will happen when you are dependent on MS to access your documents? How much will you pay then!?
So if they're doing this for server protocol, what next, future Windows API with a license?
i.e.:
1. You build your corporate application on Windows. 2. Your company becomes dependent on it. 3. Windows XP is discontinued. The new version has a 'new' 'enhanced' slightly different API. 4. You want the documentation. 5. Microsoft says, no problem, but it will cost you. 6. Your screwed, you take the hit of shifting your people over to a new platform, or you take the hit and give Microsoft what it wants.
Yahoo could let you put per-page targetted phrases on the page. So in the advert code you could put "Chocolate Confectionary;Wooden Clackers;Pink Pajamas" and if Yahoo hasn't analysed the page yet it serves up an add for Chocolate Confectionary, Wooden Clackers or Pink Pajamas....
There are lots of sites that generate pages on the fly, but Google can't serve up an ad until its parsed the page, so the first showing of that page (the most important) shows no adverts.
Same with general news site, context analysis is terrible for general news, it would be better to let the new site specify the keyphrases on a per-page basis.
Dropping the fee for small businesses applying for patents. I don't think that helps, patents are worthless protection, its the *lawsuits* protecting the patented idea that cost the money not the patent application. Without the lawsuits a patent offers no protection at all.
Patent office to focus on quality not quantity. The problem I have with this is how is the patent office supposed to determine if software is new and novel. i.e. I think they're patenting rubbish simply because they don't know all the prior art available. Its all closed source and cannot be determined.
Microsoft are complaining about the patent situation in the USA *after* the vote in Europe. Before the vote they held shows for the Commission showing how innovative they are and for all the mentions in this story their lobbyists were there.
I think you're thinking of Hilltop, (scoring links from pages that score well on the same term but are not related otherwise) but Hilltop had a different problem it would favour sites with a home page on the topic and ignore authority sites with the relevent page buried, because other sites tend to link to the top page.
Pageranks works differently since it covers all search terms and is ignorant of the search phrase. Its is a "how important is this page on the net" rank.
This article is about trust rank, a sort of "this page has X% probability of being spam rank"
I think this trust rank is part of the current Google algo, it shows all the signs. I don't think its used like that in Yahoo.
Google current results are on topic but weak = It is inevitable that some authority pages that are on topic but untrusted will be kicked down making more second tier but non-authority sites come to the top -> shallow results of non-authority sites.
Deep searches hit or miss, will Google find it or not? Sometimes yes, sometimes no = A page about "hotels near montserrat monastry" is spam to someone looking for information on "monserrat monastry" but might be perfectly ontopic for "hotel with disable access ramp near monserrat monastry". The probability of spam is meaningless for that long phrase since it's so obscure as to be not the target of spam-> zero probability of spam. Yet the trustrank would penalise the page the same amount for both phrases.
That's why I think they should have a cut off per search phrase, on heavily spammed search phrases the cut off would be high trimming off many sites, on low spammed phrases it would be low or zero. The page might have 0.2 probability of being spam. For "cheap viagra" it would be below the 0.5 cutoff for that heavily spammed term and thrown away, but for "effects of viagra and how to obtain the same result on the cheap" is would be above the 0.001 cutoff and hence ontopic.
Strange fluctuations in serps: Two algos pushing against each other using the same basic page and link information -> chaotic butterfly wing flapping behaviour.
Suppose you had the perfect Oracle that could check every search result and clean it of spam.
Ranking by onpage text, links etc., the items that make a page relevant or not gives you:
A. 1st most relevant.
B. 2nd most relevant
C. Spam
D. 3rd most relevant
E. 4th most relevant.
F. Spam
After your Oracle has hand checked every site you get:
A. 1st most relevant.
B. 2nd most relevant
C. 3rd most relevant
D. 4th most relevant.
Not:
A. 10th most relevant
B. 2nd most relevant
C. 8th most relevant
D. 5th most relevant
Ranking by trust as well as relevance gives you a clean but not very relevant result set.
I've read it but is sounds mixed up. Isn't the ideal result from a search engine:
Matches - spam - offtopic, sorted by relevence
not
Matches sorted by f(pagerank,trustrank)
Google used pagerank+on page text as a measure of how relevent a page is but thats not reliable anymore because the set contains spam pages.
The 'trusted' value tells you nothing about relevence, it only gives the likelyhood of the page being spam or not spam. If its spam you want it removed, if its not spam, then its page rank determines its relevence not some function of pagerank and trustrank.
i.e. they should not promote or demote pages because on trust rank, they simply define a cut off value K, if the trust is less than K then its likely spam and should be removed.
Since spam follows money terms, they should have K(keyphrase), so they can change the value of K on each keyphrase to remove the spam. Otherwise they will filter non money terms where no spam exists and their algo can only do harm!
Here's the brochure for the D50
e s/ GB/D50_Leaflet.pdf
http://www.europe-nikon.com/uploads/ngb/Brochur
"Compressed NEF (RAW): 12-bit compression,
JPEG: JPEG baseline-compliant
Exif 2.21, Compliant DCF 2.0 and DPOF"
Notice it says NEF (RAW) without stating the missing white balance information.
Further down it talks about the camera supporting white balance.
"Auto (TTL white balance with 420-pixels RGB sensor), six manual modes, preset white balan"
And the only mention of software is in the "Optional Accessories".
"Optional Accessories....Nikon Capture 4 (ver.4.3) Software"
So, you have:
1. A misleading statement that NEF is RAW format.
2. A statement that the camera supports white balance.
3. A statement that the capture software is extra.
"due diligence "
Due diligence requires proper information.
The reasonable expectation is that the camera doesn't raise any artificial obstacles to you getting at your data.
This is what you think you're getting at purchase, yet its not so.
Hence it should be labelled and any adverts saying it supports RAW should add that the whitebalance info is only available to some software makers.
Its not an open source vs closed thing.
Its a 'Company adds a little GOTCHA in their product and doesn't disclose it on the box' thing.
Its an artificial barrier raised between you and your pictures for the purpose of extracting more money from you.
" So return the damn camera!"
Great idea, will Nikon give me my RAW photos that I've already taken, when I return it, or will I be forced to get them in degraded form?
You take a photograph, you think its yours, taken with a camera you bought, of a subject you chose, with all permissions sorted.
However you then find there's an extra little catch.
You can only access your picture with software that your camera maker has decided to approve.
You didn't agree to any of this, it didn't warn you on the box, nobody told you that the pictures are only your subject to some extra pre-conditions and you had reasonable expectation that the camera would not raise artificial obstacles to you getting at your picture.
And this situation is somehow supposed to be acceptable?
Also No,
because it does nothing to stop the patent play companies who never make a product to be marked.
It does nothing to help disclose prior art outside of patented products, since only patented things need to be marked.
It does nothing for tarball products. Imagine receiving a Windows XP with a readme listing 70000 patent numbers.
There is no real penalty for overspecifying, its just bytes in a file. So companies will simply claim their software utilises all their patents. Without the code who can prove it doesn't? Rendering the disclosure worthless.
The paper mentionsthe IBM progress bar patent from 1990: Patent on progress bar
Here's a screen shot from the Apple2GS (Actually its running on a GUS emulator becauses it way too old). AppleIIgs screen shot
Notice the progress bar it displayed as it was starting up. Thats from 1983?
That's a European patent.
It's in the nature of software that you can release the product without explaining how the internal algorithms work.
So there is no paradox for software, he can both show it and not reveal how the black box works inside.
2 cent opinion.
"Kenneth Arrow's information paradox, which describes the problem faced by an inventor selling an idea. Anybody contemplating the purchase of this idea will, naturally, want to know what it is. But if the inventor reveals his idea, he no longer would have anything to sell."
He could implement it and show its advantages and sell it on its advantages.
Recall Fox softwares 'Rushmore' database technology. They showed the benefit without revealing the technique.
Of course he, the inventor, must be able to implement it, or how else could anyone else?! Also he must be able to show advantages or it has no worth.
Patent use of concrete extruder to make schools, use of concrete extruder to make offices. Patent making window holes using concrete extruder. Patent concrete extruder with stone chipping attachment. Patent concrete extruder with sharp corner making attachment. Patent concrete extruder with hole proder for making lighter walls.
Better still, wait a couple of years, patent making houses with a concrete extruder, attach it to your abandoned (really failed) earlier patent applications, wait ten years till nobody can remember who invented what then sue sue sue!
Wikipedia search problem, maybe Google could donate one of their search boxes?
:(
Google search problem, Looks like that page is new, the top result I get now is the title minus the description, so it looks like a fresh page just pulled but not in all the data centres yet.
They had an update in early Feb, and a fix in March, its a bit rough right now.
"Dns cache poisoning"
Encarta:
Separate articles on Cache, DNS and Poison none useful.
Wikipedia:
None found, Suggests searching Wikipedia with Google or Yahoo, Google suggests this:
http://en.wikipedia.org/wiki/Spoofing_attacks
Which has a link to this one:
http://en.wikipedia.org/wiki/DNS_cache_poisoning
Shows you how fresh Wikipedia is, it looks like the DNS Cache poisioning page is too new to be indexed by either Google or Yahoo.
More to the point I can see why Microsoft wants to go the same way.
"offset-balast on a DC-motor with a simple motor speed control"
You mean an eccentric wheel! Why don't you use the proper word for this!
"eccentric, noun, A disk or wheel having its axis of revolution displaced from its center so that it is capable of imparting reciprocating motion."
Can I trade these?
If I make 8 movies and don't sue anyone for pirating them, can I trade that for killing someone?
Nobody important, I accept that Gates is worth at least 20 movies....
Within anti-trust rules yes they are free to screw over their customers.
.NET and .NET changed and became deprecated, but the new .NET would cost you your patents. Ouch!
The reason they are in court for this one, is because their Windows monopoly clients favour their Servers and they seek to keep it that way via withholding the interoperability information.
You can't leverage a monopoly to gain another monopoly like that, its a breach of anti-trust rules, and the withholding of the APIs is just the lever they're using.
Now as the price for giving up that lever, they want their competitors to be $80/seat or 5% of revenue more expensive and to exclude some competitors they haven't been able to compete against, those pesky open source servers.
Its worth pointing out that they can do this to Real, Sun etc. today, but tomorrow it could be any major corp thats been stupid enough to become over dependent on MS.
Imagine you depended on
Same goes with their DRM, can you imagine what will happen when you are dependent on MS to access your documents? How much will you pay then!?
Welcome to the real world indeed.
So if they're doing this for server protocol, what next, future Windows API with a license?
i.e.:
1. You build your corporate application on Windows.
2. Your company becomes dependent on it.
3. Windows XP is discontinued. The new version has a 'new' 'enhanced' slightly different API.
4. You want the documentation.
5. Microsoft says, no problem, but it will cost you.
6. Your screwed, you take the hit of shifting your people over to a new platform, or you take the hit and give Microsoft what it wants.
There should be an extra tag
[noindex]....... [/noindex]
So that *parts* of a page can be excluded from indexing.
AFP could then put that in their story text it gives to other sites.
Maybe its just me, but if the 302 is to a different domain, do you have to assign it across?
I see lots of 302s used for country shifts e.g. a French visitor is shifted from www.foo.com to fr.foo.com, but its under the same domain foo.com.
For the ones shifted to other domains, does it matter if you ignore the 302 and take visitors directly to fr.foo.com?
Yahoo could let you put per-page targetted phrases on the page. So in the advert code you could put "Chocolate Confectionary;Wooden Clackers;Pink Pajamas" and if Yahoo hasn't analysed the page yet it serves up an add for Chocolate Confectionary, Wooden Clackers or Pink Pajamas....
There are lots of sites that generate pages on the fly, but Google can't serve up an ad until its parsed the page, so the first showing of that page (the most important) shows no adverts.
Same with general news site, context analysis is terrible for general news, it would be better to let the new site specify the keyphrases on a per-page basis.
They're suggesting:
Dropping the fee for small businesses applying for patents. I don't think that helps, patents are worthless protection, its the *lawsuits* protecting the patented idea that cost the money not the patent application. Without the lawsuits a patent offers no protection at all.
Patent office to focus on quality not quantity. The problem I have with this is how is the patent office supposed to determine if software is new and novel. i.e. I think they're patenting rubbish simply because they don't know all the prior art available. Its all closed source and cannot be determined.
Microsoft are complaining about the patent situation in the USA *after* the vote in Europe. Before the vote they held shows for the Commission showing how innovative they are and for all the mentions in this story their lobbyists were there.
So I doubt they're angels here.
Yes he can have it both ways. He is fully aware that software patents are bad, and he intends to take advantage of the badness to further his aims.
This is the mark of an evil person. He understand he's doing damage and does it anyway because he can benefit from it.
I wonder if this 'MICROSOFT IS INNOVATIVE' story is time to conincide with the patent vote in Europe tomorrow.
..predicts upcoming traffic conditions..n ces/trb/00326. pdf
Lets see MSN Desktop search....
http://desktop.google.com/
Teddy bear running windows...
http://www.aibo-europe.com/
Navigating photo libraries....
http://www.flickr.com/ ?
TouchLight,
http://www.minorityreport.com/
http://www.its.berkeley.edu/confere