Google TrustRank
Philipp Lenssen writes "Google registered a trademark for the word "TrustRank", as Search Engine Watch reveals. Is this a sign we can expect a follow-up to Google's PageRank? An earlier, possibly related paper on TrustRank is available; it proposes techniques to semi-automatically separate good pages from spam by the use of a small selection of reputable seed pages."
Yes, this is always a problem. How can you possibly know whether or not a site is spam just by looking at who's linked to it? A lot of great sites have very few external links to them and often they're from blogs and other sites that will likely be identified as spam prone.
This is a basic problem of filtering web-content. How do you avoid throwing out the baby with the bath water? I'm running into that problem in designing a custom filter to keep my son from inadvertently seeing pornography as he looks for his "r0mz," but that's peanuts compared to Google's dilemma.
The fact is, spam filtering is inherently censorship. This kind of interference will always have a negative impact on the marketplace of ideas that is the modern internet. On the other hand, as a side effect, removing blogs from search results (as this trust metric very likely will) may increase the usability of Google overall. I suspect there will be some people who are not as happy about that as I am.
-- Molly Lipton, Born Again Technologist.
You fail to see beyond your nerd reality. This is not a "bit of a hack". If you are assuming so because of the name, it is obvious you know nothing about marketing. Even if it has nothing to do with Pagerank, Google will continue the xRank naming convention, as it is known and trusted. RTFP (paper) before you spout off that this is a "hack". It is a whole new methodology.
The fact is, we really don't have enough information as of yet to conclude whether this is a patch to PageRank, or a secondary system, running along side PageRank. One can assume it to be the former, but the latter could work just as well with Google's new corporate concept.
Imagine going into your Gmail account settings, adding a string of a few websites you deem to be "superior" or of better quality, and then let TrustRank grab the collection of all of these, note where the highest votes go, and use these as more "Trustworthy" search results. Or, using PageRank, it simply adds an option "Vote these sites higher because they are linked to the user defined site settings."
Both schema make Search Engine spamming more controllable by Google (Simply by terminating accounts linked to spammers), and could have an interesting effect. Can't wait to see what happens with TrustRank.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
So, links from pages of bad reputation give your page bad reputation?
I can see this already....
This page contains very objectionable content.
If you are easily offended, don't enter.
Blah, blah, blah.
Blah, blah, blah.
Do you agree to these conditions?
Yes No
Anagram("United States of America") == "Dine out, taste a Mac, fries"
To see Google's TrustRank Trademark info on the USPTO site, click here , click "New User Form Search (Basic)", and search for "TrustRank".
Trustrank is basically the same as resetting pagerank.
What happens is, that humans select some webpages which they trust. The idea is, that these trustworthy webpages only links to good sites. So, the trustworthy webpages are used as seed into a regular webcrawler.
At first glance, this looks like a low pass filter to me. Ie the same result could be achieved by cutting all PR 5 sites.
Couldn't they just look for links in gmail messages and use those as
weights in a trust system?
Links in messages identified as spam could be given a negative
weight. That weight could be determined by the number of people
identifying messages with that link as spam. Links from those sites
would being given less trust than a completely unknown page, unless they
are positively weighted themselves or linked to by a positively weighted
site. Links found in non spam messages could be given positive weights
by the same rules.
This would also have the advantage of offering spam filtering rules
based on trustrank weights. Setting a minimum trustrank would allow the
system to weight the email by checking the links in the email, and using
their trustrank for the message itself. The automated spam filtering
gmail offers could thus affect trustrank, increasing the impact of both
systems (email and searching) and possibly allowing it to be extended
to google groups/Usenet filtering.
Potential Examples
(moving each weight given by linking 1 point towards 0)
site1 [+5] - url found in 5 non spam messages
site2 [-5] - url found in 5 spam messages
site3 [+4] - url linked to from site1 (5 + -1)
site4 [-4] - url linked to from site2 (-5 + 1)
site5 [0] - url linked to from site1 and site2 (5 + -5)
site6 [3] - url linked to from site1, site3, and site2. (((5 + 4) + -5) + -1)
Email1 [-5] - contains links to site2, site4, and site6 (((-5 + -4) + 3) + 1)
Not perfect perhaps, but workable and easy to combine with a simple
rule set for weighting parts of a url to create an 'intelligent' system
guided by user preferences.
- Christine
I think TrustRank would be more useful in Gmail to give a reading on how "spammy" an email is. They already have something like it, where a box shows up warning you that the sender may have spoofed their address.
How is this different from applying a weighting to PageRank?
It attempts to detect clusters of pages which have few inbound links, which also propagating "trust" scores to all other sites by using their linking structure. For sites that have many inbound links (high scroring in pagerank), the authors claim this modification tends to classify spam and reputable sites differently.
Will the owners of the pages / sites deemed to fall within the set of trusted seed sites get any money for all their hard work (i.e. hand-maintaining pages of links)?
No.
However, they will get better search engine visibility, which is quite valuable.
What if such an owner decides to link to a page of commercial or spam links - will they get any money from the owner of the linked site?
The paper suggests using only highly reputable organizations with long-term stability for the seed pages. Government organizations, universities, very well known companies.
The analysis in the paper is based on a per-site graph, not per-page, by the way. They lacked the resources to try these computations on such a large data set.
Is this a possible method of abuse?
Presumably, the small set of seed pages/sites will need to be monitored by staff employed by the search engine company. If one of the trusted seed sites "went bad", they would need to be removed from the list.
Will that cool poster of links between websites now become 3D to give trusted links more prominence?
Probably not.
PJRC: Electronic Projects, 8051 Microcontroller Tools
It should be noted that the Slashdot user No More Free Stuff catalogs such links, and by adding this user as a friend and assigning a negative bonus to foes of friends, you can lower the moderated value of any such posts.
The truth about Scientology, Xenu, and you: Operation Clambake
This sounds very similar to the Trust sytem used by Vipul's Razor and Cloudmark software. I have used the Spamnet product since 2001 and run Vipul's Razor on my mailserver, it is the most accurate filter that I've found (and believe me, I've tested them all). Kudos to Google!