Google TrustRank
Philipp Lenssen writes "Google registered a trademark for the word "TrustRank", as Search Engine Watch reveals. Is this a sign we can expect a follow-up to Google's PageRank? An earlier, possibly related paper on TrustRank is available; it proposes techniques to semi-automatically separate good pages from spam by the use of a small selection of reputable seed pages."
Are these going to be used in conjuction? It would be very nice to be able to sort out those pages that have nothing but a long list of keywords on them. It's probably all in vain, as somehow will sooner or later find a way to get around this, as well.
Portland, North Dakota Puppies
This is a step in the right direction conceptually, but giving a smaller number of "seed sites" more rank influence increases the potential fallout from any rank cheats that may be found in the future (see Google Bomb and Google 302 exploit.
Google may be better off as they are currently leaving all sites initally equal in influence before the Pagerank calculation.
Then again, Google has a great track record for testing their ideas before committing them to general service...
Its not censorship. Google couldn't censor even if they wanted to. Rather than explaining to you what censorship means, let me just tell you that what Google is doing is siply doing their job better. I don't want to find spam when searching for anything, and neither does anyone else. Ergo, eliminating spam from the search results makes everyone (except spammers) happier.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
Yes, this is always a problem. How can you possibly know whether or not a site is spam just by looking at who's linked to it? A lot of great sites have very few external links to them and often they're from blogs and other sites that will likely be identified as spam prone.
This is a basic problem of filtering web-content. How do you avoid throwing out the baby with the bath water? I'm running into that problem in designing a custom filter to keep my son from inadvertently seeing pornography as he looks for his "r0mz," but that's peanuts compared to Google's dilemma.
The fact is, spam filtering is inherently censorship. This kind of interference will always have a negative impact on the marketplace of ideas that is the modern internet. On the other hand, as a side effect, removing blogs from search results (as this trust metric very likely will) may increase the usability of Google overall. I suspect there will be some people who are not as happy about that as I am.
-- Molly Lipton, Born Again Technologist.
I've got a TR7 site with four links available...
That's why us open source programmers always throw out and completely rewrite our programs from version 2.6 to version 2.8
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
Two points: 1) Any new system Google implements will run along the side of PageRank; they've invested too much to completely switch all of Google running to TrustRank. The system might even augment current PageRank by running an algorithm over the data that PageRank returns. We can only speculate as of now. But I can assure you that one will not replace the other, and there will probably be a way to use both systems in the future if you like. Hell, using your Gmail account, you may even be able to specifically tune PageRank, making certain pages more relevant to you appear higher in search results.
2) You have the option of not using Google. Yahoo is a completely independent search engine now.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
You know, new solutions are most often patched old ones.
:)
Should Google just throw away their many years of research, and start from scratch?
I find this trust-based approach interesting, but I wonder how it's gonna work for smaller sites (Which the few trusted seeds will not ever link to), but I guess the smaller sites don't really have a problem as it is, because only specific search-terms are targeted.
There's also the problem of allowing new websites into the game, but I guess that's for the Google developers to figure out.
My <1000 UID is with a hot chick
How is this different from applying a weighting to PageRank?
Will the owners of the pages / sites deemed to fall within the set of trusted seed sites get any money for all their hard work (i.e. hand-maintaining pages of links)?
What if such an owner decides to link to a page of commercial or spam links - will they get any money from the owner of the linked site? Is this a possible method of abuse?
Will that cool poster of links between websites now become 3D to give trusted links more prominence?
You fail to understand that google is incapable of actually censoring anything. Them not displaying a webpage in their results does not, indeed, remove it from the web.
Google's primary responsibility now is to it's shareholders, which means increasing the chance that you and I find exactly what we are trying to look for, and not to unabashedly display every peddler that serves up content over http.
The fact is, we really don't have enough information as of yet to conclude whether this is a patch to PageRank, or a secondary system, running along side PageRank. One can assume it to be the former, but the latter could work just as well with Google's new corporate concept.
Imagine going into your Gmail account settings, adding a string of a few websites you deem to be "superior" or of better quality, and then let TrustRank grab the collection of all of these, note where the highest votes go, and use these as more "Trustworthy" search results. Or, using PageRank, it simply adds an option "Vote these sites higher because they are linked to the user defined site settings."
Both schema make Search Engine spamming more controllable by Google (Simply by terminating accounts linked to spammers), and could have an interesting effect. Can't wait to see what happens with TrustRank.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
So, links from pages of bad reputation give your page bad reputation?
I can see this already....
This page contains very objectionable content.
If you are easily offended, don't enter.
Blah, blah, blah.
Blah, blah, blah.
Do you agree to these conditions?
Yes No
Anagram("United States of America") == "Dine out, taste a Mac, fries"
This sounds very similar to Advogato's trust metric, which uses a "seed" of trusted accounts to filter out trolls/spammers. The difference might be that it should be even easier to implement in the case of web pages, because they already have links to each other, avoiding the reliance on users to manually "certify" other user accounts in order to build the graph.
-- If no truths are spoken then no lies can hide --
It's not necessarily censorship. They could just present the "trustworthy" pages first. You could always skip to the later pages if you wanted, just like you can browse /. at -1 if you want.
And yes, this means that the system could be abused, just like PageRank and /. moderation. Anyone want to do away with those?
To see Google's TrustRank Trademark info on the USPTO site, click here , click "New User Form Search (Basic)", and search for "TrustRank".
Google, as we all know, is a reputable service provider; they get the job done efficiently and innovatively. Now they are continuing their attack on the ails of the internet which was started by Gmail spam filtering. By developing this tool, Google is helping to clean the Internet up and enable it to become the massive source of pure information it has such potential to be. The "negative" sites on the Internet, such as keyword sites with no real content which invade search results, and the like are a bane to the community and by helping get rid of them, Google is yet again doing us all a favour. Google, I salute you.
Given that one of the authors of the referenced paper is an employee of Yahoo, I have to wonder if whatever Google has in mind has anything whatsoever to do with the trustrank scheme we're talking about here. I mean, all we know is they trademarked the word, nothing more.
"Google couldn't censor even if they wanted to."
Since when? Google is a privately owned corporation. They've got stock holder to answer to now, but it still stands that they can do what they want with what they own. They're not obligated to give you unfiltered results on their free, privately owned service.
Trustrank is basically the same as resetting pagerank.
What happens is, that humans select some webpages which they trust. The idea is, that these trustworthy webpages only links to good sites. So, the trustworthy webpages are used as seed into a regular webcrawler.
At first glance, this looks like a low pass filter to me. Ie the same result could be achieved by cutting all PR 5 sites.
The funny thing is, that one of the authors of the Trustrank paper is from Yahoo.
Underholdning.info
"You fail to understand that google is incapable of actually censoring anything."
Yes, they can. Their search results.
* When I say sleezeballs and tweeking, I mean the people who will try outrageous stunts to game the system, rather than the consultants who will help you increase rank by the stunning tactic of actually improving your site. Radical, but sometimes it works.
One line blog. I hear that they're called Twitters now.
its censorship in the same way that excluding undesireable content from television or radio is censorship.
I don't want to find spam when searching for anything, and neither does anyone else. Ergo, eliminating spam from the search results makes everyone (except spammers) happier.
an anon has replied, "what is spam?" and i pose the same question. "spam" or unwanted content is far too complex an issue to be derived by a script. i could have a moodswing (or multiple personalitydisorder, or any number of other examples) and decide i want viagra one day. how will google's tech know what is and isnt crap? granted, there are bound to be select sites that actually do ship viagra, but there are countless millions more of simple shit. it would be a very complex task to weed out these unwanted pages, especially the 'small business' types (upstarts) with no PageRank preference.
this sig no verb
I read another post speculating that gmail users could be used as voters to choose trusted sites. Something that would probably actually work would be tagging domains that are received by a certain percentage of the gmail population and NOT marked as junk, and then giving them weight according to their percentage.
Becase we gmailers are picky.
It would probably have to be integrated with something else, because I bet there are a few pr0n mailing lists that lots of people have.
Please stop stalking me, bro.
Considering how much market share Google has, them not displaying a web page in their results (or dropping it a few hundred places) effectively removes it from the web.
Google's primary responsibility now is to its shareholders. Google makes money from advertising. If Google can encourage you to patronize its advertisers instead of trusting its index for everything (which right now is pretty easily gamed), then Google makes more ad revenue and shareholders are happy.
For more information, click here.
No, probably the other way round.
If you're linked by a trusted page, then your rank goes up, but there's no negative for being linked by untrusted pages - your pagerank stays the same.
The google-watch page on PageRank already mentions how pagerank, over the years, has switched from an actual score of popularity (number of links to a page), to a trustrank-like index, based on the reputability of the links to a page. This makes it much harder for the newbie to get a good pagerank, and empowers way too much the owners of old web sites and corporate pages.
/.ers.
Even though it contains way too much rant for my taste, google watch is worth a full read by all
but you can always go to another search engine or go to the page directly.. so if they wanted/tried to censor, they'd only lose users.
so when google desides what's trusted for us, what is good content and what isnt, are they still not being "evil"?
Are you fucking kidding me? This is just another mechanism for deciding whether particular pages should be shown for queries. Show me a search engine that doesn't do that.
If you use a search engine, then by definition you are trusting them to show you relevant results. If you don't want to trust Google, then use another search engine. If you don't want to trust another search engine, them stop using them entirely.
Your basic complaint is against the very nature of a search engine. The hysteria surrounding Google now that they have gone public has blinded people to common sense. You don't have to scrutinise every little thing Google does to see if it's "evil". You just have to use some sense.
This is wierd. It is the 19th hit (on the second page) of a google search for "trustrank" It requires a login from google's results page, but a google's cache reveals a directory including the paper linked to by /.
I guess we weren't supposed to read this. And you shouldn't have read *this*!
"so when google desides what's trusted for us, what is good content and what isnt, are they still not being "evil"?
Yes.
Why is it that everyone is constantly striving to find Google's evil? Ranking the relevancy of pages to a search is Google's job. By ranking spam as relevant to my search they have failed. Using the concept of a web of trust to establish relevancy is a fairly obvious solution and has well established analogs in other fields (e.g. PKI).
If you're looking for evil, try GE, GM, or Unilever. Google doesn't even begin to rank on the evil-o-meter.
It would be amazing if Google gave us the ability to assign trust values to sites that we ourselves trust. This way, for example, I might give Wikipedia or the BBC a 10/10 trust rating for all their off-site links (and set it so that links off the linked sites are at 50% of their parent trust rating etc.). If we could also subscribe to someone else's trust ratings then technically illiterate people could hand over the responsibility of managing their trust database to someone else. From first thoughts, this looks like it could solve the problem of malicious SEO.
"Freedom of Speech"
First of all, the only protection that is guaranteed you here is that the gov't will make no law abridging the freedom of speech.
Google, as influential as they might be, are not the government (insert 'Do No Evil' joke here). Therefore, they are not bound to this "Freedom of Speech" argument.
Secondly, "Freedom of Speech" is not this universal, higher-being ordained preserve at all cost idea that we have transformed it into.
Freedom of speech does not give you the right to spray-paint your slogan all over my front door, nor, in this case, does it give you a 'right' to be listed on Google. Nor do you have a 'right' to have your name printed on the front page of your local paper in 36pt font.
Not being listed in Google does not amount to censorship in any definition of the word. The net existed before google, and people still managed to find web-sites. Google gives (through PageRank or whatever mechanism they choose) free advertisement to 'good' sites. They have every right to only display sites that pay money, if they so desired. You have absolutely zero (0) 'rights' to be listed for free on Google.
Trotting out the Freedom of SPEECH argument is nothing more than whining about Big Brother coming to get you because what you have to say isn't worth hearing. Guess what? If you want to be heard, say something that's worth listening to. All that glitters is not gold, and much that is said (or printed) is worthless drivel. Much like this post.
Censorship would be making the content unavailable. They're simply bringing more relevent content to the top of the search, which is what a search engine is supposed to do in the first place. If yoou want what's considered to be spam, hit next a few times.
Couldn't they just look for links in gmail messages and use those as
weights in a trust system?
Links in messages identified as spam could be given a negative
weight. That weight could be determined by the number of people
identifying messages with that link as spam. Links from those sites
would being given less trust than a completely unknown page, unless they
are positively weighted themselves or linked to by a positively weighted
site. Links found in non spam messages could be given positive weights
by the same rules.
This would also have the advantage of offering spam filtering rules
based on trustrank weights. Setting a minimum trustrank would allow the
system to weight the email by checking the links in the email, and using
their trustrank for the message itself. The automated spam filtering
gmail offers could thus affect trustrank, increasing the impact of both
systems (email and searching) and possibly allowing it to be extended
to google groups/Usenet filtering.
Potential Examples
(moving each weight given by linking 1 point towards 0)
site1 [+5] - url found in 5 non spam messages
site2 [-5] - url found in 5 spam messages
site3 [+4] - url linked to from site1 (5 + -1)
site4 [-4] - url linked to from site2 (-5 + 1)
site5 [0] - url linked to from site1 and site2 (5 + -5)
site6 [3] - url linked to from site1, site3, and site2. (((5 + 4) + -5) + -1)
Email1 [-5] - contains links to site2, site4, and site6 (((-5 + -4) + 3) + 1)
Not perfect perhaps, but workable and easy to combine with a simple
rule set for weighting parts of a url to create an 'intelligent' system
guided by user preferences.
- Christine
I've read it but is sounds mixed up. Isn't the ideal result from a search engine:
Matches - spam - offtopic, sorted by relevence
not
Matches sorted by f(pagerank,trustrank)
Google used pagerank+on page text as a measure of how relevent a page is but thats not reliable anymore because the set contains spam pages.
The 'trusted' value tells you nothing about relevence, it only gives the likelyhood of the page being spam or not spam. If its spam you want it removed, if its not spam, then its page rank determines its relevence not some function of pagerank and trustrank.
i.e. they should not promote or demote pages because on trust rank, they simply define a cut off value K, if the trust is less than K then its likely spam and should be removed.
Since spam follows money terms, they should have K(keyphrase), so they can change the value of K on each keyphrase to remove the spam. Otherwise they will filter non money terms where no spam exists and their algo can only do harm!
I think TrustRank would be more useful in Gmail to give a reading on how "spammy" an email is. They already have something like it, where a box shows up warning you that the sender may have spoofed their address.
... would be nice if you could use adblock style filtering on Google search results, then if you wanted to get rid of certain results, (i.e. from blog or "sales" sites), you could block their domains.
Probably wouldnt be that difficult to get around it but might help a bit
t
Suppose you had the perfect Oracle that could check every search result and clean it of spam.
Ranking by onpage text, links etc., the items that make a page relevant or not gives you:
A. 1st most relevant.
B. 2nd most relevant
C. Spam
D. 3rd most relevant
E. 4th most relevant.
F. Spam
After your Oracle has hand checked every site you get:
A. 1st most relevant.
B. 2nd most relevant
C. 3rd most relevant
D. 4th most relevant.
Not:
A. 10th most relevant
B. 2nd most relevant
C. 8th most relevant
D. 5th most relevant
Ranking by trust as well as relevance gives you a clean but not very relevant result set.
I've got an idea: Anytime you see an informative and/or insightful post whose contents you would like to see modded up, but which has a spam-o-licious free [product] link in the sig, just copy the informative content into a new Anonymous Coward post, which the mods can then moderate higher, while the spammy parent can be modded down into obvlion.
I'm sure, eventually, they'd learn ;)
This reminds me of the PageRank problem where all the porn sites link to Disney.com and if you searched for "sex" it would rank tops... Think of the implications...
Please stop censoring me.
English is easier said than done.