Google TrustRank
Philipp Lenssen writes "Google registered a trademark for the word "TrustRank", as Search Engine Watch reveals. Is this a sign we can expect a follow-up to Google's PageRank? An earlier, possibly related paper on TrustRank is available; it proposes techniques to semi-automatically separate good pages from spam by the use of a small selection of reputable seed pages."
with the newly proposed AdSense plans?
Have you metaroderated recently?
so when google desides what's trusted for us, what is good content and what isnt, are they still not being "evil"? additionally, how are the pages seperated? on what criteria? man or machine (potential for flaws on either side)?
this sig no verb
Are these going to be used in conjuction? It would be very nice to be able to sort out those pages that have nothing but a long list of keywords on them. It's probably all in vain, as somehow will sooner or later find a way to get around this, as well.
Portland, North Dakota Puppies
This sounds like a bit of a hack to me. PageRank worked once (and was great) but now spam is a real problem all Google can do is try and modify their original tech?
We need new solutions, not patches on old ones.
This is a step in the right direction conceptually, but giving a smaller number of "seed sites" more rank influence increases the potential fallout from any rank cheats that may be found in the future (see Google Bomb and Google 302 exploit.
Google may be better off as they are currently leaving all sites initally equal in influence before the Pagerank calculation.
Then again, Google has a great track record for testing their ideas before committing them to general service...
I've got a TR7 site with four links available...
How is this different from applying a weighting to PageRank?
Will the owners of the pages / sites deemed to fall within the set of trusted seed sites get any money for all their hard work (i.e. hand-maintaining pages of links)?
What if such an owner decides to link to a page of commercial or spam links - will they get any money from the owner of the linked site? Is this a possible method of abuse?
Will that cool poster of links between websites now become 3D to give trusted links more prominence?
So, links from pages of bad reputation give your page bad reputation?
I can see this already....
This page contains very objectionable content.
If you are easily offended, don't enter.
Blah, blah, blah.
Blah, blah, blah.
Do you agree to these conditions?
Yes No
Anagram("United States of America") == "Dine out, taste a Mac, fries"
This sounds very similar to Advogato's trust metric, which uses a "seed" of trusted accounts to filter out trolls/spammers. The difference might be that it should be even easier to implement in the case of web pages, because they already have links to each other, avoiding the reliance on users to manually "certify" other user accounts in order to build the graph.
-- If no truths are spoken then no lies can hide --
Sounds to me like they will be using a similar approach to how most Spam filters work. Given a sample of "good" pages, and "bad" pages, they can classify new pages (and assign a rank accordingly).
To see Google's TrustRank Trademark info on the USPTO site, click here , click "New User Form Search (Basic)", and search for "TrustRank".
Google, as we all know, is a reputable service provider; they get the job done efficiently and innovatively. Now they are continuing their attack on the ails of the internet which was started by Gmail spam filtering. By developing this tool, Google is helping to clean the Internet up and enable it to become the massive source of pure information it has such potential to be. The "negative" sites on the Internet, such as keyword sites with no real content which invade search results, and the like are a bane to the community and by helping get rid of them, Google is yet again doing us all a favour. Google, I salute you.
your loyal minions know not what they do... Okay yeah. I think it's a smart idea. I hope it works. But there IS room for human error. (just as always)
I always find it annoying to find irrelevant pages. If this works I'm happy, else I wont be mad at my Lord. Just a little disappointed.
You have been warned.
Given that one of the authors of the referenced paper is an employee of Yahoo, I have to wonder if whatever Google has in mind has anything whatsoever to do with the trustrank scheme we're talking about here. I mean, all we know is they trademarked the word, nothing more.
Trustrank is basically the same as resetting pagerank.
What happens is, that humans select some webpages which they trust. The idea is, that these trustworthy webpages only links to good sites. So, the trustworthy webpages are used as seed into a regular webcrawler.
At first glance, this looks like a low pass filter to me. Ie the same result could be achieved by cutting all PR 5 sites.
bah, if they could include a green-orange-red light in their toolbar... Go to your bank's website to "verify" your password and a little red light starts flashing in your toolbar? Could be good.
10 ?"Hello World" life was simple then
The funny thing is, that one of the authors of the Trustrank paper is from Yahoo.
Underholdning.info
I think I'll try being less agreeable from now on.
Please stop stalking me, bro.
* When I say sleezeballs and tweeking, I mean the people who will try outrageous stunts to game the system, rather than the consultants who will help you increase rank by the stunning tactic of actually improving your site. Radical, but sometimes it works.
One line blog. I hear that they're called Twitters now.
I read another post speculating that gmail users could be used as voters to choose trusted sites. Something that would probably actually work would be tagging domains that are received by a certain percentage of the gmail population and NOT marked as junk, and then giving them weight according to their percentage.
Becase we gmailers are picky.
It would probably have to be integrated with something else, because I bet there are a few pr0n mailing lists that lots of people have.
Please stop stalking me, bro.
No, probably the other way round.
If you're linked by a trusted page, then your rank goes up, but there's no negative for being linked by untrusted pages - your pagerank stays the same.
The google-watch page on PageRank already mentions how pagerank, over the years, has switched from an actual score of popularity (number of links to a page), to a trustrank-like index, based on the reputability of the links to a page. This makes it much harder for the newbie to get a good pagerank, and empowers way too much the owners of old web sites and corporate pages.
/.ers.
Even though it contains way too much rant for my taste, google watch is worth a full read by all
Are we now reporting Google news from the future?
I read a very interesting article on the possible outcomes of a semantic web, and a google "trust rank" actually appeared in it.
If "Google trusts fooPage" becomes a standard, recognised triplet, I see no reason why this won't be extended to "Google trusts userX", which becomes "ebay trusts userX" etc.
It's very possible they're looking to the future, and have more in mind than "there's probably no pr0n on this page"...
If I search for "stoned whores" what sites should be considered trusted?
This is wierd. It is the 19th hit (on the second page) of a google search for "trustrank" It requires a login from google's results page, but a google's cache reveals a directory including the paper linked to by /.
I guess we weren't supposed to read this. And you shouldn't have read *this*!
It would be amazing if Google gave us the ability to assign trust values to sites that we ourselves trust. This way, for example, I might give Wikipedia or the BBC a 10/10 trust rating for all their off-site links (and set it so that links off the linked sites are at 50% of their parent trust rating etc.). If we could also subscribe to someone else's trust ratings then technically illiterate people could hand over the responsibility of managing their trust database to someone else. From first thoughts, this looks like it could solve the problem of malicious SEO.
Who cares if they change staff, it's not like it's going to impact on world peace/trade or anything else as a matter of fact.
Might impact the development of one of the most critical tools on the Internet, but you're right, that'd just be news for nerds.
Oh, wait...
I really don't understand how these two concepts became conflated in the minds of so many people. "Censorship" is the act of preventing some unpleasant words, sounds, or images from being published. That implies the use or threat of physical force to do the preventing, because there are other publishers who have differing standards. "Editorial discretion" is something entirely different - it's the act of deciding what words, sounds, or images you personally wish to publish. It is, in fact, inherent in freedom of speech itself.
If Google decides that they didn't want to refer any traffic to certain kinds of information, then the people who want that stuff will have to find another search engine. They have to weigh that against the others who will welcome search results that they are looking for. Google's reputation is everything -- if they erode it, they cease to be relevant, and therefore profitable.
[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
Couldn't they just look for links in gmail messages and use those as
weights in a trust system?
Links in messages identified as spam could be given a negative
weight. That weight could be determined by the number of people
identifying messages with that link as spam. Links from those sites
would being given less trust than a completely unknown page, unless they
are positively weighted themselves or linked to by a positively weighted
site. Links found in non spam messages could be given positive weights
by the same rules.
This would also have the advantage of offering spam filtering rules
based on trustrank weights. Setting a minimum trustrank would allow the
system to weight the email by checking the links in the email, and using
their trustrank for the message itself. The automated spam filtering
gmail offers could thus affect trustrank, increasing the impact of both
systems (email and searching) and possibly allowing it to be extended
to google groups/Usenet filtering.
Potential Examples
(moving each weight given by linking 1 point towards 0)
site1 [+5] - url found in 5 non spam messages
site2 [-5] - url found in 5 spam messages
site3 [+4] - url linked to from site1 (5 + -1)
site4 [-4] - url linked to from site2 (-5 + 1)
site5 [0] - url linked to from site1 and site2 (5 + -5)
site6 [3] - url linked to from site1, site3, and site2. (((5 + 4) + -5) + -1)
Email1 [-5] - contains links to site2, site4, and site6 (((-5 + -4) + 3) + 1)
Not perfect perhaps, but workable and easy to combine with a simple
rule set for weighting parts of a url to create an 'intelligent' system
guided by user preferences.
- Christine
Google ! critical?
It's like saying that Microsoft windows is the only O.S. that exist,
The fact that seeds are chosen manually restricts the use of such a scheme to seeds that never change or are known for a fact to be reliable (such as fortune 1000 companies, governement sites or media outlets). Otherwise, a site can start as one that has terrific content only to switch to a spam site the moment it gets a good trust ranking (and other abuses).
.. google could have a manner for acquiring feedback from the users regarding the level of offense of sites acquired. But this in itself may not be that interesting. When combined (think vectorspace combined) with other metrics, it becomes quite valuable.
I don't favor this type of scheme because it is not adaptive enough.
A much better manner for achieving the goals that Google is reaching for would be to hammer out multiple metrics for automated ranking (call them whatever you wish; trust, linkage, etc..) and then apply operations on the ranking vectors to sort site listings. The simple fact that there are multiple metrics would allow statistical analysis to be performed to catch attempted rank hikes.
Having multiple schemes would allow Google to use much more advanced statistical analysis techniques to catch cheats. My assertion here is that while the quality of each individual metric shouldn't be ignored, it is the emergence of patterns between metrics where the most valuable data lies. Ensuring that all data is purely adaptive and algorithmic (rather than hand-chosen seeds, etc..) will provide much more robust results.
A simple example might be an offense meeter. Much like tivo has thumbs-up, thumbs-down,
The reason that it can be true that 1+1 > 2 is that very peculiar nonzero value of the + operator
This isn't an original idea, but I can't remember where I most recently read about the concept, so I'll go ahead and say it's mine:
Trust for things like email senders and web sites shouldn't be centralized. My web of trusted entities, which should be easy to maintain (unlike, say, blacklists or whitelists) and should evolve semi-automatically, should be based on the interaction of my trusted sites/entities, and, in turn, their trusted sites/entities. Sort of like TrustRank, but where each person determines their own initial seed of trusted sites/entities. Of course, if you didn't want to deal with choosing seeds, you'd just pick Google as your trusted site.
This is of course a horribly abstract idea, and I have no idea how I'd implement this for 1 or a million users, but hey, you gotta start with the vision.
Simple Unexpected Concrete Credible Emotional Stories
So de facto censorship is okay, as long as it's not de jure?
Funny, we used to use this same reasoning to keep blacks from voting in the South (unfortunately still do, in some parts).
I mean, minorities are peferectly free to excise their legal right to vote. And their employer is perfectly free to fire them from their job afterward. And their banker is perfectly free to foreclose on their mortgage after he sees them at the polls and they miss a single payment.
You don't have to use the law to strip someone of their civil rights. De facto censorship is every bit as powerful (moreso, I would argue) than de jure censorship.
-Eric
SJW: Someone who has run out of real oppression, and has to fake it.
Troll? How the fuck is this a troll?
This is the most insightful comment I've seen so far.
Did the language offend your delicate American sensibilities, dear moderators?
If you're going to post crazed rants, post the counter, too: Google-Watch Watch.
I've read it but is sounds mixed up. Isn't the ideal result from a search engine:
Matches - spam - offtopic, sorted by relevence
not
Matches sorted by f(pagerank,trustrank)
Google used pagerank+on page text as a measure of how relevent a page is but thats not reliable anymore because the set contains spam pages.
The 'trusted' value tells you nothing about relevence, it only gives the likelyhood of the page being spam or not spam. If its spam you want it removed, if its not spam, then its page rank determines its relevence not some function of pagerank and trustrank.
i.e. they should not promote or demote pages because on trust rank, they simply define a cut off value K, if the trust is less than K then its likely spam and should be removed.
Since spam follows money terms, they should have K(keyphrase), so they can change the value of K on each keyphrase to remove the spam. Otherwise they will filter non money terms where no spam exists and their algo can only do harm!
Sorry, I'll clarify: The critical tool I was referring to was Internet searching/indexing as a whole, not Google in particular. Thus, any new development by Google (or any other group that focuses on such things) represents an overall development in that area. Even moreso when it's Google since they're such a major player in the field--when they change something, it has the potential to affect many, many people.
Anyways, that wasn't really the point of my comment.
Of course, that wasn't something the Good Old Boys would tolerate, so they used the coercive power of government to make it illegal to, for instance, let black train passengers ride in the same car as the whites. The railroads didn't want to have to segregate, because that made it more expensive for them to operate. They only cared what color someone's money was. Insensitive to the cultural peculiarities of a region - damn capitalist bastards!
You can have the Sheriff and his deputies conveniently engaged in some activity far removed from the Klan meeting until after the Guest of Honor has expired. But that's still the use of force, and I don't believe that Google has any plans to send in hooded thugs to harass folks who use Yahoo! instead. However, if they do, and the law enforcement officers turn their head the other way, then those officers are guilty of depriving people of their civil rights, you betcha.[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
I think TrustRank would be more useful in Gmail to give a reading on how "spammy" an email is. They already have something like it, where a box shows up warning you that the sender may have spoofed their address.
O.k thanks for the clarification! I agree with this comment!
Seeing a large number of replies so far, it appears that most people seem to see this as some kind of censorship.
I haven't read the article, but the name suggest they will do something similar to how pagerank works, not actually trimming the results, but re-ordering them. It doesn't hide any content, just displays the content that is more likely to be what you want, higher up.
Or am I the confused one here?
.sigs are for losers
... would be nice if you could use adblock style filtering on Google search results, then if you wanted to get rid of certain results, (i.e. from blog or "sales" sites), you could block their domains.
Probably wouldnt be that difficult to get around it but might help a bit
t
Suppose you had the perfect Oracle that could check every search result and clean it of spam.
Ranking by onpage text, links etc., the items that make a page relevant or not gives you:
A. 1st most relevant.
B. 2nd most relevant
C. Spam
D. 3rd most relevant
E. 4th most relevant.
F. Spam
After your Oracle has hand checked every site you get:
A. 1st most relevant.
B. 2nd most relevant
C. 3rd most relevant
D. 4th most relevant.
Not:
A. 10th most relevant
B. 2nd most relevant
C. 8th most relevant
D. 5th most relevant
Ranking by trust as well as relevance gives you a clean but not very relevant result set.
No problem! And to be honest, I'm not really in total disagreement with your original post--it does seem like we see an awful lot of Google stories. They're an interesting corporate phenomenon, though, and one that appeals to geeks and "regular folk" alike. That wide appeal, combined with all the genuinely nifty things they do, probably contributes to the above-average proportion of Google postings we see.
Google has reported that it is now indexing more pages than it previously indexed. This is exciting news for Slashdot readers as it means that there will be something more interesting than John Cleese whoring himself out to comment on. Google also mentioned something about Google, GoogleLabs, PageRank, Google, and...oh yeah, Google.
Think I'm being a dink? See for yourself. I'm as big a user of Google as the next person, but I'm actually missing the news for nerds stuff, not the Google press releases. Is there some pending buyout of OSDL by the behemoth that is Google that we should know about?
"Nokia is not a country, it's the capital of Finland!" -Moderated "Informative". Yeesh.
I find it funny that I didn't see OP's post.
Slashdot censorship at its finest!
"DAYTOOK'RFREEDUMASPEECH!"
Oh man, that was bad. I feel dirty.
put the what in the where?
If google kept their 'secret sauce' as secret as possible, it makes work so much more difficult for scummy spammers.
This is the opposite idea: instead of rewarding pages that are linked from trusted Web sites, pages linking to bad Web sites are punished. See: Web Spam, Propaganda and Trust.
I immediately made and wrote a new idea to the usenet for prior art records :)
Could this TrustRank be a form of Wikipedia, where people who search for stuff and get results...they click on a web page, then immediately hit the back button. Could google be looking at the time is takes for the user to make a repeat request back to their search engine? Thus judging whether the user found usable content on the page? Adjusting the Ranking accordingly? mattcriticalcode(dot)com
Yes. I am lame I dont have an account but feel I have something to say.
I've got an idea: Anytime you see an informative and/or insightful post whose contents you would like to see modded up, but which has a spam-o-licious free [product] link in the sig, just copy the informative content into a new Anonymous Coward post, which the mods can then moderate higher, while the spammy parent can be modded down into obvlion.
I'm sure, eventually, they'd learn ;)
This sounds very similar to the Trust sytem used by Vipul's Razor and Cloudmark software. I have used the Spamnet product since 2001 and run Vipul's Razor on my mailserver, it is the most accurate filter that I've found (and believe me, I've tested them all). Kudos to Google!
Does this still mean that .edu, .gov, and .org domains will be given a bit more trust over .com and other domains? (just like pagerank)
- Teja
I think you'll agree that it's better than Duped Google news from the past.
PtPete
This is definitely a good answer to my original post about google-watch. I felt there was too much rant into google-watch for it to be very reliable.
Nevertheless, google-watch is right in that to get a good pagerank, it is currently better to be linked from one highly ranked page (in a TrustRank fashion) than from 10 low ranked pages, which was favored by the original PageRank scoring method.
Please stop censoring me.
English is easier said than done.
When they said "trusted sites" I was all geared up for some sort of Bayesian analysis. After all, it worked for spam.
This may be of interest: http://collabrank.org
Yes [goatse.cx] No [disney.com]
Two things. First, "Guy Opens Ass To Show Everyone" has moved here. Second, in this era of counterproductive copyright, lowering Disney's reputation is a Good Thing.
It would link to any of the myraid quotes in the Bible. Specifically to those giving the story in the New Testament where some bring a woman caught in adultery to Jesus and say to stone her (the group throws rocks at her until she dies). He tells them "let he who is without sin cast the first stone" and when none of them does and they leave, Jesus tells her that He does not condemn her and to "go and sin no more."
I don't have the book, chapter & verse on hand, but I'm sure you can google "go and sin no more" and find it rather quickly. My memory may be off, but there may be accounts of it in more than one of the Gospels.
I wanted to do a search to see if latex would bond to stucco
You have to know how to use the right keywords. I can't think of a lot of "Rated Thirty in Roman Numerals" sites that have a good ranking for "stucco".
i could have a moodswing (or multiple personalitydisorder, or any number of other examples) and decide i want viagra one day.
VIAGRA (sildenafil citrate) is available only by prescription. If you want to find legit VIAGRA on Google, then you should be searching Google Local for physicians.
Sounds like Google is gearing up for the sematic web, much like that mentioned in this Slashdot article. http://slashdot.org/article.pl?sid=04/08/01/205820 6&tid=217&tid=133&tid=95
My right to free speech does not require that a broadcaster supply me his transmitter
On the contrary, search for "fairness doctrine". Broadcasters in the United States are FCC licensees who in turn are public trustees who hold a government-granted oligopoly on broadcasting. If no broadcaster airs a viewpoint, then it has been censored from television. The FCC has overturned a regulatory version of the fairness doctrine, but Congress can legislate it back into effect at any time and is likely to do so given that the current President doesn't veto anything.
You haven't established that de jure censorship is the result of the practices being contemplated here.
Perhaps in the case of Google this is true because Google does not hold a monopoly. However, FCC licensees do hold a government-granted oligopoly on broadcasting in the United States.
[If mortgage payment dates are "conveniently" scheduled on election days,] Sounds like a good reason to do business with a more friendly banker.
Until the equal housing laws came into effect, there often wasn't "a more friendly banker".
A day or two before this announcment of an increase in pages, Googlebot swarmed all over my web site (which from Google's perspective has about 15,000 pages)... it was requesting pages and URI formats that hadn't existed for months or years.
So what I think happened was they went back through their "every URL we have ever seen" database, and respidered every page, ignoring a lot of the quality criteria that was holding down their page counts - 404 errors, no current valid incoming links, duplicate content, etc..
When doing a search, I now see a much larger quantity of 404 errors, and pages with content that is years old and horribly out of date than before this "improvement".
The folks over at the 7search search engine have a service called "TrustGauge" (trustgauge.com) that sends a report monthly showing the domain's apparent credibility based on traffic, incoming web links and other factors - the most important of which is you pay them to be "validated" by one of their other companies.
Final 2006 "Proof of Global Warming" US Hurricane Count -> 0
or am I just delving too deep into cyberpunk?
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23