Cracking the Google Code... Under the GoogleScope
jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""
first post niggers
the gnu license is a jewish conspiracy
shitniggers
How the hell is this about my rights online?
500GB of disk, 5TB of transfer, $5.95/mo
Does this mean I'm first for once? Probably not.
-Those who know do not say, Those who say do not know
So will this make it easier or harder to find porn?
The days of the digital watch are numbered.
Now I'll see more Get ranked #1 in search engines" spam.
http://www.anologger.com/
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*
First first post to be indexed highly by Google!
To crush artificial link inflation and hear the lamintations of search engine spam
The linked article is slashgoogled. It's a googlewar. Googlers are all googling.
You can't handle the truth.
Cracking the Google Code... Under the GoogleScope
...if you thought you cracked the Google Code and had Google all figured out ... guess again.
... guess again.
Google's US Patent confirms information retrieval is based on historical data.
Publication Date: 5/8/2005 9:51:18 PM
Author Name: Lawrence Deon
An Introduction:
Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out
Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.
The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.
What exactly do these changes mean to you?
Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:
"A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."
Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.
Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.
Here's how Google scores your web pages.
In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
What's new and interesting is what Google takes into account in determining the freshness of a web page.
For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.
According to their patent filing Google records and scores the following web page changes to determine freshness.
The frequency of all web page changes
The actual amount of the change itself... whether it is a substantial change redundant or superfluous
Changes in keyword distribution or density
The actual number of new web pages that link to a web page
The change or update of anchor text (the text that is used to link to a web page)
The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.
Developing your web page augments for page freshness.
Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.
Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.
How do you unravel that statement and differentiate between the two types of content?
An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.
A page related to winter clothin
It just occurred to me that, as Google changes its algorithms, it'll just create more business for the Search Engine Optimization consultant. When web sites drop in the Google rankings, they'll want to make changes to move back up, and will hire the SEO again to do so.
Have you read my blog lately?
Its obvious Google and Yahoo are moving on to trust-based (or perceived trust) ranking for sites based on what they see users clicking on through the web accelerator, Yahoo's MyWeb, etc. Hopefully this will help grade down the obvious spam...although you only find out its spam by going to the page...we'll see.
Brought to its knees already.
But when I search on Tiger, a mail-order company's site still comes up above Apple's. Is anyone at Google listening?
you will be googleated. Or googleaten. Whichever.
Borgle.
You can't handle the truth.
...that google is still a "not evil" company? This proxy "web-accelerator" thing really still has me freaked out. Am I just paranoid or is there legitimate reason for concern?
The article is not written by a Google employee, nor did the author speak with anyone at Google. It's simply his analysis of the patent document filed by Google.
Also, at the bottom of the article after the author's name, there's a link to some search optimization service's website.
There are 2 kinds of people in this world. Those that can keep their train of thought,
Seems the egotistical owner of the whiteboxlinux.net and whiteboxlinux.com domains has decided to offer them on ebay as a peace offering between wbel and himself.
This is really great news, lets hope someone with WBEL enthusiasm steps up to build a respectable community site.
Not even this great code can prevent slashdotting.
Cracking the Google Code... Under the GoogleScope
...if you thought you cracked the Google Code and had Google all figured out ... guess again.
... guess again.
Google's US Patent confirms information retrieval is based on historical data.
Publication Date: 5/8/2005 9:51:18 PM
Author Name: Lawrence Deon
An Introduction:
Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out
Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.
The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.
What exactly do these changes mean to you?
Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:
"A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."
Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.
Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.
Here's how Google scores your web pages.
In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
What's new and interesting is what Google takes into account in determining the freshness of a web page.
For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.
According to their patent filing Google records and scores the following web page changes to determine freshness.
The frequency of all web page changes
The actual amount of the change itself... whether it is a substantial change redundant or superfluous
Changes in keyword distribution or density
The actual number of new web pages that link to a web page
The change or update of anchor text (the text that is used to link to a web page)
The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.
Developing your web page augments for page freshness.
Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.
Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.
How do you unravel that statement and differentiate between the two types of content?
An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.
A page related to winter clothin
..that Google's search dominance is a direct result of it clinging onto a patent for PageRank ?
Sorry kids, but patents and "Do no evil" are mutually incompatible concepts
regards
While the article was in the "mysterious future", I clicked on it, skimmed the article, then clicked "printer friendly version" and closed the window with the original browser friendly page. The printer friendly version never came up and the original page was no longer accessible because in those few seconds the article went live on slashdot and the server was knocked out. I guess I'll just have to search my cache or find a mirror.
One of my google accounts have just been closed [maybe temporary]. First i was thinking that gmail is just down, but then i realised that any other user can login from my computer just fine, excepting mine. [I know it`s troll but i am to mad and i have to let it out]
Think like a hacker, act like a hacker, but never become a hacker !
Who Is Pamela Jones?
By Maureen O'Gara
Friday May 6, 2005 - A few weeks ago I went looking for the elusive harridan who supposedly writes the Groklaw blog about the SCO v IBM suit.
The now-famous opinion-shaping open source leader Pamela Jones, aka PJ, doesn't give conventional face-to-face interviews. Never has, near as anyone knows. All communication is virtual. Only one person in the world has ever claimed to have met her - in the pressroom at LinuxWorld in Boston complete with a Pamela Jones badge - and described her as a fortyish reddish-blonde who giggled a lot.
Oh yeah? Wonder what cold crème she uses.
Pamela Jones is a 61-year-old Jehovah's Witness who lives in a shabby genteel garden apartment in desperate need of an interior decorator on a heavily trafficked commercial road at 304 North Central Avenue in Hartsdale, New York. Hartsdale is in Westchester and Westchester is IBM territory.
See, even though Groklaw treats cell phones like they were Kleenex and changes its unpublished numbers regularly, one number it left with a journalist led to this flat and - wouldn't you know it but - some calls from there had been placed to the courts in Utah and to the Canopy Group so obviously this just isn't any Pamela Jones.
Pamela has lived in apartment 1A for 10 years at least, according to the super, who says he's watched people move in, have children, and the children marry and move away.
Now, this isn't your usual anonymous New York apartment. It's practically a self-contained village where the super goes for the old ladies' groceries when there's snow on the ground and people know each other's business.
But the super didn't know much about Pamela except that she had a computer, worked at home (maybe sometimes) for a lawyer, was "paranoid" - his word - and "sensitive to smells."
He remembered how he was cleaning paintbrushes one day and she came running down the stairs screaming "Fire."
She was also missing and had been for weeks.
Nobody there knew where she was.
She had up and disappeared one day, and the super was worried about her. He said her son had dropped by and he didn't know where she was, and that some strange man that "nobody knew," as the super described him, had tried to get into her apartment while she was gone - the Medeco lock she had had installed on her door - something nobody else in the complex seemed to feel a need for - was more expensive than the door. But, as it happened, the super said, she had just sent in her rent in an envelope postmarked Connecticut.
Like an episode out of "Where in the World is Carmen San Diego," the trail led to 10 Bittersweet Trail in Norwalk, Connecticut, 24 miles away. Sure enough, parked in the driveway was Pamela's car, just as the super had described it, a dark gray '90s Japanese number with a bunch of Jehovah Witness pamphlets tossed on the backseat.
The woman at the house, Barbara Sharnik, told a disjointed story. She didn't know Pamela, Pamela hated her, Pamela wasn't there, Pamela left her car there because it got bumped, Pamela left her car there because she left town, and so on.
Afterwards Barbara called the cops, and then the cops called the number we left with her and the cops said that she was Pamela's mother and that Pamela was on the run and had shacked up with her mother because she had gotten "threatening mail" weeks before and that she had just gotten spooked again because "people were getting hurt around [my] stories" and had lighted out for Canada.
Odd, the subject of my stories - or any stories - never came up during our brief interview. I was just looking for Pamela.
That left Pamela's son, Nicolas Richards, who, as it happens, had been in the software business in Manhattan until - why, my goodness - things seem to have come a cropper right around the time Groklaw came into existence.
Nick and his ma were apparently involved together in Medabiliti Inc, an ISV, because one Pamela Jones with a Westche
I use google quite a bit to check on recent spyware/malware (used it this morning) and with all due respect, the first few links typically are for spyware products that don't work, domain parking sites (search engines themselves), requiring some amount of diligence to get to the "real" sites that have information.
If this claim is true, I guess we'll have to wait the typical "four to six weeks for delivery."
The "war" metaphor really is cute. Geeky competition in search relevance is really a lot like bombing cities, shooting ranks of soldiers, and destroying bridges and railways. Burnt, bloody bodies everywhere! And clean datacenters with mathematical algorithms.
--
make install -not war
One of the most interesting (and obvious) effects of Google's changes: The company which once ranked first for the phrase "search engine optimization", SEOinc, is now nowhere to be found -- even a search for the company's name doesn't bring up the company's website. SEOincs response has been a -- somewhat ineffective -- try to bring those reporting on its fall to "cease and desist".
Google United - Google Patent Examined
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control.
Publication Date: 4/7/2005 7:41:24 AM
By Jim Hedger, StepForth News Editor, StepForth Placement Inc.
Thoughts on Google's patent... "Information retrieval based on historical data."
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control. Some of the ideas and concepts covered in the document are almost certainly worked into the current algorithm running Google. Some are being worked in as this article is being written. Some may never see the blue-light of electrons but are pretty good ideas so it might have been considered wise to patent them. Google's not saying which is which. While not exactly War and Peace, it's a pretty complex document that gives readers a glimpse inside the minds of Google engineers. What it doesn't give is a 100% clear overview of how Google operates now and how the various ideas covered in the patent application will be integrated into Google's algorithms. One interesting section seems to confirm what SEOs have been saying for almost a year, Google does have a "sandbox" where it stores new links or sites for about a month before evaluation.
Google is in the midst of sweeping changes to the way it operates as a search engine. As a matter of fact, it isn't really a search engine in the fine sense of the word anymore. It isn't really a portal either. It is more of an institution, the ultimate private-public partnership. Calling itself a media-company, Google is now a multi-faceted information and multi-media delivery system that is accessed primarily through its well-known interface found at www.google.com.
Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll's over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn't work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn't work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.
In its recent SEC filing, the first it has produced since going public in August 2004, Google said it was going to spend a lot of money to continue outpacing its rivals. This year they figure they will spend about $500 million to develop or enhance newer technologies. In 2004 and 2003, Google spent $319 million and $177 million respectively. The increase in innovation-spending corresponds with a doubling of Google's staff headcount which has jumped from 1628 employees in 2003 to 3021 by the end of 2004.
Over the past five years Google has produced a number of features that have proven popular enough to be included among its public-search offerings. On their front page, these features include Image Search, Google Groups, Google News, Froogle, Google Local, and Google Desktop. There are dozens of other features which can be accessed by cli
sigs, as if you care.
What do those guys actually *do* in any case? I mean, legitimately. I guess you can tweak things a bit, but... how much does that actually get you if you simply aren't a popular site?
Server Error in '/main' Application.
W orkerRequest wr) +146
.NET Framework Version:1.1.4322.2032; ASP.NET Version:1.1.4322.2032
Server Too Busy
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Web.HttpException: Server Too Busy
Source Error:
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Stack Trace:
[HttpException (0x80004005): Server Too Busy]
System.Web.HttpRuntime.RejectRequestInternal(Http
Version Information: Microsoft
use ask.com .This is the best.
Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
From the article: GOOGLE has plans that will dramatically improve the results of internet news searches, by ranking them according to quality rather than simply by their date and relevance to search terms. The ambitious system is revealed by patents filed in the US and around the world (WO 2005/029368) by researchers based at the company's headquarters in Mountain View, California.
Give your website mentos, the freshmaker
There is no sig
Almost any algorithm can be spoofed fairly easily: inserting very small text that's the same color as the background. Then whenever they want Google to think they've updated, they change the text. The viewer doesn't tell the difference, but the source code changes. Or they could just use comments in Javascript, or just create Javascript that never gets used.
Also, a page with frames might get penalized since its content doesn't change, although the content of the frames may change frequently.
Original server is /.ed. Coral cache link here.
Fifteen minutes on /., and the server (IIS .Net) is toast.
Hmmmm. Wasn't there a "crash IIS" contest or something like that going on in the past couple or three week?
Racing is an addiction that makes heroin look like a vague hankering for something crunchy.
Don't you love those .net error messages? They are so much more pretty than the apache error messages. They really make it clear, "This is a .net error, we have the prettiest errors, and something is really wrong. Matter of fact, the server is just TOO BUSY."
Then after saying their too busy, they get into the BSOD type text for us REALLY techie types, talking about unhandled exceptions and stack traces.
What I don't quite understand is why they take such care to make sure these beautiful error messages are nicely coded with CSS, but there is ONE font tag tainting the whole thing. It really is a shame...
I really feel better knowing that IIS is giving everyone such appealing error messages and in so much detail. Thanks IIS
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
When it comes to linking, taco is a scumbag , you should clearly avoid the hocus pocus or magic bullet linking schemes
http://www.computerweekly.com/articles/article.asp ?liArticleID=138415&liArticleTypeID=1&liCategoryID =2&liChannelID=22&liFlavourID=1&sSearch=&nPage=1/
p ?liArticleID=138415&liArticleTypeID=1&liCategoryID =2&liChannelID=22&liFlavourID=1&sSearch=&nPage=1/
p ?liArticleID=138415&liArticleTypeID=1&liCategoryID =2&liChannelID=22&liFlavourID=1&sSearch=&nPage=1/
p ?liArticleID=138415&liArticleTypeID=1&liCategoryID =2&liChannelID=22&liFlavourID=1&sSearch=&nPage=1/
Do you see?
http://www.computerweekly.com/articles/article.as
Do you see?
http://www.computerweekly.com/articles/article.as
Do you see?
You are witnessing a transformation...
http://www.computerweekly.com/articles/article.as
Do you see?
Don't use .net
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
Since the story submission didn't end the post with a question, I feel compelled to add one:
How will this affect the ranking of insightful FAQs, which by nature my not change frequently?
Another shout-out poll to my homeboy Slashdotters: Do you pronounce FAQs as "F-A-Q's" or "Faks"?
When it comes to linking, taco is a scumbag, you should clearly avoid the hocus pocus or magic bullet linking schemes... nice troll, dickhead
Slashdot: Googlenews for googlenerds. Googlestuff that googlematters.
Note that Google is now looking at domain ownership information. This may result in a much lower level of bogus information in domain registrations. It's probably a good idea to make sure that your domain registration information, business license, D&B rating, on-site contact info, and SSL certificates all match.
"Domain cloaking" will probably mean that you don't appear anywhere the top in Google. So that's on the way out.
Do a search within the grandparent for 'taco is a scumbag'; you will see that nothing has been inserted. It's just the parent (and co-parent) trying to cause trouble.
Truthfully, a search for "Tiger" should point to Apple's OS because that's what most people are looking for. I think you're thinking of wikipedia.
It will be great if it works. I like how google tries new things to get the most relevant results. I hope this one doesnt backfire. How often do webmasters update how to fix a broken disk or how to patch an older version of Solaris. If you only get the newest results, you will likely get blogs and rss feed results for your broken disk.... as they are updated often.
Google has millions upon millions of click history on their search results that say what it is people really are looking for, as well as which ones appeared good fodder for first clicking.
No one else has such a large database of what humans have actually picked.
Such a click history and search term history asset is worth even more if it gets correlated with Evil Direct Marketing information from the cookie traders.
Although, it seems possible that large ISPs could also grab and analyze their members Google interactions to figure out people's tastes, assuming such interactions remain unencrypted.
I have to wonder how many companies with static IP addresses have, unbeknownst to them, built up extensive history logs at Google showing their search term preferences and click selections. If I were a technology startup with a hot idea to research I'd be a little more paranoid about something like that.
"Provided by the management for your protection."
infinitesimally I don't want to MiLestones, telling will recall that it
Please direct ALL google/applevertisments to mailto:cmdrtaco@slashdot.org along with obligatory paypal payment.
Thanks,
Rob Malda
I've been running a fairly popular website now for over two years, the main search term for it yields us at about position 6, a URL that hasn't even been online for that entire time is ranked 5. Maby now we can finally get moved up over the non-existent website.
What does that mean? At the highest level, it means that most of the Google algorithm is constructed by a machine. You give the machine human-constructed examples of how to rank a sample set of pages (notice those want ads where Google is hiring people who can inspect and assess the quality of web pages?) and it then uses essentially brute-force techniques to test every possible combination of your ranking variables to find the simplest formula that ranks pages the same way the human did.
There is no human at Google "twisting dials" to alter individual parameters of a formula. The machine constructs the algorithm, and it can therefore easily be so complex that no human can understand it. Tweaking the algorithm becomes a process of changing or adding to your "training set" of human-ranked pages, and letting the data mining process come up with a revised algorithm.
For example, Google could invent a new variable called "category", and identify each page as belonging to category Astronomy, Botulism, Country, [...] and Other. Once that variable is thrown into the mix, then the Google "aglorithm" is essentially free to vary wildly from one type of subject matter to the next. For example, you might see someone with a Real Estate site swearing up and down that inbound links are no longer as important, while someone with an Astronomy site might swear that, no, inbound links are more important than ever. You can see exactly this kind of bickering in most of the forums that people who hope to do Search Engine Optimization frequent.
The other big mistake people make in trying to see how to game the Google algorithm is "delay". In studying how people manage (or fail to manage) complex systems, psychologists learned that people generally would fail if a delay was introduced between their actions and the results of their actions.
In one very simple test, people were charged with trying to stabilize the temperature in a virtual refridgerator. They had one dial, and there was exactly one piece of feedback: the current temperature in the fridge. However, they were not explicitly told that there was a delay between moving the dial and when the results of that action would stabilize.
The responses of those test subjects was eerily similar to what we see in Google-gaming webmasters these days. Some people swore up and down that some human behind the scenes was directly tweaking the results to thwart whatever they did. Others became frustrated and decided that nothing they did really mattered, so they would just swing the dial back and forth between its minimum and maximum settings.
What does this have to do with Google? These days, Google can change their algorithm relatively frequently, and the algorithm can vary by the relative date of various things. The net sum is, there's a delay between when your page is first ranked and when it is likely to arrive at a relatively stable ranking. This can drive webmasters nuts as they think they've done something clever to rank their page high, but then it drops a week later. Although it doesn't occur to them, the important question is: did the change cause the high ranking or did it cause the sudden decline?
The few people who did master the simple refridgerator system? Well, they sounded more like some of the people who are more successful at gaming Google. Those folks tend to say things like: "just make one change and then leave it alone for a while to see what happens."
Can you still game the Google algorithm? Undoubtedly in specific cases. But it's getting harder. The Google algorithm was always complex, but what's changing is that the days when a few variables (such as inbound link count) generally swamped the effects of all the others is drawing to a close. We are approaching the day when the best technique to rank highly with Google will be: sit down at your keyboard and make more good content every day.
There seems to be a lot of weight put on web page freshness. I host a friend's site containing the collection of poems by Ella Wheeler Wilcox. She lived in the 1800s so one cannot expect to see any new material from her.
The site is mostly static but is rich with cultural value. It's currently the number one hit on Google. I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.
The race isn't always to the swift... but that's the way to bet!
seaching the error code seems to indicate a database connectivity error.
Disclaimer: I'm a J2EE dev so my opinion may not count.
It appears that they are either not properly implementing connection pooling and running out of connections. Or the database is being overloaded by failing to implement caching to non changing data.
Maybe they just aren't used to developing for a high traffic web site.
----- If communism is a system where the government owns business, what do you call a system where business owns govern
People expect seo to get more complex as time goes on. This isn't news and SEO is not going to dissapear. What will happen is people with little motivation or resources will be further discouraged to do SEO as competition increases. That's it. Trustrank will take over Pagerank. Link history will become more important than simply having links. Easily created seo tools such as linkfarms and blog spammers will decrease in value. Everyone expects these things to happen. SEO will always existly largely because there will always be a need to rank higher in search engines.
This is the last article I'm ever reading about Google on here. Does Google pay Slashdot to cover them every day? I'm personally sick of seeing Google articles. It's a search engine, it's not a cure for cancer. I don't see any noticeable improvements in finding things on the web between Google and Alta Vista yet every day a new article hailing Google as the search engine to end all search engines appears on Slashdot. The short-lived NlightN search engine back in the mid 90's indexed a lot more than Google does and it included indexing of phrases and individual words, including stop words like 'the'. So lets all herald Google for not being as good as a search engine that was out in 1995.
Isn't this "page update frequency" hullaballoo a bit premature? If Google wants relevant results I can only see update frequency being but a minor factor in any page rank determination algorithms. For example: Informations sites (historical information, dictionaries, encyclopedias, collections, etc...) are often at once the most relevant (if info is what you're looking for) and the least updated sites. I can't really imagine the Oxford Faculty meeting every week to decide new words for their dictionary to retain their www.oed.com pagerank. Just imagine what it would do to the English language : )
Seriously, this little article is going to get Webmasters thinking a little more but I don't see anything to panic about. Not yet, anyways.
No, no sig. Really.
ThePromenader
one solution to get really reliable results is to rank any non-registered commercial pages as low as possible and to have a strong policy for commercial subscribers (and affordable registration fee). when searching, i get drowned in 'best price' advertising, price comparison sites and all this kind of irrelevant stuff. i'm usually looking for technical specs, good reviews,... if poeple cheat, spam,... it's to sell something. from my experience, most irrelevant results point to sites trying to sell something. so lowering the number of results pointing to sites trying to sell something should automatically improve the relevance of the results. btw, if companies don't want to register and respect google policy concerning web page contents, there are chances that their page should get a low ranking.
A patent itself is like any other tool: a gun, a can of spray paint, an email service. How you use a tool is where evil can arise. If I randomly shoot children - evil. If I paint my tag on public buildings - evil. If I spam - evil.
If I use a patent to economically enrich myself but as a result impede the use of information - possibly evil.
If I patent something then create a free license for it so that no one can restrict its use through commercial monopoly - good.
I'm not saying that Google is using their PageRank patent for good, just that simply owning patents is an evil-neutral stance. It all comes down to use.
How about the following button on the Goole Toolbar:
"This page is Spam."
Google could take into account how many people report a page as spam, who they are (trusted users or aggressive competitors?) and aggregate results. While not determinative, lots of "This is Spam" reports on a page could trigger some sort of heightened scrutiny (maybe put it back in sandbox).
weighting the number of referrals by common content and proximity of search terms in referrer and referenced page makes it much more difficult to inflate rank.
i.e.
-references are weighted very lightly if there is no other content in common and heavily if there is a lot of similar maaterial.
-references are also weighted based on the 'distance' between search terms in the refering and referenced page.
Page = broke, its been slashdotted to the brink of system failure ;D
It seems nobody has asked the question: what if a spammer wants to lower the rank of more reputable companies? If a spammer link spams a site that is already fairly popular, couldn't it harm the page rank of a company that has nothing to do with the spam?
MacroHard - Boning you in a big way! (TM)
This could then lead into making everything google displayed as linking to a blog.
Shameless linkage http://laborassistance.com/
How to stop spamming? Get rid of weblogs.
Or at least they start out spelling the same way.
Its not nice to fool Mother Nature.
Incase you hadn't noticed google links are direct
m
oh no not quite. Very recently they started changing all the search results links to (something like, I can't quite remember):
http://google.com?url=http://websiteyouclicked.co
This seemed to be an experiment which lasted a few days as I recall and seems to be abandonded for now at least. Up until a few days ago on google.co.uk I could still find those url= links by going way down the search results..I'm trying to find an example for you now but I can't find one this second.
I was surprised there wasn't a big outcry about this at the time
Also a bit of fancy javascript could probably also grab what you clicked and send it to google.
No modpoints here. Mod parent up!
Is this a sign of "evil google" that some are waiting for? I mean all this optimization is directed at comerce, it does prevent spam (low paying/no paying customers) in favor of legit webmasters. At first this seems like a win-win, and it is if what you are looking for is to purchase something, but how will this optimization help me find things like how to tie a box knot, who first called statistics worse then "damn lies," or how to intigrate ln(cos(x))? Most of my searches are in search of some fact or another how will this help me?
Maybe the SEOs do realize it, but can't resist the offer of easy money from the thousands of MLM and "me too" sites trying to sell useless crap.
"The history of SQL and relational databases traces back to E.F. Codd, an IBM researcher who first published an article on the relational database idea in June 1970. Codd's article started a flurry of research, including a major project at IBM. Part of this project was a database query language named SEQUEL, an acronym for Structured English Query Language. The name was later changed to SQL for legal reasons, but many people still pronounce it SEQUEL to this day."
http://www.provue.com/proVUE/Fact_SQLServer.htmljust a bit of history.
"if i'd known it was harmless, i'd have killed it myself"
And in return, /. has cracked the site's code ;)
--
www.nitemarecafe.com
One of my gripes is that Google can not find me. I have put my personal identification information on a page, e.g. complete name, schools attended, cities lived in, etc. If I enter these into Google, it still will not find that page. I would think that my page with my specific info would be unique, and someone searching for this page with this specific info should be able to find it.
For those who didn't read to the bottom of the article: "Left on 5/9/2005 2:47:21 PM by Anonymous Comments: I looked over the Google patent application. One of the salient facts about the application is that it was filed 12/31/03, with origins going back to 9/30/03. Given this topic relates to what I understand is a dynamic cataloging technology, the fact that the patent application is over 18 months old (not 2 mos. old as the article indicates) limits the relevance of this analysis."
They also measure the increase in incoming links, because a steady increase indicates that the site continues to be of interest. Also, if the site has links from university Lit departments and other "high influence" sites, those links count heavily.
The only way for me to knock your friend's site out of the top 10 would be to put up a site with equally interesting content about poems by Ella Wheeler Wilcox, get equally high-quality links to it, and ... that's too much work. I'd link to it rather than try to duplicate it.
Special interest sites, such as your friend's site, and sites selling a product they make themselves are easy to get high ranks for - in their niche. The sites that are scrambling for position are the ones re-selling Tahitian Noni juice and other common commodities.
It's the one about the database, right?
... as long as however they change Google, searches for steaming load still return William Shatner as the number one hit.
Mod down people who tell people how to mod in their sigs
It just goes to show you how all the hollaring about privacy will just result in pointless noise-making anyway. Companies know that they can invade people's privacy and ultimately it just doesn't really matter, and nobody's going to stop them. They're going to get away with it, people are going to spy on you, and guess what? You're gonna accept it. As long as they wrap it in pretty paper, we as a society have a history of taking the bait damn near EVERY SINGLE TIME.
The proof is in the pudding, folks. There was lots of noise over Social Security Numbers back in the day. How many of you have SSN's?
Lots of noise over drug testing and whether people should be allowed to do what they want to their own bodies. How many of us still accept a drug test when we apply for a new job?
There was lots of noise of Windows XP activiation. How many of you use Windows XP?
Lots of noise over Gmail's mail scanning. How many of you use Gmail?
Lots of noise over this Real ID stuff. How many of us are going to have Real ID's?
Lots of noise over all this DRM stuff. How many of us still buy music from iTMS?
Lots of noise over this RFID stuff. How many of us are going to have RFID's implanted into our hands or forheads...... you get the idea.
The government knows it. Corporations know it. They just have to give it time. You'll accept it. Eventually you'll stop complaining, and they will have their way. You'll be forced into compliance without ever "really" accepting it. They just have to do it methodically. All it takes is a little bait, and a bit of time. It won't be long until we don't really have any concept of privacy anymore anyway, and people will wonder what all the hoop-holla was about to begin with. There are countless examples. These are just a couple.
Sleep well!
A community-oriented lyrics site
> She lived in the 1800s so one cannot expect to see any new material from her....
> I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.
So you're afraid that your friend's page is going to be bumped by a page that more frequently updates these poems from the 1800s?
Instead of the links they were using before of;
http://google.com?url=http://websiteyouclicked.c om
Google is now using what seems to be a geographically-based IP to generate these 'links'.
In doing so, they not only eliminate a large amount of DNS traffic, they are also partitioning off the computation based on the area of the country you are searching from. FYI - In my location the IP is 64.233.167.104
A co-worker of mine wrote an article on the google patents that might help people understand the them without sifting through the legal mumbo-jumbo.
c al-data-patent.html
His article can be found at:
http://www.socengine.com/seo/guide/google-histori
Using lighttpd instead of Apache, a shared hosting account on TextDrive took a "full-frontal slashdotting" -- and that was generating pages in PHP from a database backend. Without affecting the other sites on the same server!
The results are below, First, is a website about tigers being endangered. Second, is TigerDirect the company that owns the trademark "Tiger" in relation to computers. Third is the Mac OS "Tiger". Fourth is a rescue facility for large cats. Fifth is some UK goverment program titled Tiger.
...
...
... Mac OS X Tiger delivers 200+ new features which make it easier than ever to find, ... Mac OS X Tiger will change the way you use a computer. ...
So I call you an idiot.
google results for tiger below
5 TIGERS : The Tiger Information Center
Dedicated to providing information to help preserve the remaining five subspecies of tigers. News, research, a kid's section, and a live tiger cub cam feed
www.5tigers.org/ - 16k - May 9, 2005 - Cached - Similar pages
TigerDirect.com - Computer Parts, PC Components, Desktop Computers
Large stock of components. Also offers great bundles with CPUs already included.
www.tigerdirect.com/
Apple - Mac OS X
www.apple.com/macosx/
Tiger Haven
A sanctuary for lions, tigers, jaguars, cougars and leopards in need. Includes background, virtual tour, photos, and how to help. Kingston, Tennessee.
www.tigerhaven.org/
Tailored Interactive Guidance on Employment Rights - TIGER Home Page
The TIGER (Tailored Interactive Guidance on Employment Rights) web site is designed to provide a user-friendly guide through UK employment law.
www.tiger.gov.uk/
From the article:
What's interesting though is Google is interested in tracking the behavior of web surfers through bookmarks, cache, favorites, and temporary files (most likely with the Google toolbar and/or the Google desktop search tool).
Something I would love to see is sites that are composed of valid XHTML placed higher in the ranks then totaly invalid/outdated code (sorry Slashdot).
Their index is filled with MILLIONS of pages from one particular e-commerce site alone!
If all sites were limited to ONE AND ONLY ONE webpage from a bona-fide unique web domain, Google would probably need only a fraction of the computer systems to store and process 4 billion webpages.
This would also get rid of all the e-commerce affiliates who have set up shop in some directory on some public hompage webserver and not paid for their own domain.
This would also improve the performance and search results given out by Google by not having to index and catalogue more than one page of an e-commerce site.
If all sites were limited to ONE page only of any domain it would not be a search engine. It would be a joke. I know local sites which call themselves search engines when they are a directory instead.
Google is not that kind of site. It is a search service and to be useful it must search a large number of pages from every site. But there is a limit to the number, I believe.
SCIREV.NET - fanfics,reviews & more
Right, and TFA specifically mentions that sites that need freshness(however that's determined) are the ones which are given bonuses for being updated, whereas sites with essentially static info don't get either a boost or a penalty.
This article talks quite a bit about "freshness" and suggests updating your pages frequently so that the "modified-by" header changes.
How does this apply to dynamic content, specifically dynamic content hidden behind apache mod-rewrite to look and act static. I would assume that any time googlebot hits such a url it will see a file listed as modified. This is especially true if the content varies dynamically with things like, for instance, 'latest comments' or 'newest' boxes and the like.
In this way, every page on my new site is always "fresh" to some degree as small pieces are constantly changing and random. Anyone want to venture a guess as to how Google treats this situation?
Personally I think this whole "freshness" idea is misguided. It just doesn't make much sense.
Have you ever asked yourself, Is It Normal?.
The large info/e-commerce sites already have a built-in search engine for their site ALREADY on that site's homepage. Why 'force' Google to do that job for them (via their 'Add URL' page) and clutter up their (already spamdexed) index?
Google is SO spamdexed right now that if you want to find GENUINE product reviews you have to use search terms somewhat like this:
foobarproduct -shipping -checkout -shopping -cart -[name of that big e-commerce site that has 'spamdexed' Google]
Would be great if there was a -https Google option that worked properly. This option would probably be all that is needed to blow all the e-commerce sites away that are on or link to a HTTPS URL. Of course, this would also deep-six legitimate, non e-commerce sites that choose to host their site on a HTTPS URL -- I've come accross a few such sites.
It's getting harder and harder to avoid sales pitches when you are trying to use Google to find useful, non-commercial information. My ideas would go a long way to solve the current problem with spamdexing within Google.
Too bad you can't get away from the 'Sponsored Links' on the right hand side of Google's return results -- I just ignore them virtually all the time.
say what? "never has been, based, even in part, on inbound link count."
..um, unless i am insanely high on crack and misreading what you are saying, you are COMPLETELY mistaken. google PR is definitely weighted towards inbound and reciprocal linking, and (to put it into very simple terms) inbound PR is divided amongst various pages linked to from the originating site. If PAGE A links to SITE B and has a PR of 4, and SITE B is the only page linked, then all the weight of that PR 4 gets passed to SITE B (with some sort of modifier, obviously). I am by no means an SEO expert, but i am 99% sure of this fact.
/that/ long ago). Here is a link if you want to do some checking to find someone to support the veracity of your claims.
If you have evidence to support your claim (that inbound links do not affect PR) please post it.. because this concept (inbound/reciprocal links = GOOGLE LOVE) is one of the fundamental tenets of 'successful' SEO nowadays (or at least it was the last time I worked a fulltime job that required me to understand SEO to any basic degree.. which wasnt