Nice description
by
torqer
·
· Score: 3, Informative
In case you were like me and really had no idea what the submitter was talking about in his description...
The link is to an article that gives some insight into how google searches through the hordes and hordes of webpages. And bashes other search engines.
Note to submitter: while brevity may be the soul of wit try to remember we haven't read the article yet and need just a little more information.
There's a problem with this
by
Tim+Ward
·
· Score: 5, Insightful
Now that Google will find anything you want so easily, isn't there a danger that people will stop putting links to useful and interesting sites on their pages?
I don't need to tell people, via a link, about some wonderful site I've found if they can find it for themselves quicker and easier using Google. So I might not bother to maintain my collections of useful links, and Google will lose its information source. A victim of its own success.
What happens then?
Re:There's a problem with this
by
GigsVT
·
· Score: 4, Interesting
I've thought of this myself. I know I don't do nearly as much "surfing" between related sites now that Google is here and works. I usually hit Google up, then if that site isn't what I want, I don't bother clicking their links section, I just go straight back to Google.
The one thing that may save us though is AOLers. Bear with me here.:) I think that maybe we have found the most efficient way to get the information we want, mostly because the novelty of the Internet has mostly worn off for us. We no longer spend hours bouncing from site to site, just reading random stuff. We use the Internet as a tool to expand our effective knowledge and intelligence.
This is obvious with the various Googlebots that have sprung up in lots of IRC chat rooms. This happens a lot in help rooms, if no one knows the answer, or doesn't want to take the time to explain it fully, they just !google and the bot returns the first link in the search.
So while people like us, if we were the only people on the net, would cause Google to fail, so long as there are still "surfers" out there, it should allow Google to remain meaningful.
Just my two cents.
-- I've had enough abrasive sigs. Kittens are cute and fuzzy.
How to abuse Google
by
AftanGustur
·
· Score: 5, Informative
Re:How to abuse Google
by
PeterClark
·
· Score: 3, Informative
Well, this has been known for a long time. But really, it's not as big a deal as one might think. "Scientology" as a search term pulls up an entire page of Scinetologist sites, except for #4, which is xenu.net. However, the first page for "Scientology secrets" is full of sites that debunk Scientology. So yes, the Church of Scientology has a virtual monopoly on the search "Scientology" but is far, far from controlling other search items. It all works out in the end.
:Peter
More Google Links
by
Schwarzchild
·
· Score: 5, Informative
I would go on worrying if i were you
by
limbop
·
· Score: 4, Insightful
Google works on the recursive principle that an important document is one linked to by a lot of important documents. search for "child pornography" and (i'm generalizing here) you're likely to find two kinds of sites: sites offering child pornography and sites opposing it. those will probably create two seperate cliques (if you look at the web as a graph) or clusters. It will be quite easy to offer them as two seperate lists both satisfying the search query. i believe northern light (http://www.northernlight.com/) does exactly this.
Now how about a similar principle for people? A suspicious person is one who communicates with suspicious people. If you have access to Email messages sent on the internet this is quite easy to achieve. Filter the messages to those mentioning "child pornography" and now do the same analysis as google does. voila! you are left with lists of child pornographers and of internet vigilantes. easy. automatic. you can start worrying again.
btw, if you are looking for an interesting technical description of the best search engine around, the original google article (http://citeseer.nj.nec.com/brin98anatomy.html) by Brin and Page does the job a lot better than Doctrow's.
A puff piece with poor logic
by
XDG
·
· Score: 4, Insightful
The article boils down more or less to the following:
1. "Old" search technologies (Altavista, Yahoo) failed because they used approaches that found words but not content (Altavista) or relied on non-scalable human editorial judgement (Yahoo).
2. Google works (and is cool) because it uses available information about the number of links to determine (a) valuable content and (b) smart judges of other valuable content
3. The government efforts at creating the Panopticon will fail because they'll be stuck using "old" keyword approaches that can't pick out real content.
This argument is flawed in two key ways:
1. The author confuses the nature of the "search". Web searching is about finding *content* and the challenge is differentiating "good" content from "bad" content. Governmental "security" searching is more akin to traffic analysis and the goal is identifying dangerous *individuals* based on the content and pattern of their traffic. The challenge there is differentiating "good" (safe) speakers from "bad" (dangerous) speakers.
2. The author assumes (based apparently simply on opinion and what is popularly reported in the press) that the government will blindly apply "alta-vista style" techniques. His lack of fear of the Panopticon is based on an assumption of incompetence in the application of surveillance methods. Given the motivation and resources (both of which the government now has in spades), there is no reason to believe that more sophisticated and effective techniques will not be developed and pursued. Assuming Echelon has really been in operation, it's hard to imagine that, in the closed halls of the NSA, researchers aren't well aware of the limitations of keyword search and are far along applying cryptanalytical techniques to the real problem identified above.
It would seem that the author is trying to take advantage of hype and concern about government surveillance not to make a serious comment about it or whether one should truly be concerned, but rather to get an audience for his opinion that Google is really cool, which most of already knew anyway.
-XDG
Re:A puff piece with poor logic
by
sam_handelman
·
· Score: 3, Insightful
the challenge is differentiating "good" content from "bad" content.... The challenge there is differentiating "good" (safe) speakers from "bad" (dangerous) speakers.
I agree with all else you say - including that the government has the resources to come up with new approaches to the problem - but I don't think that this challenge is really different from distinguishing between good and bad content. In so far as the government is trying to do what it shouldn't even remotely be doing, using this technology to identify subvsersives, you are right. However, in so far as carnivore might *actually* be used to intercept a criminal communique, I think that the challenge is very similar to what is faced by google.
Suppose that Inoccuous260@hotmail.com only ever sends one message, from some terminal in a public library, and it is the delivery schedule for a nuclear weapon. The best, most morally (if not legally) defensible use of Carnivore would be to intercept this message and hand it over to the Feds. If the Feds can do this, even once, Carnivore will be with us forever, however else it may be abused, b/c you will never rally the public will to end use of such a tool. The problem of identifying that message, and I don't want to brainstorm ideas here, but I'm sure we could come up with several, is very similar to the problem of picking out a biographical sketch of Allen Turing among all the sci-fi and hoopla, which Google can do using characterisation by links, and which the government would be hard-pressed to do without that human resource.
So, the author raises a fair point about the limitations on the "legitimate", let us say intended, use of carnivore. However, the unintended/illegitimate use, simple identification of dissidents, could indeed be carried out by a clever 10 year old, and is plenty worrisome even if Carnivore never does what it was supposedly intended to do.
-- The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
Wrong about email
by
Karellen
·
· Score: 5, Informative
He's wrong about one thing. Email does have links. It has links indicating who it came from and who it went to. Even without the content, that sort of information, about who is talking to whom, and in what patterns, can be really informative to those who know what they're looking for.
If you include the content, it's a goldmine.
URLs embedded in email would make it better again
Aside from that though, great article.
-- Why doesn't the gene pool have a life guard?
Wrong panopticon
by
dallen
·
· Score: 5, Insightful
Doctorow's point, I believe, is that we have a luxury of choices for searching information, but those who want to wiretap us do not have the luxury of infinite time and infinitely improved ways to find the information they want.
If they could only track us via the public internet, I would probably agree.
I would say we don't know what sort of technology they ultimately have for searching our data; until we knew that, we should not assume anything such as he has, that they're not able to keep up with the flood of data.
Remember that they're not only recording elements of email, phone, and other communications; but they are also tracking who is sending and receiving it; and those who are under "wiretap" are nearly perfectly trackable as long as they can associate an identity to an IP to a person. That is the Panopticon, the prison with ideal survailance; mapping a person to their communication and selectively watching those who bear suspicion.
incredibly short-term viewpoint
by
AdamBa
·
· Score: 4, Funny
1) Google sucks. All search engines suck right now. Altavista may suck 99% and Google may only suck 97%, but they are all terrible, and will remain so until they can actually start to understand what a page is about. The author may bag on AI, and it it bad now, but it's the only hope for workable search engines in the future.
2) What is this absolute crapola about how bytes are more reliable than allegedly "fragile" books? Does this tubesteak realize that there are 500 year old books that are completely legible, while 15-year-old electronic data is unreadable? Yeesh. The only bright spot is that this guy's ravings are in electronic form, so future generations won't have to worry about them.
This article is insightful? It is deceiving.
I read something interesting about the "Panopticon" not long ago...
"The agency which Poindexter will run is called the Information Awareness Office. You want to know what that is? Think, Big Brother is Watching You. IAO will supply federal officials with 'instant' analysis on what is being written on email and said on phones all over the US. Domestic espionage."
--John Sutherland of UK's Guardian.
Remember John Poindexter? Mr. Iran-Contra? He lied to Congress and kept Ronald out of the loop. He also was responsible for shredding lots of docs on the subject as well. Now he'll be spying on US domestic electronic transmissions.
There is some irony in him destroying thousands emails to cover his ass then and now being in charge of watching everyone else's emails.
I'm also sure that the billions of dollars for his new office may be able to overcome shortcomings of certain search engines. Nobody's going to have to type all those boolean operators.
Cheers to all the spooks! I think it is a job well done!
-b.
alleged fragility of books
by
AdamBa
·
· Score: 3, Insightful
Maybe 500 was an exaggeration (given that the printing press was about that old)...but there are certainly 300 year-old books that are fine (not having been vacuum-sealed) and 100 year-old books are not even that unusual.
The article (or that part of it) reminds me of the people who claimed that newspapers were going to fall apart and they all needed to be microfilmed and stored that way...now the newspapers that were dumped are in such great shape that The Sharper Image is selling them for $30 a pop, and the microfilms are deteroriating, that is the ones that were made legible to begin with.
Copying bytes may be easy but every time I switch computers I have to worry about moving stuff and where is it stored, then there is 20-year-old stuff on 5 1/4" floppies...meanwhile my books from childhood are all doing great. Even the cheap-o dot-matrix printouts from my BBS days in 1983 are perfectly preserved, which is more than I can say for any data I had from back then.
- adam
Google sometimes defies explanation.....
by
fwc
·
· Score: 3, Interesting
I was talking to a friend about "mystery email attachments", and wanted to find this user friendly strip.
So, without thinking I fire up google and type the search:
"user friendly the comic strip" email attachment
and then clicked on search. The first hit is the cartoon I wanted, so I click on it. When I pull up the page, I realize that the text words "email attachment" don't appear anywhere on the screen other than the graphic text in the comic itself, so google shouldn't have found the page - at least according to how I thought google worked. So I pulled up the source to see if there was a meta tag there which would explain this. Nope.
The only thing I can think of is that google either OCR's the pictures (seems scary, and that font which Illiad uses doesn't look very OCR-able). The other thing I thought about is that perhaps google also matches text found within <A> tags which link to that page or something.
I've shot a message off to google to ask about this but I haven't heard back yet. I'll be interested to find out how the *@(#*$ they did this.
I think that I saw an ad somewhere which said "How the @(#$* did they do that?" was the highest praise one web designer could give to another. If that's true, they've definately earned my praise in this case. Regardless, some wizard at google got their search engine to do exactly what I wanted with whatever technology they used. Technology sufficiently advanced is indistinguishable from magic. And google is definately magic.
The link is to an article that gives some insight into how google searches through the hordes and hordes of webpages. And bashes other search engines.
Note to submitter: while brevity may be the soul of wit try to remember we haven't read the article yet and need just a little more information.
Now that Google will find anything you want so easily, isn't there a danger that people will stop putting links to useful and interesting sites on their pages?
I don't need to tell people, via a link, about some wonderful site I've found if they can find it for themselves quicker and easier using Google. So I might not bother to maintain my collections of useful links, and Google will lose its information source. A victim of its own success.
What happens then?
Actually Google's system can, and is, beeing abused..
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
Undocumented Google Commands
Google Time Bombs
Google Science-Fiction
"sweet dreams are made of this..."
Google works on the recursive principle that an important document is one linked to by a lot of important documents. search for "child pornography" and (i'm generalizing here) you're likely to find two kinds of sites: sites offering child pornography and sites opposing it. those will probably create two seperate cliques (if you look at the web as a graph) or clusters. It will be quite easy to offer them as two seperate lists both satisfying the search query. i believe northern light (http://www.northernlight.com/) does exactly this.
Now how about a similar principle for people? A suspicious person is one who communicates with suspicious people. If you have access to Email messages sent on the internet this is quite easy to achieve. Filter the messages to those mentioning "child pornography" and now do the same analysis as google does. voila! you are left with lists of child pornographers and of internet vigilantes. easy. automatic. you can start worrying again.
btw, if you are looking for an interesting technical description of the best search engine around, the original google article (http://citeseer.nj.nec.com/brin98anatomy.html) by Brin and Page does the job a lot better than Doctrow's.
The article boils down more or less to the following:
1. "Old" search technologies (Altavista, Yahoo) failed because they used approaches that found words but not content (Altavista) or relied on non-scalable human editorial judgement (Yahoo).
2. Google works (and is cool) because it uses available information about the number of links to determine (a) valuable content and (b) smart judges of other valuable content
3. The government efforts at creating the Panopticon will fail because they'll be stuck using "old" keyword approaches that can't pick out real content.
This argument is flawed in two key ways:
1. The author confuses the nature of the "search". Web searching is about finding *content* and the challenge is differentiating "good" content from "bad" content. Governmental "security" searching is more akin to traffic analysis and the goal is identifying dangerous *individuals* based on the content and pattern of their traffic. The challenge there is differentiating "good" (safe) speakers from "bad" (dangerous) speakers.
2. The author assumes (based apparently simply on opinion and what is popularly reported in the press) that the government will blindly apply "alta-vista style" techniques. His lack of fear of the Panopticon is based on an assumption of incompetence in the application of surveillance methods. Given the motivation and resources (both of which the government now has in spades), there is no reason to believe that more sophisticated and effective techniques will not be developed and pursued. Assuming Echelon has really been in operation, it's hard to imagine that, in the closed halls of the NSA, researchers aren't well aware of the limitations of keyword search and are far along applying cryptanalytical techniques to the real problem identified above.
It would seem that the author is trying to take advantage of hype and concern about government surveillance not to make a serious comment about it or whether one should truly be concerned, but rather to get an audience for his opinion that Google is really cool, which most of already knew anyway.
-XDG
He's wrong about one thing. Email does have links. It has links indicating who it came from and who it went to. Even without the content, that sort of information, about who is talking to whom, and in what patterns, can be really informative to those who know what they're looking for.
If you include the content, it's a goldmine.
URLs embedded in email would make it better again
Aside from that though, great article.
Why doesn't the gene pool have a life guard?
Doctorow's point, I believe, is that we have a luxury of choices for searching information, but those who want to wiretap us do not have the luxury of infinite time and infinitely improved ways to find the information they want.
If they could only track us via the public internet, I would probably agree.
I would say we don't know what sort of technology they ultimately have for searching our data; until we knew that, we should not assume anything such as he has, that they're not able to keep up with the flood of data.
Remember that they're not only recording elements of email, phone, and other communications; but they are also tracking who is sending and receiving it; and those who are under "wiretap" are nearly perfectly trackable as long as they can associate an identity to an IP to a person. That is the Panopticon, the prison with ideal survailance; mapping a person to their communication and selectively watching those who bear suspicion.
HOWTO get better dates on slashdot
2) What is this absolute crapola about how bytes are more reliable than allegedly "fragile" books? Does this tubesteak realize that there are 500 year old books that are completely legible, while 15-year-old electronic data is unreadable? Yeesh. The only bright spot is that this guy's ravings are in electronic form, so future generations won't have to worry about them.
- adam
Remember John Poindexter? Mr. Iran-Contra? He lied to Congress and kept Ronald out of the loop. He also was responsible for shredding lots of docs on the subject as well. Now he'll be spying on US domestic electronic transmissions.
There is some irony in him destroying thousands emails to cover his ass then and now being in charge of watching everyone else's emails.
I'm also sure that the billions of dollars for his new office may be able to overcome shortcomings of certain search engines. Nobody's going to have to type all those boolean operators.
The quote above is from the UK's Guardian... Check out what you might have been missing
An interesting story, curiously not in CNN..
Nor MSNBC...
Couldn't find it in Washington Post..
Article in LA times on his appointment does not describe what he is to do in his new job except to blather about Sputnik and stealth aircraft.
Not in CBC.ca : (
Cheers to all the spooks! I think it is a job well done! -b.
The article (or that part of it) reminds me of the people who claimed that newspapers were going to fall apart and they all needed to be microfilmed and stored that way...now the newspapers that were dumped are in such great shape that The Sharper Image is selling them for $30 a pop, and the microfilms are deteroriating, that is the ones that were made legible to begin with.
Copying bytes may be easy but every time I switch computers I have to worry about moving stuff and where is it stored, then there is 20-year-old stuff on 5 1/4" floppies...meanwhile my books from childhood are all doing great. Even the cheap-o dot-matrix printouts from my BBS days in 1983 are perfectly preserved, which is more than I can say for any data I had from back then.
- adam
So, without thinking I fire up google and type the search:
"user friendly the comic strip" email attachment
and then clicked on search. The first hit is the cartoon I wanted, so I click on it. When I pull up the page, I realize that the text words "email attachment" don't appear anywhere on the screen other than the graphic text in the comic itself, so google shouldn't have found the page - at least according to how I thought google worked. So I pulled up the source to see if there was a meta tag there which would explain this. Nope.
The only thing I can think of is that google either OCR's the pictures (seems scary, and that font which Illiad uses doesn't look very OCR-able). The other thing I thought about is that perhaps google also matches text found within <A> tags which link to that page or something.
I've shot a message off to google to ask about this but I haven't heard back yet. I'll be interested to find out how the *@(#*$ they did this.
I think that I saw an ad somewhere which said "How the @(#$* did they do that?" was the highest praise one web designer could give to another. If that's true, they've definately earned my praise in this case. Regardless, some wizard at google got their search engine to do exactly what I wanted with whatever technology they used. Technology sufficiently advanced is indistinguishable from magic. And google is definately magic.