Google Expanding To IRC?
AnimeFreak writes "In this The Register article, Google apparently has been involved in a little bit of activity in various IRC channels. According to Google, as asked by IRC Junkie: they're researching ways to improve their service and the activity is only temporary. Could this mean an ability to search for information that is contained on IRC? Services, such as Netsplit.de and Search IRC exist, and both allow the ability to get information from various IRC networks. Is Google trying to replicate what both these sites have done?"
"Search for w4r3z complete. Results 1-10 of eleventy billion:"
--saint
and believe it or not it's called xgoogle.com
Beings aspergers AND pulling chicks... I enjoy the challenge!
The "information" on IRC is 99% crap. I'm concerned that, by integrating IRC searching in Google, the signal to noise ratio of Google will go way down. If however, Google keeps it as a separate service like Usenet I suspect that it will go away due to lack of interest.
Who really wants to search IRC, except the Justice Department?
2005 - Google indexes all the things ever said on soap operas and talk radio.
2007 - Did you forget what you said in your high school cafeteria in 1998? Don't worry, Google now has it indexed.
2010 - Lost your car keys? Don't worry, Google knows. Just do a search and you will find them.
Don't blame Durga. I voted for Centauri.
Well, how do you build up a reliable irc database. I mean there are many servers and bots and so on in the irc, and most of them deal with warez and therefore are only up temporary. So if google really wants to build a irc search engine they have to find a way to get rid of the dead links, and also from links that point to illegal copy's (you can be sued for pointing to warez, can't you (see the deCss case)).
I personally would be glad, for the irc is a little bit, well, unstructured, and a search engine would definitely do good, but the problems building a database and interface based thereon seem enomous to me.
".Sig Stealer" was here
Well, yes and no. xGoogle is designed largely around finding shared files on IRC IIRC (always wanted to do that). As far as I know, it depends not upon channel content, but on server/channel names and perhaps M'sOTD.
-theGreater Pedant.
Well, recalling from where I get "news" (read: 90% useless but funny content via links), the IRC (IRCnet, which is popular in Germany) is a incredible fast distribution way for links.
Assuming that google is interested in finding new sites as soon as possible, they should crawl the irc channels.
This does not mean that they are going to index it.
It seems Tony Collen had the original scoop on this story. It is more informative than the Register link.
If you scroll down his original web log on this topic you will see Google's first official acknowledgment of their IRC activity.
Remember... ZG9uJ3QgZm9yZ2V0IHRvIGRyaW5rIHlvdXIgb3ZhbHRpbmU=
...a/s/l?
Ita erat quando hic adveni.
that spam will extend itself to irc?
Thousands if not millions of bogus irc channels with specific keywords inserted in the topic only to attract hits on the main google search page?
Hack your mind out of its sandbox.
XGoogle.ORG -> Error: Cannot Connect to Data Base
Too many connections
Slashdotted already? We slashdotters are more dangerous than a beowulf cluster of... something.
Founder of Mirror Moon - Tsukihime Game Trans
How IRC users would react to a bot from microsoft.com is an exercise left to the reader.
If the IRC is anything like was it was when I last brushed thru, not many will even notice - or attemt to engage the 'bots in "virtual intimate acts".
Off course, there would always be someone - likely a Mac or Linux user - who will notice and scream up about how MicroSoft is 'spying' on the IRC-network, which in turn would lead to several more or less wellinformed blogs writting about it, which in turn will lead to a /. headline close to "Micro$oft trying to take over IRC, will shut out 3rd part clients"...
Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
The IRC admins, at least for most of the better channels, will simply set up a config to kick/ban the google bot. Many channels don't allow non-human connections unless set up by the channel admins. Unlike the annoying spammers who uses legit and stolen access points, google will likely come from a single legit source making the process of denying access easier.
Google shouldn't be trying to find more content, they should be working on filtering out the mass of garbage sites that already exist.
With the importance of Google in our every day lives steadily increasing, I don't dare to think of what might happen if Google et all stops being our good friend at some distant point. Centralized repositories are just not the way to go, we need a distributed, user-base owned, search engine. Maybe in the next Matrix moovie...
like archiving email, usenet, and web traffic before it - this is simply a reminder that nothing you type through an open network is -private-. this is a lesson most of us should have learned a long time ago.
but this isn't an invasion of privacy. there's no expectation of privacy when you log onto a public chat board. just as there's no expectation of privacy should you decide to walk naked through a park.
the best you can hope for online is pseudonymity.
but that's out the window with the combined power of google. which is quickly becoming the internet's inadvertant Big Brother.
the primary difference being, google works -for- the people just as much as it works -against- the people.
// "Can't clowns and pirates just -try- to get along?"
Bill Gates: Speak.
Neo: The search engine Google has grown beyond your control. You cannot stop him -- but I can.
Bill Gates: And if you fail?
Neo: I won't.
--- several scenes later ---
Google: Mr. Anderson! Welcome back, we missed you.
* Google pauses and looks around at the multitude of web sites and irc channels he has cached
Google: Like what I've done with the place?
Neo: It ends tonight.
Google: I know it does, I've had some researched figure out the answer for me. That's why the rest of me is just going to enjoy chatting on irc while we fight. I've seen the logs and irc'ers already know that I'm the one that beats you, so they're just gonna download from some leet xdcc bots.
Now we've new category of stuffs to search for other than p0rns. :)
bloodninja: Ok baby, we got to hurry, I don't know how long I can keep it ready for you.
j_gurli3: thats ok. ok i'm a japanese schoolgirl, what r u.
bloodninja: A Rhinocerus. Well, hung like one, thats for sure.
j_gurli3: haha, ok lets go.
j_gurli3: i put my hand through ur hair, and kiss u on the neck.
bloodninja: I stomp the ground, and snort, to alert you that you are in my breeding territory.
j_gurli3: haha, ok, u know that turns me on.
j_gurli3: i start unbuttoning ur shirt.
bloodninja: Rhinoceruses don't wear shirts.
j_gurli3: No, ur not really a Rhinocerus silly, it's just part of the game.
bloodninja: Rhinoceruses don't play games. They f*cking charge your ass.
j_gurli3: stop, cmon be serious.
bloodninja: It doesn't get any more serious than a Rhinocerus about to charge your ass.
bloodninja: I stomp my feet, the dust stirs around my tough skinned feet.
j_gurli3: thats it.
bloodninja: Nostrils flaring, I lower my head. My horn, like some phallic symbol of my potent virility, is the last thing you see as skulls collide and mine remains the victor. You are now a bloody red ragdoll suspended in the air on my mighty horn.
bloodninja: Goddam am I hard now.
(Original post from bash.org
*Goes into new google IRC search mechanism and searches for term "Warez"*
Result: "Warez" is a very common word and was not included in your search
Mad Hatter
The idea of searchable IRC logs kindof scares me. An investigative team need only go to Google to search for discussions by someone with the nickname "l33t".
Of course, IRC logs are already out there, often made available by the denizens in charge of the channel in question. But they're not hooked up to a common database.
The speed of information dissemination is great for research and development, but that applies to both you, and people who want to learn about you.
I've mentioned several times on IRC that I have a brain disorder (Asperger's syndrome, specifically), but I may have been operating under the assumption that the information wasn't important enough to be spread around to twenty or thirty Googleable sites. To be honest, I don't care who knows, which is why I'm saying it here.
tasks(723) drafts(105) languages(484) examples(29106)
For example, I would like to search and browse the chatter on the SUSE acquisition and KDE vs Ximian situation on #gnome @ irc.gimp.org.
If Google could allow me to do that, that would be fantastic.
As an aside, does anyone know of IRC logs for #gnome?
(Please browse at -1 to read this comment.)
The odd thing is that people are reporting the robot joining channels, doing /whois on users and more.
What value could the /whois info from random users have?
The only thing one can safely say about this whole situation is: Google is doing some testing on IRC.
Personally, this is how I look at it:
Google ranks websites according to many criteria. Ranging from keyword density, keywords, text placement on the page, to incoming links and what the text within the links say. What use could IRC have? It is possible that active topics that are being discussed in real time could be used to help boost rankings towards subjects that are currently hot topics, similar to how google currently temporarily boosts the scoring of newly indexed pages to the google index. This is of course, pure speculation.
As others no doubt have already thought -- actual postings of private user information would be useless, as ChatScan had several million in funding couldn't pull it off with the IRC community two years ago. However, using that information to derive popular subjects might.
OTOH, google can likely gleam similar information from the millions of searches users enter into their search engine each day.
SearchIRC - Now with live chat directory!
From the information I've seen, Google is capturing URLs in channels, not the actual conversations.
I visited the site, but it only index channel names. Not discussions. Eg. you can't search for: "Cannot open /dev/dsp" +quake 3
I should've mentioned it; but I meant searching past discussions.
I have reviewed several logs of IRC chat rooms, and have not yet seen a good log format. Reading something like:
klax: So what'd you eat for dinner
bryan: Does anyone know how to recompile a kernel?
ray: I had french fries and a beer
Provides little to no format. Google currently cache's PDF files in their cache; and should your search term return a pdf file, all your keywords are highlighted. I would imagine that google would use this same approach for their log format system, yet even this does not provide a friendly browsable view. I don't have any recomendation for a proper format, as I have not seen any good formated logs.
My Thoughts, Kyndig