Google Expanding To IRC?
AnimeFreak writes "In this The Register article, Google apparently has been involved in a little bit of activity in various IRC channels. According to Google, as asked by IRC Junkie: they're researching ways to improve their service and the activity is only temporary. Could this mean an ability to search for information that is contained on IRC? Services, such as Netsplit.de and Search IRC exist, and both allow the ability to get information from various IRC networks. Is Google trying to replicate what both these sites have done?"
Well, yes and no. xGoogle is designed largely around finding shared files on IRC IIRC (always wanted to do that). As far as I know, it depends not upon channel content, but on server/channel names and perhaps M'sOTD.
-theGreater Pedant.
that spam will extend itself to irc?
Thousands if not millions of bogus irc channels with specific keywords inserted in the topic only to attract hits on the main google search page?
Hack your mind out of its sandbox.
Well, how do you build up a reliable irc database.
Have your bots sit in channels worth archiving. Break logs down into manageable chunks (hourly, by size, etc), and index them. Searches pull up these chunks of log with your search terms highlighted.
I mean there are many servers and bots and so on in the irc, and most of them deal with warez and therefore are only up temporary. So if google really wants to build a irc search engine they have to find a way to get rid of the dead links, and also from links that point to illegal copy's
Ever try searching for warez on Google Groups? Good luck. They don't archive the binary newsgroups, and it is simple to weed out the posts that contain binaries in regular newsgroups.
Google is pretty smart, let's wait and see what they come up with.
Ironically, the word ironically is often used incorrectly.
With the importance of Google in our every day lives steadily increasing, I don't dare to think of what might happen if Google et all stops being our good friend at some distant point. Centralized repositories are just not the way to go, we need a distributed, user-base owned, search engine. Maybe in the next Matrix moovie...
For example, I would like to search and browse the chatter on the SUSE acquisition and KDE vs Ximian situation on #gnome @ irc.gimp.org.
If Google could allow me to do that, that would be fantastic.
As an aside, does anyone know of IRC logs for #gnome?
(Please browse at -1 to read this comment.)
The advantage of IRC, though, compared to the Web, is that it is more reliable - in a very weak sense, but nonetheless.
Think of the google page rank algorithm, it is in great danger to be made useless by link farms.
That is because google has problems seperating link farms from "real" pages which link to each other and by that, provide each other some trust (pagerank).
With well populated irc channels, googles bots can have a higher trust that these channels are not artifial, like the link farms are.
Although you are right, the information to be found there is crap in most cases, I could imagine google harvesting known good help channels (linux-help, etc) for urls which are posted in conversations ("#bla-expert shouts: If you want to know more about bla, goto www.bla-project.org/documentation") , in order to better qualify web pages.
Wouldn't it also be nice for google to have an IRC interface to their search engine?
Google bots in popular channels. It could work.
--
Use Vobbo for Video Blogs
From the information I've seen, Google is capturing URLs in channels, not the actual conversations.
I have reviewed several logs of IRC chat rooms, and have not yet seen a good log format. Reading something like:
klax: So what'd you eat for dinner
bryan: Does anyone know how to recompile a kernel?
ray: I had french fries and a beer
Provides little to no format. Google currently cache's PDF files in their cache; and should your search term return a pdf file, all your keywords are highlighted. I would imagine that google would use this same approach for their log format system, yet even this does not provide a friendly browsable view. I don't have any recomendation for a proper format, as I have not seen any good formated logs.
My Thoughts, Kyndig