Is IRC All Bad?
An anonymous reader writes "IRC is often portrayed by the media as a haven for illegal activity. The author of IRC Hacks set out to find whether or not this was true. His conclusions are quite alarming, suggesting that 99.9% of IRC usage is illegal although he backs up IRC by saying that it is also used for lots of constructive purposes and is used by open source software developers." Update: 01/21 05:17 GMT by P : The author claimed it was merely 99.9% of traffic "to the top 60 channels" that is illegal, not 99.9% of all IRC traffic.
This entire post is like flamebait for some of us.
I've been an oper on DALnet for six years now, and I currently lead up their coding team, so allow me to shed some light on this - assuming this makes me qualified.
The top 60 channels. Who goes to huge channels to chat? Ever tried talking in a channel with 20 active users? Try 800 active users. Nobody goes to large channels to chat, its pointless to even try. The folks that join these channels join looking for something specific, or to offer something. They find what they are looking for, and move on.
On DALnet, we've taken agressive action against warez, child porn, and drones. Drones are unfortunately the only item that I can speak on authoritively - we reject about 300 drones per second on any given server on our network. This is done through pattern matching in their registration. Drones is a serious problem on any network. A while back (five years or so), dianora of efnet did some drone hunting, and concluded that around 60% of "users" on irc were accually drones - hacked end-user computers. Drones are a far worse problem than people realize.
A few years ago, DALnet was seriously DDOS'd - we went from the top network in the world (around 140,000) to next to nothing. Our servers sometimes got hit with DDOS attacks in the range of 60 Gigabits per second. We shut down major providers, rendered entire datacenters useless, and obviously lost servers quickly. We've since changed our routing methods to rely heavily on anycast, and changed a lot of other things.
In my mind, DALnet is one of the networks that accually has one of the lowest noise ratios around. Quakenet, the current leader in usercount, raises questions with me. Their usercount rose very fast, and I wonder about their userbase. I personally know only -one- person who uses quakenet. You mention DALnet, Undernet or EFnet and people identify much more readily. Even more people use small IRC networks with 50-500 users.
99.9% for illegal purposes - bullshit. If you go to irc only to look for warez, then I think you are in the minority. I'd put illegal purposes around 5% at best. And that means real, live people at the keyboard, looking for illegal material.
.
Yes, this is an extreme example of how NOT to conduct a study. He started by chosing the 60 most popular channels - by definition they were not typical. There are 50,000 channels on undernet alone with an average of about 3 users each. Then he chose 4 keywords that are likely to be used much more for warez than legitimate conversation. The results would have been very different if the channels and keywords had been chosen randomly. Of course, if he had chosen a small number of keywords randomly, the results would probably have been 0.00% illegal traffic since the vast majority of the words used on IRC don't name products that are pirated, so the approach of examining the relative rates of legal and illegal use of particular keywords is itself flawed even if your choice of keywords isn't. Relative frequency of many different keywords in some cases could give some clues though there are statistical problems with this. "ROFLMAO" is more likely to be found in legitimate messages whereas "systemworks" is more likely to be found in piracy or SPAM (though it can occur in many legitimate contexts as well). A bayesian filter that looked at ALL keywords could have been used to separate the legal from illegal traffic after extensive training and used to extend the study over more messages and channels than could be done by hand.
And of course, his statistics (or even much better ones) won't tell you if, for example, 37% of the bots offering downloads are run by BSA, RIAA, and MPAA so they can collect IP addresses of pirates and 87% of the download requests are dummy requests they generate to make it look like everyone is doing it (to make it look like it is safe to download so they can entrap people as well as inflating statistics they can trot out later).