Questionable Data Mining Concerns IRC Community
jessekeys writes "Two days ago an article on TechCrunch about IRSeeK revealed to the community that a service logs conversations of public IRC channels and put them into a public searchable database.
What is especially shocking for the community is that the logging bots are very hard to identify. They have human-like nicks, connect via anonymous Tor nodes and authenticate as mIRC clients. IRSeeK never asked for permission and violates the privacy terms of networks and users. A lot of chatters were deeply disturbed finding themselves on the search engine in logs which could date back to 2005.
As a result, Freenode, the largest FOSS IRC network in existence, immediately banned all tor connections while the community gathered and set up a public wiki page to share knowledge and news about IRSeeK. The demands are clear: remove all existing logs and stop covert operations in our channels and networks.
Right now, the IRSeeK search is unavailable as there are talks talking place with Freenode Staff."
IRC is pretty much a shadow of its-self from the good old days of perhaps 10 years ago. Does anyone really even bother with it now? Between the scams/spam/abuse, why bother?
And no, I'm not trolling, i was there in the beginning, but watched it degenerate into a virtual cesspool years ago, and got out before it hit rock bottom. Has it improved?
---- Booth was a patriot ----
Our nicks on IRC provide a level of anonymity, and we know that actual people do keep logs of us. Many of our quotes even end up on http://www.bash.org./ I go onto IRC knowing that my conversation is not necessarily private, and if I ever wanted to discuss private details of myself to someone on IRC, I could simple private message him. I could even set up a private room if I have to discuss private matters to a group of people. I don't know why I'd discuss private issues with those on IRC, but some people may for whatever reasons. It's silly to expect privacy on IRC. Never say anything in public that you don't want to come back at you. If anything, just set up a passworded channel if you're planning a violent revolution.
So what exactly makes an IRC network FOSS? Almost all the major networks have been publishing their code since their inception. Given that I've been part of the coding team for DALnet for the last seven years - and publishing Bahamut as GPL the entire time, saying that freenode is the "largest FOSS network"...
As a side note, DALnet has banned tor nodes quite a while ago, because of services abuse coming from those IP addresses.
.
The three people who still use IRC are going to be *pissed!*
(Last time I used IRC was in an attempt to get support on a particular open source software package. Worst. Support. Ever. In a room with 50+ connected people, seemingly every single one was AFK for a solid 5 minutes. Of course when someone got back, they just told me I was in the wrong IRC room to ask that question, [you know, the one in the product's documentation!] and I was stupid for not knowing it. The other 49 AFK people never said a word, so I kind of wondered why the hell they even bothered to connect. Of course, maybe they were all secret IRC logging bots, heh.)
Comment of the year
"How many times has someone come into a linux channel asking for help when the same question was answered 5 minutes earlier."
If that question is asked as frequently as you make it seem to be, the person asking it could have found the answer with a websearch. The fact that they didn't search the web tells you that they certainly won't use an irc search engine first either.
Couple things here... hiding your mask is quite possible on freenode, and can be done in a few minutes time upon request. As far as irseek on efnet, they are not using tor there as far as I've seen and not attempting to hide their hostname, either. I'd say that does point towards the use of tor being an evasion tactic rather than a hostmask hiding tactic, since they haven't attempted to hide hostmasks elsewhere.
Probably not. I strongly doubt they would put the logs on the web.
The Tao of math: The numbers you can count are not the real numbers.
The second type of communication is peer-to-peer. A user sends a message to a specific user. Examples include e-mail, phone communication, and the like.
Anyone can ensure the privacy of peer-to-peer communcation. Consider two users who want to exchange e-mail messages. First, the users pick a reliable encryption tool (which are readily available on the Internet) and an encryption key. Then, each user encrypts a message before sending it via e-mail to the other user. Even the NSA will be unable to crack the message (if the users pick a good encryption tool).
Encryption can also be applied to voice communication. The users can use an Internet-phone software application to communicate by voice via the Internet. Each user merely needs to encrypt the data packets before sending them to the other user's computer.
If you believe that someone (e.g., a Russian spy) is wiretapping your regular (mobile or landline) phone, then do voice communication via the Internet. In Russia, most people use cell phones, so they just need to ensure that the phone has a data-communication mode in addition to the regular voice-communication mode. To ensure private communication, the user switches the mode of his phone to data-communication mode and uses his phone as a modem. He plugs the modem into his computer and then runs an Internet-phone software application to communicate via the Internet. The FSB (successor to the KGB) can record the entire session of encrypted Internet packets, but the FSB will be unable to decipher the communication.
USENET used to be similar to IRC, in that it was used for casual, short-lived conversations, with expiration times for articles ranging from days to a few weeks. Post-1977, those articles should be automatically copyrighted and companies should not have a right to repurpose them from their originally intended usage. Well, that didn't stop companies like DejaNews from putting everything up on-line and making it searchable. Now, this company is doing the same thing for IRC.
I'm actually all for the principle that if you put it on the web or in a chat or on the public airwaves, people should be able to copy it, archive it, and redistribute it. However, such a principle needs to be formulated and enforced uniformly; it simply isn't right for some groups to get away with ignoring copyright and others to get charged with copyright infringement.
why is it that people on slashdot still are beating this dead horse? you should have NO expectation of privacy in a public forum. that's what public means. get over yourselves. stop acting like your rights to privacy are being trampled when you make an ass out of yourselves in public.
It seems very silly (at best) to expect "privacy" on a public communications channel, especially when probably a lot of the participants keep their own logs anyway.
Let me tell you my favourite "in Soviet Russia" kind of story. The story of how a handful of Party officials held some hundreds of millions of people in line.
;)
Yes, everyone knows about Stalin's brutal mass executions and deportations. Very distasteful business, that. It also created so much resentment that it was unsustainable in the long run.
So it evolved into something more subtle: the idea that somewhere there's a dossier about you, containing a lot of the stupid things you've said in the past. You don't know exactly what or how much. (After all, they were the non-computer kind.) And you don't know when or how it will bite you in the arse later.
Maybe you can kiss any chance of traveling abroad goodbye. Maybe now your chances of promotion or of finding a better paid job, just became nil. Or maybe you're just this far from having to explain it all to the secret police and, if you're lucky, looking forward to a long career somewhere in Siberia. Or maybe it will bite your kid in the arse, if they can't get you. Etc.
In a nutshell, the idea was that you don't have an expectation of privacy. Anything you say, even nodding approvingly when comrade Piotr swears at the government at the pub, might become permanently attached to you and a factor in which way your future goes.
Worse yet, how do you know if comrade Piotr isn't an agent provocateur, trying to get you to say something you'll regret?
So people learned to think twice before opening their mouth, and avoid saying anything that might be used against them. It turned them into a mass of isolated (and thus vulnerable) individuals, because not many risked saying (or even listening to) anything that could have been the start of an organized resistance.
And now back to the topic, here's what I wonder: why the heck do we allow the same in the West, if it's done by corporate PHB's instead of the Communist Party?
The effects, way I see it, can be exactly the same: anything you ever say or do is recorded _somewhere_. Be it Google, or such recorder bots or whatever. And in an age where HR drone routinely google employees and prospective employees, it can come back to bite you in the arse.
And to get even more back on topic: even if you started a private conversation with comrade Piotr, how do you know if he's not just baiting you for something to post on Bash?
Yes, nicks are a privacy tool, but for most people it's not as unbreakable as they think. We already know that most ISPs would give away the owner of an IP address without even asking for a court order. Did you ever register that nick? Because if you did, now the IRC server has information linking that nick to an email address. If you think none can be bullied into giving it away, think twice.
Plus, are you paranoid enough to keep _all_ conversation at the level of "I'm evolvearth, you don't need to know my RL name and telephone number"? Well, kudos if you do, but most people don't. For most, online communication seems to be just an extension of RL communication. (And please don't imagine that said in a condemning tone or anything.)
So basically, all these attempts of recording everything we say or do... will they just turn us into some obedient serfs to our corporate overlords? You know, better not say anything that makes you sound like a maladjusted anarchist, because some HR drone will google you. That might be your job you're throwing away there. Better not say anything against the government too, because you don't know when your (current or future) company gets a chance at a government pork-barrel contract that requires a thorough background check. Etc.
Yes, you can password protect channels, do it all in private channels, etc, but I'd say even that might not help you much once enough people learned to just keep their mouth and fear strangers asking about certain matters.
Just some (admittedly pessimistic) stuff to think about, if you're bored enough
A polar bear is a cartesian bear after a coordinate transform.
but in most states a conversation is illegal to record unless all parties expressly allow it. The owner of a bar can't just start audio recording at all the tables if they want to...(video is OK with NO audio, and audio is allowed in "general" or at a register, but recording individuals is highly unethical and probably illegal, let alone to publish that somewhere. I don't see how IRC is any different other than it's "written" because it's typed on a computer so that may change the rules.. from an oral conversation.
Communicating through plain text on the internet no longer considered private.
More at eleven.
using System.Awesome;
I don't see how making things opt-in and the bot easily identifiable is a demand to go out of business; it sounds very reasonable to me.
Some channels (particularly support types) will have use for a search bot.
It seems a bit underhanded how they disguised the bots as a human and used tor to hide the activity. Look at the web: the only search engines that try and disguise themselves and which ignore robots.txt belong to spammers. Legitimate search engines obey robots.txt and are easily identifiable by their user agent. They don't disguise themselves as MSIE.
Oolite: Elite-like game. For Mac, Linux and Windows
FWIW, IRSeeK seems to have had a change of heart, or at least is being receptive to privacy concerns:
http://www.irseek.com/blog/
Sounds like a genuine response of concern to me...
And what has IRC been replaced by to a large extent? ICQ, AIM, Yahoo Chat. Individuals sending messages to one another in isolation via a corporate network which was doing who knows with all of that. On IRC we had DCC chat - direct chat without any middleman watching. Putting aside encryption (for both), it's the principle and design of the thing - we were allowed privacy, not beholden to some corporation. But more importantly, there was a social context, it was not only individuals messaging one another in isolation, although sometimes it was, but people hanging out in groups of like-minded people. It had a social element lacking in it that AIM does not have. Yes, I know AIM has some awful group-chat thing (which crashes on GAIM constantly) but it is a small tag-on to the isolating thing that AIM is.
Not that IRC is perfect. Sometimes a bunch of idiots would take over the channel. The architecture of control - channel operators, kicking and banning and the like - those are crude tools and something better could have been (or could still be) engineered. Especially in channels more free-wheeling than #gentoo or the like. But it is far better than the isolation of something like AIM.
Some positive things about IRC - Freenode is good. I like Indymedia's IRC network, if that type of thing is up your alley. I also like some uses it has been put to by programs - Wikipedia sends its recent changes to an IRC channel, and a number of different scripts use it to combat vandalism there. Some Gnutella clients used to use it to bootstrap - as do some other p2p programs like Freenet. All inspired uses of a protocol that is ideally suited for the type of social, collaborative efforts going on there.
This is just a sure fire way to cause more chans to go invite only (+i).
You feel sleepy. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise.
There is also TopicSpy which logs any urls (images, docs, videos, ...) found in IRC topics. While not invasive as as IRSeeK it can expose urls that were not intended to be public. Beware!
I foresee a simple but costly solution to this--channels that don't use pastebins and let people post segments of code are inevitably going to be not merely archived, but reproduced and published. In something like C, I doubt you could call that copyright infringement for five or six lines--but in a sufficiently expressive language like Perl, Python or APL, I'm pretty sure it would be fair game to register said algorithm and make a claim against the people who automatically copy and publish it without notification.
I mean--I hate to advocate flagrant abuse of copyright--but when their idea of "unobtrusive" basically means getting tor banned, lying about their client, and polymorphic usernames to wholly disguise the presence of a logger--pretty much anything you do to undermine them becomes fair game.
I for one used freenode (in particular) under the impression that I would be logged--but only by private, noncommercial parties who would likely only publish limited portions for clarification. This isn't about legal rights--freenode makes it clear that they don't restrict logging--it's pretty much inevitable with a decent client. But I at least would like to know when I'm being logged for commercial purposes. If they can't at least behave respectfully in this regard, I see no reason to grant them the courtesy of prior notice when they infringe a registered copyright--I'm not required to do that by law either. Decent people would give notice of course...
They've already published a clarification on their site http://www.irseek.com/blog/?p=3 . But What I want to know is--why lie about their client, and masking their origin through tor nodes? What non-malevolent purpose could that possibly have had? Their whole bit about being "unobtrusive" is a load of BS--an extra name in the channel that I can mask on, particularly with the name BOT in it works fine in every other channel.
Until they can justify their past subversive behavior--any future behavior loses the benefit of doubt with respect to intent. In any channels I run, they're now expressly banned in the topic line.
As an IRC user I dislike IRSeek's business model and practices very much. Discussions on IRC channels are by definition available only to the people who join in, and making any log available without asking is bad etiquette and in most places it is against the terms of use. If we wanted to make our discussions public, we would speak in a Web forum or USENET newsgroup, or we would use our own logging facility and post the logs on our webpages.
People who believe IRC is dead or don't appreciate it are obviously not worthy of being called nerds. IRC is alive and well, and it is very interesting and useful. Remember that there are many IRC servers across the globe and many channels in them, just as there are many USENET newsgroups. If one network or channel is touched by the Eternal September, go to another server and at some point you *will* find interesting people.
What would make you more upset?
1) You walk into someone's office at work and find a list of the funniest quotes by you, that they had remembered from previous conversations.
2) You find out that they have been secretly tape recording every conversation you had with everyone at the office.
Essentially, yes. You've summarized my concerns better than my verbose roundabout style ever could. Thanks.
My only question was just how much such logging bots, "do no evil" Google, etc, just move us closer to... well, slavery. "Do no evil" Google has brought a lot of good, for example, but also brought us the reality where you _will_ be googled by your potential employer, and might suffer the consequences for some dumb thing you've said in freshman year.
Sometimes the road to hell can be paved with good intentions. Sometimes the government is just one of the possible evils.
1. To start with the most important part: If you're a highly qualified expert -- I fancy myself one too -- you have that option. Most people don't. Most jobs involve interchangeable peons. Noone will lose any sleep over whether they hired someone uber-qualified to operate the cash register, or just the obedient peon who doesn't rock the boat. In fact, in most cases it can be argued that hiring the latter is the _better_ thing to do.
What I'm getting to is:
A) Most people don't have that option to be defiant. So if saying the wrong thing can spell even one extra month of unemployment, they'll rather say what a potential employer wants to hear.
B) A world where only the upper 1% experts can afford to speak their mind, is a world which has lost the battle. A small inteligentsia can be bought, arrested on trumped charges, discredited, whatever. Stalin did that too.
If everyone except you is too afraid to even listen to your crusade, you've already lost. You've just become the liability to a totalitarian regime -- either the totalitarian government kind, or the corporate-owned kind -- and they'll find a way to render you harmless.
2. In an ideal world, every employer would be logical like you describe.
In the real world, employers are swamped in resumes, and are just dying for a reason, any reason, no matter how arbitrary or lame, to discard some. Some will just mix them discard the bottom half of the pile. Some smart and successful people argued that you should discard anyone whose email address you don't like the sound of, or whose picture looks unprofessional, or whatever. At least one corporation is using numerology. Add the numbers for each letter in your name (where A=1, B=2, etc), add the digits of the result, repeat the last step until you have a single digit. If it matches the digit for the company's name, you're eligible, if not, noone will even read your resume. At all. Several corporations use tarot. Literally. Etc.
The only thing that matters is having a repeatable criterion, and one that doesn't fall afoul of discrimination laws. So even if you're not allowed to refuse employing someone because they're black, you can safely refuse to hire them because their name sums up to 3. Or because your HR department found something they dislike when googling them.
So even for the top experts, some will realize that they increase their chances of a better job, if they just keep their mouth shut. Even if it's a slight increase, hey, every bit helps. If keeping your big mouth shut gives you even a 1% chance of landing a better paying / more stable / better quality-of-life / etc job, there will be people who'll gladly take that advantage.
For the replaceable peons I've mentioned before? Doubly so. In fact, make it 10 times so.
A polar bear is a cartesian bear after a coordinate transform.
I think this is the same thing. There's going to be resistance to the idea at first, because people aren't used to it and nobody likes something that works to change. But there's no reason why the change has to be for the worse and not for the better. I think an IRC log service could actually be pretty cool. Sure, there's a lot of stuff that goes on there, that I doubt anyone is going to care about later, but particularly in the technical channels there's a lot of good information given out from time to time. A good, well-known archive might prevent a lot of repetition, and allow users to make sure they're not asking things that get covered all the time.
There's no way to have a communications system where you're just screaming unencrypted bits out into the ether for anyone who wants to listen to them -- which is basically what both IRC and Usenet amount to -- and not let people archive them. There's no technical solution (you can try to keep blocking the logbots, but it's a losing battle if they're determined), and there's no real legal solution either (you could just set the archive up in some country that doesn't care about user's copyrights).
The IRC community has a chance now to embrace this, and in doing so, find some sort of middle ground (like the "X-No-Archive" header) that wouldn't get them into a fight with the people who want an archive that nobody can win.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
The thing is, all the channels that want archives - the software development channels, etc - already run their own which they control and can prune information out of that they don't want public. (In practice, I think a decent proportion of the Freenode-based channels I spend time in have some sort of official public log.) I can't see this going down well at all.
Comment removed based on user account deletion