MySpace Predator Caught By Code
An anonymous reader writes, "Wired News editor and former hacker Kevin Poulsen wrote a
1,000-line Perl script that checked MySpace for registered sex offenders. Sifting through the results, he manually confirmed over 700 offenders, including a serial child molester in New York actively trying to hook up with underage boys on the site, and who has now been arrested as a result. MySpace told Congress last June that it didn't have this capability." Wired News says they will publish Poulsen's code under an open-source license later this week.
Sure it's easy. Suck down the HTML to the search page. Build a routine that does the HTML POST, and iterate through each name in the Offender's list, using it for the value of the "search by real name" field. Parse for the result count string in the returned HTML. When result count >0, investigate further. Now, how easy is it for MySpace? I'd say about an order of magnitude easier - they have direct access to the database. Roughly something like: SELECT * FROM userbase WHERE EXISTS (SELECT offenders.realname FROM offenders WHERE offenders.realname like '%'+userbase.realname+'%') Sure, there's a little added complexity for slight spelling variations, but SoundEx and the like can be used for such purposes.
Isn't this a breach of privacy and wouldn't this person or MySpace be vulnerable to lawsuits?
Anything you put on a public web site is--by definition--not private. It would be a breach of privacy if MySpace used private, personal information, but if the script just culled information from public pages, there's no breach of privacy.
ZuluPad, the wiki notepad on crack
If you are sifting through private information, then one of the following is true:
- If you are a Law Enforcement Official, anything you discover cannot be used to obtain a warrant, nor can this evidence be used against
someone without it being lawfully reacquired once a warrant has been issued
- If you are a private citizen, unless you violated some sort of Terms of Use or other agreement to obtain the information, it is not
illegal for you to use it
Yes. It is perfectly legal for a private citizen, acting on his or her own volition, to perform searches. The illegality occurs when laws are broken to obtain the information (breach of contract, breaking and entering, etc).OCO is Loco
Doing a bunch of HTTP fetches, parsing and extracting the data -- from sources that were probably never designed to be automatically parsed, and hence have lots of weird exceptions and corner cases -- and then performing string compares, easily adds up to 1000 lines, especially with comments and error messages. The task is trivial in theory but somewhat hairy in practice.
And speaking from unpleasant experience, doing something like this in a language without features dedicated to text parsing (like C++ without the Boost Perl regexp library) would take at least three times the lines.