freedom-to-tinker.com · Domains

'Login With Facebook' Data Hijacked By JavaScript Trackers (techcrunch.com)

Developers · Facebook · 2018-04-18 12:03 · posted by BeauHD · from the when-it-rains-it-pours dept. · 91 comments

An anonymous reader quotes a report from TechCrunch: Facebook confirms to TechCrunch that it's investigating a security research report that shows Facebook user data can be grabbed by third-party JavaScript trackers embedded on websites using Login With Facebook. The exploit lets these trackers gather a user's data including name, email address, age range, gender, locale, and profile photo depending on what users originally provided to the website. It's unclear what these trackers do with the data, but many of their parent companies including Tealium, AudienceStream, Lytics, and ProPS sell publisher monetization services based on collected user data. The abusive scripts were found on 434 of the top 1 million websites including freelancer site Fiverr.com, camera seller B&H Photo And Video, and cloud database provider MongoDB. That's according to Steven Englehardt and his colleagues at Freedom To Tinker, which is hosted by Princeton's Center For Information Technology Policy.

Over 400 of the World's Most Popular Websites Record Your Every Keystroke (vice.com)

Yro · Privacy · 2017-11-20 15:30 · posted by BeauHD · from the session-replay-scripts dept. · 263 comments

An anonymous reader quotes a report from Motherboard: The idea of websites tracking users isn't new, but research from Princeton University released last week indicates that online tracking is far more invasive than most users understand. In the first installment of a series titled "No Boundaries," three researchers from Princeton's Center for Information Technology Policy (CITP) explain how third-party scripts that run on many of the world's most popular websites track your every keystroke and then send that information to a third-party server. Some highly-trafficked sites run software that records every time you click and every word you type. If you go to a website, begin to fill out a form, and then abandon it, every letter you entered in is still recorded, according to the researchers' findings. If you accidentally paste something into a form that was copied to your clipboard, it's also recorded. These scripts, or bits of code that websites run, are called "session replay" scripts. Session replay scripts are used by companies to gain insight into how their customers are using their sites and to identify confusing webpages. But the scripts don't just aggregate general statistics, they record and are capable of playing back individual browsing sessions. The scripts don't run on every page, but are often placed on pages where users input sensitive information, like passwords and medical conditions. Most troubling is that the information session replay scripts collect can't "reasonably be expected to be kept anonymous," according to the researchers.

DNA-Based Advertising Redefines Commercial "Ad-Targeting"

Science · Advertising · 2015-09-16 10:16 · posted by samzenpus · from the born-to-like-this dept. · 31 comments

An anonymous reader writes: Hidden among the customary disclaimers about how the website intends to use the information it holds about you, ancestry.com states that it reserves the right to leverage the genotyping tests of users (who have contributed their DNA to AncestryDNA research) in order to serve back 'relevant' advertising via the site. Critics of the clause believe that the site's promise to delete a user's genome on request is devalued both by the possibility of data breaches and by the fact that data brokers and other third parties are both unlikely to honor (or even know about) removal requests, and are likely to improve at leveraging genetic information in the future.

White House Names Ed Felten As Deputy U.S. Chief Technology Officer

News · Usa · 2015-05-11 10:36 · posted by samzenpus · from the putting-a-team-together dept. · 27 comments

New submitter bird writes: Ed Felton, Director of Princeton University's Center for Information Technology Policy (CITP) and well-known and outspoken consumer advocate, has been appointed deputy US chief technology officer. His is a voice of reason that needs to be heard when tech policy is made. The press release says: "We are excited to announce that Dr. Ed Felten is joining the White House Office of Science and Technology Policy as Deputy U.S. Chief Technology Officer. Ed joins a growing number of techies at the White House working to further President Obama’s vision to ensure policy decisions are informed by our best understanding of state-of-the-art technology and innovation, to quickly and efficiently deliver great services for the American people, and to broaden and deepen the American people’s engagement with their government."

Bitcoin (Probably) Isn't Broken

It · Bitcoin · 2013-11-09 10:54 · posted by Unknown · from the cheaters-cheating-on-cheaters dept. · 78 comments

Trailrunner7 writes "In the wake of the publication of a new academic paper that says there is a fundamental flaw in the Bitcoin protocol that could allow a small cartel of participants to become powerful enough that it could take over the mining process and gather a disproportionate amount of the value in the system, researchers are debating the potential value of the attack and whether it's actually practical in the real world. The paper, published this week by researchers at Cornell University, claims that Bitcoin is broken, but critics say there's a foundational flaw in the paper's assertions. ... The idea of a majority of Bitcoin miners joining together to dominate the system isn't new, but the Cornell researchers say that a smaller pool of one third of the miners could achieve the same result, and that once they have, there would be a snowball effect with other miners joining this cartel to increase their own piece of the pie. However, other researchers have taken issue with this analysis, saying that it wouldn't hold together in the real world. 'The most serious flaw, perhaps, is that, contrary to their claims, a coalition of ES-miners [selfish miners] would not be stable, because members of the coalition would have an incentive to cheat on their coalition partners, by using a strategy that I'll call fair-weather mining,' Ed Felten, a professor of computer science and public affairs at Princeton University and director of the Center for Information Technology Policy, wrote in an analysis of the paper."

Ed Felten: Why Email Services Should Be Court-Order Resistant

It · Security · 2013-10-15 18:05 · posted by Soulskill · from the it's-not-a-bug-it's-a-feature dept. · 183 comments

Jah-Wren Ryel sends this excerpt from Ed Felten at Freedom to Tinker: "Commentators on the Lavabit case, including the judge himself, have criticized Lavabit for designing its system in a way that resisted court-ordered access to user data. They ask: If court orders are legitimate, why should we allow engineers to design services that protect users against court-ordered access? The answer is simple but subtle: There are good reasons to protect against insider attacks, and a court order is an insider attack. To see why, consider two companies, which we’ll call Lavabit and Guavabit. At Lavabit, an employee, on receiving a court order, copies user data and gives it to an outside party—in this case, the government. Meanwhile, over at Guavabit, an employee, on receiving a bribe or extortion threat from a drug cartel, copies user data and gives it to an outside party—in this case, the drug cartel. From a purely technological standpoint, these two scenarios are exactly the same: an employee copies user data and gives it to an outside party. Only two things are different: the employee’s motivation, and the destination of the data after it leaves the company."

The Linux Backdoor Attempt of 2003

Linux · Security · 2013-10-09 04:04 · posted by Unknown · from the alright-which-one-of-you-did-it dept. · 360 comments

Hugh Pickens DOT Com writes "Ed Felton writes about an incident, in 2003, in which someone tried to backdoor the Linux kernel. Back in 2003 Linux used BitKeeper to store the master copy of the Linux source code. If a developer wanted to propose a modification to the Linux code, they would submit their proposed change, and it would go through an organized approval process to decide whether the change would be accepted into the master code. But some people didn't like BitKeeper, so a second copy of the source code was kept in CVS. On November 5, 2003, Larry McAvoy noticed that there was a code change in the CVS copy that did not have a pointer to a record of approval. Investigation showed that the change had never been approved and, stranger yet, that this change did not appear in the primary BitKeeper repository at all. Further investigation determined that someone had apparently broken in electronically to the CVS server and inserted a small change to wait4: 'if ((options == (__WCLONE|__WALL)) && (current->uid = 0)) ...' A casual reading makes it look like innocuous error-checking code, but a careful reader would notice that, near the end of the first line, it said '= 0' rather than '== 0' so the effect of this code is to give root privileges to any piece of software that called wait4 in a particular way that is supposed to be invalid. In other words it's a classic backdoor. We don't know who it was that made the attempt—and we probably never will. But the attempt didn't work, because the Linux team was careful enough to notice that that this code was in the CVS repository without having gone through the normal approval process. 'Could this have been an NSA attack? Maybe. But there were many others who had the skill and motivation to carry out this attack,' writes Felton. 'Unless somebody confesses, or a smoking-gun document turns up, we'll never know.'"

FBI Considers CALEA II: Mandatory Wiretapping On Every Device

Yro · Privacy · 2013-05-18 02:20 · posted by timothy · from the putting-it-gently dept. · 318 comments

Techmeology writes "In response to declining utility of CALEA mandated wiretapping backdoors due to more widespread use of cryptography, the FBI is considering a revamped version that would mandate wiretapping facilities in end users' computers and software. Critics have argued that this would be bad for security (PDF), as such systems must be more complex and thus harder to secure. CALEA has also enabled criminals to wiretap conversations by hacking the infrastructure used by the authorities. I wonder how this could ever be implemented in FOSS."

Two Florida Judges Quash Copyright Fishing Lawsuits

Entertainment · Music · 2012-03-29 03:14 · posted by timothy · from the don't-let-the-door-hit-you-on-the-way-out dept. · 17 comments

Fluffeh writes with a piece of good news on the privacy front: "Two rulings in related cases this week have dealt a serious blow to the plaintiffs and their dodgy legal strategy. Ordinarily, copyright law is handled by the federal courts, but Florida plaintiffs have begun using an obscure provision of state law called a 'pure bill of discovery' to attempt to force ISPs to reveal the identity of suspected file-sharers. The rulings, one on Monday and one on Wednesday, saw two different judges siding with the objecting ISPs. 'These back-to-back rulings against the plaintiffs suggest that they're likely to lose any time ISPs raise objections to fishing expeditions against their customers.'"

iFixit's Kyle Wiens On the War On DIY Electronics

Hardware · Hardhack · 2012-03-19 14:15 · posted by Unknown · from the insert-car-analogy-here dept. · 760 comments

pigrabbitbear writes with an excerpt from an article at Motherboard: "Anyone planning on buying a new iPad should know what they're getting themselves into by now. In recent years, Apple and other hardware manufacturers have made it liquid-crystal clear that they're not fond of the idea that customers can tear open and fix products without the help of licensed repair specialists. Even if it's as easy as ordering a part online and following a few instructions gleaned from a Google search, hardware companies generally seem to prefer we keep the hood closed. It should not be surprising, then, that the latest version of Apple's much-desired tablet has one 'killer' feature that's finally getting the attention it deserves: A design that stops you from getting inside of it."

Prof. J. Alex Halderman Tells Us Why Internet-Based Voting Is a Bad Idea (Video)

It · 2012-03-12 00:54 · posted by Roblimo · from the paper-ballots-are-still-the-best dept. · 264 comments

On March 2, 2012, Timothy wrote about University of Michigan Professor J. Alex Halderman and his contention that there is no way to have secure voting over the Internet using current technology. In this video, Alex explains what he meant and tells us about an experiment (that some might call a prank) he and his students did back in 2010, when they (legally) hacked a Washington D.C. online voting pilot project. This is, of course, a "professional driver on closed course; do not attempt" kind of thing. If you mess with voting software without permission, you might suddenly find the FBI coming through your door at 4 a.m., so please don't do it.

Factorable Keys: Twice As Many, But Half As Bad

It · Security · 2012-02-15 04:40 · posted by Unknown · from the keep-on-factoring dept. · 40 comments

J. Alex Halderman and Nadia Heninger write in with an update to yesterday's story on RSA key security: "Yesterday Slashdot posted that RSA keys are 99.8% secure in the real world. We've been working on this concurrently, and as it turns out, the story is a bit more complicated. Those factorable keys are generated by your router and VPN, not bankofamerica.com. The geeky details are pretty nifty: we downloaded every SSL and SSH keys on the internet in a few days, did some math on 100 million digit numbers, and ended up with 27,000 private keys. (That's 0.4% of SSL keys in current use.) We posted a long blog post summarizing our findings over at Freedom to Tinker."

Carl Malamud Answers: Goading the Government To Make Public Data Public

Yro · Government · 2012-01-23 07:24 · posted by timothy · from the one-man-orchestra dept. · 21 comments

You asked Carl Malamud about his experiences and hopes in the gargantuan project he's undertaken to prod the U.S. government into scanning archived documents, and to make public access (rather than availability only through special dispensation) the default for newly created, timely government data. (Malamud points out that if you have comments on what the government should be focusing on preserving, and how they should go about it, the National Archives would like to read them.) Below find answers with a mix of heartening and disheartening information about how the vast project is progressing.

LoC?
by an Anonymous Reader

So how many GB/TB is a Library of Congress? :)

Or, more seriously, how big are you estimating? Are you using raw scans or some sort of compression (JPG, PNG, etc)? What resolution are you using? Do you vary the resolution depending on the document?

What sort of meta data are you putting in?

CM: The reason John Podesta and I suggested a Federal Scanning Commission in our letter at YesWeScan.Org is we really don't know how big the holdings of the government are. I can tell you that the Library of Congress is about 32 million cataloged books (a significant increase from the 6,487 books Thomas Jefferson donated to get them started). But, this is about more than books, it is about paper records, microfilmed technical papers, video, audio, photographs, and much more.

The scale is fairly vast. The Smithsonian has 137 million objects, including about 13 million images. David Ferriero, the Archivist of the United States estimates he has over 10 billion pages of text documents, 7.2 million maps, and 40 million photographs including everything from past census records to presidential dinner menus, and that includes about 7.5 million motion pictures and sound recordings. The Government Printing Office distributes their documents to the Federal Depository Library Program, and that includes over 60 million pages of collections including the Official Journals of Government such as the Federal Register. That's just scratching the surface, and we recommended a Federal Scanning Commission to begin the process of understanding what we have (and what is worth digitizing).

As to standards? There are lots of pretty good standards on how to digitize. NARA, Library of Congress, GPO all spec out document scans at 400 dpi, for example. For photographs, moving images, and other objects, there are some pretty good and pretty detailed standards at www.digitizationguidelines.gov. I know Brewster Kahle's operation and my own tend to work off those specifications (in fact Brewster does quite a bit of scanning for the government).

As to compression? Well, I've found people tend to overcompress things. That said, sometimes the initial quality isn't that great, so a 600 dpi uncompressed scan would be silly in some cases. But, for photographs I try very hard to keep the TIFF images around and not rely on JPEG. Likewise, for audio it is really nice to keep a nice 48 khz version of your file around if you can simply because if you screw up the compression maybe somebody else can do a better job in a few years. Disk space is relatively cheap, so that isn't the barrier it used to be. For video, I rip MPEG2 at whatever it is on a DVD, when I'm actually digitizing I try to get the video bitrate up to 8-10 mbps when ripping a Betacam or Umatic. Some people think that is overkill, but I'd rather be safe than sorry.

Metadata? Well, you got to have it or you're not going to get very far when it comes to access. Many librarians have made perfect the enemy of the good when it comes to metadata and have resisted any attempt at digitization because we don't have the very best metadata we might have. I'm more in the camp of scan what you have and get as much of the metadata as you can into it. For example, we have 3,200 1000-page volumes of briefs from the 9th Circuit of the U.S. Court of Appeals. We didn't have good metadata, but we had the Internet Archive scan them anyway. Then, after we got our PDF files, I shipped those off to a double-key team in India and they broke the briefs up into individual documents and typed the metadata into a spreadsheet for me, which we hope to release soon.

My point is that sometimes you can shoehorn the metadata in after the fact or you can use a variety of techniques to pull the metadata out of the documents (e.g., smart OCR). In theory, you can use crowdsourcing to get the metadata, but so far I've not had a lot of luck persuading thousands of people to spend their time doing that kind of work. A captcha is a quick thing to do and is between you and something you want, whereas entering metadata in for videos or documents is one of those civic duty things that everybody thinks everybody else should be doing.

Total size? Brewster says a book is about 400 Mbytes (though he's very quick to point out that you could put the words in all the books in the library into a terabyte and if you're distributing PDFs, you can easily throw 130,000 full-color, searchable PDFs onto a 4 TB drive). But, you were probably asking about raw data. Here's some raw numbers:

32 million books at 400 Mbytes each is 12.8 petabytes 50 million photos at 150 Mbytes each is 7.5 petabytes 10 billion pieces of paper ("records") at 100 Kbytes each is 1 petabyte 20 years of video at 8 mbps is only 630 Tbytes.

(Somebody check my math?)

If you're talking a decade-long federal digitization initiative, we're looking at well south of 50 petabytes, which seems pretty doable in this day and age!

Can the rare books collections be digitized?
by autophile

Three closely related questions about the rare books collections at the Library of Congress:

1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).

2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?

3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.

CM: In reverse order. I don't know why we aren't distributing and decentralizing our scanning efforts. The Internet Archive is a heavy-duty production shop and they do an amazing job, as do folks like Google Books and the folks digitizing things the Mormon Church. But, there are a bunch of DIY solutions and it would be really nice if we could get more people pitching in. The biggest problem on distributing the digitization efforts is quality control. I know when it comes to ripping video, I can easily teach other people how to grab an MPEG2 off a DVD, but when it comes to things like digitizing a Betacam, that takes some training. But, we're all trainable and I wish we could all do more.

Getting back paper copies of books and papers when they're doing a copy anyway is just plain dumb. Likewise with things like FOIA results. John Podesta testified before the Senate about FOIA and said if an agency answers a FOIA request, they should also post their result online so others can see it. That seems pretty obvious.

As far as digitizing rare book collections, there are some amazing pockets throughout the government but there is no real coordination and there certainly is no effort to scan at scale or to come up with a realistic national digitization strategy. That is why we called on the White House to lead the effort. Within the Library of Congress there are some amazing collections, but if you look around to places like the National Agricultural Library or the National Library of Medicine or the libraries in the service academies you'll find lots more. Some have argued that digitizing rare books is silly because the audience is just a few academics, but I can tell you from my own experience helping host the network site for the Archimedes Palimpsest that when you make this kind of information available, there is an amazing long tail.

If you scan it, they will come. And, to answer your question, if we all scan it, they will come much sooner.

Real time legislation drafting
by kerskine

Would it be possible to implement a system that would allow real-time and continuous review of legislation while it's being drafted? Much has been made over the past three years about legislation being available for review before voting by the House or Senate. The final draft for review usually is huge PDF that makes it near impossible for citizens, interest groups, and the media to thoroughly analysis in time.

CM: You want to see the sausage being made not just buy the hot dog! I'll comment on the U.S. Congress since that's the system I know best. Thomas is a pretty good system if you happen to be stuck in 1994. It does have all the amendments and the actions and the various stages that legislation go through. But, it isn't real time, more like "pretty quick." As Van Jacobson once quipped, "Same day service in a nanosecond world." And, Thomas isn't really machine processable, it is final form, usually formatted ASCII text (shades of NROFF!). People like Josh Tauberer who built GovTrack.US have spent considerable time crawling those systems and trying to get the data into regularized formats and make it available to others to reuse via APIs, but that isn't the same as exposing the inner working of the sausage factory.

Majority Leader Cantor's staff has been pushing a system to make the raw data all available in XML from the Clerk's office and I think that is a very promising initiative which hopefully will bear fruit. (They're having a February 2 conference to discuss their plans if you are interested. I have no idea if it will be streamed for those of who aren't Inside the Beltway and I don't know their schedule for moving past conferences and into production.)

Congress is a pretty complicated beast. I know some folks like Sean McGrath have had better luck with some of the state legislatures. The problem is you need to dig deep into the inner working of a legislature. In the Congress, that means you're changing things like authoring tools that are used in the Clerk's office and by all the staff members, so you have to be careful or you get a bunch of really angry Congressman yelling at you because their staff can't crank out the flavor-of-the-week in the form of a bill or amendment.

There's also a bit of an issue of will. My work with the Congress to put hearings on-line showed that you could take the official transcripts of a hearing and use those to generate closed captions on the video. All you need is the official transcript of the hearing, but in order to get those I had to execute a special Memorandum of Understanding with the House Oversight Committee. Other committees guard their transcripts jealously and won't let them out for several when. When I started processing a bunch of historical videos we purchased from C-SPAN, I went to the Government Printing Office and found that many committees never deliver their transcripts, even a decade after the fact!

How to keep track of legislative activity about open access?
by oneiros27

Recently in the federal register, there were two calls for comments about access to data and research from federally funded research:

http://federalregister.gov/a/2011-28623 [federalregister.gov] http://federalregister.gov/a/2011-28621 [federalregister.gov]

I didn't hear about these until ~4 weeks after the original announcement, and with the holidays, it was too late to try to get the societies I'm involved with to prepare and vote on official statements. Are there any places where people can get/post notices of these sorts of things so that we can stay informed and try to help influence policies?

CM: The Federal Register is getting a lot better now that it is a much more open system. The idea of "Federal Register 2.0" was a paper I wrote for the Obama transition, so it is an issue I've tracked pretty closely and frankly, I've been amazed at how much better it is now. What they did is instead of selling the raw data feed for the Federal Register for $17,000/year, they went from SGML to XML and then released the data in bulk for free. A few guys out in San Francisco were looking for something to do to enter a contest and they took that bulk data and dreamed up GovPulse.US. That was such a better version of the Federal Register that the Office of the Federal Register switched the official site over to their open source platform. My point is the tools are there to do better notification mechanisms, and I'm sure the government would welcome somebody grabbing the GovPulse.US code out of Github and making it even better.

That's the technical answer. But, the substantive answer is that there is a huge boatload of stuff in the Federal Register and it is pretty hard to figure out what to pay attention to. I also missed that particular call for comment, and I've even missed several Requests for Information coming out of places I try and pay attention to, like the White House's Office of Science and Technology Policy. And, I do this stuff full-time! Perhaps better targeted notification mechanisms are the answer. Maybe it is a social media solution, where you pay attention to things your friends are paying attention to. I hope the answer is not that the only way to pay attention is to be employed with a beltway bandit which can afford hundreds of minions that do nothing but pay attention to Washington. Indeed, there are some very fancy for-pay services from folks like Congressional Quarterly and Bloomberg that cost an arm and a leg, but I can't help but think there has to be a better way that is also open.

What do you think of corporate partnerships?
by mhh5

I'd like to know what you think about corporate partnerships in the process to get public data released. (I'm not sure if Google Patents existed before the USPTO released its databases.) Do corporations that get involved in the process tend to make the process better without question, or are there tradeoffs in some areas because the corporations always want to help but then try to retain a proprietary version of the data for themselves?

CM: The theory is that the government gets some kind of valuable service (like digitization) that the government wouldn't get otherwise so it is a "win-win." But, the reality is all too often the government gets snookered and what we do is give some corporation exclusive access to some pot of data and the government doesn't get much of anything. The deal between Amazon and the National Archives was a good example of that kind of a private fence around the public domain. With a help from Boing Boing, I started systematically purchasing those public domain videos and re-releasing them in the wild. I have no problem with Amazon selling public domain video, I just hate it when they get a de facto or a contractual exclusive. (My testimony before Congress on this subject is here.)

There are lots of other examples of government getting snookered. For example, the Government Accountability Office let Thomson West get access to 60 million or so pages of federal legislative histories. At great cost to the government, they were all packed up and dispatched to West which digitized them all and then sent them back to the government. West now sells access to his amazing database. What did the government get for it's trouble? A few logins for GAO staffers. Even members of Congress need to pay to access the database! (We have an interesting paper trail on this issue.)

I'm glad you brought up the Google Patent system because I was personally involved in making that happen and I can tell you that this one is totally legit. Jon Orwant is the lead developer on this for Google and I played a small part in helping convince the White House and the Patent Office they ought to give Jon access to their data (the heavy lifting on that deal was by Beth Noveck who was the Deputy CTO at the time). Google makes all the data they got from the Patent Office available for bulk access with no strings attached. I can vouch for that because I did a mirror of their system. Last I heard Google was sending out anywhere from 1 to 10 terabytes of data PER DAY to external sources and even normally very critical folks who work in this arena have been really happy.

The big problem in the Patent Office is their computing infrastructure is a real catastrophe. Their power plant is over 95% capacity (e.g., plug in a computer, bring the building down!) and even though the Under Secretary knew that selling DVD subscriptions was silly, he wasn't able to switch over to an FTP service. He cut the deal with Google Patent and it worked out well for the government, for Google, and for everybody else.

What's the difference between the Google deal and the Amazon deal? In the case of the Amazon and GAO/West deals, the government lawyers did all the negotiating and they were totally outsmarted by some sharks in industry. But, when government has people like Under Secretary Kappos and Beth Noveck doing the negotiating, these things can work out just fine. The key is government should partner with people who want to do public service, not people who want to service the public.

Encouraging Governments?
by theNAM666

In a city such as Nashville, things as basic as business ownership and property records are not available online. In states such as New Jersey, public records such as basic corporate filings (officers, operating address/address for service of process) are accessible only for a fee.

What concrete actions can citizens confronting such situations, take to encourage accessibility and accountability?

CM: I find you need a carrot and a stick to make this stuff happen, especially at the local level. Folks like Everyblock.Com and CodeForAmerica.Org have done great working prying some of these databases loose, but there is still lots to do.

The first thing you should do is pick up the phone (or pick up your email client) and write/call the people who run the system. Ask them if you can have access to the data. Sometimes, it is as simple as that.

Other times, though, it isn't quite as simple since they want the money (or they want the control or they think this should be done by "private industry" by which they mean some buddy who is a contractor). The nice thing about any government system is somebody usually has oversight responsibilities. So, the next step is to find a city council member of state legislator who has oversight on the agency in question and ask them.

Again, life isn't usually that simple, but sometimes you win! If you can't get anywhere that way, what I usually end up doing is basically competing with the government system. Build a proxy system like RECAPtheLaw.Org did to recycle paid documents. Or, get a sponsor and buy a reasonable number of docs and build a web site that looks like it is going to be a real production system.

Then, go back again and ask. Maybe if you have eyeballs or at least have a nice web site, that is enough to get the government moving. But, if that doesn't do the job, you may have no choice but to compete with them for real, which of course requires a big commitment in time and energy and not everybody can do that. I know in the case of the Patent Office, I started pestering them in 1993, including several times when I spent 6-figure sums purchasing their data, and it still took until 2011 to crack that nut.

The real trick is focus/obsession. Pick one thing you really care about and just keep pestering them until you crack it open. If you're surfing from one opengov problem to another, showing up for a 1-day hackathon then moving on to something else, you're not going to get anywhere. Pick something real and make it your thing.

Privately Owned, Copyrighted Law
by AdamnSelene

I think I have read that the law itself cannot be copyrighted and it should be possible to make it available available to everyone. But as a techie who drafts standards and specifications, I was wondering about how far this goes--especially since Congress recently proposed enacting some of our standards into law. (They decided not to, but they read some parts into the committee records as they debated.) Can you still accomplish your project if a governmental body adopts (or considers adopting) a privately owned, copyrighted technical reference manual or set of safety standards as administrative law (or regulations that carry the force of law)? Or would such obstacles keep you from being able to digitize all of the government's laws (and archives of proposed laws)?

CM: The idea that the law has no copyright is a fundamental part of the American system of government. That applies to states and municipalities as well. The basic decision is Wheaton v. Peters from 1834 but that decision has been reaffirmed over and over. The law is sacred in the American system. You can't have equal protection under the law or due process under the law if there is a poll tax on access to justice.

When we get to a privately developed standards however, it turns into a very interesting issue. The basic mechanism is called Incorporation by Reference. The government will take some external document (such as a model building code) and incorporate the entire text to make it the law of the land. A guy named Peter Veeck was responsible for a landmark decision in 2002 when he published the Texas Building Code which was an incorporation of a privately-developed and very expensive model code. The court ruled that while the model code had copyright, the law of the land did not.

Based on the Veeck decision, my group went and posted many of the public safety codes enacted by the states. We started by purchasing model codes, finding the incorporating legislation, and concatenating the two pieces together and posting the resulting PDFs. More recently, we've done some extensive reworking of the California public safety codes, known as Title 24, converting the entire text into valid XHTML, recoding the graphics as SVG graphics, the formulas as MathML, and regenerating the PDF documents as nicely typeset documents instead of low-quality scans. You can see this work on the web but it is also available as Google Code project.

The federal government also uses this mechanism intensively, with over 2,000 standards incorporated into the Code of Federal Regulations. This is non-trivial stuff, things like all the OSHA safety regulations. The issue was recently considered by a federal group called the Administrative Conference of the U.S. which basically rolled over and endorsed the idea that it is ok for important parts of the law to cost money. (Read EFF's protest letter if you want a good critique of what they did.)

I'm not necessarily saying that government should be able to appropriate any privately-developed standard and make it available. And, I'm not necessarily saying you want OSHA bureaucrats drafting the standards. But, I do think the big standards establishment and the government regulators have cut a deal that results in the law not being available and the costs forked off on private citizens and small business with extortionate monopoly prices. I just paid $847 for a 48-page safety standard from Underwriters Labs and $60 for 2-page safety standard from the Society of Automotive Engineers, both of which are mandated by law in the CFR. They do need money to run their operations, but let me just point out that in 2009 the 501(c)(3) nonprofit Underwriters Labs paid their CEO $2,138,984 and the nonprofit SAE paid their CEO $412,578.

Ancestry.com
by An Anonymous Reader

What is your opinion about websites like Ancestry.com which make use of public records and charge a subscription fee for access? What is the incentive for the government to migrate old documents into digital form when services like these exist? Do you think Ancestry.com should be a 501(c)(3)?

CM: I'm not a big fan of for-profit corporations that have a business model of monetizing the public domain. I'm fine if they exist and fine if they make billions of dollars, but if they are the only game in town they've taken something that belongs to all of us and and turned it into their private property.

The government got snookered on the Ancestry.Com deal. They could have insisted that the raw data be available in bulk for anybody else to use. The folks that approach the government to cut these sweetheart deals argue that is unreasonable because they need a "return on investment" and the argue that if they don't get the return on investment they won't do the deal (and by extension nobody else will do the deal).

But, government can argue much harder! For example, instead of negotiating some exclusive thing with Ancestry.Com, how come they didn't ask the Internet Archive to grab the data? Or put together something creative with a couple of foundations that would pay for the digitization in return for the kind of payback the foundations like to see (e.g., good press, photo opportunity with the President, or other tools of the trade)?

You asked if Ancestry.Com should be a 501(c)(3)? Not all nonprofits do something that I think which should be an essential part of their mission, which is allow others to compete with them. I believe providing open access to all data ought to be a precondition to getting nonprofit status (an idea that Gil Elbaz has been pushing for quite some time). A good example of a nonprofit that builds walls is Guidestar which wants to be the place where you go for all your nonprofit information. The IRS should be making all Form 990 returns of nonprofits available in bulk for anybody to use, which would knock the bottom out of Guidestar's attempts to build walls and force them to stay innovative and provide value.

Pacer Problems
by onyxruby

How much difficulty do you anticipate in getting and publishing records in Pacer? If there's one system that should be free it the decisions that our courts make and yet you are charged by the page just to view the results. Are you concerned about a court taking an unkind view on your archiving what is in Pacer?

CM: PACER is an abomination. Do they take a dim view of our efforts? Well, the Administrative Office of the U.S. Courts reacted so strongly to our efforts to make their data available that they called the FBI on Aaron Swartz and cancelled the only meaningful public access system they had, which consisted of one terminal in each of 17 public libraries around the country. In this era of rapidly decreasing costs, they just boosted their access charges from 8 cents a page to 10 cents a page, arguing that this is a bargain compared to 25 cents a page for a copy machine.

What I find so disturbing about PACER is that when we did get 20 million pages of docs, we were able to conduct a comprehensive analysis of privacy violations in the courts, an analysis that led to a nice thank-you letter from the Judicial Conference and changes in their privacy rules. In other words, only when public interest groups got access to the data did we begin to address privacy issues. Public access is not just about pro se prisoners defending themselves from a jail cell, which is the view of many in the Administrative Office of the Courts. Public access is about attempts like ours (and many other folks) to make our system of justice function better. When we say we are "an empire of laws not a nation of men" that means we write down what we are doing in our courts so that it is no longer the arbitrary decisions of individuals. The paper trail is there so we can make sure the system is functioning properly. When you limit that access to those that only have a Gold Card, you pervert democracy and you pervert justice.

This principle that access to justice shouldn't hide behind a cash register goes back to the Greeks. Theseus in Euripedes' Suppliants said "when there are no public laws, one man holds power by keeping the law all for himself, and there is no more equality. But when the laws are written, the weak man and the rich man have equal justice." The PACER system is justice for the rich man.

Steve Schultze and the team at Princeton did a lot of the heavy lifting on this issue, including the very nice RECAPtheLaw.Org system they built. They've also done a lot of financial analysis that shows that the courts are not only recovering their costs for operating the expensive PACER system, they're making a huge profit (to the tune of $100 million/year) and using their excess profits to do things like buy big-screen TVs in direct violation of the E-Government act.

The basic problem on PACER is the Judicial Conference has delegated the issue to a few techie judges who think what they've built is something great. But, PACER is a hairball of bad PERL code and the result has not served the judges, the bar, or the American people very well. My only hope is that eventually, the Judicial Conference will see that their information technology is 30 years behind the rest of the Internet and feel ashamed at the travesty they have wrought. Until then, we have RECAP.

If you're interested in the issue, a couple of resources to look at are the PACER paper trail and a bit of a rant that I delivered at the Gov 2.0 summit.

How to visualize opened data?
by hardwarejunkie9

The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.

After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.

Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.

CM: Actually, most of the data I'm looking at is not tables of numbers, it is video, images, textual documents, technical papers, maps, and books.

But, I definitely get what you're saying and there are a lot of numbers. For example, the IRS Form 990s should be structured data instead of PDF documents, so extracting the data from the mass of paper is the initial challenge. There are lots of other examples of this kind of initial extraction, getting what were printed paper docs into structured data. There are some interesting tools, such as OCRopus which does layout analysis, but there needs to be much more. One of the reason we called for a Federal Scanning Commission is that we think there is a lot of directed R&D that could not only scale up mass digitization but could also work on the important value-added of extraction of structured data and handling some of the tricky issues like detecting the presence of Social Security Numbers.

Once you have the data, as you say, then what? I'm a big fan of the idea that the government starts by providing bulk data, then they provide an API, and then maybe they also build web sites and apps and other things along with everybody else out there. That's a 3-part hierarchy that Ed Felten and some of his students developed and it should be a law that applies to all government information systems that are externally facing.

The issue here is that all too often people look at a problem like "digitize all government information" and they want to see the whole stack of the solution from one place. But, I think you can do a layered approach and count on the fact that there is always somebody smarter out there and our job is to reduce the barriers to entry. So, how would I visualize the data? I have no idea, but I'd make damned sure that folks like Martin Wattenberg at Many Eyes and Hans Rosling at Gapminder knew the data was out there and then I'd sit back and be amazed at whatever they come up with. How's that for pushing the problem downstream?

Why is data access so hard?
by CanHasDIY

Can you provide any explanation as to why it is so difficult and cost-prohibitive to obtain records from the government, especially considering the abundance of laws requiring government compliance with requests for information (AKA "Sunshine Laws")?

Is it simply a matter of government employee ineptitude, or have you found evidence of a more nefarious rationale?

CM: I get that question a lot. Why would a member of Congress take deliberate steps to stop public hearings from being available? Why would a court administrator deliberately restrict access to public court documents? Usually the answer is, as Heinlein said, "you have attributed conditions to villainy that simply result from stupidity." When I'm explaining why something is so broken on a big government system, my usual answer is that there are a lot of people still stuck in the 1970s and 1980s, when information dissemination was really, really hard and it took men in white lab coats and computers the size of freight trains to process data. In other words, the problem with a lot of folks who are government gatekeepers is they just don't get the Internet and they don't get computers. In fact, usually when some senior bureaucrat is throwing stones at me, you can find younger staffers working for them rolling their eyes.

That's an optimistic view, and if I'm right things will get better. But, I'm often wrong on my predictions of the future. (I was the guy who saw TimBL demo the web in 1992 and thought to myself "interesting, but it won't scale.")

But, there is also some more nefarious stuff happening, often the accumulation of power by being able to cut exclusive deals with contractor buddies. If your life in government consists of receiving emissaries from Lockheed Martin, maybe you think you're making everybody happy by letting them build you a $1 billion computer system. Often, you think your problems are so unique that the $1 billion solution is the only answer.

And, in some cases, as we've seen from numerous GAO reports, Inspector General reports, Congressional hearings, and newspaper articles, there are some really evil people out there who think the public domain and the government is their personal business opportunity. Looting the federal government is the kind of civic crime that ranks right up there in my book with stealing cookies from Girl Scouts and selling fake medicines to sick people.

Who is the worst?
by TheBrez

Which government agency is the worst to get information from?

CM: I don't know who the worst are (there's a lot of competition for that slot), but the ones that piss me off the most are the ones that should know better.

Public.Resource.Org is a really small operation. I'm the only staff member. My part-time sysadmin is @mdkail who is pretty busy with his day job as CIO at NetFlix. My ISP is Jim Martin and his team at ISC who are kind of busy running the F-Root. My office net is supported by the amazing systems team at O'Reilly which rents me office space at below-market rates.

I'll grant you government would have a tough time getting that kind of help. But, I'm a one-man shop and we run the 4th most popular U.S. government video channel on YouTube, we're the source for a lot of the on-line presence of the U.S. Court of Appeals, and we've supported efforts for the U.S. Congress, the White House, and the National Archives. If we can do this out of Northern California, couldn't the vast resources of the federal government in Washington, D.C. do a whole lot better than they're doing now?

For me, my current bete noir is the U.S. Congress. We got half-way through processing their archives of video from congressional hearings, publishing about 31 terabytes of data. Then, a couple of staffers decided this was a bad idea and pulled the rug out from under us. They actually decided it was a bad idea to publish video from public congressional hearings.

Like any agency, Congress is a mixed bag. We had tons of support from Darrell Issa, for example, and ran a very successful pilot project for him for a year. We talked to all sorts of people on committees and in the various agencies that support the Congress. But, at the end of the day, a couple of staff members were able to decide that the public archive shouldn't be public and they terminated our project. (If you have some time, you might like to read our rather surreal paper trail.)

So, rather than the worst, I think we need to look for the most shameful, the ones that have the privilege and the power and could easily do better. I know it is in vogue to throw stones at government in general and Washington in particular, but there are times when government can be so useful and so awe inspiring it takes your breath away. Government can be that shining city on the hill but we all have to take an active part in our government to keep those lights shining bright.

Researchers Debut Proxy-Less Anonymity Service

Yro · Privacy · 2011-07-18 03:32 · posted by CmdrTaco · from the you-can't-see-me dept. · 116 comments

Trailrunner7 writes "As state-level censorship continues to grow in various countries around the globe in response to political dissent and social change, researchers have begun looking for news ways to help Web users get around these restrictions. Now, a group of university researchers has developed an experimental system called Telex that replaces the typical proxy architecture with a scheme that hides the fact that the users are even trying to communicate at all."

DC Suspends Tests of Online Voting System

Yro · Government · 2010-10-05 11:18 · posted by timothy · from the vote-erlich-and-often dept. · 170 comments

Fortran IV writes "Back in June, Washington, DC signed up with the The Open Source Digital Foundation to set up an internet voting system for DC residents overseas. The plan was to have the system operational by the November general election. Last week the DC Board of Elections and Ethics opened the system for testing and attracted the attention of students at the University of Michigan, with comical results. The DC Board has postponed implementation of the system for 'more robust testing.'" Update: 10/06 02:42 GMT by T : University of Michigan computer scientist J. Alex Halderman provides an explanation of exactly how the folks at Michigan exploited the DC system.

Lineage II Addiction Lawsuit Makes It Past the EULA

Games · Court · 2010-09-01 19:29 · posted by Soulskill · from the four-little-letters dept. · 267 comments

We recently discussed a man who sued NCsoft for making Lineage II "too addictive" after he spent 20,000 hours over five years playing it. Now, several readers have pointed out that the lawsuit has progressed past its first major hurdle: the EULA. Quoting: "NC Interactive has responded the way most software companies and online services have for more than a decade: it argued that the claims are barred by its end-user license agreement, which in this case capped the company's liability to the amount Smallwood paid in fees over six months prior to his filing his complaint (or thereabouts). One portion of the EULA specifically stated that lawsuits could only be brought in Texas state court in Travis County, where NC Interactive is located. ... But the judge in this case, US District Judge Alan C. Kay, noted that both Texas and Hawaii law bar contract provisions that waive in advance the ability to make gross-negligence claims. He also declined to dismiss Smallwood's claims for negligence, defamation, and negligent infliction of emotional distress."

An iPhone App Store That Apple Doesn't Control

Apple · Iphone · 2010-07-30 06:38 · posted by Soulskill · from the mr-jobs-tear-down-that-wall dept. · 144 comments

waderoush writes "Princeton's Ed Felten has criticized the iPhone and iPad as Disneyland-like 'walled gardens' and says there's no way the iTunes App Store can 'offer the scope and variety of apps that a less controlled environment can provide.' Now there's a central marketplace where developers can sell iPhone-optimized apps without going through Apple's gatekeepers. Launched today, it's called OpenAppMkt and it's a showcase for mobile Web apps — not just the type seen back in 2007-2008, before the advent of the App Store, but also for new games and other apps developed using HTML5/CSS/JavaScript (in some cases, the same apps compiled and sold as native iPhone apps). Xconomy has a behind-the-scenes interview with OpenAppMkt's creators, who say they're not out to compete with the native App Store, but that developers deserve new ways to reach users."

Mozilla Debates Whether To Trust Chinese CA

Yro · Mozilla · 2010-02-17 10:02 · posted by timothy · from the but-that-would-never-happen dept. · 276 comments

At his Freedom to Tinker blog, Ed Felten has a thoughtful, accessible piece on the debate at Mozilla about whether Firefox, by default, should trust a Chinese certificate authority (as it has since October). Felten explains in clear language why this is significant, and therefore controversial. An excerpt: "To see why this is worrisome, let's suppose, just for the sake of argument, that CNNIC were a puppet of the Chinese government. Then CNNIC's status as a trusted CA would give it the technical power to let the Chinese government spy on its citizens' 'secure' web connections. If a Chinese citizen tried to make a secure connection to Gmail, their connection could be directed to an impostor Gmail site run by the Chinese government, and CNNIC could give the impostor a cert saying that the government impostor was the real Gmail site."

DMCA Takedown Scandal, Part Two

Yro · Privacy · 2009-12-20 08:11 · posted by kdawson · from the trying-harder dept. · 153 comments

pmdubs writes "Following up on our earlier discussion, Michael Freedman updates us on experience with dubious DMCA takedown notices. As a result of the publicity his initial post received, the Video Protection Alliance has dropped Nexicon, the company to which they had outsourced infringement detection. In this case, while there may be little legal recourse to issuing invalid DMCA notices, the threat of bad press seems to have reined in highly questionable practices."

Sequoia Disclosing Voting System Source To DC

Yro · Government · 2009-06-07 00:11 · posted by timothy · from the watch-whether-my-pity-meter-twitches dept. · 100 comments

buzzinglikeafridge writes "After Sequoia voting machines registered more votes than there were voters in DC's primaries last September, and the city threatened a lawsuit as a result, the company agreed to disclose technical details of the system (including source code) to the city. Although this isn't the first time the company has disclosed the source code of its machines, it is the first time the machines' blueprints will be handed over as well."

You Are Not a Lawyer

Yro · Court · 2009-02-10 06:42 · posted by kdawson · from the help-in-thinking-like-one dept. · 693 comments

Paul Ohm is starting a new "very occasional" feature on the Freedom To Tinker blog called You Are Not a Lawyer — "In this series, I will try to disabuse computer scientists and other technically minded people of some commonly held misconceptions about the law (and the legal system)." In the first installment, Ohm walks through the reasons why many techies' faith in the presence of "reasonable doubt" is so misplaced. "When techies think about criminal law, and in particular crimes committed online, they tend to fixate on [the 'beyond a reasonable doubt'] legal standard, dreaming up ways people can use technology to inject doubt into the evidence to avoid being convicted. I can't count how many conversations I have had with techies about things like the 'open wireless access point defense,' the 'trojaned computer defense,' the 'NAT-ted firewall defense,' and the 'dynamic IP address defense.' ... People who place stock in these theories and tools are neglecting an important drawback. There are another set of legal standards — the legal standards governing search and seizure — you should worry about long before you ever get to 'beyond a reasonable doubt.'"

Damning Report On Sequoia E-Voting Machine Security

Politics · Security · 2008-10-21 10:15 · posted by kdawson · from the worse-than-you-thought dept. · 200 comments

TechDirt notes the publication of the New Jersey voting machine study, the attempted suppression of which we have been discussing for a while now. The paper that the Princeton and Lehigh University researchers are releasing, as permitted by the Court, is "the same as the Court's redacted version, but with a few introductory paragraphs about the court case, Gusciora v. Corzine." What's new is the release of a 90-minute evidentiary video — the researchers have asked the court for permission to release a shorter version that hits the high points, as the high-res video is about 1 GB in size. See TechDirt's article for the report's executive summary listing eight ways the AVC Advantage 9.00 voting machine can be subverted.

Judge Suppresses Report On Voting Systems

Yro · Court · 2008-10-03 00:20 · posted by kdawson · from the tell-me-but-don't-tell-them dept. · 192 comments

Irvu writes "A New Jersey Superior Court Judge has prohibited the release of an analysis conducted on the Sequoia AVC Advantage voting system. This report arose out of a lawsuit challenging on constitutional grounds the use of these systems. The study was conducted by Andrew Appel on behalf of the plaintiffs, after the judge in the case ordered the company to permit it. That same judge has now withheld it indefinitely from the public record on a verbal order."

CSRF Flaws Found On Major Websites, Including a Bank

It · Security · 2008-09-29 13:58 · posted by kdawson · from the wherever-you-look dept. · 143 comments

An anonymous reader sends a link to DarkReading on the recent announcement by Princeton researchers of four major Web sites on which they found exploitable cross-site request forgery vulnerabilities. The sites are the NYTimes, YouTube, Metafilter, and INGDirect. All but the NYTimes site have patched the hole. "... four major Websites susceptible to the silent-but-deadly cross-site request forgery attack — including one on INGDirect.com's site that would let an attacker transfer money out of a victim's bank account ... Bill Zeller, a PhD candidate at Princeton, says the CSRF bug that he and fellow researcher Edward Felton found on INGDirect.com represents ... 'the first example of a CSRF attack that allows money to be transferred out of a bank account that [we're] aware of.' ... CSRF is little understood in the Web development community, and it is therefore a very common vulnerability on Websites. 'It's basically wherever you look,' says [a security researcher]." Here are Zeller's Freedom to Tinker post and the research paper (PDF).

CSRF Flaws Found On Major Websites, Including a Bank

It · Security · 2008-09-29 13:58 · posted by kdawson · from the wherever-you-look dept. · 143 comments

An anonymous reader sends a link to DarkReading on the recent announcement by Princeton researchers of four major Web sites on which they found exploitable cross-site request forgery vulnerabilities. The sites are the NYTimes, YouTube, Metafilter, and INGDirect. All but the NYTimes site have patched the hole. "... four major Websites susceptible to the silent-but-deadly cross-site request forgery attack — including one on INGDirect.com's site that would let an attacker transfer money out of a victim's bank account ... Bill Zeller, a PhD candidate at Princeton, says the CSRF bug that he and fellow researcher Edward Felton found on INGDirect.com represents ... 'the first example of a CSRF attack that allows money to be transferred out of a bank account that [we're] aware of.' ... CSRF is little understood in the Web development community, and it is therefore a very common vulnerability on Websites. 'It's basically wherever you look,' says [a security researcher]." Here are Zeller's Freedom to Tinker post and the research paper (PDF).