Domain: google.com
Stories and comments across the archive that link to google.com.
Stories · 3,747
-
Anonymous Hacks Tunisian Islamist Sites
eldavojohn writes "The hacktivist group Anonymous has claimed another victim by taking down Islamist sites in Tunisia. Similar to an earlier attack on Turkish government sites, #optunisia has resulted in several government blogs and sites being replaced with 'Payback is a b****, isn't it?' The message lists censorship as the motivation behind this activity. The AFP is reporting that this is also in response to the reintroduction of Salafist laws and the caliphate. An additional Anonymous message read, 'We are not against religion, we are Muslims, but we are defending freedom in our country.' Censorship continues wholesale in Tunisia." -
Google Unifies Media, Apps Into Google Play
eldavojohn writes "Google has just announced Google Play to merge their existing solutions for music, movies, books and apps in the new cloud based storage system promising that you will never have to worry about losing or moving them across devices ever again. You'll be able to store 20,000 songs for free. The region breakdown is: 'In the U.S., music, movies, books and Android apps are available in Google Play. In Canada and the U.K., we'll offer movies, books and Android apps; in Australia, books and apps; and in Japan, movies and apps. Everywhere else, Google Play will be the new home for Android apps.'" -
X Server Now Available For Android
New submitter mkwan writes "The open-source X Server for Android has hit beta and is now available for download through the Android Market. On Australian networks at least, smartphones are assigned publicly-accessible IP addresses, so it should be possible to display many Linux applications on an Android smartphone simply by setting the DISPLAY environment variable to the phone's IP address followed by :0" The source is available under the MIT license (or Apache; the project page and story disagree) over at Google Code. It doesn't support all of the X protocol and there's no Xlib implementation (i.e. no X11 apps on the device yet except via the NDK if you're lucky), but it is a reimplementation of the X server in Java for Android. You can run remote applications at least. -
Google: Best Adaptation of a Novel To a Patent?
theodp writes "The USPTO's Thursday publication of Google's patent application for Inferring User Interests was nicely-timed, coinciding with what ZDNet called Google's privacy policy doomsday. The inventors include Google Sr. Staff Research Scientist Shumeet Baluja, the author of The Silicon Jungle, a cautionary tale of data mining's promise and peril, which Google's Vint Cerf found 'credible and scary.' No doubt some will feel the same about Beluja's patent filing, which lays out plans for mining 'user generated content, such as user interests, user blogs, postings by the user on her or other users' profiles (e.g., comments in a commentary section of a web page), a user's selection of hosted audio, images, and other files, and demographic information about the user, such as age, gender, address, etc.'" -
Torvalds Calls OpenSUSE Security 'Too Intrusive'
jfruh writes "The balance between security and ease of use is always a tricky one to strike, and Linux distros tend to err on the side of caution. But no less a luminary than Linus Torvalds thinks openSUSE has gone too far. When his kid needed to call from school for the root password just so he could add a printer to a laptop, that's when Linus decided things had gone off the rails." -
MINIX 3.2 Released With Some Major Changes
An anonymous reader writes "MINIX 3.2.0 was released today (alternative announcement). Lots of code has been pulled in from NetBSD, replacing libc, much of the userspace and the bootloader. This should allow much more software to be ported easily (using the pkgsrc infrastructure which was previously adopted) while retaining the microkernel architecture. Also Clang is now used as a default compiler and ELF as the default binary format, which should allow MINIX to be ported to other architectures in the near future (in fact, they are currently looking to hire someone with embedded systems experience to port MINIX to ARM). A live CD is available." The big highlight is the new NetBSD based userland — it replaces the incredibly old fashioned and limited Minix userland. There's even experimental SMP support. Topping it all off, the project switched over to git which would make getting involved in development a bit easier for the casual hacker. -
Candidates Sued By Patent Troll For Using Facebook
WrongSizeGlass writes "Ars is reporting that the 'inventor' of the concept of 'providing individual online presences for each of a plurality of members of a group of members,' claims that four million Facebook business account holders, including at least three major presidential candidates, are guilty of infringing his patent. He's suing Facebook for infringing on his patent as well as the three candidates. A Patent Office examiner rejected the patent claims, but the rejections have been appealed." -
World's First Quadruple Limb Transplant Fails
New submitter smoothjazz writes "The world's first quadruple limb transplant failed, according to Hacettepe University. Doctors had to remove the arms and legs that had been transplanted last Friday onto Sevket Çavdar, 27, because of tissue incompatibility. From the article: 'Doctors had first removed one leg from the patient after his heart and vascular system failed to sustain the limb and then the other leg and two arms. "The science council (of the hospital) decided to remove the organs one by one due to additional metabolic complications in the following process," the hospital said in a statement. "Our patient is now in the intensive care unit. The critical process is still continuing," it added.'" -
YouTube Identifies Birdsong As Copyrighted Music
New submitter eeplox writes "I make nature videos for my YouTube channel, generally in remote wilderness away from any possible source of music. And I purposely avoid using a soundtrack in my videos because of all the horror stories I hear about Rumblefish filing claims against public domain music. But when uploading my latest video, YouTube informed me that I was using Rumblefish's copyrighted content, and so ads would be placed on my video, with the proceeds going to said company. This baffled me. I disputed their claim with YouTube's system — and Rumblefish refuted my dispute, and asserted that: 'All content owners have reviewed your video and confirmed their claims to some or all of its content: Entity: rumblefish; Content Type: Musical Composition.' So I asked some questions, and it appears that the birds singing in the background of my video are Rumblefish's exclusive intellectual property." -
Apple Has Too Much Money
Hugh Pickens writes "AP reports that last week during a question-and-answer session at the company's annual shareholders' meeting CEO Tim Cook said he believes Apple has more money than it needs and his next challenge is to figure out whether Apple should break from the cash-hoarding ways of his predecessor, the late Steve Jobs, and dip into its $98 billion bank account to pay shareholders a dividend this year. 'Frankly speaking, it's more than we need to run the company.' The question of how to handle Apple's cash stockpile is a touchy one, partly because company co-founder Jobs had steadfastly brushed aside suggestions that the company restore its quarterly dividend which Jobs suspended in 1995 when it was in such deep trouble that it needed to hold on to every cent to keep from going bankrupt. Marketwatch analyst Mark Hulbert writes that a compelling case can be made that a huge cash hoard actually represents grave danger for Apple. That's because too much cash often burns a hole in managers' pockets, and they end up doing a poor job of investing that cash—engaging instead in foolish pursuits like empire building. Hulbert adds that a good strategy for ensuring that Apple remains a hungry, growth-oriented entrepreneurial company might be for it to distribute much of its cash to shareholders." -
Women More Likely To Unfriend Than Men
Hugh Pickens writes "AFP reports that a study by the Pew Research Center's Internet and American Life Project shows that women are more likely than men to delete friends from their online social networks like Facebook and tend to choose more restrictive privacy settings. Sixty-seven percent of women who maintain a social networking profile said they have deleted friends compared with 58 percent of men. The study also found that men are nearly twice as likely as women to have posted updates, comments, photos or videos that they later regret (PDF). 'Even as social media users become more active curators of their profile, a small group of what might be described as trigger-happy users say they post updates, comments, photos, or videos that they later regret sharing.'" -
'Culturomics' Spreads From Google Books To Scientific Preprints
ananyo writes "Cultural Observatory at Harvard University in Cambridge, Massachusetts is to index the whole of the ArXiv pre-print database of papers from the physical sciences, breaking down the full text of the articles into component phrases to see how often a particular word or phrase appears relative to others — a measure of how 'meme-like' a term is. The team has already applied a similar approach to 5 million books in the Google Books database to produce their n-gram viewer. But the Google Books database carries with it a major limitation: because many of the works are under copyright, users cannot be pointed to the actual source material. Applying the tool to ArXiv means it could be used to chart trends in high-energy physics, for example: a quickening pulse of papers citing the Higgs boson, for example, or a peak in papers about supersymmetry, a theory which may soon be waning." -
Is It Time For NoSQL 2.0?
New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT. HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?" -
Adobe Makes Flash on GNU/Linux Chrome-Only
ekimd writes "Adobe has anounced their plans to abandon future updates of their Flash player for Linux. Partnering with Google, after the release of 11.2, 'the Flash Player browser plugin for Linux will only be available via the 'Pepper' API as part of the Google Chrome browser distribution and will no longer be available as a direct download from Adobe.' Viva la HTML 5!" And it appears that Mozilla won't be implementing Pepper anytime soon. -
Have Bad Cars Gone Extinct?
Hugh Pickens writes "AP reports that global competition is squeezing lemons out of the market and forcing automakers to improve the quality and reliability of their vehicles. With few exceptions, cars are so close on reliability that it's getting harder for companies to charge a premium. 'We don't have total clunkers like we used to,' says Dave Sargent, automotive vice president with J.D. Power. In 1998, J.D. Power and Associates found an industry average of 278 problems per 100 vehicles, but this year, the number fell to 132. In 1998, the most reliable car had 92 problems per 100 vehicles, while the least reliable had 517, a gap of 425. This year the gap closed to 284 problems. It wasn't always like this. In the 1990s, Honda and Toyota dominated in quality, especially in the key American market for small and midsize cars. Around 2006, General Motors, Ford, and Chrysler were heading into financial trouble and shifted research dollars from trucks to cars after years of neglect and spent more on engineering and parts to close the gap. Meanwhile Toyota's reputation was tarnished by a series of safety recalls, and Honda played conservative with new models that looked similar to the old ones. Now it's 'very hard to find products that aren't good anymore,' says Jeremy Anwyl, CEO of the Edmunds.com automotive website. 'In safety, performance and quality, the differences just don't have material impact.'" -
Microsoft Accuses Google of Violating Internet Explorer's Privacy Settings
New submitter Dupple writes with a followup to Friday's news that Google was bypassing Safari's privacy settings. Now, Microsoft's Internet Explorer blog has a post accusing Google of doing the same thing (in a different way) to Internet Explorer. Quoting: "By default, IE blocks third-party cookies unless the site presents a P3P Compact Policy Statement indicating how the site will use the cookie and that the site’s use does not include tracking the user. Google’s P3P policy causes Internet Explorer to accept Google’s cookies even though the policy does not state Google’s intent. P3P, an official recommendation of the W3C Web standards body, is a Web technology that all browsers and sites can support. Sites use P3P to describe how they intend to use cookies and user information. By supporting P3P, browsers can block or allow cookies to honor user privacy preferences with respect to the site’s stated intentions. ... Technically, Google utilizes a nuance in the P3P specification that has the effect of bypassing user preferences about cookies. The P3P specification (in an attempt to leave room for future advances in privacy policies) states that browsers should ignore any undefined policies they encounter. Google sends a P3P policy that fails to inform the browser about Google’s use of cookies and user information. Google’s P3P policy is actually a statement that it is not a P3P policy." -
Google Working On Password Generator For Chrome
Trailrunner7 writes "Google is in the process of developing a tool to help users generate strong passwords for the various and sundry Web sites for which they need to register and authenticate. The password-generator is meant to serve as an interim solution for users while Google and other companies continue to work on widespread deployment of the OpenID standard. The tool Google engineers are working on is a fairly simple one. For people who are using the Chrome browser, whenever a site presents them with a field that requires creating a password, Chrome will display a small key icon, letting the users know that they could allow Chrome to generate a password for them." -
EU Court Rules Social Networks Cannot Be Forced To Police Downloads
arnodf writes "According to EU Observer, 'The European Court of Justice (ECJ) has struck the latest blow in the debate over internet policing, ruling on Thursday (16 February) that online social network sites cannot be forced to construct measures to prevent users from downloading songs illegally. The court, which is the highest judicial authority in the EU, stated that installing general filters would infringe on the freedom to conduct business and on data privacy. ... The case was brought before the ECJ by Sabam, the Belgian national music royalty collecting society, against social network site Netlog. In 2009, Sabam went to the Belgian Court of First Instance to demand that Netlog take action to prevent site-users from illegally downloading songs from its portfolio. It also insisted that Netlog pay a €1,000 fine for every day of delaying in compliance. Netlog legal submission argued that granting Sabam's injunction would be imposing a general obligation to monitor on Netlog, which is prohibited by the e-commerce directive.' In related news, Sabam is going to be prosecuted (Google translation of Dutch original) for 'forging accounts, abuse of trust, bribery, money laundering and forgery,' which took place from the early 90's till 2007" -
Yet Another European Government Drops ACTA
An anonymous reader writes "The government of Bulgaria, which had already signed ACTA, yesterday reversed itself, and announced that it would not seek ratification of the treaty. This comes after similar moves by Poland, Germany and the Netherlands, and a weekend of massive protests against ACTA across the European continent." -
"Liberated" Tunisia Still Censoring Websites
Frequent Slashdot contributor Bennett Haselton writes "Tunisia's high court will decide on Wednesday whether to allow censoring of websites containing pornography or 'calls to violence.' It's disappointing that censorship continues in post-revolutionary Tunisia, but it's enough of an improvement over the old regime, that anti-censorship cyber-activism efforts would probably best be spent on helping other countries." Read on for Bennett's analysis.In Tunisia, where dictator Zine El Abidine Ben Ali was ousted one year ago amid hopes for a new era of freedom, the high court will decide on Wednesday whether to censor foreign pornographic websites in accordance with local law. Facebook pages that "call for violence" may also be blocked. Conveniently, all the machinery for censoring the Internet in Tunisia is already in place, having been installed under Ben Ali's dictatorship for the purposes of censoring and spying on Tunisian citizens (and, for a while, phishing their Facebook passwords). The irony recalls the situation in Iraq in 2009, when the government announced plans to start censoring foreign websites -- to which Iraqi citizens complained that they thought censorship would end with the fall of Saddam's regime. Actually, apart from the three outlier countries of Turkey, Israel and Lebanon, pornography remains illegal in every Middle Eastern country (and some conservative African nations), including the recently "liberated" ones including Egypt, Iraq and Tunisia. (Although, Iraq's street market in pornography thrives as long as the police have better things to do.)
I'm against such censorship in principle -- I think that even the right to publish and access pornography counts as a fundamental human right. But I think we have to take what progress we can get, and censoring just pornography and calls to violence, is a big improvement over censoring pornography and dissident political speech, which is the norm in most non-"liberated" Middle Eastern countries like Syria, Iran, and Saudi Arabia. Syria blocks foreign opposition sites like All4Syria.info, Iran blocks Facebook and YouTube to keep dissidents from posting or viewing anti-government material, and Saudi Arabia blocks Reporters Without Borders and filters the Amnesty International report on human rights in Saudi Arabia (but not the rest of the Amnesty International site!).
Saudi Arabia blocking the Amnesty International report on human rights in their country (while leaving the rest of the site unblocked), in particular, seems like the kind of thing that a government would do more as a "fuck you" to human rights activists, than a means to achieve a practical goal. For one thing, most of the facts in the human rights report about Saudi Arabia -- about sex discrimination and lack of political and religious freedom -- are already well known to the people who live there. And secondly, what percent of the citizens of a country would ever read the Amnesty International report on human rights in that country, even if it were not blocked? How many Americans even know that Amnesty puts out an annual report about human rights violations in the United States? So it seems more like a symbolic move to remind everyone who's in charge. For all the disappointment in the lack of progress for free speech in post-"liberation" countries, the non-"liberated" ones are indeed worse.
As for the Tunisian proposal to censor "calls to violence", I wouldn't always be against that, even in principle. In most countries, direct incitements to violence can be considered illegal (it depends on what you say and, of course, on what judge you get). In a developing country rife with ethnic tensions, even greater restrictions on calls to violence could be justified. When you finally watched Hotel Rwanda , weren't you hoping someone would bust in on that radio DJ telling everyone to kill Tutsis in the middle of a civil war, and blow him to hell? The biggest problem with a rule against "calls to violence" is that the government could stretch the definition to silence political speech. But it's possible to keep that kind of abuse in check, as has mostly been achieved in the U.S. For that, what you need is an independent judiciary, not an abolishment of all rules against calls to violence.
So the free-speech situation in "liberated" Tunisia may be nothing to write home about, but it sounds much better than it used to be, when writing home to complain about it could get you arrested. A Wall Street Journal article from July 2011 describes how, under Ben Ali's dictatorship, Tunisian cyber-activist Slim Amamou had been imprisoned and abused by the police for calling for peaceful demonstrations. Post-revolution, he was freed and asked to join the interim government, where the strictest restriction placed on him was to "stop sending Twitter messages during internal government meetings to his 25,000 followers". They may not have their porn, but that's still progress.
Of course, if someone in Tunisia wants to circumvent the government filters (using tools like proxy sites, VPNs, Tor, UltraSurf, Psiphon, etc.) and get to a porn site, more power to them. I just wouldn't make it a priority to set aside resources to help them get it. Not while there are Iranians who need help getting around the latest restrictions blocking them from Facebook and Gmail.
Two caveats. First, if someone wants to sell circumvention services to Tunisians who just want to get around the porn blocker, that doesn't count as "setting aside resources", so that's a perfectly noble endeavor. In fact, given the economies of scale in the circumvention business, selling to Tunisians could help to bring the price down for other users, including users in countries like Saudi Arabia where the government does engage in political filtering, and where circumvention services could be a tool for social change. Second, providing circumvention services (free or paid) to Tunisians, does probably make it less likely that the new government would revert to political censorship, knowing that many of its citizens have the tools to beat it, even if those tools are only currently used to access porn sites. So to that extent, setting aside resources to provide circumvention services in Tunisia might be a worthwhile cause.
Still, I think it's a lot less important than using circumvention tools to fight political censorship in truly autocratic countries like Iran. For the next generation of proxy servers that I'm rolling out, I'm working on setting aside some of them just for Iranian IP addresses. Even if Iranians just use them to get on Facebook, that's still contributes more to advancing the cause of social democracy, than Tunisians using them to get on Playboy.
-
Google Offering Cash For Your Cache
pigrabbitbear writes "The gradual transformation of the web into an ultra-personalized, corporate-owned social space in the cloud has raised more than a few legitimate concerns about data privacy. Google, for obvious reasons, has always been one of the top cheerleaders for this metamorphosis. Touting a fresh new privacy policy that allows data about you from all of their services to coalesce, they've recently been particularly bullish about rendering that increasingly realistic digital portrait of you that lies stuffed away in their servers. It has led us again to question: How much are we comfortable with our machines knowing about us? How much is our privacy really worth? With their new program, Google is now asking those questions quite directly, and preceding them with dollar signs. Are we all on the verge of making our own information age Faustian bargains?" -
Online Privacy Worth Less Than Marshmallow Fluff Six Pack
nonprofiteer writes "With a program called Screenwise, Google is offering a total of $25 in Amazon gift cards to anyone willing to install a Chrome browser extension that will let the search giant track every website the user visits and what they do there over a year-long period. Google says it will study this in order to improve its products and services. Forbes points out that $25 in Amazon credits isn't quite enough to buy a six pack of Marshmallow Fluff ($26.75)." The money isn't much as a pure trade for privacy, but I suspect that many people would like to have their preferences be among those that shape how Google — and other companies, too — actually organize their interfaces. (Note that the tracking can be selectively turned off by the user.) -
Lake Vostok Reached
First time accepted submitter Cyberax writes "After 30 years of drilling and weeks of media attention the Antarctic underground lake Vostok has been reached by Russian scientists (translated article). Deep drilling in the vicinity of Vostok Station in Antarctica began in the 1970s, when the existence of the reservoir was not yet known. Scientists are beginning paleoclimatic studies and further exploration of the lake will continue in 2013-2014." -
83-Year-Old Woman Gets New 3D-Printed Titanium Jaw
arnodf writes "The University of Hasselt (in Belgium) announced today (Google translation of Dutch original) that Belgian and Dutch scientists have successfully replaced an 83-year-old woman's lower jaw with a 3D-printed model. According to the researchers, 'It is the first custom-made implant in the world to replace an entire lower jaw. ... The 3D printer prints titanium powder layer by layer, while a computer controlled laser ensures that the correct particles are fused together. Using 3D printing technology, less materials are needed and the production time is much shorter than traditional manufacturing. The artificial jaw is slightly heavier than a natural jaw, but the patient can easily get used to it." -
Anonymous Posts Audio of Intercepted FBI Conference Call
DrDevil writes "A member of the computer hacking group Anonymous has hacked into a telephone conference between the FBI and Scotland Yard (London Police) and posted it on the internet. The Daily Telegraph has a comprehensive article on the hack. The audio of the call can be heard here." Reader eldavojohn snips as well from the AP's story as carried by Google: "Those on the call talk about what legal strategy to pursue in the cases of Ryan Cleary and Jake Davis — two British suspects linked to Anonymous — and discuss details of the evidence gathered against other suspects." -
Google Begins Country-Specific Blog Censorship
bonch writes "Google will begin redirecting blogs to country-specific URLs. Blog visitors will be redirected to a URL specific to their location, with content subject to their country's censorship laws. A support post on Blogger explains the change: 'Over the coming weeks you might notice that the URL of a blog you're reading has been redirected to a country-code top level domain, or "ccTLD." For example, if you're in Australia and viewing [blogname].blogspot.com, you might be redirected to [blogname].blogspot.com.au. A ccTLD, when it appears, corresponds with the country of the reader's current location.'" -
Dutch Supreme Court Sees Game Objects As Goods
thrill12 writes "The Dutch Supreme Court ruled on January 31st that the taking away of possessions in the game Runescape from a 13-year-old boy, who was threatened with a (real) knife, was in fact theft because the possessions could be seen as actual goods. The highest court explained this not by arguing it was software that was copied, but by stating that the game data were real goods acquired through 'effort and time investment,' and 'the principal had the actual and exclusive dominion of the goods' — up until the moment the other guy took them away, that is." -
German Appeals Court Confirms Galaxy Tab 10.1 Ban
New submitter Killer Panda sends word that a German Appeals Court has upheld the injunction prohibiting sales of Samsung's Galaxy Tab 10.1 in Germany. Apple convinced lower courts to issue and uphold the injunction last year by making the case that Samsung's devices "slavishly" copied the iPhone and iPad. "Samsung, which is Apple's supplier as well as a competitor, has been trying to have the German decision overturned while also seeking other means to fight Apple. It redesigned the Galaxy Tab 10.1 for the German market only and named it Galaxy Tab 10.1N to get around the sales ban. Apple challenged the reworked version but a German court last month rejected Apple's claims in a preliminary judgment." The European Union announced some more bad news for Samsung: they'll be investigating the company to see whether its use of patent lawsuits is illegally hindering other companies' use of standardized 3G technology. "Under EU patent rules, a company that holds patents for standardized products is required to license them out indiscriminately at a fair price." -
Some Critics Suggest Apple Boycott Over Chinese Working Conditions
Hugh Pickens writes "The Guardian reports that Apple's image is taking a dive after revelations in the NY Times about working conditions in the factories of some of its network of Chinese suppliers and the dreaded word 'boycott' has started to appear in media coverage of Apple's activities. 'Should consumers boycott Apple?' asked a column in the Los Angeles Times as it recounted details of the bad PR fallout amid detailed allegations that workers at Foxconn suffered in conditions that resembled a modern version of bonded labor, working obscenely long shifts in unhealthy conditions with few of the labor rights that workers in the west would take for granted." Read on, below. Pickens continues: "But Apple has come out fighting, which is no surprise given the remarkable success that the company has seen in recent years with its reputation for 'cool' among hip urban professionals and a generally positive corporate image. In a lengthy email sent to Apple staff, chief executive Tim Cook met the allegations head-on. 'We care about every worker in our worldwide supply chain. Any accident is deeply troubling, and any issue with working conditions is cause for concern,' Cook said. He went on to slam critics of the company. 'Any suggestion that we don't care is patently false and offensive to us ... accusations like these are contrary to our values.' So will we see some kind of movement to boycott Apple products, akin to the campaign several years ago to pressure Nike to improve working conditions in its factories asks Sam Gustin in Time Magazine? "You can either manufacture in comfortable, worker-friendly factories, or you can reinvent the product every year, and make it better and faster and cheaper, which requires factories that seem harsh by American standards," an anonymous current Apple executive told the Times. "And right now, customers care more about a new iPhone than working conditions in China."" -
Google+ Officially Open To Teens
hypnosec writes "Google+ made a landmark move and opened itself to users who are over the age of 13. Google+ did not initially target the younger crowd and kept itself available only for users above the age of 18. While opening up to youngsters over the age of 13 the social network also added improved safety features to keep the younger crowd protected. Now it features more rigid default settings for privacy, but they can be overridden nonetheless. The vice president of product management at Google+, Bradley Horowitz, in a Google+ post stated, 'With Google+, we want to help teens build meaningful connections online. We also want to provide features that foster safety alongside self-expression. Today we're doing both, for everyone who's old enough for a Google Account.'" -
Google Consolidates Privacy Policies Across Services
parallel_prankster writes "The Washington Post reported Tuesday that Google will require users to allow the company to follow their activities across e-mail, search, YouTube, and other services; a radical shift in strategy that is expected to invite greater scrutiny of its privacy and competitive practices. The information will enable Google to develop a fuller picture of how people use its growing empire of Web sites. Consumers will have no choice but to accept the changes. The policy will take effect March 1 and will also impact Android mobile phone users. 'If you're signed in, we may combine information you've provided from one service with information from other services,' Alma Whitten, Google's director of privacy, product, and engineering, wrote in a blog post." The angle of the Washington Post article is a bit negative; Google sees this as consolidating an absurd number of privacy policies for its various services into a single, unified document. Reader McGruber adds: "Donald E. Graham, the Washington Post's chairman and CEO, joined Facebook's Board of Directors in January 2009. Curiously, the Washington Post article neglects to disclose that." -
The Google+ Name Game Continues
theodp writes "'Sticks and stones will break my bones,' the old nursery rhyme goes, 'but names will never hurt me.' Unless, of course, you're on Google+. While touting what it calls a move toward a more inclusive naming policy for Google+, the search giant's Name Policy would still make Sister Aloysius Beauvier smile. Names like 'Doctor Stan Livingston,' 'Bill Smithwick DDS,' and 'Rev. Jim Copley, S. P.' are cited as examples of violations that could cost you your Google+ privileges. And since new Google account users are reportedly now forced to join Google+, one wonders if the Name Policy might even preclude one from establishing one of those adorable dear.sophie.lee or dear.hollie accounts." -
The Google+ Name Game Continues
theodp writes "'Sticks and stones will break my bones,' the old nursery rhyme goes, 'but names will never hurt me.' Unless, of course, you're on Google+. While touting what it calls a move toward a more inclusive naming policy for Google+, the search giant's Name Policy would still make Sister Aloysius Beauvier smile. Names like 'Doctor Stan Livingston,' 'Bill Smithwick DDS,' and 'Rev. Jim Copley, S. P.' are cited as examples of violations that could cost you your Google+ privileges. And since new Google account users are reportedly now forced to join Google+, one wonders if the Name Policy might even preclude one from establishing one of those adorable dear.sophie.lee or dear.hollie accounts." -
Alternative Android Market To House Banned Apps
sl4shd0rk writes "In contrast to the Apple's iron-fisted control over their App store, the Android Market is much more open. Google does, on occasion, remove apps it deems inappropriate, such as emulators, legally-questionable music services, tethering apps and one-click root apps. But if Koushik Dutta of CyanogenMod fame has his way, these heretic apps may have a home after all. Dutta plans an 'underground' Android Market complete with an approval process to weed out malicious applications; something Google doesn't do. Ideally, this will give Android users a more trustable source from which to get applications without having to resort to dictatorial software control." -
Carl Malamud Answers: Goading the Government To Make Public Data Public
You asked Carl Malamud about his experiences and hopes in the gargantuan project he's undertaken to prod the U.S. government into scanning archived documents, and to make public access (rather than availability only through special dispensation) the default for newly created, timely government data. (Malamud points out that if you have comments on what the government should be focusing on preserving, and how they should go about it, the National Archives would like to read them.) Below find answers with a mix of heartening and disheartening information about how the vast project is progressing.
LoC?
by an Anonymous Reader
So how many GB/TB is a Library of Congress? :)
Or, more seriously, how big are you estimating? Are you using raw scans or some sort of compression (JPG, PNG, etc)? What resolution are you using? Do you vary the resolution depending on the document?
What sort of meta data are you putting in?
CM: The reason John Podesta and I suggested a Federal Scanning Commission in our letter at YesWeScan.Org is we really don't know how big the holdings of the government are. I can tell you that the Library of Congress is about 32 million cataloged books (a significant increase from the 6,487 books Thomas Jefferson donated to get them started). But, this is about more than books, it is about paper records, microfilmed technical papers, video, audio, photographs, and much more.
The scale is fairly vast. The Smithsonian has 137 million objects, including about 13 million images. David Ferriero, the Archivist of the United States estimates he has over 10 billion pages of text documents, 7.2 million maps, and 40 million photographs including everything from past census records to presidential dinner menus, and that includes about 7.5 million motion pictures and sound recordings. The Government Printing Office distributes their documents to the Federal Depository Library Program, and that includes over 60 million pages of collections including the Official Journals of Government such as the Federal Register. That's just scratching the surface, and we recommended a Federal Scanning Commission to begin the process of understanding what we have (and what is worth digitizing).
As to standards? There are lots of pretty good standards on how to digitize. NARA, Library of Congress, GPO all spec out document scans at 400 dpi, for example. For photographs, moving images, and other objects, there are some pretty good and pretty detailed standards at www.digitizationguidelines.gov. I know Brewster Kahle's operation and my own tend to work off those specifications (in fact Brewster does quite a bit of scanning for the government).
As to compression? Well, I've found people tend to overcompress things. That said, sometimes the initial quality isn't that great, so a 600 dpi uncompressed scan would be silly in some cases. But, for photographs I try very hard to keep the TIFF images around and not rely on JPEG. Likewise, for audio it is really nice to keep a nice 48 khz version of your file around if you can simply because if you screw up the compression maybe somebody else can do a better job in a few years. Disk space is relatively cheap, so that isn't the barrier it used to be. For video, I rip MPEG2 at whatever it is on a DVD, when I'm actually digitizing I try to get the video bitrate up to 8-10 mbps when ripping a Betacam or Umatic. Some people think that is overkill, but I'd rather be safe than sorry.
Metadata? Well, you got to have it or you're not going to get very far when it comes to access. Many librarians have made perfect the enemy of the good when it comes to metadata and have resisted any attempt at digitization because we don't have the very best metadata we might have. I'm more in the camp of scan what you have and get as much of the metadata as you can into it. For example, we have 3,200 1000-page volumes of briefs from the 9th Circuit of the U.S. Court of Appeals. We didn't have good metadata, but we had the Internet Archive scan them anyway. Then, after we got our PDF files, I shipped those off to a double-key team in India and they broke the briefs up into individual documents and typed the metadata into a spreadsheet for me, which we hope to release soon.
My point is that sometimes you can shoehorn the metadata in after the fact or you can use a variety of techniques to pull the metadata out of the documents (e.g., smart OCR). In theory, you can use crowdsourcing to get the metadata, but so far I've not had a lot of luck persuading thousands of people to spend their time doing that kind of work. A captcha is a quick thing to do and is between you and something you want, whereas entering metadata in for videos or documents is one of those civic duty things that everybody thinks everybody else should be doing.
Total size? Brewster says a book is about 400 Mbytes (though he's very quick to point out that you could put the words in all the books in the library into a terabyte and if you're distributing PDFs, you can easily throw 130,000 full-color, searchable PDFs onto a 4 TB drive). But, you were probably asking about raw data. Here's some raw numbers:
32 million books at 400 Mbytes each is 12.8 petabytes 50 million photos at 150 Mbytes each is 7.5 petabytes 10 billion pieces of paper ("records") at 100 Kbytes each is 1 petabyte 20 years of video at 8 mbps is only 630 Tbytes.
(Somebody check my math?)
If you're talking a decade-long federal digitization initiative, we're looking at well south of 50 petabytes, which seems pretty doable in this day and age!
Can the rare books collections be digitized?
by autophile
Three closely related questions about the rare books collections at the Library of Congress:
1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).
2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?
3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.
CM: In reverse order. I don't know why we aren't distributing and decentralizing our scanning efforts. The Internet Archive is a heavy-duty production shop and they do an amazing job, as do folks like Google Books and the folks digitizing things the Mormon Church. But, there are a bunch of DIY solutions and it would be really nice if we could get more people pitching in. The biggest problem on distributing the digitization efforts is quality control. I know when it comes to ripping video, I can easily teach other people how to grab an MPEG2 off a DVD, but when it comes to things like digitizing a Betacam, that takes some training. But, we're all trainable and I wish we could all do more.
Getting back paper copies of books and papers when they're doing a copy anyway is just plain dumb. Likewise with things like FOIA results. John Podesta testified before the Senate about FOIA and said if an agency answers a FOIA request, they should also post their result online so others can see it. That seems pretty obvious.
As far as digitizing rare book collections, there are some amazing pockets throughout the government but there is no real coordination and there certainly is no effort to scan at scale or to come up with a realistic national digitization strategy. That is why we called on the White House to lead the effort. Within the Library of Congress there are some amazing collections, but if you look around to places like the National Agricultural Library or the National Library of Medicine or the libraries in the service academies you'll find lots more. Some have argued that digitizing rare books is silly because the audience is just a few academics, but I can tell you from my own experience helping host the network site for the Archimedes Palimpsest that when you make this kind of information available, there is an amazing long tail.
If you scan it, they will come. And, to answer your question, if we all scan it, they will come much sooner.
Real time legislation drafting
by kerskine
Would it be possible to implement a system that would allow real-time and continuous review of legislation while it's being drafted? Much has been made over the past three years about legislation being available for review before voting by the House or Senate. The final draft for review usually is huge PDF that makes it near impossible for citizens, interest groups, and the media to thoroughly analysis in time.
CM: You want to see the sausage being made not just buy the hot dog! I'll comment on the U.S. Congress since that's the system I know best. Thomas is a pretty good system if you happen to be stuck in 1994. It does have all the amendments and the actions and the various stages that legislation go through. But, it isn't real time, more like "pretty quick." As Van Jacobson once quipped, "Same day service in a nanosecond world." And, Thomas isn't really machine processable, it is final form, usually formatted ASCII text (shades of NROFF!). People like Josh Tauberer who built GovTrack.US have spent considerable time crawling those systems and trying to get the data into regularized formats and make it available to others to reuse via APIs, but that isn't the same as exposing the inner working of the sausage factory.
Majority Leader Cantor's staff has been pushing a system to make the raw data all available in XML from the Clerk's office and I think that is a very promising initiative which hopefully will bear fruit. (They're having a February 2 conference to discuss their plans if you are interested. I have no idea if it will be streamed for those of who aren't Inside the Beltway and I don't know their schedule for moving past conferences and into production.)
Congress is a pretty complicated beast. I know some folks like Sean McGrath have had better luck with some of the state legislatures. The problem is you need to dig deep into the inner working of a legislature. In the Congress, that means you're changing things like authoring tools that are used in the Clerk's office and by all the staff members, so you have to be careful or you get a bunch of really angry Congressman yelling at you because their staff can't crank out the flavor-of-the-week in the form of a bill or amendment.
There's also a bit of an issue of will. My work with the Congress to put hearings on-line showed that you could take the official transcripts of a hearing and use those to generate closed captions on the video. All you need is the official transcript of the hearing, but in order to get those I had to execute a special Memorandum of Understanding with the House Oversight Committee. Other committees guard their transcripts jealously and won't let them out for several when. When I started processing a bunch of historical videos we purchased from C-SPAN, I went to the Government Printing Office and found that many committees never deliver their transcripts, even a decade after the fact!
How to keep track of legislative activity about open access?
by oneiros27
Recently in the federal register, there were two calls for comments about access to data and research from federally funded research:
http://federalregister.gov/a/2011-28623 [federalregister.gov] http://federalregister.gov/a/2011-28621 [federalregister.gov]
I didn't hear about these until ~4 weeks after the original announcement, and with the holidays, it was too late to try to get the societies I'm involved with to prepare and vote on official statements. Are there any places where people can get/post notices of these sorts of things so that we can stay informed and try to help influence policies?
CM: The Federal Register is getting a lot better now that it is a much more open system. The idea of "Federal Register 2.0" was a paper I wrote for the Obama transition, so it is an issue I've tracked pretty closely and frankly, I've been amazed at how much better it is now. What they did is instead of selling the raw data feed for the Federal Register for $17,000/year, they went from SGML to XML and then released the data in bulk for free. A few guys out in San Francisco were looking for something to do to enter a contest and they took that bulk data and dreamed up GovPulse.US. That was such a better version of the Federal Register that the Office of the Federal Register switched the official site over to their open source platform. My point is the tools are there to do better notification mechanisms, and I'm sure the government would welcome somebody grabbing the GovPulse.US code out of Github and making it even better.
That's the technical answer. But, the substantive answer is that there is a huge boatload of stuff in the Federal Register and it is pretty hard to figure out what to pay attention to. I also missed that particular call for comment, and I've even missed several Requests for Information coming out of places I try and pay attention to, like the White House's Office of Science and Technology Policy. And, I do this stuff full-time! Perhaps better targeted notification mechanisms are the answer. Maybe it is a social media solution, where you pay attention to things your friends are paying attention to. I hope the answer is not that the only way to pay attention is to be employed with a beltway bandit which can afford hundreds of minions that do nothing but pay attention to Washington. Indeed, there are some very fancy for-pay services from folks like Congressional Quarterly and Bloomberg that cost an arm and a leg, but I can't help but think there has to be a better way that is also open.
What do you think of corporate partnerships?
by mhh5
I'd like to know what you think about corporate partnerships in the process to get public data released. (I'm not sure if Google Patents existed before the USPTO released its databases.) Do corporations that get involved in the process tend to make the process better without question, or are there tradeoffs in some areas because the corporations always want to help but then try to retain a proprietary version of the data for themselves?
CM: The theory is that the government gets some kind of valuable service (like digitization) that the government wouldn't get otherwise so it is a "win-win." But, the reality is all too often the government gets snookered and what we do is give some corporation exclusive access to some pot of data and the government doesn't get much of anything. The deal between Amazon and the National Archives was a good example of that kind of a private fence around the public domain. With a help from Boing Boing, I started systematically purchasing those public domain videos and re-releasing them in the wild. I have no problem with Amazon selling public domain video, I just hate it when they get a de facto or a contractual exclusive. (My testimony before Congress on this subject is here.)
There are lots of other examples of government getting snookered. For example, the Government Accountability Office let Thomson West get access to 60 million or so pages of federal legislative histories. At great cost to the government, they were all packed up and dispatched to West which digitized them all and then sent them back to the government. West now sells access to his amazing database. What did the government get for it's trouble? A few logins for GAO staffers. Even members of Congress need to pay to access the database! (We have an interesting paper trail on this issue.)
I'm glad you brought up the Google Patent system because I was personally involved in making that happen and I can tell you that this one is totally legit. Jon Orwant is the lead developer on this for Google and I played a small part in helping convince the White House and the Patent Office they ought to give Jon access to their data (the heavy lifting on that deal was by Beth Noveck who was the Deputy CTO at the time). Google makes all the data they got from the Patent Office available for bulk access with no strings attached. I can vouch for that because I did a mirror of their system. Last I heard Google was sending out anywhere from 1 to 10 terabytes of data PER DAY to external sources and even normally very critical folks who work in this arena have been really happy.
The big problem in the Patent Office is their computing infrastructure is a real catastrophe. Their power plant is over 95% capacity (e.g., plug in a computer, bring the building down!) and even though the Under Secretary knew that selling DVD subscriptions was silly, he wasn't able to switch over to an FTP service. He cut the deal with Google Patent and it worked out well for the government, for Google, and for everybody else.
What's the difference between the Google deal and the Amazon deal? In the case of the Amazon and GAO/West deals, the government lawyers did all the negotiating and they were totally outsmarted by some sharks in industry. But, when government has people like Under Secretary Kappos and Beth Noveck doing the negotiating, these things can work out just fine. The key is government should partner with people who want to do public service, not people who want to service the public.
Encouraging Governments?
by theNAM666
In a city such as Nashville, things as basic as business ownership and property records are not available online. In states such as New Jersey, public records such as basic corporate filings (officers, operating address/address for service of process) are accessible only for a fee.
What concrete actions can citizens confronting such situations, take to encourage accessibility and accountability?
CM: I find you need a carrot and a stick to make this stuff happen, especially at the local level. Folks like Everyblock.Com and CodeForAmerica.Org have done great working prying some of these databases loose, but there is still lots to do.
The first thing you should do is pick up the phone (or pick up your email client) and write/call the people who run the system. Ask them if you can have access to the data. Sometimes, it is as simple as that.
Other times, though, it isn't quite as simple since they want the money (or they want the control or they think this should be done by "private industry" by which they mean some buddy who is a contractor). The nice thing about any government system is somebody usually has oversight responsibilities. So, the next step is to find a city council member of state legislator who has oversight on the agency in question and ask them.
Again, life isn't usually that simple, but sometimes you win! If you can't get anywhere that way, what I usually end up doing is basically competing with the government system. Build a proxy system like RECAPtheLaw.Org did to recycle paid documents. Or, get a sponsor and buy a reasonable number of docs and build a web site that looks like it is going to be a real production system.
Then, go back again and ask. Maybe if you have eyeballs or at least have a nice web site, that is enough to get the government moving. But, if that doesn't do the job, you may have no choice but to compete with them for real, which of course requires a big commitment in time and energy and not everybody can do that. I know in the case of the Patent Office, I started pestering them in 1993, including several times when I spent 6-figure sums purchasing their data, and it still took until 2011 to crack that nut.
The real trick is focus/obsession. Pick one thing you really care about and just keep pestering them until you crack it open. If you're surfing from one opengov problem to another, showing up for a 1-day hackathon then moving on to something else, you're not going to get anywhere. Pick something real and make it your thing.
Privately Owned, Copyrighted Law
by AdamnSelene
I think I have read that the law itself cannot be copyrighted and it should be possible to make it available available to everyone. But as a techie who drafts standards and specifications, I was wondering about how far this goes--especially since Congress recently proposed enacting some of our standards into law. (They decided not to, but they read some parts into the committee records as they debated.) Can you still accomplish your project if a governmental body adopts (or considers adopting) a privately owned, copyrighted technical reference manual or set of safety standards as administrative law (or regulations that carry the force of law)? Or would such obstacles keep you from being able to digitize all of the government's laws (and archives of proposed laws)?
CM: The idea that the law has no copyright is a fundamental part of the American system of government. That applies to states and municipalities as well. The basic decision is Wheaton v. Peters from 1834 but that decision has been reaffirmed over and over. The law is sacred in the American system. You can't have equal protection under the law or due process under the law if there is a poll tax on access to justice.
When we get to a privately developed standards however, it turns into a very interesting issue. The basic mechanism is called Incorporation by Reference. The government will take some external document (such as a model building code) and incorporate the entire text to make it the law of the land. A guy named Peter Veeck was responsible for a landmark decision in 2002 when he published the Texas Building Code which was an incorporation of a privately-developed and very expensive model code. The court ruled that while the model code had copyright, the law of the land did not.
Based on the Veeck decision, my group went and posted many of the public safety codes enacted by the states. We started by purchasing model codes, finding the incorporating legislation, and concatenating the two pieces together and posting the resulting PDFs. More recently, we've done some extensive reworking of the California public safety codes, known as Title 24, converting the entire text into valid XHTML, recoding the graphics as SVG graphics, the formulas as MathML, and regenerating the PDF documents as nicely typeset documents instead of low-quality scans. You can see this work on the web but it is also available as Google Code project.
The federal government also uses this mechanism intensively, with over 2,000 standards incorporated into the Code of Federal Regulations. This is non-trivial stuff, things like all the OSHA safety regulations. The issue was recently considered by a federal group called the Administrative Conference of the U.S. which basically rolled over and endorsed the idea that it is ok for important parts of the law to cost money. (Read EFF's protest letter if you want a good critique of what they did.)
I'm not necessarily saying that government should be able to appropriate any privately-developed standard and make it available. And, I'm not necessarily saying you want OSHA bureaucrats drafting the standards. But, I do think the big standards establishment and the government regulators have cut a deal that results in the law not being available and the costs forked off on private citizens and small business with extortionate monopoly prices. I just paid $847 for a 48-page safety standard from Underwriters Labs and $60 for 2-page safety standard from the Society of Automotive Engineers, both of which are mandated by law in the CFR. They do need money to run their operations, but let me just point out that in 2009 the 501(c)(3) nonprofit Underwriters Labs paid their CEO $2,138,984 and the nonprofit SAE paid their CEO $412,578.
Ancestry.com
by An Anonymous Reader
What is your opinion about websites like Ancestry.com which make use of public records and charge a subscription fee for access? What is the incentive for the government to migrate old documents into digital form when services like these exist? Do you think Ancestry.com should be a 501(c)(3)?
CM: I'm not a big fan of for-profit corporations that have a business model of monetizing the public domain. I'm fine if they exist and fine if they make billions of dollars, but if they are the only game in town they've taken something that belongs to all of us and and turned it into their private property.
The government got snookered on the Ancestry.Com deal. They could have insisted that the raw data be available in bulk for anybody else to use. The folks that approach the government to cut these sweetheart deals argue that is unreasonable because they need a "return on investment" and the argue that if they don't get the return on investment they won't do the deal (and by extension nobody else will do the deal).
But, government can argue much harder! For example, instead of negotiating some exclusive thing with Ancestry.Com, how come they didn't ask the Internet Archive to grab the data? Or put together something creative with a couple of foundations that would pay for the digitization in return for the kind of payback the foundations like to see (e.g., good press, photo opportunity with the President, or other tools of the trade)?
You asked if Ancestry.Com should be a 501(c)(3)? Not all nonprofits do something that I think which should be an essential part of their mission, which is allow others to compete with them. I believe providing open access to all data ought to be a precondition to getting nonprofit status (an idea that Gil Elbaz has been pushing for quite some time). A good example of a nonprofit that builds walls is Guidestar which wants to be the place where you go for all your nonprofit information. The IRS should be making all Form 990 returns of nonprofits available in bulk for anybody to use, which would knock the bottom out of Guidestar's attempts to build walls and force them to stay innovative and provide value.
Pacer Problems
by onyxruby
How much difficulty do you anticipate in getting and publishing records in Pacer? If there's one system that should be free it the decisions that our courts make and yet you are charged by the page just to view the results. Are you concerned about a court taking an unkind view on your archiving what is in Pacer?
CM: PACER is an abomination. Do they take a dim view of our efforts? Well, the Administrative Office of the U.S. Courts reacted so strongly to our efforts to make their data available that they called the FBI on Aaron Swartz and cancelled the only meaningful public access system they had, which consisted of one terminal in each of 17 public libraries around the country. In this era of rapidly decreasing costs, they just boosted their access charges from 8 cents a page to 10 cents a page, arguing that this is a bargain compared to 25 cents a page for a copy machine.
What I find so disturbing about PACER is that when we did get 20 million pages of docs, we were able to conduct a comprehensive analysis of privacy violations in the courts, an analysis that led to a nice thank-you letter from the Judicial Conference and changes in their privacy rules. In other words, only when public interest groups got access to the data did we begin to address privacy issues. Public access is not just about pro se prisoners defending themselves from a jail cell, which is the view of many in the Administrative Office of the Courts. Public access is about attempts like ours (and many other folks) to make our system of justice function better. When we say we are "an empire of laws not a nation of men" that means we write down what we are doing in our courts so that it is no longer the arbitrary decisions of individuals. The paper trail is there so we can make sure the system is functioning properly. When you limit that access to those that only have a Gold Card, you pervert democracy and you pervert justice.
This principle that access to justice shouldn't hide behind a cash register goes back to the Greeks. Theseus in Euripedes' Suppliants said "when there are no public laws, one man holds power by keeping the law all for himself, and there is no more equality. But when the laws are written, the weak man and the rich man have equal justice." The PACER system is justice for the rich man.
Steve Schultze and the team at Princeton did a lot of the heavy lifting on this issue, including the very nice RECAPtheLaw.Org system they built. They've also done a lot of financial analysis that shows that the courts are not only recovering their costs for operating the expensive PACER system, they're making a huge profit (to the tune of $100 million/year) and using their excess profits to do things like buy big-screen TVs in direct violation of the E-Government act.
The basic problem on PACER is the Judicial Conference has delegated the issue to a few techie judges who think what they've built is something great. But, PACER is a hairball of bad PERL code and the result has not served the judges, the bar, or the American people very well. My only hope is that eventually, the Judicial Conference will see that their information technology is 30 years behind the rest of the Internet and feel ashamed at the travesty they have wrought. Until then, we have RECAP.
If you're interested in the issue, a couple of resources to look at are the PACER paper trail and a bit of a rant that I delivered at the Gov 2.0 summit.
How to visualize opened data?
by hardwarejunkie9
The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.
After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.
Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.
CM: Actually, most of the data I'm looking at is not tables of numbers, it is video, images, textual documents, technical papers, maps, and books.
But, I definitely get what you're saying and there are a lot of numbers. For example, the IRS Form 990s should be structured data instead of PDF documents, so extracting the data from the mass of paper is the initial challenge. There are lots of other examples of this kind of initial extraction, getting what were printed paper docs into structured data. There are some interesting tools, such as OCRopus which does layout analysis, but there needs to be much more. One of the reason we called for a Federal Scanning Commission is that we think there is a lot of directed R&D that could not only scale up mass digitization but could also work on the important value-added of extraction of structured data and handling some of the tricky issues like detecting the presence of Social Security Numbers.
Once you have the data, as you say, then what? I'm a big fan of the idea that the government starts by providing bulk data, then they provide an API, and then maybe they also build web sites and apps and other things along with everybody else out there. That's a 3-part hierarchy that Ed Felten and some of his students developed and it should be a law that applies to all government information systems that are externally facing.
The issue here is that all too often people look at a problem like "digitize all government information" and they want to see the whole stack of the solution from one place. But, I think you can do a layered approach and count on the fact that there is always somebody smarter out there and our job is to reduce the barriers to entry. So, how would I visualize the data? I have no idea, but I'd make damned sure that folks like Martin Wattenberg at Many Eyes and Hans Rosling at Gapminder knew the data was out there and then I'd sit back and be amazed at whatever they come up with. How's that for pushing the problem downstream?
Why is data access so hard?
by CanHasDIY
Can you provide any explanation as to why it is so difficult and cost-prohibitive to obtain records from the government, especially considering the abundance of laws requiring government compliance with requests for information (AKA "Sunshine Laws")?
Is it simply a matter of government employee ineptitude, or have you found evidence of a more nefarious rationale?
CM: I get that question a lot. Why would a member of Congress take deliberate steps to stop public hearings from being available? Why would a court administrator deliberately restrict access to public court documents? Usually the answer is, as Heinlein said, "you have attributed conditions to villainy that simply result from stupidity." When I'm explaining why something is so broken on a big government system, my usual answer is that there are a lot of people still stuck in the 1970s and 1980s, when information dissemination was really, really hard and it took men in white lab coats and computers the size of freight trains to process data. In other words, the problem with a lot of folks who are government gatekeepers is they just don't get the Internet and they don't get computers. In fact, usually when some senior bureaucrat is throwing stones at me, you can find younger staffers working for them rolling their eyes.
That's an optimistic view, and if I'm right things will get better. But, I'm often wrong on my predictions of the future. (I was the guy who saw TimBL demo the web in 1992 and thought to myself "interesting, but it won't scale.")
But, there is also some more nefarious stuff happening, often the accumulation of power by being able to cut exclusive deals with contractor buddies. If your life in government consists of receiving emissaries from Lockheed Martin, maybe you think you're making everybody happy by letting them build you a $1 billion computer system. Often, you think your problems are so unique that the $1 billion solution is the only answer.
And, in some cases, as we've seen from numerous GAO reports, Inspector General reports, Congressional hearings, and newspaper articles, there are some really evil people out there who think the public domain and the government is their personal business opportunity. Looting the federal government is the kind of civic crime that ranks right up there in my book with stealing cookies from Girl Scouts and selling fake medicines to sick people.
Who is the worst?
by TheBrez
Which government agency is the worst to get information from?
CM: I don't know who the worst are (there's a lot of competition for that slot), but the ones that piss me off the most are the ones that should know better.
Public.Resource.Org is a really small operation. I'm the only staff member. My part-time sysadmin is @mdkail who is pretty busy with his day job as CIO at NetFlix. My ISP is Jim Martin and his team at ISC who are kind of busy running the F-Root. My office net is supported by the amazing systems team at O'Reilly which rents me office space at below-market rates.
I'll grant you government would have a tough time getting that kind of help. But, I'm a one-man shop and we run the 4th most popular U.S. government video channel on YouTube, we're the source for a lot of the on-line presence of the U.S. Court of Appeals, and we've supported efforts for the U.S. Congress, the White House, and the National Archives. If we can do this out of Northern California, couldn't the vast resources of the federal government in Washington, D.C. do a whole lot better than they're doing now?
For me, my current bete noir is the U.S. Congress. We got half-way through processing their archives of video from congressional hearings, publishing about 31 terabytes of data. Then, a couple of staffers decided this was a bad idea and pulled the rug out from under us. They actually decided it was a bad idea to publish video from public congressional hearings.
Like any agency, Congress is a mixed bag. We had tons of support from Darrell Issa, for example, and ran a very successful pilot project for him for a year. We talked to all sorts of people on committees and in the various agencies that support the Congress. But, at the end of the day, a couple of staff members were able to decide that the public archive shouldn't be public and they terminated our project. (If you have some time, you might like to read our rather surreal paper trail.)
So, rather than the worst, I think we need to look for the most shameful, the ones that have the privilege and the power and could easily do better. I know it is in vogue to throw stones at government in general and Washington in particular, but there are times when government can be so useful and so awe inspiring it takes your breath away. Government can be that shining city on the hill but we all have to take an active part in our government to keep those lights shining bright. -
Carl Malamud Answers: Goading the Government To Make Public Data Public
You asked Carl Malamud about his experiences and hopes in the gargantuan project he's undertaken to prod the U.S. government into scanning archived documents, and to make public access (rather than availability only through special dispensation) the default for newly created, timely government data. (Malamud points out that if you have comments on what the government should be focusing on preserving, and how they should go about it, the National Archives would like to read them.) Below find answers with a mix of heartening and disheartening information about how the vast project is progressing.
LoC?
by an Anonymous Reader
So how many GB/TB is a Library of Congress? :)
Or, more seriously, how big are you estimating? Are you using raw scans or some sort of compression (JPG, PNG, etc)? What resolution are you using? Do you vary the resolution depending on the document?
What sort of meta data are you putting in?
CM: The reason John Podesta and I suggested a Federal Scanning Commission in our letter at YesWeScan.Org is we really don't know how big the holdings of the government are. I can tell you that the Library of Congress is about 32 million cataloged books (a significant increase from the 6,487 books Thomas Jefferson donated to get them started). But, this is about more than books, it is about paper records, microfilmed technical papers, video, audio, photographs, and much more.
The scale is fairly vast. The Smithsonian has 137 million objects, including about 13 million images. David Ferriero, the Archivist of the United States estimates he has over 10 billion pages of text documents, 7.2 million maps, and 40 million photographs including everything from past census records to presidential dinner menus, and that includes about 7.5 million motion pictures and sound recordings. The Government Printing Office distributes their documents to the Federal Depository Library Program, and that includes over 60 million pages of collections including the Official Journals of Government such as the Federal Register. That's just scratching the surface, and we recommended a Federal Scanning Commission to begin the process of understanding what we have (and what is worth digitizing).
As to standards? There are lots of pretty good standards on how to digitize. NARA, Library of Congress, GPO all spec out document scans at 400 dpi, for example. For photographs, moving images, and other objects, there are some pretty good and pretty detailed standards at www.digitizationguidelines.gov. I know Brewster Kahle's operation and my own tend to work off those specifications (in fact Brewster does quite a bit of scanning for the government).
As to compression? Well, I've found people tend to overcompress things. That said, sometimes the initial quality isn't that great, so a 600 dpi uncompressed scan would be silly in some cases. But, for photographs I try very hard to keep the TIFF images around and not rely on JPEG. Likewise, for audio it is really nice to keep a nice 48 khz version of your file around if you can simply because if you screw up the compression maybe somebody else can do a better job in a few years. Disk space is relatively cheap, so that isn't the barrier it used to be. For video, I rip MPEG2 at whatever it is on a DVD, when I'm actually digitizing I try to get the video bitrate up to 8-10 mbps when ripping a Betacam or Umatic. Some people think that is overkill, but I'd rather be safe than sorry.
Metadata? Well, you got to have it or you're not going to get very far when it comes to access. Many librarians have made perfect the enemy of the good when it comes to metadata and have resisted any attempt at digitization because we don't have the very best metadata we might have. I'm more in the camp of scan what you have and get as much of the metadata as you can into it. For example, we have 3,200 1000-page volumes of briefs from the 9th Circuit of the U.S. Court of Appeals. We didn't have good metadata, but we had the Internet Archive scan them anyway. Then, after we got our PDF files, I shipped those off to a double-key team in India and they broke the briefs up into individual documents and typed the metadata into a spreadsheet for me, which we hope to release soon.
My point is that sometimes you can shoehorn the metadata in after the fact or you can use a variety of techniques to pull the metadata out of the documents (e.g., smart OCR). In theory, you can use crowdsourcing to get the metadata, but so far I've not had a lot of luck persuading thousands of people to spend their time doing that kind of work. A captcha is a quick thing to do and is between you and something you want, whereas entering metadata in for videos or documents is one of those civic duty things that everybody thinks everybody else should be doing.
Total size? Brewster says a book is about 400 Mbytes (though he's very quick to point out that you could put the words in all the books in the library into a terabyte and if you're distributing PDFs, you can easily throw 130,000 full-color, searchable PDFs onto a 4 TB drive). But, you were probably asking about raw data. Here's some raw numbers:
32 million books at 400 Mbytes each is 12.8 petabytes 50 million photos at 150 Mbytes each is 7.5 petabytes 10 billion pieces of paper ("records") at 100 Kbytes each is 1 petabyte 20 years of video at 8 mbps is only 630 Tbytes.
(Somebody check my math?)
If you're talking a decade-long federal digitization initiative, we're looking at well south of 50 petabytes, which seems pretty doable in this day and age!
Can the rare books collections be digitized?
by autophile
Three closely related questions about the rare books collections at the Library of Congress:
1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).
2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?
3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.
CM: In reverse order. I don't know why we aren't distributing and decentralizing our scanning efforts. The Internet Archive is a heavy-duty production shop and they do an amazing job, as do folks like Google Books and the folks digitizing things the Mormon Church. But, there are a bunch of DIY solutions and it would be really nice if we could get more people pitching in. The biggest problem on distributing the digitization efforts is quality control. I know when it comes to ripping video, I can easily teach other people how to grab an MPEG2 off a DVD, but when it comes to things like digitizing a Betacam, that takes some training. But, we're all trainable and I wish we could all do more.
Getting back paper copies of books and papers when they're doing a copy anyway is just plain dumb. Likewise with things like FOIA results. John Podesta testified before the Senate about FOIA and said if an agency answers a FOIA request, they should also post their result online so others can see it. That seems pretty obvious.
As far as digitizing rare book collections, there are some amazing pockets throughout the government but there is no real coordination and there certainly is no effort to scan at scale or to come up with a realistic national digitization strategy. That is why we called on the White House to lead the effort. Within the Library of Congress there are some amazing collections, but if you look around to places like the National Agricultural Library or the National Library of Medicine or the libraries in the service academies you'll find lots more. Some have argued that digitizing rare books is silly because the audience is just a few academics, but I can tell you from my own experience helping host the network site for the Archimedes Palimpsest that when you make this kind of information available, there is an amazing long tail.
If you scan it, they will come. And, to answer your question, if we all scan it, they will come much sooner.
Real time legislation drafting
by kerskine
Would it be possible to implement a system that would allow real-time and continuous review of legislation while it's being drafted? Much has been made over the past three years about legislation being available for review before voting by the House or Senate. The final draft for review usually is huge PDF that makes it near impossible for citizens, interest groups, and the media to thoroughly analysis in time.
CM: You want to see the sausage being made not just buy the hot dog! I'll comment on the U.S. Congress since that's the system I know best. Thomas is a pretty good system if you happen to be stuck in 1994. It does have all the amendments and the actions and the various stages that legislation go through. But, it isn't real time, more like "pretty quick." As Van Jacobson once quipped, "Same day service in a nanosecond world." And, Thomas isn't really machine processable, it is final form, usually formatted ASCII text (shades of NROFF!). People like Josh Tauberer who built GovTrack.US have spent considerable time crawling those systems and trying to get the data into regularized formats and make it available to others to reuse via APIs, but that isn't the same as exposing the inner working of the sausage factory.
Majority Leader Cantor's staff has been pushing a system to make the raw data all available in XML from the Clerk's office and I think that is a very promising initiative which hopefully will bear fruit. (They're having a February 2 conference to discuss their plans if you are interested. I have no idea if it will be streamed for those of who aren't Inside the Beltway and I don't know their schedule for moving past conferences and into production.)
Congress is a pretty complicated beast. I know some folks like Sean McGrath have had better luck with some of the state legislatures. The problem is you need to dig deep into the inner working of a legislature. In the Congress, that means you're changing things like authoring tools that are used in the Clerk's office and by all the staff members, so you have to be careful or you get a bunch of really angry Congressman yelling at you because their staff can't crank out the flavor-of-the-week in the form of a bill or amendment.
There's also a bit of an issue of will. My work with the Congress to put hearings on-line showed that you could take the official transcripts of a hearing and use those to generate closed captions on the video. All you need is the official transcript of the hearing, but in order to get those I had to execute a special Memorandum of Understanding with the House Oversight Committee. Other committees guard their transcripts jealously and won't let them out for several when. When I started processing a bunch of historical videos we purchased from C-SPAN, I went to the Government Printing Office and found that many committees never deliver their transcripts, even a decade after the fact!
How to keep track of legislative activity about open access?
by oneiros27
Recently in the federal register, there were two calls for comments about access to data and research from federally funded research:
http://federalregister.gov/a/2011-28623 [federalregister.gov] http://federalregister.gov/a/2011-28621 [federalregister.gov]
I didn't hear about these until ~4 weeks after the original announcement, and with the holidays, it was too late to try to get the societies I'm involved with to prepare and vote on official statements. Are there any places where people can get/post notices of these sorts of things so that we can stay informed and try to help influence policies?
CM: The Federal Register is getting a lot better now that it is a much more open system. The idea of "Federal Register 2.0" was a paper I wrote for the Obama transition, so it is an issue I've tracked pretty closely and frankly, I've been amazed at how much better it is now. What they did is instead of selling the raw data feed for the Federal Register for $17,000/year, they went from SGML to XML and then released the data in bulk for free. A few guys out in San Francisco were looking for something to do to enter a contest and they took that bulk data and dreamed up GovPulse.US. That was such a better version of the Federal Register that the Office of the Federal Register switched the official site over to their open source platform. My point is the tools are there to do better notification mechanisms, and I'm sure the government would welcome somebody grabbing the GovPulse.US code out of Github and making it even better.
That's the technical answer. But, the substantive answer is that there is a huge boatload of stuff in the Federal Register and it is pretty hard to figure out what to pay attention to. I also missed that particular call for comment, and I've even missed several Requests for Information coming out of places I try and pay attention to, like the White House's Office of Science and Technology Policy. And, I do this stuff full-time! Perhaps better targeted notification mechanisms are the answer. Maybe it is a social media solution, where you pay attention to things your friends are paying attention to. I hope the answer is not that the only way to pay attention is to be employed with a beltway bandit which can afford hundreds of minions that do nothing but pay attention to Washington. Indeed, there are some very fancy for-pay services from folks like Congressional Quarterly and Bloomberg that cost an arm and a leg, but I can't help but think there has to be a better way that is also open.
What do you think of corporate partnerships?
by mhh5
I'd like to know what you think about corporate partnerships in the process to get public data released. (I'm not sure if Google Patents existed before the USPTO released its databases.) Do corporations that get involved in the process tend to make the process better without question, or are there tradeoffs in some areas because the corporations always want to help but then try to retain a proprietary version of the data for themselves?
CM: The theory is that the government gets some kind of valuable service (like digitization) that the government wouldn't get otherwise so it is a "win-win." But, the reality is all too often the government gets snookered and what we do is give some corporation exclusive access to some pot of data and the government doesn't get much of anything. The deal between Amazon and the National Archives was a good example of that kind of a private fence around the public domain. With a help from Boing Boing, I started systematically purchasing those public domain videos and re-releasing them in the wild. I have no problem with Amazon selling public domain video, I just hate it when they get a de facto or a contractual exclusive. (My testimony before Congress on this subject is here.)
There are lots of other examples of government getting snookered. For example, the Government Accountability Office let Thomson West get access to 60 million or so pages of federal legislative histories. At great cost to the government, they were all packed up and dispatched to West which digitized them all and then sent them back to the government. West now sells access to his amazing database. What did the government get for it's trouble? A few logins for GAO staffers. Even members of Congress need to pay to access the database! (We have an interesting paper trail on this issue.)
I'm glad you brought up the Google Patent system because I was personally involved in making that happen and I can tell you that this one is totally legit. Jon Orwant is the lead developer on this for Google and I played a small part in helping convince the White House and the Patent Office they ought to give Jon access to their data (the heavy lifting on that deal was by Beth Noveck who was the Deputy CTO at the time). Google makes all the data they got from the Patent Office available for bulk access with no strings attached. I can vouch for that because I did a mirror of their system. Last I heard Google was sending out anywhere from 1 to 10 terabytes of data PER DAY to external sources and even normally very critical folks who work in this arena have been really happy.
The big problem in the Patent Office is their computing infrastructure is a real catastrophe. Their power plant is over 95% capacity (e.g., plug in a computer, bring the building down!) and even though the Under Secretary knew that selling DVD subscriptions was silly, he wasn't able to switch over to an FTP service. He cut the deal with Google Patent and it worked out well for the government, for Google, and for everybody else.
What's the difference between the Google deal and the Amazon deal? In the case of the Amazon and GAO/West deals, the government lawyers did all the negotiating and they were totally outsmarted by some sharks in industry. But, when government has people like Under Secretary Kappos and Beth Noveck doing the negotiating, these things can work out just fine. The key is government should partner with people who want to do public service, not people who want to service the public.
Encouraging Governments?
by theNAM666
In a city such as Nashville, things as basic as business ownership and property records are not available online. In states such as New Jersey, public records such as basic corporate filings (officers, operating address/address for service of process) are accessible only for a fee.
What concrete actions can citizens confronting such situations, take to encourage accessibility and accountability?
CM: I find you need a carrot and a stick to make this stuff happen, especially at the local level. Folks like Everyblock.Com and CodeForAmerica.Org have done great working prying some of these databases loose, but there is still lots to do.
The first thing you should do is pick up the phone (or pick up your email client) and write/call the people who run the system. Ask them if you can have access to the data. Sometimes, it is as simple as that.
Other times, though, it isn't quite as simple since they want the money (or they want the control or they think this should be done by "private industry" by which they mean some buddy who is a contractor). The nice thing about any government system is somebody usually has oversight responsibilities. So, the next step is to find a city council member of state legislator who has oversight on the agency in question and ask them.
Again, life isn't usually that simple, but sometimes you win! If you can't get anywhere that way, what I usually end up doing is basically competing with the government system. Build a proxy system like RECAPtheLaw.Org did to recycle paid documents. Or, get a sponsor and buy a reasonable number of docs and build a web site that looks like it is going to be a real production system.
Then, go back again and ask. Maybe if you have eyeballs or at least have a nice web site, that is enough to get the government moving. But, if that doesn't do the job, you may have no choice but to compete with them for real, which of course requires a big commitment in time and energy and not everybody can do that. I know in the case of the Patent Office, I started pestering them in 1993, including several times when I spent 6-figure sums purchasing their data, and it still took until 2011 to crack that nut.
The real trick is focus/obsession. Pick one thing you really care about and just keep pestering them until you crack it open. If you're surfing from one opengov problem to another, showing up for a 1-day hackathon then moving on to something else, you're not going to get anywhere. Pick something real and make it your thing.
Privately Owned, Copyrighted Law
by AdamnSelene
I think I have read that the law itself cannot be copyrighted and it should be possible to make it available available to everyone. But as a techie who drafts standards and specifications, I was wondering about how far this goes--especially since Congress recently proposed enacting some of our standards into law. (They decided not to, but they read some parts into the committee records as they debated.) Can you still accomplish your project if a governmental body adopts (or considers adopting) a privately owned, copyrighted technical reference manual or set of safety standards as administrative law (or regulations that carry the force of law)? Or would such obstacles keep you from being able to digitize all of the government's laws (and archives of proposed laws)?
CM: The idea that the law has no copyright is a fundamental part of the American system of government. That applies to states and municipalities as well. The basic decision is Wheaton v. Peters from 1834 but that decision has been reaffirmed over and over. The law is sacred in the American system. You can't have equal protection under the law or due process under the law if there is a poll tax on access to justice.
When we get to a privately developed standards however, it turns into a very interesting issue. The basic mechanism is called Incorporation by Reference. The government will take some external document (such as a model building code) and incorporate the entire text to make it the law of the land. A guy named Peter Veeck was responsible for a landmark decision in 2002 when he published the Texas Building Code which was an incorporation of a privately-developed and very expensive model code. The court ruled that while the model code had copyright, the law of the land did not.
Based on the Veeck decision, my group went and posted many of the public safety codes enacted by the states. We started by purchasing model codes, finding the incorporating legislation, and concatenating the two pieces together and posting the resulting PDFs. More recently, we've done some extensive reworking of the California public safety codes, known as Title 24, converting the entire text into valid XHTML, recoding the graphics as SVG graphics, the formulas as MathML, and regenerating the PDF documents as nicely typeset documents instead of low-quality scans. You can see this work on the web but it is also available as Google Code project.
The federal government also uses this mechanism intensively, with over 2,000 standards incorporated into the Code of Federal Regulations. This is non-trivial stuff, things like all the OSHA safety regulations. The issue was recently considered by a federal group called the Administrative Conference of the U.S. which basically rolled over and endorsed the idea that it is ok for important parts of the law to cost money. (Read EFF's protest letter if you want a good critique of what they did.)
I'm not necessarily saying that government should be able to appropriate any privately-developed standard and make it available. And, I'm not necessarily saying you want OSHA bureaucrats drafting the standards. But, I do think the big standards establishment and the government regulators have cut a deal that results in the law not being available and the costs forked off on private citizens and small business with extortionate monopoly prices. I just paid $847 for a 48-page safety standard from Underwriters Labs and $60 for 2-page safety standard from the Society of Automotive Engineers, both of which are mandated by law in the CFR. They do need money to run their operations, but let me just point out that in 2009 the 501(c)(3) nonprofit Underwriters Labs paid their CEO $2,138,984 and the nonprofit SAE paid their CEO $412,578.
Ancestry.com
by An Anonymous Reader
What is your opinion about websites like Ancestry.com which make use of public records and charge a subscription fee for access? What is the incentive for the government to migrate old documents into digital form when services like these exist? Do you think Ancestry.com should be a 501(c)(3)?
CM: I'm not a big fan of for-profit corporations that have a business model of monetizing the public domain. I'm fine if they exist and fine if they make billions of dollars, but if they are the only game in town they've taken something that belongs to all of us and and turned it into their private property.
The government got snookered on the Ancestry.Com deal. They could have insisted that the raw data be available in bulk for anybody else to use. The folks that approach the government to cut these sweetheart deals argue that is unreasonable because they need a "return on investment" and the argue that if they don't get the return on investment they won't do the deal (and by extension nobody else will do the deal).
But, government can argue much harder! For example, instead of negotiating some exclusive thing with Ancestry.Com, how come they didn't ask the Internet Archive to grab the data? Or put together something creative with a couple of foundations that would pay for the digitization in return for the kind of payback the foundations like to see (e.g., good press, photo opportunity with the President, or other tools of the trade)?
You asked if Ancestry.Com should be a 501(c)(3)? Not all nonprofits do something that I think which should be an essential part of their mission, which is allow others to compete with them. I believe providing open access to all data ought to be a precondition to getting nonprofit status (an idea that Gil Elbaz has been pushing for quite some time). A good example of a nonprofit that builds walls is Guidestar which wants to be the place where you go for all your nonprofit information. The IRS should be making all Form 990 returns of nonprofits available in bulk for anybody to use, which would knock the bottom out of Guidestar's attempts to build walls and force them to stay innovative and provide value.
Pacer Problems
by onyxruby
How much difficulty do you anticipate in getting and publishing records in Pacer? If there's one system that should be free it the decisions that our courts make and yet you are charged by the page just to view the results. Are you concerned about a court taking an unkind view on your archiving what is in Pacer?
CM: PACER is an abomination. Do they take a dim view of our efforts? Well, the Administrative Office of the U.S. Courts reacted so strongly to our efforts to make their data available that they called the FBI on Aaron Swartz and cancelled the only meaningful public access system they had, which consisted of one terminal in each of 17 public libraries around the country. In this era of rapidly decreasing costs, they just boosted their access charges from 8 cents a page to 10 cents a page, arguing that this is a bargain compared to 25 cents a page for a copy machine.
What I find so disturbing about PACER is that when we did get 20 million pages of docs, we were able to conduct a comprehensive analysis of privacy violations in the courts, an analysis that led to a nice thank-you letter from the Judicial Conference and changes in their privacy rules. In other words, only when public interest groups got access to the data did we begin to address privacy issues. Public access is not just about pro se prisoners defending themselves from a jail cell, which is the view of many in the Administrative Office of the Courts. Public access is about attempts like ours (and many other folks) to make our system of justice function better. When we say we are "an empire of laws not a nation of men" that means we write down what we are doing in our courts so that it is no longer the arbitrary decisions of individuals. The paper trail is there so we can make sure the system is functioning properly. When you limit that access to those that only have a Gold Card, you pervert democracy and you pervert justice.
This principle that access to justice shouldn't hide behind a cash register goes back to the Greeks. Theseus in Euripedes' Suppliants said "when there are no public laws, one man holds power by keeping the law all for himself, and there is no more equality. But when the laws are written, the weak man and the rich man have equal justice." The PACER system is justice for the rich man.
Steve Schultze and the team at Princeton did a lot of the heavy lifting on this issue, including the very nice RECAPtheLaw.Org system they built. They've also done a lot of financial analysis that shows that the courts are not only recovering their costs for operating the expensive PACER system, they're making a huge profit (to the tune of $100 million/year) and using their excess profits to do things like buy big-screen TVs in direct violation of the E-Government act.
The basic problem on PACER is the Judicial Conference has delegated the issue to a few techie judges who think what they've built is something great. But, PACER is a hairball of bad PERL code and the result has not served the judges, the bar, or the American people very well. My only hope is that eventually, the Judicial Conference will see that their information technology is 30 years behind the rest of the Internet and feel ashamed at the travesty they have wrought. Until then, we have RECAP.
If you're interested in the issue, a couple of resources to look at are the PACER paper trail and a bit of a rant that I delivered at the Gov 2.0 summit.
How to visualize opened data?
by hardwarejunkie9
The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.
After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.
Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.
CM: Actually, most of the data I'm looking at is not tables of numbers, it is video, images, textual documents, technical papers, maps, and books.
But, I definitely get what you're saying and there are a lot of numbers. For example, the IRS Form 990s should be structured data instead of PDF documents, so extracting the data from the mass of paper is the initial challenge. There are lots of other examples of this kind of initial extraction, getting what were printed paper docs into structured data. There are some interesting tools, such as OCRopus which does layout analysis, but there needs to be much more. One of the reason we called for a Federal Scanning Commission is that we think there is a lot of directed R&D that could not only scale up mass digitization but could also work on the important value-added of extraction of structured data and handling some of the tricky issues like detecting the presence of Social Security Numbers.
Once you have the data, as you say, then what? I'm a big fan of the idea that the government starts by providing bulk data, then they provide an API, and then maybe they also build web sites and apps and other things along with everybody else out there. That's a 3-part hierarchy that Ed Felten and some of his students developed and it should be a law that applies to all government information systems that are externally facing.
The issue here is that all too often people look at a problem like "digitize all government information" and they want to see the whole stack of the solution from one place. But, I think you can do a layered approach and count on the fact that there is always somebody smarter out there and our job is to reduce the barriers to entry. So, how would I visualize the data? I have no idea, but I'd make damned sure that folks like Martin Wattenberg at Many Eyes and Hans Rosling at Gapminder knew the data was out there and then I'd sit back and be amazed at whatever they come up with. How's that for pushing the problem downstream?
Why is data access so hard?
by CanHasDIY
Can you provide any explanation as to why it is so difficult and cost-prohibitive to obtain records from the government, especially considering the abundance of laws requiring government compliance with requests for information (AKA "Sunshine Laws")?
Is it simply a matter of government employee ineptitude, or have you found evidence of a more nefarious rationale?
CM: I get that question a lot. Why would a member of Congress take deliberate steps to stop public hearings from being available? Why would a court administrator deliberately restrict access to public court documents? Usually the answer is, as Heinlein said, "you have attributed conditions to villainy that simply result from stupidity." When I'm explaining why something is so broken on a big government system, my usual answer is that there are a lot of people still stuck in the 1970s and 1980s, when information dissemination was really, really hard and it took men in white lab coats and computers the size of freight trains to process data. In other words, the problem with a lot of folks who are government gatekeepers is they just don't get the Internet and they don't get computers. In fact, usually when some senior bureaucrat is throwing stones at me, you can find younger staffers working for them rolling their eyes.
That's an optimistic view, and if I'm right things will get better. But, I'm often wrong on my predictions of the future. (I was the guy who saw TimBL demo the web in 1992 and thought to myself "interesting, but it won't scale.")
But, there is also some more nefarious stuff happening, often the accumulation of power by being able to cut exclusive deals with contractor buddies. If your life in government consists of receiving emissaries from Lockheed Martin, maybe you think you're making everybody happy by letting them build you a $1 billion computer system. Often, you think your problems are so unique that the $1 billion solution is the only answer.
And, in some cases, as we've seen from numerous GAO reports, Inspector General reports, Congressional hearings, and newspaper articles, there are some really evil people out there who think the public domain and the government is their personal business opportunity. Looting the federal government is the kind of civic crime that ranks right up there in my book with stealing cookies from Girl Scouts and selling fake medicines to sick people.
Who is the worst?
by TheBrez
Which government agency is the worst to get information from?
CM: I don't know who the worst are (there's a lot of competition for that slot), but the ones that piss me off the most are the ones that should know better.
Public.Resource.Org is a really small operation. I'm the only staff member. My part-time sysadmin is @mdkail who is pretty busy with his day job as CIO at NetFlix. My ISP is Jim Martin and his team at ISC who are kind of busy running the F-Root. My office net is supported by the amazing systems team at O'Reilly which rents me office space at below-market rates.
I'll grant you government would have a tough time getting that kind of help. But, I'm a one-man shop and we run the 4th most popular U.S. government video channel on YouTube, we're the source for a lot of the on-line presence of the U.S. Court of Appeals, and we've supported efforts for the U.S. Congress, the White House, and the National Archives. If we can do this out of Northern California, couldn't the vast resources of the federal government in Washington, D.C. do a whole lot better than they're doing now?
For me, my current bete noir is the U.S. Congress. We got half-way through processing their archives of video from congressional hearings, publishing about 31 terabytes of data. Then, a couple of staffers decided this was a bad idea and pulled the rug out from under us. They actually decided it was a bad idea to publish video from public congressional hearings.
Like any agency, Congress is a mixed bag. We had tons of support from Darrell Issa, for example, and ran a very successful pilot project for him for a year. We talked to all sorts of people on committees and in the various agencies that support the Congress. But, at the end of the day, a couple of staff members were able to decide that the public archive shouldn't be public and they terminated our project. (If you have some time, you might like to read our rather surreal paper trail.)
So, rather than the worst, I think we need to look for the most shameful, the ones that have the privilege and the power and could easily do better. I know it is in vogue to throw stones at government in general and Washington in particular, but there are times when government can be so useful and so awe inspiring it takes your breath away. Government can be that shining city on the hill but we all have to take an active part in our government to keep those lights shining bright. -
CES Recap: Gadgets and Blisters
I was in Las Vegas last week to see the 2012 Consumer Electronics Show. (Officially, it's the International Consumer Electronics Show, but no one calls it "ICES.") I've been to CES just a few times before, but usually as the finish line of a marathon drive from Seattle, rather than a plane flight from Tennessee as it was this time around. I've also never arrived with an armload of video equipment, which brings its own hassles. (Did you notice our videos?) Following are a few thoughts about the experience.
I started my trip with a friendly rubdown from a TSA functionary in Knoxville, whose carefully narrated prodding ("Now I'm going to be touching the inside of your waistband ...") failed to scan the recesses of my brain for what evil may there lurk, or take much notice of the should-be-suspicious bundle of batteries and radio-equipped mics in an evil-looking hard-shelled case that smells of gun oil, but did take a while to poke through ("check out") my bag of unremarkable clothing, paper, and sundries. It is a nice bag, after all, and one must have priorities: I could have had cupcakes.
Rooms are typically cheap in Vegas; the city is still America's gambling mecca, but gambling has become so widespread elsewhere that the draw is weaker -- and there are all those hotel rooms looking for occupants. But because CES is the biggest event of the year, even overbuilt Vegas fills up and prices are high that week. So I stayed at the Sin City Hostel, which has a friendly staff, an eclectic clientele (of young international travelers, mostly), and some of the most uncomfortable beds I have ever slept on. On the other hand, it was a thousand bucks cheaper than I was quoted for a room that week at a decent mid-range hotel on the strip, and I have a high tolerance for unusual accommodation. Unless you're actually going to CES, and would rather have a reasonable, luxurious room than a hammock-shaped lump of foam, any other week is probably a smarter time to visit.
Things to note about CES:
It's huge.
CES takes up not just an enormous convention center, but spills over into hotels both nearby and not-especially nearby. Just to touch every booth, suite, and temporary meeting corral would probably take a full-time effort for the whole run of the show, and it might not even be possible (that would make an interesting video game!), especially since some of the dealers are in town for CES, but not officially part of the show as exhibitors. When I met with Steven Isaac of TouchFire, for instance, it was in the lobby of an adjacent hotel. "Nearby" in this case still means a walk of 20 minutes or more just to cross the convention center grounds; it can take nearly that long, too, to walk entrance-to-wall in any of the several large halls at the convention center.
And you'd want to walk all the way back, too. The flashy kiosks operated by companies like Samsung, Nokia, and Motorola tend to be right near the front of the exhibit halls ("them as has, gets"), but back in the low(er) rent districts toward the back of each hall is where a lot of the most interesting stuff collects. Some of these booths might not be that interesting in themselves, but it's fascinating to see the modest public face of companies that sell the bits (LEDs, copper wire, blank circuit boards) and services (custom molding, circuit board layout, high-end fabric embroidery) that underlie even seemingly simple goods. This is also the place to find interesting devices like a ring-mounted mouse, helmet- and google-mounted video cameras in great profusion, and a wireless silicone keyboard cleverly molded to fold into what looks almost like a translucent billfold.
For pure technological art, it's hard to beat the industrial sculpture of the High End audio world. But the show is too big: I didn't get a chance this year to see the biggest trove of that, which is at the Venetian rather than the convention center. $50,000 speakers, and amplifiers in the same range, aren't in my budget (see above re. Sin City Hostel), but if you want to see where Monster Cable and Best Buy get their ideas of how to price electronics, it's enlightening to ogle some of the beautiful components and then their price tags.
It's not just electronics.
In fact, it's not just "consumer" electronics, either, as that term is generally used. No mistake, the consumer end of things isn't neglected: there are plenty of TVs (one of the crowd draws this year was LG's super-thin OLED panel (video), which I didn't have time to properly appreciate), plenty of computers and accessories, and a fair number of white goods -- stoves, refrigerators, and washing machines, all of which are ever more "electronics" in their own right. And Yes, there are shipping containers' worth of MP3 players, cameras of all descriptions, blinking and hovering toys for all ages, robots, headphones, cell phones from the mundane to the exotic, and stacks of tablets from familiar names as well as unknowns (most of them Android, and a surprising number running Android 4.0). But much of the stuff on offer is aimed squarely at institutional buyers or business users. Fancy your own collapsible walk-through metal-detector, or some high-end eye-tracking technology? A $1300 pointing device? (Or, arguably more consumer-friendly, a $900 skateboard perfect for getting around a factory floor?) This is the show for you.
And lots of the goods on display are meant for end-users, but aren't electronics themselves. There are easily thousands, probably tens of thousands, of phone cases, not to mention USB drive casings, computer bags, tripods for cameras, stands for tablets, and other accoutrements. (You can even buy the world's fanciest piece of string.) I overheard a confident claim that there were more than a hundred vendors selling computer bags; I don't doubt it, but I haven't tried to count, and it's probably a fool's errand: not every company's entry in the massive show directory gives much of a clue just what they sell.
Speaking of selling: show rules (and, I was later told, Nevada tax rules) prohibit sales of goods on the show floor itself, but they go on just the same, ranging from furtive (sideways glances and handshake-with-money) to blatant (large, handwritten sign: "Show special! $499!"). For vendors who've made the trip to CES to show off their wares to potential buyers from companies like Fry's and Best Buy as well as smaller dealers, the inventory they've brought as samples can drop in value as soon as that chance is gone; I ended up with a few iPhone 4 cases that the vendor was trying to foist on anyone not bold enough to refuse. (I don't own an iPhone, and have no plans to. Anyone want a few cases with geometric designs in red and grey?)
Besides vendors, there are organizations on hand, too. I ran short of time, or I would have have a chance to ask the folks at the EFF what a nice bunch like them was doing at a place like this.
It lasts too long.
There are a few days of special events prior to CES proper, and then four days of crowded show floor. By the end of the show, vendors are drooping in the most popular booths, and looking a bit forlorn in the lonelier ones. Elevator pitches are down to the length of a short escalator ride, and grazing show-goers are weighed down by their masses of brochures, business cards and tchotchkes. The hub-bub is impossible to avoid even on the last day, and the crowd is crushing. Even with shoes that seemed comfortable going in, I developed blisters to impress the Devil on three toes and both heels, wore out a few pens, and considered commandeering a massage chair for an hour to beat down the ache in my shoulder from hauling around my camera bag and bulky laptop.
It goes too fast.
Even though seeing it all is an impossible task, and even with inevitably sore feet, the lure of novelty is strong. I saw hundreds of exhibitors I would have liked to return to at least briefly. I had just a few minutes to play with the very attractive Mirasol screens at the Qualcomm booth, for instance (shown off in a handful of small tablets that they insist are "e-readers that play music, display video and can browse the web" rather than "tablets"), and missed out on the chance to see the new OLPC tablet.
Being a newbie with the video camera and operating without a trusty servant, I lost some time fumbling with mics, batteries, cables, and a small tripod that I bought mostly as a grip. (Why bother with external mics? Because the roar of the crowd is overwhelming to the camera's built-in mics, and only partly defeated with a shotgun mic mounted on the camera.) By the final day of the show, I had the routine down a little better, but still caught on tape — or rather, in flash memory — only a fraction of the things that caught my eye. I'm lucky that my video conspirator Roblimo has a knack for finding and assembling the most watchable bits. Good news, if hardly impressive these days: my laptop running Linux Mint had no complaints importing files from my Panasonic HD camera for sending off to him a few thousand miles away in Florida for editing.
I stayed two more nights, foam-hammock and all, sorting through marketing goo and enjoying the neon and buffet offerings along the Las Vegas strip. Because I lucked out and reached a line at the Las Vegas airport that lets the miscreants through easily (no puffer machine or full-body scanner to refuse), my trip home didn't even offer a rub-down, only the chance to catch up on a few hours of sleep. -
Ask Slashdot: What Can You Do About SOPA and PIPA?
Wednesday is here, and with it sites around the internet are going under temporary blackout to protest two pieces of legislation currently making their way through the U.S. Congress: the Stop Online Piracy Act (SOPA) and the Protect-IP Act (PIPA). Wikipedia, reddit, the Free Software Foundation, Google, the Electronic Frontier Foundation, imgur, Mozilla, and many others have all made major changes to their sites or shut down altogether in protest. These sites, as well as technology experts (PDF) around the world and everyone here at Slashdot, think SOPA and PIPA pose unacceptable risks to freedom of speech and the uncensored nature of the internet. The purpose of the protests is to educate people — to let them know this legislation will damage websites you use and enjoy every day, despite being unrelated to the stated purpose of both bills. So, we ask you: what can you do to stop SOPA and PIPA? You may have heard the House has shelved SOPA, and that President Obama has pledged not to pass it as-is, but the MPAA and SOPA-sponsor Lamar Smith (R-TX) are trying to brush off the protests as a stunt, and Smith has announced markup for the bill will resume in February. Meanwhile, PIPA is still present in the Senate, and it remains a threat. Read on for more about why these bills are bad news, and how to contact your representative to let them know it.
Note: This will be the last story we post today until 6pm EST in protest of SOPA. Why is it bad?
The Stop Online Piracy Act is H.R.3261, and the Protect-IP Act is S.968.
The intent of both pieces of legislation is to combat online piracy, giving the Attorney General and the Department of Justice power to block domain name services and demand that links be stripped from sites not involved in piracy. The problem is that the legislation, as written, is vague and overly-broad. For one thing, it classifies internet sites as "foreign" or "domestic" based entirely on their domain name. A site hosted abroad like Wikileaks.org could be classified as "domestic" because the .org TLD is registered through a U.S. authority. By defining it as "domestic," Wikileaks would then fall under the jurisdiction of U.S. laws. Other provisions are worded even more poorly: in Section 103, SOPA lays out the definition for a "foreign infringing site" as one where "the owner or operator of such Internet site is committing or facilitating the commission of criminal violations punishable under [provisions relating to counterfeiting and copyright infringement]." The problematic word is facilitating, as it opens the door to condemning sites that simply link to other sites.
The most obvious implication of this is that search engines would suddenly be responsible for monitoring and policing everything they index. Google indexed its trillionth concurrent URL in 2008. Can you imagine how many people it would take to double check all of them for infringing content? But the job wouldn't end at simply looking at them — Google would have to continually monitor them. Google would also have to somehow keep track of the billions of new sites that spring up daily, many of which would be trying to avoid close scrutiny. Of course, it's an impossible task, so there would need to be automated solutions. Automation being imperfect, it would leave us with false positives. Or perhaps sites would need to be "approved" to be listed. Either way, we'd then be dealing with censorship on a massive scale, and the infringing sites themselves would continue to pop up.
But the problems don't end there; in fact, SOPA defines "Internet search engine" as a service that "searches, crawls, categorizes, or indexes information or Web sites available elsewhere on the Internet" and links to them. That's pretty much what we do here at Slashdot. It's also something the fine folks at Wikipedia and reddit do on a regular basis. The strength of all three sites is that they're heavily dependent on user-generated content. Every day at Slashdot, readers deposit hundreds and hundreds of links into our submissions bin. Thousands of comments are made daily. We have a system to surface the good content, but the chaff still exists. If we suddenly had a mandate to retroactively filter out all the links to potentially copyright-infringing sites in our database, we wouldn't have many options. We're talking about reviewing hundreds of thousands of submissions, and every comment on 117,000+ stories. And we're far from the biggest site around — imagine social networks needing to police their content, and all the privacy issues that would raise.
Small sites and new sites would be hurt, too. A website isn't a single, discrete entity that exists on its own. A new company starting up a site would have to worry about its webhost, registrar, content provider, ISP, etc. The legislation would also raise significant financial obstacles. New companies need investments, and that would be much less likely (PDF) if the company could be held liable for content uploaded by users. On top of that, if the site was unable to live up to the vague standards set by the government and the entertainment industry, they could be on the receiving end of a lawsuit, which would be expensive to fight even if they won (and such laws would never, ever be abused). It's hard to conceptualize the internet without noting its unrivaled growth, and SOPA/PIPA would surely stifle it.
This legislation hits near and dear to the hearts of many Slashdotters; if SOPA/PIPA pass, IT staff for companies small and large are going to have their hands full making sure they aren't opening themselves to legal action or government intervention. Mailing lists, used commonly and extensively among open source software projects, would be endangered. Code repositories would need be scoured for infringing content; the bill allows for the strangling of revenue sources if its anti-infringement rules aren't being met. VPN and proxy services become only questionably legal. The very nature of the open source community — as the EFF puts it, "decentralized, voluntary, international" — is not compatible with the burdens placed on internet sites by SOPA and PIPA.
What can we do?
So, what can we do about it? There are two big things: contact your representative, and spread the word. Slashdot readers, on the whole, are more technically-minded than the average internet user, so you're all in a position to share your wisdom with the less internet-savvy people in your life, and get them to contact their representative, too. Here's some useful information for doing so:
Propublica has a list of all SOPA/PIPA supporters and opponents.
Here is the Senate contact list and the House contact list.
You can also use the EFF's form-letter, the Stop American Censorship form-letter, or sign Google's petition.
If you don't live in the U.S., you can petition the State Department. (And yes, you have a dog in this fight.)
SOPAStrike has a list of companies participating in the protest, and this crowd-sourced Google Doc tracks companies that support the legislation. Tell those companies what you think.
Further reading: Wikipedia has left their SOPA and PIPA pages up. The EFF has a series of articles explaining in more depth what is wrong with the bills. Here are some protest letters written to Congress from human rights groups, law professors, and internet companies.
Go forth and educate. -
Ask Slashdot: What Can You Do About SOPA and PIPA?
Wednesday is here, and with it sites around the internet are going under temporary blackout to protest two pieces of legislation currently making their way through the U.S. Congress: the Stop Online Piracy Act (SOPA) and the Protect-IP Act (PIPA). Wikipedia, reddit, the Free Software Foundation, Google, the Electronic Frontier Foundation, imgur, Mozilla, and many others have all made major changes to their sites or shut down altogether in protest. These sites, as well as technology experts (PDF) around the world and everyone here at Slashdot, think SOPA and PIPA pose unacceptable risks to freedom of speech and the uncensored nature of the internet. The purpose of the protests is to educate people — to let them know this legislation will damage websites you use and enjoy every day, despite being unrelated to the stated purpose of both bills. So, we ask you: what can you do to stop SOPA and PIPA? You may have heard the House has shelved SOPA, and that President Obama has pledged not to pass it as-is, but the MPAA and SOPA-sponsor Lamar Smith (R-TX) are trying to brush off the protests as a stunt, and Smith has announced markup for the bill will resume in February. Meanwhile, PIPA is still present in the Senate, and it remains a threat. Read on for more about why these bills are bad news, and how to contact your representative to let them know it.
Note: This will be the last story we post today until 6pm EST in protest of SOPA. Why is it bad?
The Stop Online Piracy Act is H.R.3261, and the Protect-IP Act is S.968.
The intent of both pieces of legislation is to combat online piracy, giving the Attorney General and the Department of Justice power to block domain name services and demand that links be stripped from sites not involved in piracy. The problem is that the legislation, as written, is vague and overly-broad. For one thing, it classifies internet sites as "foreign" or "domestic" based entirely on their domain name. A site hosted abroad like Wikileaks.org could be classified as "domestic" because the .org TLD is registered through a U.S. authority. By defining it as "domestic," Wikileaks would then fall under the jurisdiction of U.S. laws. Other provisions are worded even more poorly: in Section 103, SOPA lays out the definition for a "foreign infringing site" as one where "the owner or operator of such Internet site is committing or facilitating the commission of criminal violations punishable under [provisions relating to counterfeiting and copyright infringement]." The problematic word is facilitating, as it opens the door to condemning sites that simply link to other sites.
The most obvious implication of this is that search engines would suddenly be responsible for monitoring and policing everything they index. Google indexed its trillionth concurrent URL in 2008. Can you imagine how many people it would take to double check all of them for infringing content? But the job wouldn't end at simply looking at them — Google would have to continually monitor them. Google would also have to somehow keep track of the billions of new sites that spring up daily, many of which would be trying to avoid close scrutiny. Of course, it's an impossible task, so there would need to be automated solutions. Automation being imperfect, it would leave us with false positives. Or perhaps sites would need to be "approved" to be listed. Either way, we'd then be dealing with censorship on a massive scale, and the infringing sites themselves would continue to pop up.
But the problems don't end there; in fact, SOPA defines "Internet search engine" as a service that "searches, crawls, categorizes, or indexes information or Web sites available elsewhere on the Internet" and links to them. That's pretty much what we do here at Slashdot. It's also something the fine folks at Wikipedia and reddit do on a regular basis. The strength of all three sites is that they're heavily dependent on user-generated content. Every day at Slashdot, readers deposit hundreds and hundreds of links into our submissions bin. Thousands of comments are made daily. We have a system to surface the good content, but the chaff still exists. If we suddenly had a mandate to retroactively filter out all the links to potentially copyright-infringing sites in our database, we wouldn't have many options. We're talking about reviewing hundreds of thousands of submissions, and every comment on 117,000+ stories. And we're far from the biggest site around — imagine social networks needing to police their content, and all the privacy issues that would raise.
Small sites and new sites would be hurt, too. A website isn't a single, discrete entity that exists on its own. A new company starting up a site would have to worry about its webhost, registrar, content provider, ISP, etc. The legislation would also raise significant financial obstacles. New companies need investments, and that would be much less likely (PDF) if the company could be held liable for content uploaded by users. On top of that, if the site was unable to live up to the vague standards set by the government and the entertainment industry, they could be on the receiving end of a lawsuit, which would be expensive to fight even if they won (and such laws would never, ever be abused). It's hard to conceptualize the internet without noting its unrivaled growth, and SOPA/PIPA would surely stifle it.
This legislation hits near and dear to the hearts of many Slashdotters; if SOPA/PIPA pass, IT staff for companies small and large are going to have their hands full making sure they aren't opening themselves to legal action or government intervention. Mailing lists, used commonly and extensively among open source software projects, would be endangered. Code repositories would need be scoured for infringing content; the bill allows for the strangling of revenue sources if its anti-infringement rules aren't being met. VPN and proxy services become only questionably legal. The very nature of the open source community — as the EFF puts it, "decentralized, voluntary, international" — is not compatible with the burdens placed on internet sites by SOPA and PIPA.
What can we do?
So, what can we do about it? There are two big things: contact your representative, and spread the word. Slashdot readers, on the whole, are more technically-minded than the average internet user, so you're all in a position to share your wisdom with the less internet-savvy people in your life, and get them to contact their representative, too. Here's some useful information for doing so:
Propublica has a list of all SOPA/PIPA supporters and opponents.
Here is the Senate contact list and the House contact list.
You can also use the EFF's form-letter, the Stop American Censorship form-letter, or sign Google's petition.
If you don't live in the U.S., you can petition the State Department. (And yes, you have a dog in this fight.)
SOPAStrike has a list of companies participating in the protest, and this crowd-sourced Google Doc tracks companies that support the legislation. Tell those companies what you think.
Further reading: Wikipedia has left their SOPA and PIPA pages up. The EFF has a series of articles explaining in more depth what is wrong with the bills. Here are some protest letters written to Congress from human rights groups, law professors, and internet companies.
Go forth and educate. -
Ask Slashdot: What Can You Do About SOPA and PIPA?
Wednesday is here, and with it sites around the internet are going under temporary blackout to protest two pieces of legislation currently making their way through the U.S. Congress: the Stop Online Piracy Act (SOPA) and the Protect-IP Act (PIPA). Wikipedia, reddit, the Free Software Foundation, Google, the Electronic Frontier Foundation, imgur, Mozilla, and many others have all made major changes to their sites or shut down altogether in protest. These sites, as well as technology experts (PDF) around the world and everyone here at Slashdot, think SOPA and PIPA pose unacceptable risks to freedom of speech and the uncensored nature of the internet. The purpose of the protests is to educate people — to let them know this legislation will damage websites you use and enjoy every day, despite being unrelated to the stated purpose of both bills. So, we ask you: what can you do to stop SOPA and PIPA? You may have heard the House has shelved SOPA, and that President Obama has pledged not to pass it as-is, but the MPAA and SOPA-sponsor Lamar Smith (R-TX) are trying to brush off the protests as a stunt, and Smith has announced markup for the bill will resume in February. Meanwhile, PIPA is still present in the Senate, and it remains a threat. Read on for more about why these bills are bad news, and how to contact your representative to let them know it.
Note: This will be the last story we post today until 6pm EST in protest of SOPA. Why is it bad?
The Stop Online Piracy Act is H.R.3261, and the Protect-IP Act is S.968.
The intent of both pieces of legislation is to combat online piracy, giving the Attorney General and the Department of Justice power to block domain name services and demand that links be stripped from sites not involved in piracy. The problem is that the legislation, as written, is vague and overly-broad. For one thing, it classifies internet sites as "foreign" or "domestic" based entirely on their domain name. A site hosted abroad like Wikileaks.org could be classified as "domestic" because the .org TLD is registered through a U.S. authority. By defining it as "domestic," Wikileaks would then fall under the jurisdiction of U.S. laws. Other provisions are worded even more poorly: in Section 103, SOPA lays out the definition for a "foreign infringing site" as one where "the owner or operator of such Internet site is committing or facilitating the commission of criminal violations punishable under [provisions relating to counterfeiting and copyright infringement]." The problematic word is facilitating, as it opens the door to condemning sites that simply link to other sites.
The most obvious implication of this is that search engines would suddenly be responsible for monitoring and policing everything they index. Google indexed its trillionth concurrent URL in 2008. Can you imagine how many people it would take to double check all of them for infringing content? But the job wouldn't end at simply looking at them — Google would have to continually monitor them. Google would also have to somehow keep track of the billions of new sites that spring up daily, many of which would be trying to avoid close scrutiny. Of course, it's an impossible task, so there would need to be automated solutions. Automation being imperfect, it would leave us with false positives. Or perhaps sites would need to be "approved" to be listed. Either way, we'd then be dealing with censorship on a massive scale, and the infringing sites themselves would continue to pop up.
But the problems don't end there; in fact, SOPA defines "Internet search engine" as a service that "searches, crawls, categorizes, or indexes information or Web sites available elsewhere on the Internet" and links to them. That's pretty much what we do here at Slashdot. It's also something the fine folks at Wikipedia and reddit do on a regular basis. The strength of all three sites is that they're heavily dependent on user-generated content. Every day at Slashdot, readers deposit hundreds and hundreds of links into our submissions bin. Thousands of comments are made daily. We have a system to surface the good content, but the chaff still exists. If we suddenly had a mandate to retroactively filter out all the links to potentially copyright-infringing sites in our database, we wouldn't have many options. We're talking about reviewing hundreds of thousands of submissions, and every comment on 117,000+ stories. And we're far from the biggest site around — imagine social networks needing to police their content, and all the privacy issues that would raise.
Small sites and new sites would be hurt, too. A website isn't a single, discrete entity that exists on its own. A new company starting up a site would have to worry about its webhost, registrar, content provider, ISP, etc. The legislation would also raise significant financial obstacles. New companies need investments, and that would be much less likely (PDF) if the company could be held liable for content uploaded by users. On top of that, if the site was unable to live up to the vague standards set by the government and the entertainment industry, they could be on the receiving end of a lawsuit, which would be expensive to fight even if they won (and such laws would never, ever be abused). It's hard to conceptualize the internet without noting its unrivaled growth, and SOPA/PIPA would surely stifle it.
This legislation hits near and dear to the hearts of many Slashdotters; if SOPA/PIPA pass, IT staff for companies small and large are going to have their hands full making sure they aren't opening themselves to legal action or government intervention. Mailing lists, used commonly and extensively among open source software projects, would be endangered. Code repositories would need be scoured for infringing content; the bill allows for the strangling of revenue sources if its anti-infringement rules aren't being met. VPN and proxy services become only questionably legal. The very nature of the open source community — as the EFF puts it, "decentralized, voluntary, international" — is not compatible with the burdens placed on internet sites by SOPA and PIPA.
What can we do?
So, what can we do about it? There are two big things: contact your representative, and spread the word. Slashdot readers, on the whole, are more technically-minded than the average internet user, so you're all in a position to share your wisdom with the less internet-savvy people in your life, and get them to contact their representative, too. Here's some useful information for doing so:
Propublica has a list of all SOPA/PIPA supporters and opponents.
Here is the Senate contact list and the House contact list.
You can also use the EFF's form-letter, the Stop American Censorship form-letter, or sign Google's petition.
If you don't live in the U.S., you can petition the State Department. (And yes, you have a dog in this fight.)
SOPAStrike has a list of companies participating in the protest, and this crowd-sourced Google Doc tracks companies that support the legislation. Tell those companies what you think.
Further reading: Wikipedia has left their SOPA and PIPA pages up. The EFF has a series of articles explaining in more depth what is wrong with the bills. Here are some protest letters written to Congress from human rights groups, law professors, and internet companies.
Go forth and educate. -
OpenStreetMap Reports Data Vandalism From Google-Owned IPs
An anonymous reader writes "Following reports of misconduct by Google employees in Kenya and India, It has been found that Google IP addresses have been responsible for deliberate vandalism of OpenStreetMap data. While it is unlikely that this was a deliberate or coordinated attack by Google HQ on the competition, multiple such reports does raise the question of whether or not Google has become too big to effectively enforce its 'Don't be evil' philosophy across its massive organization." -
Google Ports Box2D Demo To Dart
mikejuk writes with an excerpt from an article at i-programmer about a neat graphics demo written in Dart: "One of the difficulties in getting a new computer language accepted by a wider audience is that there is doubt that it is real. Is it a toy language that just proves a concept or can it do real work? In the case of Dart, which is Google's replacement for JavaScript, the development is speeding ahead at a rate that is impressive but worrying. To prove that Dart is already a language that can be used, we now have a port of the well known 2D physics engine Box2D, the one Angry Birds uses, to Dart." Box2D has previously been ported to Javascript. Source is available at Google Code (under the Apache license). Note that you'll need Chromium to run the demos. -
IPv6-Only Is Becoming Viable
An anonymous reader writes "With the success of world IPv6 day in 2011, there is a lot of speculation about IPv6 in 2012. But simply turning on IPv6 does not make the problems of IPv4 exhaustion go away. It is only when services are usable with IPv6-only that the internet can clip the ties to the IPv4 boat anchor. That said, FreeBSD, Windows, and Android are working on IPv6-only capabilities. There are multiple accounts of IPv6-only network deployments. From those, we we now know that IPv6-only is viable in mobile, where over 80% (of a sampling of the top 200 apps) work well with IPv6-only. Mobile especially needs IPv6, since their are only 4 billion IPv4 address and approaching 50 billion mobile devices in the next 8 years. Ironically, the Android test data shows that the apps most likely to fail are peer-to-peer, like Skype. Traversing NAT and relying on broken IPv4 is built into their method of operating. P2P communications was supposed to be one of the key improvements in IPv6." -
IPv6-Only Is Becoming Viable
An anonymous reader writes "With the success of world IPv6 day in 2011, there is a lot of speculation about IPv6 in 2012. But simply turning on IPv6 does not make the problems of IPv4 exhaustion go away. It is only when services are usable with IPv6-only that the internet can clip the ties to the IPv4 boat anchor. That said, FreeBSD, Windows, and Android are working on IPv6-only capabilities. There are multiple accounts of IPv6-only network deployments. From those, we we now know that IPv6-only is viable in mobile, where over 80% (of a sampling of the top 200 apps) work well with IPv6-only. Mobile especially needs IPv6, since their are only 4 billion IPv4 address and approaching 50 billion mobile devices in the next 8 years. Ironically, the Android test data shows that the apps most likely to fail are peer-to-peer, like Skype. Traversing NAT and relying on broken IPv4 is built into their method of operating. P2P communications was supposed to be one of the key improvements in IPv6." -
Dutch Court Forces ISPs To Block the Pirate Bay
New submitter swinferno writes "After recent successes in Finland, Italy and Belgium, the Dutch Copyright protection organization BREIN has obtained a verdict that forces two major ISPs to block access to The Pirate Bay domains and gives them the right to submit future domains/IP addresses to be blocked in the future without court order." -
Dutch Court Forces ISPs To Block the Pirate Bay
New submitter swinferno writes "After recent successes in Finland, Italy and Belgium, the Dutch Copyright protection organization BREIN has obtained a verdict that forces two major ISPs to block access to The Pirate Bay domains and gives them the right to submit future domains/IP addresses to be blocked in the future without court order." -
Protecting Your Tablet From a Fall From Space
First time accepted submitter xwwt writes "G-Form has a nice video of an iPad launched into the stratosphere via weather balloon and protected using its new protective gear 'Extreme Edge' to see how well the gear worked in the iPad free fall to Earth. The gear is being introduced at this year's CES where our own timothy will be attending and reviewing new products. The cool part of this whole video is really that the iPad survives the free fall from space, remaining fully functional." -
Newspaper Articles Not Copyrightable In Slovakia
Yenya writes "In Slovakia, newspaper articles can be freely aggregated and archived, and are not worth copyright protection. The district court in Bratislava, Slovakia, stated in the case between news publishing house Ecopress and a news monitoring company Storin, that while the news articles manifests traces of creativity, it is not enough to be considered worth protecting the authors rights (English translation)." -
Google Accused of Interfering With South Korean FTC Investigation
New submitter DCTech writes "South Korea's Fair Trade Commission is accusing Google of methodically interfering with an anti-competition investigation into Android. 'Google deleted files and made its employees work from home in an attempt to frustrate the investigation, alleges the commission in an interview with a South Korean newspaper [machine translation]. The non-cooperation allegedly came after Google's Seoul office was raided by the commission's officials in September. The anti-competition probers were looking into whether Google's Android phones unfairly prioritize Google search and are "systematically designed" to make it difficult to switch to another option'. Now the South Korean watchdog is considering maximum fines for Google's non-compliance. Google is currently under investigation for similar anti-competition issues in Europe and the U.S." -
Shareholder Fight Threatens Mandriva SA
LinuxScribe writes "A shareholder fight (French [Google translation]) has put one of the oldest commercial Linux vendors at risk of shuttering on January 16. If Mandriva can't raise 4 million euro in capital by then, it will have no choice but to cease operations."