Obama Orders Federal Agencies To Digitize All Records
Lucas123 writes "President Obama this week issued a directive to all federal agencies to upgrade records management processes from paper-based systems that have been around since President Truman's administration to electronic records systems with Web 2.0 capabilities. Agencies have four months to come up with plans to improve their records keeping. Part of the directive is to have the National Archives and Records Administration store all long-term records and oversee electronic records management efforts in other agencies. Unfortunately, NARA doesn't have a stellar record itself (PDF) in rolling out electronic records projects. Earlier this year, due to cost overruns and project mismanagement, NARA announced it was ending a 10-year effort to create an electronic records archive."
There are only about 1,300 federal agencies. Each will need it's own resources to come up with a specialized method that will work for them...
A couple of trillion should do the trick.
Think of all the jobs this will create! Can you feel the stimulus?
So, how many Library of Congress equivalents worth of material are they intending to scan??
Huh?
Slashdot community, why is this so hard? I've worked for several organisations, all have failed to impliment a simple electronic document store. Nothing fancy, just a database of scanned forms in pdf format and the like. Has anyone worked at this sort of field? Is there some complication I don't understand?
When all the records are locked in 8x11 filing cabinets, sealed in Manila envelopes?
And the FOIA headache!
Destroying those records is hard, and some turn up - years after they were declared not to exist!
"Flyin' in just a sweet place,
Never been known to fail..."
http://yro.slashdot.org/story/11/11/26/1823200/palantir-the-war-on-terrors-secret-weapon
orwell
1984
panopticon
big brother
I'd like to see 220 years of Congressional debates in digital form.
Contribute to civilization: ari.aynrand.org/donate
(the number of federal agencies)-odd number of completely incompatible digital records systems proposals.
This is actually the perfect place to incubate distributed object stores (e.g. Hadoop on one end, something like Zimbra on the other). One namespace .gov, with sub-namespaces. With a CMIS interface. Anyone see VMWare Project Octopus yet? Well, take that times 10,000 and you have a pretty nice records management system, platform independent. There's also Alfresco which is using the JCR spec which I believe can be moved to some type of distributed backend. But it implements CMIS, has a DoD spec records management system.. So the general spec would be a CMIS framework, each department/branch/whatever makes available a service for document retrieval, central .gov listing of the services, basically what Amazon does for literally everything it does. Do not compromise, executive order Jeff Bezos style, everything is a service with a public interface. I think it is possible, but it would take a lot of just plain buying in and our government (the bureaucratic, non-political side) has gotten really really good at dragging their feet and doing nothing. The cuts are coming though, and they will have to improve efficiency just like we all have in the private sector. Of course Defense is the worst, but education can use some work as well.
Cool! Amazing Toys.
Now I am truly shocked- no paper records before Truman admin?!!?
No wonder Americans are so ignorant of the past.
...omphaloskepsis often...
Let's spend some more cash. Balance my fuckin' budget, B!
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
I would think that after the wikileaks debacle the government would have learned that you are at higher risk of losing control of digitized records than physical ones. There's a corollary to Murphy's law here. Digital records, given enough time, can and will become public.
Looking at the NARA article, as soon as I saw that some big IT contract was given to Lockheed Martin I saw all I needed to know about this initiative.
We must save our children's heritage. President Obama obviously hates America and it's legacy, otherwise, why would he be trying to destroy all the paper records? Undoubtedly, he'll claim that his long form birth certificate was destroyed during the digitization effort. It's obviously an Islamic socialist fascist communist ACORN black panther George Soros funded plot of some sort. Also.
Check your premises.
In 1000 years or more, they'll have no idea what we were up to at all. At lease some paper records have a chance of surviving.
Trollish troll is trollin'.
Check your premises.
While there is a certain amount of (justified) paranoia that the government would use digitizing records as an opportunity to engage in revisionist history, I have to say that despite a desire to do so, the odds are against the government being able to pull it off.
In order for something like 1984's Ministry of Truth to function, the government would have to be far, far more competent and efficient than is ever to be likely.
Why not just pay Google to do it, the have the infrastructure experience and coding talent.
"If any question why we died, Tell them because our fathers lied."
Does that include the Declaration of Independence? I suppose it would be much easier to change in digital form...
So there isn't a repeat of this:
http://www.archives.gov/st-louis/military-personnel/fire-1973.html
Sig this!
seriously? the guy who signed the report is mr powner.
is this some sort of 4chan joke?
Not going to work. Half these agencies probably have no idea how to accomplish this. Managers will bring in consultants etc... and have no real idea of what needs to be accomplished. Designs will be worked up and constantly revised, no work will actually get done, because no one who has any clue about how to accomplish this will be on the payroll for this project. Requirement creep will set in, more useless people will be thrown at the project, deadlines will be missed by months, then years, budgets will be exceeded by millions, and 5 years from now we will have nothing to show for it...Great idea though.
Why not outsource the whole task to somebody like Iron Mountain? They could get it done quickly and economically. It might even create a few jobs.
Don't you realize making fun of Republicans around here is Politically Incorrect and will cause offense and hurt feelings?
So these systems will encourage the general populous to give them all their personal info (i.e. more than the government already has) in return for free use of some service?
Exactly, except it won't be free because it's your tax dollars.
They will however: store all your data for you in "the cloud" so it can be hacked by the cyber-terrorists which will prompt an all out internet war creating more jobs for patriotic americans such as yourself.
You missed the most important question worth considering - in what formats will these records be maintained?
And Obama missed it, too. I don't see anything in his directive about it.
Good archival practice entails preserving original documents, not just scanned copies.
And if the purpose is to place documents on the Internet, then it's a GIGO situation. If you allow garbage, closed formats like .doc or .docx or .xls or .xlsx to be put on the Web, you're not serving transparency very well, and you're defeating your whole purpose of wanting to make data accessible for Web 2.0 mashups and the like.
Why won't government ever "get" it? The prerequisite question is ALWAYS, what formats? If the formats aren't truly open, then the data isn't open, either.
they'll do all the work for free!
p.s. WHERE is all that imaging going to be stored with the shortage of hard drives?
Oh, right, the cloud, it's trustworthy.
They're between a rock and a hard drive.
What do you think Facebook has become?
This would be a good time to write your congresscritter to point out the problems with undocumented file formats as well as Apis and network protocols.
There are plenty of formats that could be used that are open and vendor neutral.
If congress doesn't require that in it's funding authorization, many of our public records will be stored as word dos or in ms SQL databases.
Request your free CD of my piano music.
Buffoon Obama has no Money
Because George W. Bush and the Republicans didn't leave him any.
IIRC, NARA didn't end the effort, it just stopped further development because it considered it complete.
Colin Dean Go a year without DRM
Two wars on credit combined with high end tax cuts do tend to drain the coffers with a quickness.
Check your premises.
You've really been smoking the hope, huh?
Ask any corporate bean counter about the cost savings (that is, stopping spending money) by going digital.
Here's the thing: the government? It's already all-digital in the places that make sense. Also, you apparently missed the "with Web 2.0 capabilities" bit in the fucking summary, which is a buzzword meaning "giant waste of money." But, hey, it's /., the summary is probably a bit off, right?
plans for improving or maintaining its records management program, particularly with respect to managing electronic records, including email and social media, deploying cloud-based services or storage solutions
Oh, so not so much. That's from the order itself, mind you!
No, this is an order that's going to waste a ton of money to not accomplish anything. The government already uses electronic records where it makes sense. Where it doesn't, they don't.
This tries to force government agencies to move "to the cloud" by executive fiat. It's a recipe for disaster and government waste.
And unlike the previous President, he hasn't been ruling by fiat, executive order and signing statement.
Yeah, not so much, actually.
Web 2.0?
Holy buzzwords batman. What does that have to do with data records?
Why do government records need any web functionality at all? Do they WANT to be hacked?
This is how governments control the masses in "1984".
So your saying this project is putting a camera in everyone's bedroom and tying rats to their faces? Fear and omnipresence was how the fictional government controled the masses in 1984, the masses knew the official history was manafactured bullshit in much the same way as people today know that Fox is a right-wing bullshit factory.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Dunder Mifflin is gonna be pissed...
Faith is a willingness to accept something w/o complete proof and to act on it. Reason allows you to correct that faith.
"And I want it to be implemented in less than four years. Then we can change all government records to show that presidents can have four terms in office. Then we'll change it to four decades. I call it my 4-4-4-4 plan. Get out your little red kindles, children. We're going to read about democracy."
As a professional historian who has worked in the National Archives in College Park, MD and at four different presidential libraries, which incidentally are also managed by NARA, I need to interject that this is an immense costly but valuable project.
Remember "the warehouse" from the Indiana Jones movies? NARA is a little like that in terms of size but are better organized. Aisle upon aisle, shelf upon shelf, row upon row, room upon room, floor upon floor, building upon building of neatly indexed banker's boxes with labelled folders of documents. The labels may have been checked by the archivists at NARA, but they may also simply be the labels affixed to the records by the source federal agency. The individual documents in folders are almost never labelled. In the course of my work, I gathered 30k digital pictures of documents over the course of two months. The acquisition process sounds deceptively easy. Look in the index, find key words and request boxes from the archivist. Then you look through folders to locate individual documents. In point of fact, I probably visually scanned 3M pages to see if they were "interesting" and photo worthy for future research, usually taking only a few seconds per page to make a snap judgement. My decisions on which boxes of documents to request were far more time consuming. What is the right keyword for talking about computers in government in 1970? If you said "information automation" then you would be right. A few presidential (Ford especially) libraries have updated electronic files for indexing which is a huge advantage.
On my trips to the archives, it was interesting to see both professionals and amateurs using a range of technologies. I saw really old school researchers using 3x5 note cards and taking notes on legal pads. They sometimes supplemented their work by photocopying really important documents at $.75/copy. Some researchers avoided this cost by using flat bed scanners which they carried in with them. Still other researchers brought in high end digital cameras and tripods. I used a digital camera freehanded. All of these people still need to find a way to actually get to physical proximity with the records. Digitalization would open up a new era in research.
On the metadata issue, most of these records already have copious amounts of metadata recorded in well-established fields that are used by NARA.
On the OCR issue, some documents have hand-written notes on them which would not be machine readable and sometimes are not human readable. It is likely that the documents will have to be digitally scanned and flagged if handwriting is detected.
Making these records available to the general public would be a huge advantage to anyone interested in government and US history. Come to think of it, in terms of size and complexity, it would be a worthy challenge for Google. U.S. government documents run back to the founding of the country and the number of documents only increases over time.
Let's be blunt... this is 2011. The task he set forth will be tied up in bureaucracy for a minimum of a year. There will be arguments such as "Where will we get the budget to do this?" and there will be arguments like "Who will do it?" and such. Even if the program gets started, the company who will provide the obviously custom system will underbid the others involved to land the contract and once the contract itself IS started, then whoever won the contract will then stop part way and claim "The agency misinformed us as to how much would need to be digitized and therefore we need more money." at which time the project will be placed on hold pending an audit to which time it will be made known that there was corruption involved in choosing the given vendor.
:(
Agencies who have thus far opted to NOT digitize their records have done so for many reasons. And even though they're being forced to digitize now, they'll find many different methods of making the process cost substantially more than it should have and drag the process out over extended periods. Let us not forget that most of these documents can only be handled by certain staff with high enough clearance given their confidential nature. If the expose writers are to be trusted, there are entire rooms of records of paper where only one highly trusted person is allowed to enter.
Let us also point out that many of these records have been written in cursive which unlike block is a screaming nightmare to handle automatically. That means that the people who hold the clearance to view the records will need to manually enter these records themselves. There will be issues of encrypting the records so that only certain individuals will have access to them. While Obama would like to make it so that there could be some central database per organization, I'd imagine that there will be many individual, sealed networks to guarantee security.
With all these issues, let's be blunt...
1) The agencies will fight it... outright AND through bureaucratic means.
2) The agencies will say "Sure... we did it" and since many of the records are highly classified, no one can actually contradict the statement... so it most likely won't happen. When a given record is asked for they'll claim "oh...we must have missed that box"
3) It will take decades to complete as there are rooms of records where only a single individual is likely to have access and I'm guessing their typing speed isn't 100wpm.
4) Obama is on his way out. Even if he survives this coming election by some miracle (he sucks as much as the next guy, but people know he sucks and are more likely to trust someone else with less of a known suckage) by the time the project is likely to start, it's almost certain whoever takes over will pull the funds from that budget within hours of getting into office.
5) For data security sake, the agencies will most likely have to design the systems themselves using whatever crap engineers they manage to find with high enough clearance that's willing to actually code document management systems. And truthfully... this isn't a TV show... if the agencies have "Super Hackers" on staff, they're probably just as lame as the self promoting idiots you find everywhere else.
So, I'm willing to say... this will cost a tremendous amount to talk about... but will go nowhere. Sad
Enron
Lehman Bros.
BP Gulf Oil Spill
Exxon Valdez
Fukushima
Bhopal (Union Carbide)
AIG
WorldCom
Washington Mutual
General Motors
CIT Group
Not to mention all the "too big to fail" financial companies that got bailed out on the backs of the taxpayers. It was just revealed this week that the amount of assets back up by the US Treasury was about 77 Trillion $US.
Efficient Business
PS. You're a fucking racist slug.
Why is Snark Required?
better improve the self-sufficiency of my command centre. lets see... lightsaber; check...
Open formats are great and all, but why isn't anyone asking about security?
Does this mean the content will be driven by web users and we can write comments on each other's records and rate them? Wow... this is going to better than 4c**n!
Who wants to bet they spend about .0 seconds figuring out how to secure this data? For the love of god, if we have no hard copies and these digital copies stand as good as the paper record... protect them. Keep the sensitive stuff offline. If someone wants the file, they can make a phone call, get authorization, and get it patched over.
Please tell me the government learned something from wikileaks.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
75 cents of every dollar of the "Bush" tax cuts go to people making less than $250K/year.
Hell, they're not even the "Bush" tax cuts any more. The originals were passed via reconciliation and were subject to a 10 year sunset. That sunset has passed, and the onus for the extension of said tax cuts is squarely on the current occupant of 1600 Pennsylvania Avenue.
If a commercial vendor can manage to write the required code in the time given, the budget given, and meet all interface demands from the various perspective users. There is still no way that certain TLA (three letter agencies) will let all their documents be indexed. Thus, the project is DOA.
insert inflammatory comment here!
IMO
Objective: Information Determines Social Change and Technology Application.
Legacy: Technology Determines Social Change and Information Application.
Yes, a paradigm change. Decision makers (.com/.gov/.mil...) are legacy mind-locked on technology always defining and providing the "Information Technology" (IT) solution.
Yes, a paradigm change. Decision makers (.com/.gov/.mil...) must go to academia to help define the new "Information Management" (IM) market place. IM must determine the required IT architecture. Fitting information into an IT solution is always foolish/simple. Fitting an IT solution to information requirements is presently dismissed by decision makers (.com/.gov/.mil...) as foolish/simple; Hence, money, failure, and time wasted.
Technology sector marketing can hook most decision makers (.com/.gov/.mil...) their bait is still seen as the food for career success. Technology sector marketing ever knew IT and are totally unfamiliar with IM.
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
Well it is a fictionaly story based on observations of actual stories. Clearly it represents an exaggerated world, but one that is so believable when we look at our own. Oceania wasn't just a place with cameras and torture with rats. Look at Wilson's own job, sitting at a desk, taking in little scraps of paper, processing them, and then disposing of them. It was very much about control of information flow, control of the historical record.
The masses knew the official story was BS, people know it today. And by know it today, I don't just mean Fox News. I mean the official story. Many comments on the embassy cables that were released basically just confirmed what everybody already knew. Sure, a lot of specifics came out, and some of those were significant and made a difference. However, was anybody shocked at the kinds of things that were found? Nobody that I have heard comment on them. Hell a lot of people believe much wilder and crazier versions of reality than all that! Some even believe fox news.... scary.... and some people really did love Big Brother.
Going back though, I think it has always been true. There were not genuine supports of BB or even masses knowing they were lied to in 1984 because Orwell just dreamed it up from his imagination, this is how it always is.
"I opened my eyes, and everything went dark again"
So I was at a data.gov meeting in the spring, and got to talking to someone from NARA ... he said their digital archive was um ... I can't remember the exact size, but I want to say it could all fit on a single disk, so given the time, 2TB or less.
Some of the government agencies have PB of storage already ... we'd love to turn it over to NARA for long term archiving, but there's no procedures in place, and I don't think they currently have the infrastructure or personnel to deal with it.
(note, I'm taking a broad definition of 'record' here; I help to manage an archive of solar physics images; our discipline's data is growing at a raw, compressed rate of ~1.5TB/day due to SDO ... some of the earth science groups have multiple satellites with those sorts of rates).
I'd say bring in the group from PDS (planetary data system) who built OODT, which is now an Apache project, the iRODS folks from UNC, the folks who did LOCKSS, and all of the other large-scale distributed data networks, and have them discuss the benefits / flaws in each one, and come up with a good solution. If they pass this off to yet another standard government IT vendor, it's going to blow up on them again.
Build it, and they will come^Hplain.
"these records have been written in cursive which unlike block is a screaming nightmare to handle automatically."
Bull, the post office has been processing chicken scratch (in milliseconds) since the 80's. IBM has hand writing recognition solutions that would blow your little mind.
I object to power without constructive purpose. --Spock
We did this at our office some time back. There's more to it than you might think, and I wish we'd done it sooner. First, the cost savings is pretty significant. You've no idea how much paper, files, file cabinets, and sheer storage space for all this paper that's involved until you don't have to use it anymore. Add to that the labor cost of constantly running somewhere to hunt down a paper file, or the labor cost of having someone file away a stack of papers into that paper file. It really is pretty significant if you're in an office type environment that creates paperwork. The problem is going from a hard copy environment to a soft copy environment. What do you do with all your existing hard copies? What mechanisms or hardware do you use for going from hard copy to soft copy? We opted to implement our change on a going-forward basis. Basically as of a certain date all future paperwork would be soft copy. The idea being that (at least in our case) eventually the hard copy files would age into being obsolete and destroyed. There's other issues. What kind of a system do you use to store it? Do you run your own server solution? Do you farm it out to a cloud-type solution? In our case, there was excellent proprietary management software geared to our agency, but what happens if that company goes under, or is sold? All in all, it's an excellent idea but the solution isn't as simple as one might expect.
I heard that Laserfiche is a great tool for document management. As it stands they are on the fore front of the anti-piracy movement, and seem to have a stable version to avoid security issues. Maybe this is what they need?
If so, I suggest creating your own business and get ready to bid on some work. No one is going to do this in house, they're going to take bids on conversions. I used to work at a company that made quite a bit of money off of paying people, per page, to OCR patents, correct OCR errors, and tag the document in XML. And I can assure you that, because of the way the government works, the majority of the work will go to minority owned small business. The work is easy and you can get college kids to do it for peanuts.
Wise men say, "Forgiveness is divine, but never pay full price for late pizza."
Seems to be the parent's objections are a classic case of "perfect being the enemy of the good." Of course the metadata definitions won't be complete enough when these documents are scanned. This leaves the user with having to digitally search through the records (if they OCR'ed decently). How is this not an order of magnitude better than going through the paper?
We're able to see and use census records from the 1800s today because they used simple recording methods.
I only have a few concerns about electronic records. ... 100 ... 500 yrs later when the computers and programs that created the records are not longer used.
* Illegal uses
* Long term migration of data
* Access 30
I've deployed electronic record conversions for 2 Fortune 50 companies. Obviously, these were just departmental, not corporate-wide. Start with the paper forms and keep it simple. 3rd normal form for DBMS is your enemy.
Paper is the best long-term storage method available today. It is possible to keep the most important data in a few tables, then export those as CSV and store them to paper in digital format. It is possible to create the simple table output from 3rd normal form, but that needs to be ensured.
Being digital is important, but losing the ability to easily access the information in future generations is also a concern, especially for government documents.
Ummm no.... source please?
The truth is that NARA's Electronic Records Archive was not scrapped. This year, the final increment of development came to an end (on schedule), the system was formally accepted (on schedule), and the program entered the O&M phase (on schedule).
NARA wasn't "ending a 10-year effort", a 10-year effort was coming to an end.
I hadn't heard anyone discuss the limited lifespan of optical and other eDoc mediums. They have very short/finite life spans (5-20 yrs) and have to be remastered or transferred off to another medium. Also.. would such a system be "safe" from technology attack like EMP, etc? Paper is cheap, and good for a couple hundred years.
Tweeks
Reading the title, I for a microsecond thought this was some kind of culture preservation project, digitizing all the music on analog media...
While there may be some agencies that will try that "highly classified" BS story, there are inspectors and people who have security clearance which can go in and verify that even the classified documents are archived in a responsible manner. Some of those inspectors answer only to members of congress (usually something like the CBO or perhaps accountants/inspectors tied to specific committees) and are fully cleared to view any classified material as their need to know is usually within the scope of their official duties with oversight.
So yes, there are "3rd parties" that can contradict whenever somebody says "sure... we did". And if they claim compliance and it hasn't happened, those folks will find their ass nailed to the wall or possibly find themselves in prison for making a false statements like that when it isn't true.
Keep in mind here that the need to store classified materials may be made in various means, including complete secondary networks (physical layer separation on the OSI model, not mere VPN separation) or even computers "off grid" that only use SneakerNet when data needs to be shared between computers with couriers... and a stack of protocols for sharing that information that would make your head spin.
I used to work on a project that was colocated with the AIRR, we imaged millions of pages, created JPEG images for people to use in our application due to network speed issues and the application we used for users to make notes on the documents wouldn't work with the TIFF images we created for NARA.
We were auditing over a hundred years of accounting for how the US government dealt with Native Americans and had Historians and Accountants both working on the program. We had our own little Indiana Jones warehouse.
Several times our leadership recommended we just scan whole boxes and digitize them, due to budget restrictions we had to pull and digitize relevant information (thousands of documents, millions of pages).
I also was part of a group outing when we delivered a TB or so worth of our records to College Park, fun times.
Once everything is "digitized" then Big Brother can much more easily search for and retrieve any documents that are requested via FOIA.... almost as easily as Big Brother can then digitally "adjust" the contents of those very documents to make them say whatever Big Brother wants them to say, depending on whoever is requesting them, of course.
I work for a federal agency trying to implement this. It is a wonderful idea with many benefits but it is very expensive to implement. It's not just a matter of scanning documents. The scans have to be verified error free and a lot of meta data has to be manually input on the document. Mandates like this are so often passed down with out giving the agencies the resources needed to carry them out. So we so often end up getting half assed implementations.
Paper fades/discolors, is subject to flood damage, and burns at the slightest spark - which weapons far simpler than EMP can readily provide. You can't attach a search engine to paper. Storage in large quantities also becomes an issue.
/. community to argue the use of paper over technology... you would have better luck going on the Top Gear website and asking people if they would prefer a bicycle over a sports car =P
Disk storage data can be accessed quickly and indexed for optimal search. Most SANs have RAID5 built in and fire off a variety of bells and whistles when a drive goes bad, so lifespan isn't really an issue. Futhermore, data can easily be copied across several geographic locations (some of them buried deep underground) to pretty much eliminate any threat of EMP or natural disasters.
As a last point, you are asking the
Deliotte and Accenture were heard celebrating!
What you decide to digitise it a policy choice not a technology one.
Whenever some analogue-world artifact is digitised - and the original artifact is destroyed - there is a danger of information loss.
Examples: what paper stock was used? What hand-written annotations are made? by whom? when? says who? what does the paper smell of? (think I'm kidding? check out this post ); has the digital copy been modified since original analogue information was captured?
See also "In praise of paper" to understand more of the arguments - it's not a simple technology issue, ever... see 'In Prasie of Paper'
For years they thought digitize meant feeding it into machines that turn it into tiny bits....of paper.
I haven't thought of anything clever to put here, but then again most of you haven't either.
So, unable to achieve godlike perfection, we ought to do nothing? Besides, why is this an XOR thing? A high degree of precision is easy in dimensions that are curated (DM'd) and Google performs extremely well at searching in dimensions that weren't anticipated. In any given domain it is reasonable to expect the people currently filing bits of paper to know how documents need to be tagged, simply because they already do the curation. In many cases I fail to see any value in transcribing records; apart from matters of ownership, contract or engineering, bureacratic records are best forgotten.