Obama Orders Federal Agencies To Digitize All Records
Lucas123 writes "President Obama this week issued a directive to all federal agencies to upgrade records management processes from paper-based systems that have been around since President Truman's administration to electronic records systems with Web 2.0 capabilities. Agencies have four months to come up with plans to improve their records keeping. Part of the directive is to have the National Archives and Records Administration store all long-term records and oversee electronic records management efforts in other agencies. Unfortunately, NARA doesn't have a stellar record itself (PDF) in rolling out electronic records projects. Earlier this year, due to cost overruns and project mismanagement, NARA announced it was ending a 10-year effort to create an electronic records archive."
So, how many Library of Congress equivalents worth of material are they intending to scan??
Huh?
When all the records are locked in 8x11 filing cabinets, sealed in Manila envelopes?
And the FOIA headache!
Destroying those records is hard, and some turn up - years after they were declared not to exist!
"Flyin' in just a sweet place,
Never been known to fail..."
I'd like to see 220 years of Congressional debates in digital form.
Contribute to civilization: ari.aynrand.org/donate
This is actually the perfect place to incubate distributed object stores (e.g. Hadoop on one end, something like Zimbra on the other). One namespace .gov, with sub-namespaces. With a CMIS interface. Anyone see VMWare Project Octopus yet? Well, take that times 10,000 and you have a pretty nice records management system, platform independent. There's also Alfresco which is using the JCR spec which I believe can be moved to some type of distributed backend. But it implements CMIS, has a DoD spec records management system.. So the general spec would be a CMIS framework, each department/branch/whatever makes available a service for document retrieval, central .gov listing of the services, basically what Amazon does for literally everything it does. Do not compromise, executive order Jeff Bezos style, everything is a service with a public interface. I think it is possible, but it would take a lot of just plain buying in and our government (the bureaucratic, non-political side) has gotten really really good at dragging their feet and doing nothing. The cuts are coming though, and they will have to improve efficiency just like we all have in the private sector. Of course Defense is the worst, but education can use some work as well.
Cool! Amazing Toys.
Looking at the NARA article, as soon as I saw that some big IT contract was given to Lockheed Martin I saw all I needed to know about this initiative.
We must save our children's heritage. President Obama obviously hates America and it's legacy, otherwise, why would he be trying to destroy all the paper records? Undoubtedly, he'll claim that his long form birth certificate was destroyed during the digitization effort. It's obviously an Islamic socialist fascist communist ACORN black panther George Soros funded plot of some sort. Also.
Check your premises.
In 1000 years or more, they'll have no idea what we were up to at all. At lease some paper records have a chance of surviving.
Questions worth considering:
What are the savings for going digital? (Without a doubt, they exist; if not, we'd still all be filling out forms in triplicate at work.)
What is the up front cost to convert?
How long will it take the up front cost to be absorbed by the savings?
I suspect that it will pay for itself faster than you might think. Paper records searches are expensive to say the least. And they're extremely personal intensive, not to mention inefficient and error prone.
I realize that there are people out there who will condemn anything this administration does out of hand, but at least try to pretend that you think about things before you make a judgement.
Check your premises.
Does that include the Declaration of Independence? I suppose it would be much easier to change in digital form...
So, you condemn Obama for things he doesn't do (e.g., reduce costs), then condemn him for doing things (e.g., reducing costs).
Gotcha.
Check your premises.
Remember, this was when Communism tried to invade America, and so to counter it the need for a comprehensive system of records for everyone arose.
You say it like this is a bad thing. What has the downside of Wikileaks been so far?
So there isn't a repeat of this:
http://www.archives.gov/st-louis/military-personnel/fire-1973.html
Sig this!
Is there some complication I don't understand?
Yes. More than one.
Nothing fancy, just a database of scanned forms in pdf format and the like.
There's the first problem. It's never simple.
First issue - if you're going to put documents in, you're going to want to get them out. How do you search for them? You're going to want to define the metadata, and that's a headache. Got lawyers? They'll want client and matter. But those fields are just about meaningless to anyone else. How do you resolve the incompatibility? Do you use different forms for different groups of users? How will the engineering department find the subpoena papers that the lawyers filed?
What fields are globally useful? Are they so generic that any search will retrieve hundreds of documents? Conversely, are they so specific as to make your metadata field selections horribly long and therefore ambiguous? (Free text metadata? Let's not go there.)
Remember that you've got to fill in that metadata any time you add a document. What's the balance between useful and annoying? Too many fields and nobody will want to fill it in. Too few, and you won't be able to find anything.
That's for new documents. When you first implement a DMS, you have a truckload of documents to be imported. You're not going to do it manually, you're going to use an auto-import. But how do you define the metadata for all those millions of documents you're importing? What if you have client/matter, for instance? Hopefully they're all already sorted, and you can use something like Kofax Capture, a seriously powerful and fast scanner, and separator sheets on which you can do forms recognition to define the metadata fields. But there's a lot of work involved up front to get that import working properly.
Don't forget the OCR. Hopefully all your paper documents are clean and will OCR nicely, so you can do full text indexing.
Security. Better get that set up right. Profile level security? It's more secure, but people will complain that they don't know if a document is there and they just need to request access because profile level security means if you don't have permissions to access a document it won't even show up in your search results. Groups. And by the way, remember to define the permissions on all those millions of documents you're importing.
Version control. How do you control check in and check out? Do you control check in and check out, or just audit it?
I've only just scratched the surface of a document management system. Then there's records management. You'll want to make sure your system is DoD 5015.2 compliant. Setting up the retention schedules...hopefully you've got a records retention policy already, otherwise that's months worth of work to define those policies and ensure you comply with all regulatory requirements while still balancing your need to purge/archive old records.
How does something even become a record? Hopefully you've already got knowledgeable librarians (yes, that's what they're called), and you just need to train them on your new RM system.
Are all your boxes already barcoded? Your RM system should be able to register where a record is - building, shelf, box.
You're probably getting the idea. The technology is easy. The processes are complicated, and they get exponentially more complicated as the size of your client base grows.
Yeah - as noted, the man can't win. Ask any corporate bean counter about the cost savings (that is, stopping spending money) by going digital.
Also - remember - he's the President. He doesn't make the budget. (That's tied up in the Super Committee.) And unlike the previous President, he hasn't been ruling by fiat, executive order and signing statement.
Check your premises.
Why not outsource the whole task to somebody like Iron Mountain? They could get it done quickly and economically. It might even create a few jobs.
You missed the most important question worth considering - in what formats will these records be maintained?
And Obama missed it, too. I don't see anything in his directive about it.
Good archival practice entails preserving original documents, not just scanned copies.
And if the purpose is to place documents on the Internet, then it's a GIGO situation. If you allow garbage, closed formats like .doc or .docx or .xls or .xlsx to be put on the Web, you're not serving transparency very well, and you're defeating your whole purpose of wanting to make data accessible for Web 2.0 mashups and the like.
Why won't government ever "get" it? The prerequisite question is ALWAYS, what formats? If the formats aren't truly open, then the data isn't open, either.
This would be a good time to write your congresscritter to point out the problems with undocumented file formats as well as Apis and network protocols.
There are plenty of formats that could be used that are open and vendor neutral.
If congress doesn't require that in it's funding authorization, many of our public records will be stored as word dos or in ms SQL databases.
Request your free CD of my piano music.
Why is it hard? Too many people have influence in the process. Put one person in charge who will (1) actively be involved in the project and (2) have final say on decisions. No committees, no one-off directives from politicians or bosses who don't know the day-to-day details, no approval process. Just one guy calling the shots. A lot of people will be disappointed because it doesn't do X, Y or Z, or because it uses platform P instead of platform Q, but the project will be completed and will serve its purpose.
Two wars on credit combined with high end tax cuts do tend to drain the coffers with a quickness.
Check your premises.
Free text metadata? Let's not go there.
Google and it's users seem to be doing a pretty good job of utilizing free text to locate documents.
Actually, I went and read the executive order here:
http://www.scribd.com/doc/74042394/Managing-Government-Records-November-28-2011
which itself says nothing about Web 2.0 itself. Nor about moving to the cloud. The requirements laid out there are business level, and basically translate to the following: "You have 120 days to come up with system level requirements to move our data from hard copy to soft copy."
With this said, the section from the order that you're quoting is 2-b-i. It refers to the need to have a unified solution for archiving all existing electronic communication. Would you prefer that every department and agency have its own? And here I thought you might be in favor of cutting costs and efficiency.
Finally, your link shows that Obama has issued 17 signing statements in 3 years. That's about 6 per year. Bush issued 161 over 8 years. That's 20 per year. The number of executive orders is similar. And honestly, the Democrats in congress didn't play the cloture games that the Republicans play now. They made a huge stink about the ONE appointment that the Democrats tried to block (remember the chants of "up ur down! up ur down!"). Now, the Republicans won't let a damn thing to the floor of the Senate for a vote that doesn't explicitly further their causes. In other words, false equivalance fail.
Check your premises.
Dunder Mifflin is gonna be pissed...
Faith is a willingness to accept something w/o complete proof and to act on it. Reason allows you to correct that faith.
Let's see. A difference of an order of magnitude in number of signing statements. The difference between putting the war costs in the budget - and insisting that they all be by special appropriation or would veto. The difference between starting multiple wars of occupation without a declaration and not. The difference between following the law as created by congress and accepting what congress passed (or didn't as law).
Bush was effective towards his goals. Because Obama doesn't play Bush's games, but the Republicans no longer play be the rules, Obama is not effective. That's part of my point.
No, I'm by no means happy with what Obama has (and hasn't) accomplished. But I'm sick to death of the Republicans and their Rovian games and of the charred earth policy of passing nothing that will help the country (see also abuse of cloture) and blaming Obama. The Republicans declared in 2008 that they had exactly one goal: to make sure that Obama failed. And everything that they've done during these years of crisis has been aligned with that goal, while America rots.
Finally, if you've something to say, say it for yourself as opposed to trying to spin what I'm saying into the opposite. You aren't very good at it.
Check your premises.
or just expensive.
I used to work for a DMS software company at the corporate level and while the systems are on everything from elementary schools to health care providers to governments, the retrieval is pretty damn nice IF the system is set up properly. A properly set up system for a small pizzashop takes an hour or 2, a gov agency could take weeks or months to perfect. But the user side of things was a breeze.
have you seen my sig? there are many others like it but none that are the same
As a professional historian who has worked in the National Archives in College Park, MD and at four different presidential libraries, which incidentally are also managed by NARA, I need to interject that this is an immense costly but valuable project.
Remember "the warehouse" from the Indiana Jones movies? NARA is a little like that in terms of size but are better organized. Aisle upon aisle, shelf upon shelf, row upon row, room upon room, floor upon floor, building upon building of neatly indexed banker's boxes with labelled folders of documents. The labels may have been checked by the archivists at NARA, but they may also simply be the labels affixed to the records by the source federal agency. The individual documents in folders are almost never labelled. In the course of my work, I gathered 30k digital pictures of documents over the course of two months. The acquisition process sounds deceptively easy. Look in the index, find key words and request boxes from the archivist. Then you look through folders to locate individual documents. In point of fact, I probably visually scanned 3M pages to see if they were "interesting" and photo worthy for future research, usually taking only a few seconds per page to make a snap judgement. My decisions on which boxes of documents to request were far more time consuming. What is the right keyword for talking about computers in government in 1970? If you said "information automation" then you would be right. A few presidential (Ford especially) libraries have updated electronic files for indexing which is a huge advantage.
On my trips to the archives, it was interesting to see both professionals and amateurs using a range of technologies. I saw really old school researchers using 3x5 note cards and taking notes on legal pads. They sometimes supplemented their work by photocopying really important documents at $.75/copy. Some researchers avoided this cost by using flat bed scanners which they carried in with them. Still other researchers brought in high end digital cameras and tripods. I used a digital camera freehanded. All of these people still need to find a way to actually get to physical proximity with the records. Digitalization would open up a new era in research.
On the metadata issue, most of these records already have copious amounts of metadata recorded in well-established fields that are used by NARA.
On the OCR issue, some documents have hand-written notes on them which would not be machine readable and sometimes are not human readable. It is likely that the documents will have to be digitally scanned and flagged if handwriting is detected.
Making these records available to the general public would be a huge advantage to anyone interested in government and US history. Come to think of it, in terms of size and complexity, it would be a worthy challenge for Google. U.S. government documents run back to the founding of the country and the number of documents only increases over time.
Let's be blunt... this is 2011. The task he set forth will be tied up in bureaucracy for a minimum of a year. There will be arguments such as "Where will we get the budget to do this?" and there will be arguments like "Who will do it?" and such. Even if the program gets started, the company who will provide the obviously custom system will underbid the others involved to land the contract and once the contract itself IS started, then whoever won the contract will then stop part way and claim "The agency misinformed us as to how much would need to be digitized and therefore we need more money." at which time the project will be placed on hold pending an audit to which time it will be made known that there was corruption involved in choosing the given vendor.
:(
Agencies who have thus far opted to NOT digitize their records have done so for many reasons. And even though they're being forced to digitize now, they'll find many different methods of making the process cost substantially more than it should have and drag the process out over extended periods. Let us not forget that most of these documents can only be handled by certain staff with high enough clearance given their confidential nature. If the expose writers are to be trusted, there are entire rooms of records of paper where only one highly trusted person is allowed to enter.
Let us also point out that many of these records have been written in cursive which unlike block is a screaming nightmare to handle automatically. That means that the people who hold the clearance to view the records will need to manually enter these records themselves. There will be issues of encrypting the records so that only certain individuals will have access to them. While Obama would like to make it so that there could be some central database per organization, I'd imagine that there will be many individual, sealed networks to guarantee security.
With all these issues, let's be blunt...
1) The agencies will fight it... outright AND through bureaucratic means.
2) The agencies will say "Sure... we did it" and since many of the records are highly classified, no one can actually contradict the statement... so it most likely won't happen. When a given record is asked for they'll claim "oh...we must have missed that box"
3) It will take decades to complete as there are rooms of records where only a single individual is likely to have access and I'm guessing their typing speed isn't 100wpm.
4) Obama is on his way out. Even if he survives this coming election by some miracle (he sucks as much as the next guy, but people know he sucks and are more likely to trust someone else with less of a known suckage) by the time the project is likely to start, it's almost certain whoever takes over will pull the funds from that budget within hours of getting into office.
5) For data security sake, the agencies will most likely have to design the systems themselves using whatever crap engineers they manage to find with high enough clearance that's willing to actually code document management systems. And truthfully... this isn't a TV show... if the agencies have "Super Hackers" on staff, they're probably just as lame as the self promoting idiots you find everywhere else.
So, I'm willing to say... this will cost a tremendous amount to talk about... but will go nowhere. Sad
Enron
Lehman Bros.
BP Gulf Oil Spill
Exxon Valdez
Fukushima
Bhopal (Union Carbide)
AIG
WorldCom
Washington Mutual
General Motors
CIT Group
Not to mention all the "too big to fail" financial companies that got bailed out on the backs of the taxpayers. It was just revealed this week that the amount of assets back up by the US Treasury was about 77 Trillion $US.
Efficient Business
PS. You're a fucking racist slug.
Why is Snark Required?
Yeah, software would be easy to design if it weren't for all those pesky stakeholders.
Whose purpose, exactly, does it serve if the stakeholders are disappointed?
[Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
> What are the savings for going digital? (Without a doubt, they exist; if not, we'd still all be filling out forms in triplicate at work.)
Will save a lot of time for people looking to leak documents to wikileaks. On those grounds alone, this is my favorite Obama decision to date.
Finally we may see some real freedom of information acting.
"I opened my eyes, and everything went dark again"
We did this at our office some time back. There's more to it than you might think, and I wish we'd done it sooner. First, the cost savings is pretty significant. You've no idea how much paper, files, file cabinets, and sheer storage space for all this paper that's involved until you don't have to use it anymore. Add to that the labor cost of constantly running somewhere to hunt down a paper file, or the labor cost of having someone file away a stack of papers into that paper file. It really is pretty significant if you're in an office type environment that creates paperwork. The problem is going from a hard copy environment to a soft copy environment. What do you do with all your existing hard copies? What mechanisms or hardware do you use for going from hard copy to soft copy? We opted to implement our change on a going-forward basis. Basically as of a certain date all future paperwork would be soft copy. The idea being that (at least in our case) eventually the hard copy files would age into being obsolete and destroyed. There's other issues. What kind of a system do you use to store it? Do you run your own server solution? Do you farm it out to a cloud-type solution? In our case, there was excellent proprietary management software geared to our agency, but what happens if that company goes under, or is sold? All in all, it's an excellent idea but the solution isn't as simple as one might expect.
People still have to be able to locate those scanned PDFs. Now it's electronic, you need to know where to go to get it. Is it on a network share in a well-organized directory structure? At some point it gets so close to a taxonomy that you get past the simple hierarchical mapping limits.
The traditional way to handle paper records is the method I referred to; you have them stored in a traditional vault and your RM system tracks by building/room/shelf/box. Everything is barcoded to make it quick/efficient to check in/check out.
These are not insurmountable complications. I'm really just pointing out that there's a lot of details to think about, and it has to scale - both in terms of size of company and in terms of longevity of system. It's requirements gathering for a new paradigm in a company's traditional records keeping processes (which includes the workflow, by the way - I didn't even start to get into that). Again, not insurmountable, and the technology is simple. It's the processes that are crucial.
If so, I suggest creating your own business and get ready to bid on some work. No one is going to do this in house, they're going to take bids on conversions. I used to work at a company that made quite a bit of money off of paying people, per page, to OCR patents, correct OCR errors, and tag the document in XML. And I can assure you that, because of the way the government works, the majority of the work will go to minority owned small business. The work is easy and you can get college kids to do it for peanuts.
Wise men say, "Forgiveness is divine, but never pay full price for late pizza."
While there may be some agencies that will try that "highly classified" BS story, there are inspectors and people who have security clearance which can go in and verify that even the classified documents are archived in a responsible manner. Some of those inspectors answer only to members of congress (usually something like the CBO or perhaps accountants/inspectors tied to specific committees) and are fully cleared to view any classified material as their need to know is usually within the scope of their official duties with oversight.
So yes, there are "3rd parties" that can contradict whenever somebody says "sure... we did". And if they claim compliance and it hasn't happened, those folks will find their ass nailed to the wall or possibly find themselves in prison for making a false statements like that when it isn't true.
Keep in mind here that the need to store classified materials may be made in various means, including complete secondary networks (physical layer separation on the OSI model, not mere VPN separation) or even computers "off grid" that only use SneakerNet when data needs to be shared between computers with couriers... and a stack of protocols for sharing that information that would make your head spin.
I work for a federal agency trying to implement this. It is a wonderful idea with many benefits but it is very expensive to implement. It's not just a matter of scanning documents. The scans have to be verified error free and a lot of meta data has to be manually input on the document. Mandates like this are so often passed down with out giving the agencies the resources needed to carry them out. So we so often end up getting half assed implementations.