Washington State Archives Go Digital
prostoalex writes "USA Today and dozens of others report that Washington state archives went online. Over the past two years project participants scanned 1 million documents issued by state and country authorities. The archive is located in my alma mater Eastern Washington University (go Eagles!) The 800 terabyte storage system was developed by Microsoft and EDS."
Although, it has to be said, I hope they make everything accessable for *everyone*, regardless of OS and browser. No doubt a lot of researchers would be using OS X/Linux/Firefox.
Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/
... relate to state anti-competitive actions against Microsoft themselves? :)
I feel safer already.
Posting on EDS site
From the NYSE Site
%blow
%blow: No such job
^how did the sex change go?
Modifier failed
Wow, I'd like that instead of that old 8 GB harddisk in my network server.
God bless you, Toph.
The Province of New Brunswick Provincial Archives have been like this for quite some time now, with birth, death, marriage certs and census records. I have been able to search for information about my family history online using their handy dandy search tool, as well as visiting the Archives themselves at University of New Brunswick. It never occurred to me that others might be trying catching up, but I guess that this type of service isn't something that most governments deem necessary for the public.
"Well you're not Fiona Apple, and if you're not Fionna Apple, I don't give a rat's ass."
One thing that they have to concentrate on in the future when the number of records grow fast is a nice search strategy. Time taken for search is one thing that can make the mass use this facility.
As far as i have tried it out in these few minutes, the search strategy is good... there are separate search that researchers can use to know historical data and the like... This is great.
The 800 terabyte storage system was developed by Microsoft and EDS.
:-)
How would windows have enough drive pointers to be able to access this? Would there be a drive AG:?
-Pete
The site seems to be slowing a bit, so I can't find details, but surely there are some privacy concerns here. I know that this just replicates the publically avaliable material in the physical archives, but there is a big difference between going to the archives and digging through books, and harvesting info over the web, especially given the sheer amount of info on the site, many of them recent records.
just wait for Google to index it.
"The 800 terabyte storage system was developed by Microsoft and EDS." I've always wondered what a BSOD looked liked on a system with 800 terrabytes... I wonder what OS they will be using for their systems.
And I'm still ignoring the fact that machines grow old and has to be replaced. It's a known fact that disks break so You'll need backup but how long could You keep an old storage solution around. Sooner or later You'll have to migrate old backup data to newer media.
Note that I don't think that this is a bad idea, moving everything online, but there are concequences that I don't think that everyone has thought of.
Where I live one can go into the royal library and find (and read) an official document written by someone in the 16:th century, but can we be sure that 100 or even 50 years from now someone can read a DLT300-tape?
You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
Dang, there are no maps in there. The best stuff in the archives at town hall have always been maps of the town and blueprints of various buildings. But nobody scanned those in the archives. Oh well.
The GeekNights podcast is going strong. Listen!
The 800 terabyte storage system was developed by Microsoft and EDS.
Microsoft was able to confirm the system is expandable, and contrary to previous rumours, will infact have enough disk space to install Longhorn.
They do however state, that to do anything actually useful, more upgrades will be required.
liqbase
I opened up the treasures piece from the collections set on the right of the main page. http://www.digitalarchives.wa.gov/Content.aspx?txt =records#topFiveRecords
and tried to open the state seal from http://www.digitalarchives.wa.gov/content/Top5/Ter ritorialSeal.djvu
What application is associated with the djvu extension?
http://www.digitalarchives.wa.gov/
"God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
Over the past two years project participants scanned 1 million documents issued by state and country authorities.
If only someone had told them about Kinko's.
I wouldn't trust M$ with 640 KB of my sensitive data!
Oh, great. When (not if) Microsoft is brought to court for antitrust violations again, all MonkeyBoy has to do is enter a secret backdoor password and, *poof* all those documents containing damning evidence suddenly go "missing" -- or perhaps they simply disappear from the index as if they never existed.
Would you trust a known pedophile to give your kids a bath? If not, then why trust a convicted monopolist who is on the record for purgery with critical documents?
Tired of FB/Google censorship? Visit UNCENSORED!
This is how 800TB can be digitaly locked forever http://www.emc.com/products/systems/centera.jsp/ and still be online.
http://www.leadmagnet.50megs.com
The system isn't 800TB, but will scale to 800TB, according to this EDS press release. In fact, given that they've spent a mere $2.5M (powerpoint!) there's not a hope in hell that they've got 800TB! The powerpoint says it's a 5TB EMC SAN & an ADIC tape library for backup.
An interesting point is that they're delivering the documents using DjVu by Lizardtech, which is GPLd, and developed by the creators of DjVu in conjuction with LizardTech (after a period of LT not-getting-it). The DjVuLibre home page is here. LizardTech still have the best encoders for the format.
I have family in Washington State. Too bad thier information is NOT in the database. They have died, born, and married there and not scrap of data is in this database.
So it is not all that it is cracked up to be.
cheaper product based on open source:
linux based, postgres db.
Not in full release, not free, but very open.
http://www.archivas.com/
Is this the same EDS that is currently fleecing the US Navy for Hundreds of Millions of dollars in, what has been described by everyone I've talked to as extremely poor computer and network support?
FTA -- "If you mention NMCI, there is an automatic groan," he says. "I think the phrase is, 'I've been NMCI'd.' "
The Article
....developed by Microsoft and EDS. At a cost of $6,943,349,453,234,213,166,784.23. Sorry - I've just never seen anything done cost effectively when EDS was involved.
Free Scotland!
Run and hide. If there was ever a combination of resources destined to fail it's Windows and EDS. If it works at all I'll be surprised. If it keeps working I'll be amazed.
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
What years? This database seems to be limited to older archives... the most recent year for a record I found was 1965.
-Joe
Problem is that the notion that tape is an archive format. It's not, it's a backup format (catastrophic recovery). It's only an archive format while you have the capabilities to read it (if you can read it)
An archive should be a Write one, read many file system with Active on-disk (not hierarchical on tape) information with multiple copies preferably at multiple sites (depending on how valuable the data is), with programs for active file validation (you need to be sure the file is still there, and still the same, every so often) is what an archive should be.
In addition, file format migration will be needed. Extend your active validation to include format migration, and policy changes (keep fewer copies of the old version around). And file format changes most likely will need some human checking to make sure things worked (big part).
What are the odds they forget to reboot and it all crashes after 30 days?
Truly a match made in heaven.
Maybe your poor spelling or grammar affected your search.
They're hosting the archives here at Eastern? You know something's going really wrong when Slashdot is your source for current events for the university you work and attend classes. Where's my ginseng tea?
The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state. The Microsoft focus is to be expected. Redmond pays a lot of money to keep universities in our state in line. Rest assured Eastern is loaded with disgruntled Linux users being forced to learn Visual Basic in their IT courses. There are even a few IT profs pushing for changes, though they haven't made much headway in their efforts.
Also of note, Administrative login is available here:i n/logon .aspx?ReturnUrl=%2fWADAAdmin%2findex.aspx
https://www.digitalarchives.wa.gov/WADAAdm
It appears to not be succeptible to a common IIS/ASP script injection bug: ' or 0=0 --
Good work.
I wasn't able to find my birth record yet. Any mention of how much data is not online yet?
Free Dell PC
5TB? how much is that in Libraries of Congress?
800TB to store 1M docs means 800MB:doc. It seems cheaper, for storage, transmission and searching, to store most of these docs, which were typed on a machine like a typewriter or wordprocessor, as events and a context. Each doc's colophon in the database would include the font and layout parameters of the process that created the doc, like "1973 IBM Selectric", "TABS: 5, 10, 15", etc, and then a sequence of "UI" events, like keys struck and marks applied. The server could regenerate the docs through simulation, or a separate archival process could reduce all that to PostScript or some other vector format. Then the original docs themselves could be stored in an low-pressure argon-filled crypt, for infrequent exhumation, in the event that some privileged Washingtonian dodges assignment in Iraq and later runs for president.
--
make install -not war
Are there any systems that actually have this much storage now? What comes after the terabyte? Quadrabyte?
Who is "Go Digital", and why are they archiving it? :)
Has anybody figured out the date formats? I'm seeing a lot like this "02001987". OK, it's either mmddyyyy or ddmmyyyy. But what does 00 mean for month or day? Unknown? It's hard to imagine that they don't have an exact date of death for someone who died as recently as 1987. Or is a zero-based counting system (00 = Jan, 01 = Feb, ...)?
It's interesting that the death records include Social Security Numbers. Anybody want to harvest a few thousand inactive SSNs?
I rue the day that I would need a job from EDS. Wait, let me start again. When I go to work, I go in to work. I chose to be a network engineer and PC tech because I love my job and I like to help people. I didn't do it so I could screw people out of money. If you're in it for the cash, more power to you, but you could at least provide good service for good money, which is not the case from what I've heard from the people that have been unfortunetly struck by EDS.
Actually, the only reports that I have found of good quality of service are those that come out of Washington, and SUPRISE!!, that's where the money comes from.
Sounds to me like corporate ass kissing.
I'm a bloodsucking fiend! Look at my outfit!
A TERABYTE IS 1000G. And 1G IS A 1000M. So A TERABYTE IS 1,000,000 MEGABYTES. Right?
there are 1 million documents in this database? And it's 800 terabytes? So each doc is 800m in size?
800m EACH? That's freaking huge. Even if the thing is only 8T in size (far more reasonable), each doc is still 8M in size. Again, pretty massive.
is this like that time MSFT bragged about their 1T DB of geological data, and then Oracle
built the same database, with the same content using only 300G of space?
Inefficiency is nothing to brag about...or is it?
JON
At least for marriages, I doubt the database is complete/finished. Marriage records for myself (King County), my parents (Clark County) and my in-laws (King County) are not there. Death records are there though--at least for my family. As others have said, I too would be afraid of people datamining this for personal gain. I hope there are decent safeguards against this.
I worked for the Wa Secretary of State who implemented this system and believe that it will only be a matter of time before this goes "KaBoom" if they provide the technical support.
The organization has huge problems keeping competent people and I believe the technical staff providing the oversight have the appropriate skills to work at MacDonalds, not providing services in state government.
and then you have Microsoft and EDS mixed into this too.
Can you say "vulnerability's"?
I thought you could!
All local records do is make it harder for people without money to get them. People with money have always been able to hire private investigators to track down the records they want. This makes it easier for people without money to do so, or for people who are vaguely interested in something but don't care enough to hire a private investigator (or do it themselves) to do so.
If you really want to stop abuse, you'll have to make them completely private, not just "private but inconvenient to get to".
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Family moved into Washington 1924 and been there ever since. At least 22 births, 15 deaths, 8 marriages have happen in the state in those insuing years.
Oh well my family must not exist.