Slashdot Mirror


Washington State Archives Go Digital

prostoalex writes "USA Today and dozens of others report that Washington state archives went online. Over the past two years project participants scanned 1 million documents issued by state and country authorities. The archive is located in my alma mater Eastern Washington University (go Eagles!) The 800 terabyte storage system was developed by Microsoft and EDS."

131 comments

  1. Well, by chewy_2000 · · Score: 4, Insightful
    Personally I would find this, or something like it, very useful in research, even as just an undergrad History major. The amount of times I've wished for something like this while digging around in musty old archives...

    Although, it has to be said, I hope they make everything accessable for *everyone*, regardless of OS and browser. No doubt a lot of researchers would be using OS X/Linux/Firefox.

    1. Re:Well, by idamaybrown · · Score: 1

      "a lot" You mean more than 2 people? ;)

    2. Re:Well, by ImaLamer · · Score: 4, Informative

      I'm using Firefox (from Windows sadly) and I can access the content just fine.

      As for OSX and Linux users, there is a plug in for viewing the content needed. But they report to support OSX and "UNIX". The plug-in is called DjVu and has an open source equivalent at sourceforge (with RPMs, OS/2 and even Cygwin support).

    3. Re:Well, by AviLazar · · Score: 1

      No doubt a lot of researchers would be using OS X/Linux/Firefox
      They would? How do you know this?

      --

      I mod down so you can mod up. Your welcome.
    4. Re:Well, by Anonymous Coward · · Score: 0

      LizardTech's DjVu plugin is avaialable for Windows, OS X, and OS 9 here. The Windows version does work on Firefox, although not as well as it does in IE, unfortunately.

      The UNIX versions are all based off of DjVuLibre, and are not made by LizardTech. Those can be found at sourceforge.

    5. Re:Well, by ImaLamer · · Score: 1

      Anyone else hear that echo?

  2. WWW address by JamesD_UK · · Score: 5, Informative

    Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/

    1. Re:WWW address by El+Cubano · · Score: 4, Informative

      Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/

      FYI. Turn on cookies or you receive this extremely helpful error message:

      An error occured on the site. Please try again or come back another time.

      Otherwise, it's pretty cool.

  3. How many terabytes in the archive ... by blowdart · · Score: 2, Funny

    ... relate to state anti-competitive actions against Microsoft themselves? :)

    1. Re:How many terabytes in the archive ... by undertow3886 · · Score: 1

      Do a search for Bill Gates from the front page. Only thing that shows up is a death record :-/.

      --
      Sick of people knocking on Gentoo's greatness in completely unrelated .sigs? Me too!
    2. Re:How many terabytes in the archive ... by TheAntiCrust · · Score: 1

      ":-/" ???

      I hate the twenty second waiting period

    3. Re:How many terabytes in the archive ... by Anonymous Coward · · Score: 0

      The symbol ":-/" is a depiction of a penis hitting a wall. Hope this help! OMG!LOL!!LOL!!!1BBQ!!!1ONE!

  4. Hurrah by Anonymous Coward · · Score: 3, Funny
    "The 800 terabyte storage system was developed by Microsoft and EDS."
    Bill Gates and H. Ross Perot; together at last!

    I feel safer already.
    1. Re:Hurrah by nick-less · · Score: 1

      Bill Gates and H. Ross Perot; together at last!

      hm, according to this link, their CEO is a guy called Michael Jordan ...
      this name seems to attract money alot better than mine ...

  5. Just another link (or two) by Neumsy · · Score: 5, Informative
    --
    %blow
    %blow: No such job

    ^how did the sex change go?
    Modifier failed
  6. 800 TB! by microtoph · · Score: 0, Redundant

    Wow, I'd like that instead of that old 8 GB harddisk in my network server.

    --
    God bless you, Toph.
  7. NB Archives by X-Phile · · Score: 5, Informative

    The Province of New Brunswick Provincial Archives have been like this for quite some time now, with birth, death, marriage certs and census records. I have been able to search for information about my family history online using their handy dandy search tool, as well as visiting the Archives themselves at University of New Brunswick. It never occurred to me that others might be trying catching up, but I guess that this type of service isn't something that most governments deem necessary for the public.

    --
    "Well you're not Fiona Apple, and if you're not Fionna Apple, I don't give a rat's ass."
    1. Re:NB Archives by Anonymous Coward · · Score: 1, Interesting

      Yep! And my girlfriend's uncle works for NB's Provincial Archives! He talked to me about how simpler it's made their lives not to have to answer to a bunch of questions by phone and referring people to the website instead.

  8. Search capabilities by vinukr · · Score: 4, Insightful

    One thing that they have to concentrate on in the future when the number of records grow fast is a nice search strategy. Time taken for search is one thing that can make the mass use this facility.

    As far as i have tried it out in these few minutes, the search strategy is good... there are separate search that researchers can use to know historical data and the like... This is great.

  9. drive letters by chargen · · Score: 2, Funny

    The 800 terabyte storage system was developed by Microsoft and EDS.

    How would windows have enough drive pointers to be able to access this? Would there be a drive AG:? :-)

    -Pete

    1. Re:drive letters by Anonymous Coward · · Score: 1, Informative

      That's right, you'll put all the data online in single partition disks hung off one server. Why didn't I think of that?

      While there is nothing to stop an NTFS partition being 800Tb, it is far more likely that some sort of nearline hierachical storage is being used, the sort of system that is used the world over in workflow/image systems.

    2. Re:drive letters by Anonymous Coward · · Score: 0

      ARRRRRRRRRRRRRRRRRRRRRRRRGH:\>
      "Drive ARRRRRRRRRRRRRRRRRRRRRRRRGH?"
      "That's what it says, my lord. Pray shall we search it?"
      ARRRRRRRRRRRRRRRRRRRRRRRRGH:\> dir
      Directory listings for ARRRRRRRRRRRRRRRRRRRRRRRRGH
      There is no grail here, you dirty Windows kniggit!
      I blow my nose in your general diirection! Your mother was a hamster and your father was a dirty peeg!
      ARRRRRRRRRRRRRRRRRRRRRRRRGH:\>
      "Hmmmmmmmm ..."

  10. Privacy by chewy_2000 · · Score: 4, Insightful

    The site seems to be slowing a bit, so I can't find details, but surely there are some privacy concerns here. I know that this just replicates the publically avaliable material in the physical archives, but there is a big difference between going to the archives and digging through books, and harvesting info over the web, especially given the sheer amount of info on the site, many of them recent records.

    1. Re:Privacy by Anonymous Coward · · Score: 0

      So you want security by obscurity? You must be a MCSE drone.

    2. Re:Privacy by chewy_2000 · · Score: 3, Insightful

      Try reading what I said. Data mining, the wholesale collection of personal data, is made, I assume, an order of magnitude simpler using an online system vs microfiche or whatever. I would consider this an abuse of the system. I am in no way suggesting the records should have access restricted, this is just a new problem raised by the tech that needs to be addressed.

    3. Re:Privacy by BurritoWarrior · · Score: 1

      Making retrieval difficult is not part of anyone's right to privacy.

    4. Re:Privacy by chewy_2000 · · Score: 1

      And once again I am not suggesting that retrieval should be made difficult, merely pointing out that there is potential for abuse that needs to be considered.

    5. Re:Privacy by lamona · · Score: 2, Insightful

      Absolutely. Making "public" records available universally is a different meaning to "public" in public records in situ. Although the word "public" was used, it really meant the local community. When you change that to "everyone in the world with internet access" you change the context in which the data resides... and for data, context is everything. For one thing, it narrows the scope to a small portion of the population so that accurate identification (or, conversely, less mistaken identity) is facilitated.

      Making it difficult to get to the records DOES provide some privacy, and that is the level of privacy to which we've become accustomed. It's like the difference between your Aunt Mable having a listed phone in the phone book for her town of 2,000, and having her phone listed in the internet white pages. She allowed her phone to be listed because she is a part of that community and feels secure there. She probably doesn't feel the same way about being "visible" beyond that community.

      What this means is that we are going to have to either revise how we define "public", or we're going to have to get used to a different view on privacy. I'd prefer the former.

      no sig, .sig

      --
      I just read /. for the amusing .sigs
    6. Re:Privacy by Anonymous Coward · · Score: 0

      The letter "e" might be offensive to some people. I suggest you stop using this letter, just in case.

    7. Re:Privacy by dtjohnson · · Score: 1, Interesting

      This is a significant erosion of privacy. Governments require you to provide a lot of information for all sorts of things. Now, they are using new technology to make all of this information available to anyone anywhere in the world with a casual 2-minute search. Where will this stop? Tax records, medical records, personal property records, lawsuits, judgments, military records, etc. may all soon be posted online in this way. This is a first step towards that sort of future where anyone can easily sniff out all sorts of information about anyone. That will be a great tool for stalkers, criminals, identity thieves, etc. but for the rest of us, the loss of even a fig leaf of privacy will make our lives less enjoyable. An obvious first step would be to only provide this sort of information for people who are deceased.

    8. Re:Privacy by Idarubicin · · Score: 1
      Tax records, medical records, personal property records, lawsuits, judgments, military records, etc. may all soon be posted online in this way.

      Property (real property) records are already public domain--as they should be. There's no good reason for the government not to tell you who owns what land. Whether you find out at the county tax assessor's office or on the Internet is irrelevant.

      Aside from property tax information, I don't foresee other tax information being released to the public. Knowing the assessed value of other properties gives a landowner the opportunity to evaluate the reasonableness of his own assessment. Realistically, concealing the value of a property is nearly impossible anyway--it's right out in the open. With respect to income tax and other tax info, there is no reason to release that information. If for no other reason, the IRS won't do it because they would fear more people would lie on their taxes.

      Lawsuits and judgements--unless sealed by a judge's order--are also properly public domain. We're supposed to have an open court system, remember? Secret trials are generally a bad thing.

      Medical records of individuals have always been protected, private information. I don't foresee this information being opened online any time soon, nor can I come up with any reason why anyone would think it a good idea. Aggregated medical data (epidemiological information) is in many cases already published. Local and federal organizations (the CDC, for instance) regularly supply public health information. You have to know how many cases of West Nile disease there are in town before you know whether spraying for mosquito larvae is a good use of resources. Making this sort of aggregated information publically available serves the public good, and should do no harm.

      Where military and immigration records are discussed in the article, the information is described as "historic". Presumably, the people involved are long dead, and the only individuals interested will be family or genealogists.

      --
      ~Idarubicin
  11. Google by Anonymous Coward · · Score: 0

    just wait for Google to index it.

  12. BSOD? by SnowCrashed · · Score: 0, Troll

    "The 800 terabyte storage system was developed by Microsoft and EDS." I've always wondered what a BSOD looked liked on a system with 800 terrabytes... I wonder what OS they will be using for their systems.

    1. Re:BSOD? by Anonymous Coward · · Score: 0

      According to netcraft, it's Server 2003/IIS 6.0. Though that shouldn't be a surprise.

    2. Re:BSOD? by Anonymous Coward · · Score: 1, Funny

      how about the 50 hours scandisk run after the crash ?

    3. Re:BSOD? by inKubus · · Score: 1

      No kidding. And 800 terabytes--that's like 33.33TB PER DRIVE LETTER!! (excluding A: and B:, of course)

      --
      Cool! Amazing Toys.
    4. Re:BSOD? by Anonymous Coward · · Score: 0

      Only? I make it closer to 13,000 hours.

  13. Digital twilight. by haeger · · Score: 4, Interesting
    How about the "Digital Twilight" that people have talked about? One of the big problems with these kind of archives is that they aren't permanent the way that paper is. Washington could very easily end up the way that Stasi did in East Germany. They have several hundred tapes of data with information about every spy in the west on them but the information is still "safe" since noone no longer knows how the data was saved to disk or which file format was used.

    And I'm still ignoring the fact that machines grow old and has to be replaced. It's a known fact that disks break so You'll need backup but how long could You keep an old storage solution around. Sooner or later You'll have to migrate old backup data to newer media.

    Note that I don't think that this is a bad idea, moving everything online, but there are concequences that I don't think that everyone has thought of.

    Where I live one can go into the royal library and find (and read) an official document written by someone in the 16:th century, but can we be sure that 100 or even 50 years from now someone can read a DLT300-tape?

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    1. Re:Digital twilight. by Anonymous Coward · · Score: 0

      systems like this are constantly updated, therefore the hardware that runs them is constantly updated, as new storage technologies come online the old storage archive will be converted to the current technology, be it tape, optical etc.

      The only time that you really get a problem is where the data isn't constantly updated, or used, like the BBC Micro Doomsday disk.

    2. Re:Digital twilight. by LousyPhreak · · Score: 4, Insightful

      you still can move the data from the old system to a new one if its at the end of its lifetime.

      harddrives can easily be replaced (assuming its a sort of raid with hotswap)

      sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)

      and as long as the data is safely stored the access mechnism shouldnt be a problem but thats just my .02

      --
      -- Karma: beyond good and evil - mostly affected by posting political
    3. Re:Digital twilight. by mattpalmer1086 · · Score: 1

      Strangely enough, we at the UK National Archives have been involved with a project to rescue the old BBC Domesday Disk. It's coming along quite nicely, by both writing new software and extracting the old data, and with an emulator. We've even got some original working hardware now.

      As another poster in this thread rightly points out, long term digital archiving requires cycles of storage migration. But while this is onerous, it's not the biggest challenge. The biggest challenge is the format that data is written in, and the lack of documentation for systems that have behaviour (dynamic rather than static data). You have to ask the question, who really owns your data - you or the software vendor who encodes it in a proprietary format...? We're seriously looking at open standards for all data formats, or open source formats where standards are not available. We hope that in the future it will become unmarketable for vendors to sell software whose data is unreadable without their software. Infrastructurally, as a society this is too important to be held in the hands of private companies.

      Some people believe that emulation can solve these problems, but most people are going down the road of a slow continuous cycle of migration of data to other formats, for both preservation and presentation (which for digital records may involve different formats).

      To support this the National Archives have recently set up a free for use file format registry, in which we hope to record all essential details of file formats, and the migration strategies to move between them. The data model is still being refined, and we're looking to develop format migration tools that use the data. It's at http://www.nationalarchives.gov.uk/pronom/

      Matt Palmer
      Digital Preservation Department
      The UK National Archives

    4. Re:Digital twilight. by haeger · · Score: 1
      you still can move the data from the old system to a new one if its at the end of its lifetime.

      Yes, you are correct about this but scale it up a bit. If you have to change media every 10-15 years then the data migration becomes a full time job for someone.


      sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)

      and as long as the data is safely stored the access mechnism shouldnt be a problem but thats just my .02


      I agree with you in the short perspective, but look 100-500 years into the future. Data on disk is just 0 or 1 and have no inherent meaning. Should the access mechanism be lost, the data is useless. Compare this to the hieroglyphs that the Egyptians used. The data was there but until we found the rosetta stone the data was useless, it didn't tell us anything.

      I'm just trying to point out that while the data might be safe the information isn't.

      But I could be way off, I'm sure that the smart folks who implemented this has thought of the concequences and is in no way influenced by the truckload of money coming their way. ;-/

      .haeger

      --
      You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    5. Re:Digital twilight. by Anonymous Coward · · Score: 0

      The reality is that Digital data is far more persistent than paper. I can see posts to online discussion groups I wrote over 10 years ago stored and searchable on Google. I never thought that information would survive this long. It is scary to thing that I could have said something 10, 20, 30 years ago that could embarrass me today. Luckily, computers were not as pervasive back then.

    6. Re:Digital twilight. by LousyPhreak · · Score: 1

      what i meant with "safely stored" was "archived in some sort of database"

      some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.

      additionally if this system is heavily used (i.e really everything enters it) it will grow and with it the hardware will need upgrades simply for the fact that it needs to be stored and accessed in reasonable time.

      the only problem could be if the vendor of the frontend goes out of business, the source with its documentation vanishes, and someday iis will be dumped (hopefully ;) ), so the frontend will be unusable. but even in that case it should be possible to reassemble a useable frontend given the case that the system is well designed.

      the big difference between this system and hieroglyphs is that as long as it is used it will be maintained, and if needed adapted to future needs so as long as it is in use there will be no problem. even if new systems are developed data transfer should be quite trivial.

      but this is really something only time can tell, especially with the extreme growth computers (and widespread use) went through in the last ~15 years, so you cant even predict whats gonna happen in the next 50...

      --
      -- Karma: beyond good and evil - mostly affected by posting political
    7. Re:Digital twilight. by mattpalmer1086 · · Score: 1

      Nope. The reality is that digital data is far more *accessible* than paper data, not more *persistent*.

      Reckon on all your digital data being unreadable (through software obsolescence, mostly) in no more than 50 years time without some form of active intervention. We have paper that's survived for 100's of years easily with no special requirements.

      Matt Palmer
      Digital Preservation Department
      UK National Archives.

    8. Re:Digital twilight. by BK425 · · Score: 1

      I don't follow, you're allowed to personally go touch (so that you can turn the pages of and read) a 16th century document? I have a hard time imagining that's what you meant, but if it is don't you have concerns about what your finger acids will do to that historical page in 50 years? In 100 years it would certainly show the effects of humidity changes from breathe from users that had handled it. IF it survived the handling... all media have archival problems. All human product has permanance issues (even U235 degrades...).

    9. Re:Digital twilight. by Scorillo47 · · Score: 1

      >>> One of the big problems with these kind of archives is that they aren't permanent the way that paper is. Washington could very easily end up the way that Stasi did in East Germany.

      Paper is not really permanent either. If someone wants to get rid of paper documents, all he needs to do is burn them. Eventuially, in an "accidental fire".

      --
      Don't try to use the force. Do or do not, there is no try.
    10. Re:Digital twilight. by haeger · · Score: 1
      Actually yes, that's what I meant. I can (with legitimate reasons) go and read any document as far back as we have records. Any kind of research is a legit reason. A relative of mine has done some digging about our familys history and in doing this been reading document as far back as 13:th century (that's 1400-something, right?). Gloves supplied by the library.

      Yes, I am aware that these records will be destroyed eventually but it has survived more than 500 years of storing without any intervention. I seriously doubt that in 500 years someone will pick up a CDROM and go "Wow, let's see how they lived around year 2000". I doubt that will be possible in 50.

      Anyway, I just wanted to bring up the issue about the problem with digital storage. I don't have a solution. Paper and pen is clearly not an option anymore. Even I understand that.

      .haeger

      --
      You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    11. Re:Digital twilight. by swillden · · Score: 1

      some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.

      But this presumes that someone cares enough to do the migration while there are still "enough tools around". If no one cares for enough technology generations the data format will be so old that no one can migrate it.

      Here's a real-world example. I have in my possession an 5 1/4" single-sided double density diskette that contains a database I constructed with the Nutshell Database on my dad's old Leading Edge PC, circa 1985. Suppose I wanted to read that data now...

      First, I have to find a 5 1/4" diskette drive. I don't have one, but that's not too hard to find. Imagine if it was on an 8" disk, or some even more obscure medium. Luckily, modern machines still include controllers that can run that drive (that may not be true soon).

      Next, I have to find a copy of the Nutshell Database program, or I have to reverse-engineer the data format. That's harder.

      Assuming I could find a copy of the program, I need to find a machine to run it on. Now in this aspect, I'm lucky, because the program disk is formatted with the FAT file system and the program runs on DOS. Modern computers can understand FAT and can run DOS applications, mostly. But what if it was from some more obscure platform? I might have to find old hardware or find (or build) some emulator to run the program.

      Then, once I had the ability to get at the data, I probably need to find a way to export it in a form I can actually use. Hopefully Nutshell has some kind of export feature, or I'm either back to reverse engineering the data format or else finding some way to screen scrape (manually or automatically).

      And this is data that is only 20 years old, not 500 years old!

      I'm sure someone older than me can chime in with an example of something they have on tape or even a hard drive from 25 or 30 years ago that is now nearly impossible to read. Punched cards, anyone?

      Unless someone is taking an active interest in migrating data, it's not only possible but damn near guaranteed to become inaccessible given enough changes in technology.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    12. Re:Digital twilight. by advocate_one · · Score: 1
      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    13. Re:Digital twilight. by mysticalreaper · · Score: 1

      the only problem could be if the vendor of the frontend goes out of business, the source with its documentation vanishes, and someday iis will be dumped (hopefully ;) ), so the frontend will be unusable. but even in that case it should be possible to reassemble a useable frontend given the case that the system is well designed.

      Ah, yes. This is a good point. What if the vendor... of the frontend, or backend, or any of the systems, goes out of business? Then they will be screwed!

      Unless, perhaps, they were do use a system consisting ONLY of open standards. So, standard PCs for data entry. Standard hard drives for storage. Stanard ethernet for communcations. A standard web front end. I assume that they are doing a lot of this already.

      Why, if they used an FLOSS database, even if no one was using it but them, they could still maintain it, because no one could take it away, and they coudl modify it and update it themselves.

      The point is, relying on a single vendor, who is the only vendor who can sell you the product you need, is foolish. You've now hinged your entire operation on that single vendor. This is not wise. And in the computer world, patents are now threatening to legally remove your ability to write your own software to do the same thing.

      I guess my point is that if you stick to open standards all the way, you will never end up in the position hypothesized. As long as there is an operating budget, things can carry on as they always have. Heck, if they had the money, they could even arrive at their own custom solution, developed in house. And then as long as the archives exist, and as long as the government is sound, everything should carry on.

      The germans had a proprietary data format. Since no one knows how the data was arranged, it is now a mystery. But, if you were to use Postgres on Linux, then NOTHING is proprietary, you could never find yourself in a position where you are unable to decipher the info. Also, the heiroglyphs (sp) had the same problem where no one knew the key, the method of information storage. However, if you adopted a FLOSS solution, you could literally write down the exact way the info is stored, on paper, and keep it in the archive building, next to the server. Remember, the Free part is not the price, but the information.

      The only way i could see these organizations screwing themselves is by adopting proprietary solutions, and finding themselves in the position of reliance on a single vendor.

      This is the reason i always go for the open standards, and shun the proprietary ones. It fosters competition, and thus multiple options for me, the consumer. Thus, i win, directly, from supporting open standards. Which is why it's always confusing to have people go for proprietary solution... they are setting themselves up to get screwed.

    14. Re:Digital twilight. by Carnildo · · Score: 1

      Yes, I am aware that these records will be destroyed eventually but it has survived more than 500 years of storing without any intervention. I seriously doubt that in 500 years someone will pick up a CDROM and go "Wow, let's see how they lived around year 2000". I doubt that will be possible in 50.

      Washington State's doing just that experiment. Back in 1992, they created a time capsule using the latest and greatest storage technology: CD-ROMs. The plan is to add new material every 25 years, and in 2492, to open the archive. The CDs are stored in a sealed vault filled with dry nitrogen, so the physical medium should still be viable. It is significant to note that they did not include any CD-reading devices.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    15. Re:Digital twilight. by Anonymous Coward · · Score: 0
      but there are concequences that I don't think that everyone has thought of.

      That's right, I'm sure the people at MS and EDS didn't think it through at all before spending millions of dollars on it, but the people at Slashdot know all the problems instantly.

    16. Re:Digital twilight. by Anonymous Coward · · Score: 0

      I'm currently a student at Eastern (and physically located about 10 walking minutes from the new building), during the entire construction phase my friends and I have thought that the project was a great idea. Washington is and has always been know for a few things, the pretty greenery, the constant rain, the volcanoes that spew pyroclastics the world over... in other words, Washington state is known primarily for the conditions on the Western half of the state, whilst few are aware of the much drier, more geologically stable Eastern half.

      When deciding where to store your data, the question of location is one that should be answered first (of course). The format is also important, but should come secondary. The state capitol and largest cities are in the West, the old archives are located there as well. When the decision to build new archives in a safer location was made, most everybody decided that it would be logical to also make them as accessible as possible. Thus we have a state-of-the art digital archive in a pretty safe (somewhat rural and definitely agricultural) region of the state, close to the airport and the second largest city of the state, but still available to the majority of the population. Even those that remain 300 miles away.

      I'm rather certain that the data will be accessible for a long, long time. That was the entire purpose for building the new archive, after all.

    17. Re:Digital twilight. by major.morgan · · Score: 1

      Having had conversations with Adam Jansen (the WA digital archivist quoted in several of the news stories), many consequences have been thought of and addressed (as well as could be expected). I also don't believe that this is intended to replace all physical documentation - I wouldn't expect them to shred the various pieces of legislation after the documents been scanned. This is just to provide another, easier method of access to the public and researchers. On top of all of this, tell me how you "back-up" the Royal Library - just in case a fire were to take out centuries of data. One of the beautiful things about digital records is that we can have many copies of the data (in addition to the originals).

      As for some ways to deal with these problems: they have collected a decent array of legacy/antique hardware, software and emulators in an effort to assist other government offices with migrating data to newer storage systems - before it's too late. Additionally they have future growth and migration plans (i.e. budget) accounted for - keep moving the data to newer storage.

      Strikes me as not a whole lot different that what has already been an issue dealt with in the past. Think about how records were kept in the frontier days, how those were collected in some cases into the county records office, later transferred to microfiche - and now to purely digital medium.

      It IS certainly important that we don't forget about these issues in the future. Perhaps this is why we have someone with the title "Digital Archivist" in the State of Washington.

  14. no maps? by Apreche · · Score: 5, Insightful

    Dang, there are no maps in there. The best stuff in the archives at town hall have always been maps of the town and blueprints of various buildings. But nobody scanned those in the archives. Oh well.

    --
    The GeekNights podcast is going strong. Listen!
    1. Re:no maps? by Anonymous Coward · · Score: 0
      ALERT: Terrorist Detected. Launch Protection MeasurSegmentation Fault
    2. Re:no maps? by mikael · · Score: 2, Interesting

      Maps and/or aerial photographs combined together make the best time-lapse animation. It's amazing to see the growth of a city all the way from the first harbour/warehouse in Roman times to the metropolised supercity of today.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    3. Re:no maps? by Scaba · · Score: 0, Flamebait

      Maps & blueprints? What are you, some kind of terrorist!!?!?

    4. Re:no maps? by Sta7ic · · Score: 1

      Some of the maps are about a quarter mile away in the JFK Library, downstairs in the Government Archives. There are a lot of CIA intelligence materials down there, including maps and reports. One of the featured items of the month is the Postwar Intelligence Report on Iraq or somesuch. Not only are those not scanned in, but they're inconvenient to access. The government documents are all on these moving shelves that never seemed to work properly, and would consistantly beep in the background to say "we're not feeling well". So with four years of construction on this nice new facility, these clunky electromechanical shelves are just *now* getting some attention. Hopefully there aren't too many skeletons crushed up in those shelves. The irony of "progress" in the government.

    5. Re:no maps? by British · · Score: 1

      Plus you could browse through those maps in an emergency like you see in TV & movies(think Max Headroom).

  15. System Spec by LiquidCoooled · · Score: 4, Funny

    The 800 terabyte storage system was developed by Microsoft and EDS.

    Microsoft was able to confirm the system is expandable, and contrary to previous rumours, will infact have enough disk space to install Longhorn.

    They do however state, that to do anything actually useful, more upgrades will be required.

    --
    liqbase :: faster than paper
  16. How to view the records? by marlingrando · · Score: 0, Offtopic

    I opened up the treasures piece from the collections set on the right of the main page. http://www.digitalarchives.wa.gov/Content.aspx?txt =records#topFiveRecords and tried to open the state seal from http://www.digitalarchives.wa.gov/content/Top5/Ter ritorialSeal.djvu What application is associated with the djvu extension?

    1. Re:How to view the records? by Anonymous Coward · · Score: 2, Informative



      DjVu is a web-centric format and software platform for distributing documents and images. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu is used by hundreds of academic, commercial, governmental, and non-commercial web sites around the world.

      DjVuLibre is an open source (GPL'ed) implementation of DjVu, including viewers, browser plugins, decoders, simple encoders, and utilities.

  17. Thanks for the F'ing Popups by N8F8 · · Score: 3, Informative
    If you can't bother to find a link to a web resource in an article about a web resource, you shouldn't post it!

    http://www.digitalarchives.wa.gov/

    --
    "God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
  18. All that work, and.. by Nuclear+Elephant · · Score: 1, Funny

    Over the past two years project participants scanned 1 million documents issued by state and country authorities.

    If only someone had told them about Kinko's.

    1. Re:All that work, and.. by TrancePhreak · · Score: 1

      Kinko's, where looking at a computer costs you $12 a minute.

      --

      -]Phreak Out[-
    2. Re:All that work, and.. by Nuclear+Elephant · · Score: 1

      I'm sure their budget for this project has already exceeded our expectations :)

  19. Keep the original paper copies... by HungSoLow · · Score: 0, Troll

    I wouldn't trust M$ with 640 KB of my sensitive data!

  20. Perfect by IGnatius+T+Foobar · · Score: 1, Interesting

    Oh, great. When (not if) Microsoft is brought to court for antitrust violations again, all MonkeyBoy has to do is enter a secret backdoor password and, *poof* all those documents containing damning evidence suddenly go "missing" -- or perhaps they simply disappear from the index as if they never existed.

    Would you trust a known pedophile to give your kids a bath? If not, then why trust a convicted monopolist who is on the record for purgery with critical documents?

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
    1. Re:Perfect by AviLazar · · Score: 1

      I know you are just being funny, but on a side note - I do not think they state got rid of its paper documents (though that would be cool for recycling)...instead I think they just added this as a nice and easy way for society to benefit from technology...

      --

      I mod down so you can mod up. Your welcome.
    2. Re:Perfect by Anonymous Coward · · Score: 0

      Can't find any (of the right) William Gates or Steve Ballmer in the site already anyway...

    3. Re:Perfect by rjdohnert · · Score: 1

      What Anti-Trust violations?

  21. How 800TB can be locked forever by Leadmagnet · · Score: 1

    This is how 800TB can be digitaly locked forever http://www.emc.com/products/systems/centera.jsp/ and still be online.

    --
    http://www.leadmagnet.50megs.com
  22. Not 800 Terabytes, & using DjVu by illtud · · Score: 4, Informative

    The system isn't 800TB, but will scale to 800TB, according to this EDS press release. In fact, given that they've spent a mere $2.5M (powerpoint!) there's not a hope in hell that they've got 800TB! The powerpoint says it's a 5TB EMC SAN & an ADIC tape library for backup.

    An interesting point is that they're delivering the documents using DjVu by Lizardtech, which is GPLd, and developed by the creators of DjVu in conjuction with LizardTech (after a period of LT not-getting-it). The DjVuLibre home page is here. LizardTech still have the best encoders for the format.

    1. Re:Not 800 Terabytes, & using DjVu by sxpert · · Score: 1

      5TB ? that's like 18 of those 400G Hitachi drives, that go for 411 USD a piece these days. if you include the bi-opteron box, and a couple of 3ware Sata cards, that's a total investment of 20 grand or so...

    2. Re:Not 800 Terabytes, & using DjVu by illtud · · Score: 3, Insightful
      5TB ? that's like 18 of those 400G Hitachi drives, that go for 411 USD a piece these days. if you include the bi-opteron box, and a couple of 3ware Sata cards, that's a total investment of 20 grand or so...

      ...come back when you've worked in the real world (or looked at an EMC price list...!)

    3. Re:Not 800 Terabytes, & using DjVu by Anonymous Coward · · Score: 0
      As long as you build in redundancy, there's NOTHING wrong with using COTS HD's and COTS raid controllers. SATA drives are between $.50 and $1.00 per gigabyte! There's no denying that cost effectiveness.

      You sound like you're trapped in the new version of the "nobody ever got fired for buying the expensive brandname".

    4. Re:Not 800 Terabytes, & using DjVu by Anonymous Coward · · Score: 0

      Biggest SCSI drive is still only 144GB or so. Toss in hot-spares, RAID 1+0 (or is it 0+1... eh whatever). We'll be generous and say $500/drive for high-performance.

      5TB = 36 drives, but we'll give ourselves some headroom and up that to 40. Double that to account for RAID 1+0 and we now have 80 drives. Plus hot spares (figure 10 of those, 1 for every 8th drive).

      90 drives @ $500ea = $45,000 (around $10/GB, which is about par for enterprise storage). That cost will go up a lot once you add on things like the systems housing these drives ($5-$8k each at a guess), the bits to connect the drive systems together (which I think is where iSCSI gets involved).

      Then there's the backup issue, which probably costs another $50k-$100k for the hardware. Plus tape costs.

      Then there's the labor costs, software licensing costs, ongoing support costs.

      Maybe even a duplicate system that mirrors everything on the primary box.

  23. Re:Well, - data lost by jackb_guppy · · Score: 1

    I have family in Washington State. Too bad thier information is NOT in the database. They have died, born, and married there and not scrap of data is in this database.

    So it is not all that it is cracked up to be.

  24. cheaper linux based product by just+someone · · Score: 1

    cheaper product based on open source:
    linux based, postgres db.
    Not in full release, not free, but very open.
    http://www.archivas.com/

  25. EDS???? by Cboyd0319 · · Score: 1

    Is this the same EDS that is currently fleecing the US Navy for Hundreds of Millions of dollars in, what has been described by everyone I've talked to as extremely poor computer and network support?
    FTA -- "If you mention NMCI, there is an automatic groan," he says. "I think the phrase is, 'I've been NMCI'd.' "
    The Article

    1. Re:EDS???? by dknight · · Score: 1

      Yep. Not just the US Navy either, but the Army as well. I have the misfortune of having them as my network/computer support at work, and it's really sad. Most of their network guys come to me for help, and I'm a Video Teleconferencing Engineer, not a network guy.

      Of course, I cant bash them too hard. I'm hoping they'll hire me ;)

    2. Re:EDS???? by Anonymous Coward · · Score: 0

      "Most of their network guys come to me for help, and I'm a Video Teleconferencing Engineer, not a network guy."

      As one those network guys, we appreciate your efforts in preparing good coffee. Keep up the good work and maybe one day EDS will hire you. See you later at the office!

  26. ...developed by Microsoft and EDS by bmalnad · · Score: 0

    ....developed by Microsoft and EDS. At a cost of $6,943,349,453,234,213,166,784.23. Sorry - I've just never seen anything done cost effectively when EDS was involved.

    --
    Free Scotland!
  27. Bug Farm by HangingChad · · Score: 1, Funny
    The 800 terabyte storage system was developed by Microsoft and EDS

    Run and hide. If there was ever a combination of resources destined to fail it's Windows and EDS. If it works at all I'll be surprised. If it keeps working I'll be amazed.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
  28. Re:Well, - data lost by jlleblanc · · Score: 2, Informative

    What years? This database seems to be limited to older archives... the most recent year for a record I found was 1965.

    -Joe

  29. What should an archive be by just+someone · · Score: 1

    Problem is that the notion that tape is an archive format. It's not, it's a backup format (catastrophic recovery). It's only an archive format while you have the capabilities to read it (if you can read it)

    An archive should be a Write one, read many file system with Active on-disk (not hierarchical on tape) information with multiple copies preferably at multiple sites (depending on how valuable the data is), with programs for active file validation (you need to be sure the file is still there, and still the same, every so often) is what an archive should be.

    In addition, file format migration will be needed. Extend your active validation to include format migration, and policy changes (keep fewer copies of the old version around). And file format changes most likely will need some human checking to make sure things worked (big part).

  30. reboot by jthayden · · Score: 1

    What are the odds they forget to reboot and it all crashes after 30 days?

  31. Microsoft and EDS by chegosaurus · · Score: 1

    Truly a match made in heaven.

  32. Re:Well, - data lost by Anonymous Coward · · Score: 0

    Maybe your poor spelling or grammar affected your search.

  33. I'm awake, really... by Sta7ic · · Score: 1

    They're hosting the archives here at Eastern? You know something's going really wrong when Slashdot is your source for current events for the university you work and attend classes. Where's my ginseng tea?

  34. Go Eagles! by Pcghost · · Score: 2, Interesting

    The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state. The Microsoft focus is to be expected. Redmond pays a lot of money to keep universities in our state in line. Rest assured Eastern is loaded with disgruntled Linux users being forced to learn Visual Basic in their IT courses. There are even a few IT profs pushing for changes, though they haven't made much headway in their efforts.

    1. Re:Go Eagles! by Sta7ic · · Score: 2, Interesting

      Fastest growing because we still have something like space. With WSU or UW not taking anyone with less than a 3.6 GPA for reasons of overcrowding, and in-state tuition being around $1200 for 12-18 credits, this place isn't half bad. But our math department was ranked the absolute worst in the state of Washington between the four and two year colleges last year, which seems to hamstring progression through the CS department. One of our profs has a dubious reputation after 3/4 of the class failed a 300-level probability and sadistics class, which included both seniors and graduate students.

      Oh, and with all these new students, almost three weeks in and the dorm networks are STILL on the fritz. This is an issue with the provider and the infrastructure, though.

    2. Re:Go Eagles! by jhylkema · · Score: 1

      Quoth the fellow Eagle poster:

      The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state.

      Yes, it's amazing the strides Eastern has made. I was there in the early-to-mid 90s when they sucked in just about every possible way there is to suck. Now, if my other plans don't pan out, I might actually consider going back there to finish my degree. It is true, however, that EWU is still primarily a teacher's college and thus its tech programs aren't terribly strong as evidenced by the math department's non-rating.

    3. Re:Go Eagles! by CyberDave · · Score: 1

      Yeah, the dorm networks here suck (I'm a grad student myself, just moved off campus after 4 years in the dorms, er, "residence halls")

      It's not only a problem with the provider and the infrastructure, but also the management (EWU Department of Housing and Residential Life) not having a fsckin' clue how to manage a network of this size. I've looked into it on several occasions (from both the end-user and systems design perspectives). It's not pretty. Pretty fsckin' ugly, actually.

      I don't have enough space here to even get started on the subject. Email me separately if you want to have a "friendly discussion" about the subject.

      That said, I thought it was pretty cool to see my school on the front page of Slashdot. Will have to make sure my fellow CS students know about this.

      CyberDave

    4. Re:Go Eagles! by CyberDave · · Score: 1

      Rest assured, though, that the VB situation you described is applicable only to the IT guys (I'm not even sure we have an "IT" program. We have MIS, CIS, CS, etc, but nothing that's actually officially called "IT").

      In the Computer Science Department (where I am now a grad student, having gotten my BS degree here last fall), students are taught Java as the base language for the programming classes.

      After that, we offer electives in C++, C, Ptyhon, and a handful of other languages.

      We also have quite a few Linux fans on the staff in the CS Dept., a dedicated Linux lab (rather small at the moment) so Linux users should feel right at home. Not too many Mac OS X people, though (I'm the only Mac fan I know at the moment, though there are a couple users here and there).

      I'm really looking foward to the completion of the new building in the spring. I got a chance to tour it last week. It's a great building and should further fuel the growth of the University.

      As they like to say in the promotional materials around here, "It's a great time to be an Eagle!"

      CyberDave

    5. Re:Go Eagles! by Anonymous Coward · · Score: 0

      Corder: I think we've already talked, or at least, Tom's talked with you about it. We still like the "WARNING: Sanswire networking in progress" signs you made. Or was it Kerry?

      -"Sta7ic" Matt (stupid forgotten PW on a public term)

      (Tom says it was Kerry)

    6. Re:Go Eagles! by CyberDave · · Score: 1

      If my memory is correct, Kerry and I both got an "CAUTION: AT&T Network In Progress" sign from our buddy Rich. It was a pretty poor quality JPEG and at some point I recreated it in Illustrator and printed out a few copies for my neighbors. I think I've still got the .AI file around here somewhere on one of my backup images, but I don't have the inclination at this point to dig it up as evidence to support my claim.

  35. Admin login by Sheepdot · · Score: 1

    Also of note, Administrative login is available here:
    https://www.digitalarchives.wa.gov/WADAAdmi n/logon .aspx?ReturnUrl=%2fWADAAdmin%2findex.aspx

    It appears to not be succeptible to a common IIS/ASP script injection bug: ' or 0=0 --

    Good work.

  36. Apparently I don't exist by AmmoBox · · Score: 1

    I wasn't able to find my birth record yet. Any mention of how much data is not online yet?

    1. Re:Apparently I don't exist by Larry+Lightbulb · · Score: 1

      And I'm neither married nor a citizen. The names I used for those were fairly common, so there should have been some false hits at least, but nothing.

    2. Re:Apparently I don't exist by tyroney · · Score: 1

      Me too. Neither. Gee, I hope they kept the originals somewhere.

    3. Re:Apparently I don't exist by sxtxixtxcxh · · Score: 1
      Birth records aren't on file with the state archives.

      3. How can I get my birth certificate?

      The State Records Center does not have the authority to distribute any agency records to the public. Birth certificates can be obtained through the Department of Health, Center for Health Statistics at (360) 236-4300 or via the internet at www.doh.wa.gov.
      --
      for a minute there, i lost myself...
    4. Re:Apparently I don't exist by AmmoBox · · Score: 1

      Not sure why you say birth records aren't on file with the state archives.

      If that were the case why do they allow you to search for birth records? They even return results too (mainly from Spokane County)... just not my personal record.

    5. Re:Apparently I don't exist by sxtxixtxcxh · · Score: 1
      my bad... birth records aren't available for PURCHASE through that site. anyway.. here's what they have so far.

      Birth Records

      Birth Records contains a listing of people born in the following areas:

      Pierce County (Fox Island, Anderson Island, McNeil Island, and Steilacoom) from 1903 to 1914

      Walla Walla City in January - April 1907

      Spokane County from 1890 to 1906
      --
      for a minute there, i lost myself...
  37. Unit conversion by nounderscores · · Score: 2, Funny

    5TB? how much is that in Libraries of Congress?

  38. What about 1000 Terabytes? by darth_borehd · · Score: 1

    Are there any systems that actually have this much storage now? What comes after the terabyte? Quadrabyte?

    1. Re:What about 1000 Terabytes? by Anonymous Coward · · Score: 0

      What comes after the terabyte? Quadrabyte?

      duh.. wtf are you doing here? it's petabyte, dimwit.

    2. Re:What about 1000 Terabytes? by wolfdvh · · Score: 1

      Petabytes, then Exabytes after that.

    3. Re:What about 1000 Terabytes? by DarkEdgeX · · Score: 1

      It's some site trying to advertise something, but this seems to be the most comprehensive listing of "sizes" for storage--

      http://www.pcsndreams.com/Pages/Articles/Megabyt es .htm

      I'd never heard of a Brontobyte before, just Yottabyte.

      --
      All I know about Bush is I had a good job when Clinton was president.
    4. Re:What about 1000 Terabytes? by DarkEdgeX · · Score: 1
      --
      All I know about Bush is I had a good job when Clinton was president.
  39. "Go Digital"??? by Theovon · · Score: 1

    Who is "Go Digital", and why are they archiving it? :)

  40. What's the date format by Aidtopia · · Score: 2, Interesting

    Has anybody figured out the date formats? I'm seeing a lot like this "02001987". OK, it's either mmddyyyy or ddmmyyyy. But what does 00 mean for month or day? Unknown? It's hard to imagine that they don't have an exact date of death for someone who died as recently as 1987. Or is a zero-based counting system (00 = Jan, 01 = Feb, ...)?

    It's interesting that the death records include Social Security Numbers. Anybody want to harvest a few thousand inactive SSNs?

  41. teh EDS sux by Cboyd0319 · · Score: 1

    I rue the day that I would need a job from EDS. Wait, let me start again. When I go to work, I go in to work. I chose to be a network engineer and PC tech because I love my job and I like to help people. I didn't do it so I could screw people out of money. If you're in it for the cash, more power to you, but you could at least provide good service for good money, which is not the case from what I've heard from the people that have been unfortunetly struck by EDS.
    Actually, the only reports that I have found of good quality of service are those that come out of Washington, and SUPRISE!!, that's where the money comes from.

    Sounds to me like corporate ass kissing.

  42. Search for first name "Bill", last name "Gates" by Eryq · · Score: 1
    Trust me. :-)



    --
    I'm a bloodsucking fiend! Look at my outfit!
  43. Size is out of wack by Maxwell · · Score: 2, Interesting

    A TERABYTE IS 1000G. And 1G IS A 1000M. So A TERABYTE IS 1,000,000 MEGABYTES. Right?
    there are 1 million documents in this database? And it's 800 terabytes? So each doc is 800m in size?
    800m EACH? That's freaking huge. Even if the thing is only 8T in size (far more reasonable), each doc is still 8M in size. Again, pretty massive.

    is this like that time MSFT bragged about their 1T DB of geological data, and then Oracle
    built the same database, with the same content using only 300G of space?

    Inefficiency is nothing to brag about...or is it?

    JON

  44. Exhaustive Database? by Papermaker · · Score: 1

    At least for marriages, I doubt the database is complete/finished. Marriage records for myself (King County), my parents (Clark County) and my in-laws (King County) are not there. Death records are there though--at least for my family. As others have said, I too would be afraid of people datamining this for personal gain. I hope there are decent safeguards against this.

    1. Re:Exhaustive Database? by Anonymous Coward · · Score: 0

      Mod parent up , this is important to know

  45. Digital Archives by Anonymous Coward · · Score: 0

    I worked for the Wa Secretary of State who implemented this system and believe that it will only be a matter of time before this goes "KaBoom" if they provide the technical support.

    The organization has huge problems keeping competent people and I believe the technical staff providing the oversight have the appropriate skills to work at MacDonalds, not providing services in state government.

    and then you have Microsoft and EDS mixed into this too.

    Can you say "vulnerability's"?

    I thought you could!

  46. the previous method wasn't great by Trepidity · · Score: 1

    All local records do is make it harder for people without money to get them. People with money have always been able to hire private investigators to track down the records they want. This makes it easier for people without money to do so, or for people who are vaguely interested in something but don't care enough to hire a private investigator (or do it themselves) to do so.

    If you really want to stop abuse, you'll have to make them completely private, not just "private but inconvenient to get to".

  47. Re:Well, - data lost by jackb_guppy · · Score: 1

    Family moved into Washington 1924 and been there ever since. At least 22 births, 15 deaths, 8 marriages have happen in the state in those insuing years.

    Oh well my family must not exist.