Slashdot Mirror


Washington State Archives Go Digital

prostoalex writes "USA Today and dozens of others report that Washington state archives went online. Over the past two years project participants scanned 1 million documents issued by state and country authorities. The archive is located in my alma mater Eastern Washington University (go Eagles!) The 800 terabyte storage system was developed by Microsoft and EDS."

28 of 131 comments (clear)

  1. Well, by chewy_2000 · · Score: 4, Insightful
    Personally I would find this, or something like it, very useful in research, even as just an undergrad History major. The amount of times I've wished for something like this while digging around in musty old archives...

    Although, it has to be said, I hope they make everything accessable for *everyone*, regardless of OS and browser. No doubt a lot of researchers would be using OS X/Linux/Firefox.

    1. Re:Well, by ImaLamer · · Score: 4, Informative

      I'm using Firefox (from Windows sadly) and I can access the content just fine.

      As for OSX and Linux users, there is a plug in for viewing the content needed. But they report to support OSX and "UNIX". The plug-in is called DjVu and has an open source equivalent at sourceforge (with RPMs, OS/2 and even Cygwin support).

  2. WWW address by JamesD_UK · · Score: 5, Informative

    Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/

    1. Re:WWW address by El+Cubano · · Score: 4, Informative

      Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/

      FYI. Turn on cookies or you receive this extremely helpful error message:

      An error occured on the site. Please try again or come back another time.

      Otherwise, it's pretty cool.

  3. How many terabytes in the archive ... by blowdart · · Score: 2, Funny

    ... relate to state anti-competitive actions against Microsoft themselves? :)

  4. Hurrah by Anonymous Coward · · Score: 3, Funny
    "The 800 terabyte storage system was developed by Microsoft and EDS."
    Bill Gates and H. Ross Perot; together at last!

    I feel safer already.
  5. Just another link (or two) by Neumsy · · Score: 5, Informative
    --
    %blow
    %blow: No such job

    ^how did the sex change go?
    Modifier failed
  6. NB Archives by X-Phile · · Score: 5, Informative

    The Province of New Brunswick Provincial Archives have been like this for quite some time now, with birth, death, marriage certs and census records. I have been able to search for information about my family history online using their handy dandy search tool, as well as visiting the Archives themselves at University of New Brunswick. It never occurred to me that others might be trying catching up, but I guess that this type of service isn't something that most governments deem necessary for the public.

    --
    "Well you're not Fiona Apple, and if you're not Fionna Apple, I don't give a rat's ass."
  7. Search capabilities by vinukr · · Score: 4, Insightful

    One thing that they have to concentrate on in the future when the number of records grow fast is a nice search strategy. Time taken for search is one thing that can make the mass use this facility.

    As far as i have tried it out in these few minutes, the search strategy is good... there are separate search that researchers can use to know historical data and the like... This is great.

  8. drive letters by chargen · · Score: 2, Funny

    The 800 terabyte storage system was developed by Microsoft and EDS.

    How would windows have enough drive pointers to be able to access this? Would there be a drive AG:? :-)

    -Pete

  9. Privacy by chewy_2000 · · Score: 4, Insightful

    The site seems to be slowing a bit, so I can't find details, but surely there are some privacy concerns here. I know that this just replicates the publically avaliable material in the physical archives, but there is a big difference between going to the archives and digging through books, and harvesting info over the web, especially given the sheer amount of info on the site, many of them recent records.

    1. Re:Privacy by chewy_2000 · · Score: 3, Insightful

      Try reading what I said. Data mining, the wholesale collection of personal data, is made, I assume, an order of magnitude simpler using an online system vs microfiche or whatever. I would consider this an abuse of the system. I am in no way suggesting the records should have access restricted, this is just a new problem raised by the tech that needs to be addressed.

    2. Re:Privacy by lamona · · Score: 2, Insightful

      Absolutely. Making "public" records available universally is a different meaning to "public" in public records in situ. Although the word "public" was used, it really meant the local community. When you change that to "everyone in the world with internet access" you change the context in which the data resides... and for data, context is everything. For one thing, it narrows the scope to a small portion of the population so that accurate identification (or, conversely, less mistaken identity) is facilitated.

      Making it difficult to get to the records DOES provide some privacy, and that is the level of privacy to which we've become accustomed. It's like the difference between your Aunt Mable having a listed phone in the phone book for her town of 2,000, and having her phone listed in the internet white pages. She allowed her phone to be listed because she is a part of that community and feels secure there. She probably doesn't feel the same way about being "visible" beyond that community.

      What this means is that we are going to have to either revise how we define "public", or we're going to have to get used to a different view on privacy. I'd prefer the former.

      no sig, .sig

      --
      I just read /. for the amusing .sigs
  10. Digital twilight. by haeger · · Score: 4, Interesting
    How about the "Digital Twilight" that people have talked about? One of the big problems with these kind of archives is that they aren't permanent the way that paper is. Washington could very easily end up the way that Stasi did in East Germany. They have several hundred tapes of data with information about every spy in the west on them but the information is still "safe" since noone no longer knows how the data was saved to disk or which file format was used.

    And I'm still ignoring the fact that machines grow old and has to be replaced. It's a known fact that disks break so You'll need backup but how long could You keep an old storage solution around. Sooner or later You'll have to migrate old backup data to newer media.

    Note that I don't think that this is a bad idea, moving everything online, but there are concequences that I don't think that everyone has thought of.

    Where I live one can go into the royal library and find (and read) an official document written by someone in the 16:th century, but can we be sure that 100 or even 50 years from now someone can read a DLT300-tape?

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    1. Re:Digital twilight. by LousyPhreak · · Score: 4, Insightful

      you still can move the data from the old system to a new one if its at the end of its lifetime.

      harddrives can easily be replaced (assuming its a sort of raid with hotswap)

      sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)

      and as long as the data is safely stored the access mechnism shouldnt be a problem but thats just my .02

      --
      -- Karma: beyond good and evil - mostly affected by posting political
  11. no maps? by Apreche · · Score: 5, Insightful

    Dang, there are no maps in there. The best stuff in the archives at town hall have always been maps of the town and blueprints of various buildings. But nobody scanned those in the archives. Oh well.

    --
    The GeekNights podcast is going strong. Listen!
    1. Re:no maps? by mikael · · Score: 2, Interesting

      Maps and/or aerial photographs combined together make the best time-lapse animation. It's amazing to see the growth of a city all the way from the first harbour/warehouse in Roman times to the metropolised supercity of today.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
  12. System Spec by LiquidCoooled · · Score: 4, Funny

    The 800 terabyte storage system was developed by Microsoft and EDS.

    Microsoft was able to confirm the system is expandable, and contrary to previous rumours, will infact have enough disk space to install Longhorn.

    They do however state, that to do anything actually useful, more upgrades will be required.

    --
    liqbase :: faster than paper
  13. Thanks for the F'ing Popups by N8F8 · · Score: 3, Informative
    If you can't bother to find a link to a web resource in an article about a web resource, you shouldn't post it!

    http://www.digitalarchives.wa.gov/

    --
    "God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
  14. Re:How to view the records? by Anonymous Coward · · Score: 2, Informative



    DjVu is a web-centric format and software platform for distributing documents and images. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu is used by hundreds of academic, commercial, governmental, and non-commercial web sites around the world.

    DjVuLibre is an open source (GPL'ed) implementation of DjVu, including viewers, browser plugins, decoders, simple encoders, and utilities.

  15. Not 800 Terabytes, & using DjVu by illtud · · Score: 4, Informative

    The system isn't 800TB, but will scale to 800TB, according to this EDS press release. In fact, given that they've spent a mere $2.5M (powerpoint!) there's not a hope in hell that they've got 800TB! The powerpoint says it's a 5TB EMC SAN & an ADIC tape library for backup.

    An interesting point is that they're delivering the documents using DjVu by Lizardtech, which is GPLd, and developed by the creators of DjVu in conjuction with LizardTech (after a period of LT not-getting-it). The DjVuLibre home page is here. LizardTech still have the best encoders for the format.

    1. Re:Not 800 Terabytes, & using DjVu by illtud · · Score: 3, Insightful
      5TB ? that's like 18 of those 400G Hitachi drives, that go for 411 USD a piece these days. if you include the bi-opteron box, and a couple of 3ware Sata cards, that's a total investment of 20 grand or so...

      ...come back when you've worked in the real world (or looked at an EMC price list...!)

  16. Re:Well, - data lost by jlleblanc · · Score: 2, Informative

    What years? This database seems to be limited to older archives... the most recent year for a record I found was 1965.

    -Joe

  17. Go Eagles! by Pcghost · · Score: 2, Interesting

    The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state. The Microsoft focus is to be expected. Redmond pays a lot of money to keep universities in our state in line. Rest assured Eastern is loaded with disgruntled Linux users being forced to learn Visual Basic in their IT courses. There are even a few IT profs pushing for changes, though they haven't made much headway in their efforts.

    1. Re:Go Eagles! by Sta7ic · · Score: 2, Interesting

      Fastest growing because we still have something like space. With WSU or UW not taking anyone with less than a 3.6 GPA for reasons of overcrowding, and in-state tuition being around $1200 for 12-18 credits, this place isn't half bad. But our math department was ranked the absolute worst in the state of Washington between the four and two year colleges last year, which seems to hamstring progression through the CS department. One of our profs has a dubious reputation after 3/4 of the class failed a 300-level probability and sadistics class, which included both seniors and graduate students.

      Oh, and with all these new students, almost three weeks in and the dorm networks are STILL on the fritz. This is an issue with the provider and the infrastructure, though.

  18. Unit conversion by nounderscores · · Score: 2, Funny

    5TB? how much is that in Libraries of Congress?

  19. What's the date format by Aidtopia · · Score: 2, Interesting

    Has anybody figured out the date formats? I'm seeing a lot like this "02001987". OK, it's either mmddyyyy or ddmmyyyy. But what does 00 mean for month or day? Unknown? It's hard to imagine that they don't have an exact date of death for someone who died as recently as 1987. Or is a zero-based counting system (00 = Jan, 01 = Feb, ...)?

    It's interesting that the death records include Social Security Numbers. Anybody want to harvest a few thousand inactive SSNs?

  20. Size is out of wack by Maxwell · · Score: 2, Interesting

    A TERABYTE IS 1000G. And 1G IS A 1000M. So A TERABYTE IS 1,000,000 MEGABYTES. Right?
    there are 1 million documents in this database? And it's 800 terabytes? So each doc is 800m in size?
    800m EACH? That's freaking huge. Even if the thing is only 8T in size (far more reasonable), each doc is still 8M in size. Again, pretty massive.

    is this like that time MSFT bragged about their 1T DB of geological data, and then Oracle
    built the same database, with the same content using only 300G of space?

    Inefficiency is nothing to brag about...or is it?

    JON