Slashdot Mirror


XML for Ancients

Andrew writes: "More than 5,000 years ago, the very first information revolution occurred when some unknown research team in Mesopotamia found a way to download and store language through a killer application called "writing.". The cuneiform digital library will have 60,000 texts ready in a couple of years. Using SVG and XML to represent their documents. Similar efforts are underway for hieroglyphics."

5 of 118 comments (clear)

  1. Slightly off topic..... by MisterPo · · Score: 3, Interesting

    I have been working in IT since 1997, yeah I know a mere blink of an eye for some Unix Wizards (ie. beards, strange clothing and their own arcane language). What I have noticed is that every year my handwriting has been getting progressively worse. What with my PDA, laptop, PCs etc. I just have no need to wield a pen no more :)

    Apart from signing my name on credit card chits, the only time I am required to write is for birthday/Christmas and other assorted cards. Its getting so bad now that I start to write a long word and just give up. My once pristine handwriting now looks like a doctors prescription scrawl.

    Any else get this too?

    Po

  2. Will we have to revise unicode? by Ukab+the+Great · · Score: 2, Interesting

    With all these ancient language/hieroglyphic texts being archived, I have a feeling that we'll be hitting that 65536 character wall very shortly, since someone in the future might need that Cunieform version of M$ Word (hey, it could happen). Is it time for UTF-32?

  3. XML, Writing and Jabber by Jucius+Maximus · · Score: 2, Interesting
    "Using SVG and XML to represent their documents. Similar efforts are underway for hieroglyphics."

    They're using XML? They could integrate this with some sort of retrieval language and couple it with Jabber clients. That way you could send some sort of command-line search/retrieval command to the database using a regular Jabber client and have the XML data sent back, since Jabber natively supports the standard.

  4. Actually... by recursiv · · Score: 4, Interesting

    Unicode is often referred to as a 16-bit system, which would allow for only 65,536 characters, but by reserving some code points for mapping into additional 16-bit planes, it has the potential to cope with over one million unique characters.

    The current version (3.1) of the Unicode Standard, developed by the Unicode Consortium, assigns a unique identifier to each of 94,140 characters

    --
    I used to bulls-eye womp-rats in my pants
  5. Re:XML Overrated? by ukryule · · Score: 2, Interesting
    When you're coding up ancient writing, you want to store much more information about each character or word than with normal text (colour, angle, depth etc.). XML is quite good at storing these attributes, so it makes sense to use it.

    Taking a quote from the heiroglyphics link (can't comment on the cuneiform link as it's /.ed):

    Let's illustrate these points. In the current MCD, data about an individual sign is scattered around it. Look for example at :

    =A1\\r1 -i

    It means "Sign Gardiner A1", as both grammatical and word ending, reversed, rotated. fine positional data, colour data, and more are hard to add. On the other hand, the current proposal would represent the same sequence as

    <hieroglyph code="A1" gramend="y" wordend="y" rot="90" reversed="y">
    <hieroglyph code="i">

    Of course, as with any use of XML, you could do it with a 'homegrown' solution, but the point is that using XML gives you a well known (and well supported) framework which everyone can standardise on. (And yes I know the XML in the example is malformed ...)