Domain: jclark.com
Stories and comments across the archive that link to jclark.com.
Comments · 37
-
Major flaw in the build-process
This does not affect the users directly, but it is a major pain for integrators/porters. OO.o has a terrible habit of bundling all of the 3rd-party software packages, that it uses, into its own source tree. I'm talking about (probably missed some):
- agg
- bash
- bitstream-vera
- bsh
- bison
- boost
- curl
- db42
- dmake
- expat2
- freetype
- icu
- jpeg
- firefox (or some other Mozilla-based browser)
- libmspack
- libsndfile
- libtextcat
- libwpd
- libxslt
- neon
- nss
- nspr
- python
- sane-backends
- STLport
- unixODBC
- unzip
- vigra
- xmlsec1
- xt
- zip
- zlib
If they could, I'm certain, they would've bundled Java too, but — fortunately — Sun's license prohibits that... Now I realize, that this is done to offer "a single package" to those, who build it on their own, but nobody does. Everybody gets these from their OS' integrators. And the pain for us is enormous, because to force OO.o build to stop its silly ways is a serious undertaking. For some of the above packages there is --with-system-foo configure-flag, but not for all, and the default is to always use the bundled one, so support for the external ones bitrots quickly...
Most of the local builds don't bother and so end up wasting disk space and CPU-time rebuilding packages, which are external to OO.o. The end results are also bloated, duplicating stuff, that's already installed on the users' systems and without bug-fixes, which have already gone into each of the respective package since its most recent "bundling" into OO.o tarballs.
Download a source tarball and see for yourself... Something like: tar tjf OOo_OOG680_m9_source.tar.bz2 | grep 'z$'. No other software project does this on this scale and for good reasons — it is Just Wrong[TM]. OO.o better clean up their act in this respect...
-
Re:Maximizing Composability and Relax NG Trivia
You can't get around the fact that Java simply does not have those many important features I listed (and linked to their definitions on Wikipedia), which are all extremely useful for implementing things like Relax/NG validators.
James Clark, the guy who wrote the Haskell code, is the SAME guy who wrote the Java code, and he's written a whole lot of other complex Java code, as well as many other languages, and also designed and implemented many XML standards. FYI, he served as the technical lead of the original W3C XML Working Group and as the editor of the XSLT and XPath recommendations.
Kiddo, you have no idea who you're calling a "poor (or stubborn) programmer". James Clark is one of the best programmers on the planet, who has written some of the most important code that's run by millions of people every day. Have you ever heard of Expat, the XML parser? Or XSLT? And no, James Clark is NOT the guy who founded Netscape. That Jim Clark just made millions of dollars off of the open source code generously designed, written and shared by James Clark.
Here is a brilliant interview with James Clark from Dr. Dobb's Journal. I've included some of my favorite parts, but the entire interview is fascinating and well worth reading. A Triumph of Simplicity: James Clark on Markup Languages and XML:
If you peek under the hood of high-profile open-source projects such as Mozilla, Apache, Perl, and Python, you'll find a little program called "expat" handling the XML parsing. If you've ever used the man command on your GNU/Linux distribution, then you've also used groff, the GNU version of the UNIX text formatting application, troff. If you've ever done any work with SGML, from generating documentation from DocBook to building your own SGML applications, you've undoubtedly come across sgmls, SP, and Jade.
Whether you've heard of him or not (and mostly likely, you haven't), James Clark (below right) has made your life easier. In addition to authoring these and other widely used open-source tools (see http://www.jclark.com/ for a complete list), Clark served as the technical lead of the original W3C XML Working Group and as the editor of the XSLT and XPath recommendations. He recently founded Thai Open Source Software Center (http://www.thaiopensource.com/). His latest project is TREX, an XML schema language. Clark sat down with Eugene Eric Kim to discuss markup languages, the standardization process, and the importance of simplicity.
DDJ: How did you get involved with SGML?
JC: I was interested in using SGML as a replacement for one part of what groff was doing. Then I got Charles Goldfarb's book, The SGML Handbook, and I thought, "Hmm, this is an interesting thing. Let's see if I can write a program for it." Then Charles Goldfarb released his ARCSGML SGML parser, and I started working with that. The more I worked with it, the more I felt it needed improvements and bug fixes, and nobody else seemed to be doing that. There seemed to be a real need for turning a research-worthy tool into more of a production-quality tool, and that turned into sgmls. Working with sgmls, I got more and more dissatisfied with its basic internal structure. There were some things in SGML that would have been very hard to implement within sgmls, and I felt that I really understood how SGML parsing worked, and so I produced a completely new SGML parser, SP.
DDJ: Did you feel like there were any major itches that you got to scratch with the specification of XML?
JC: I knew how insanely complex writing an SGML parser was. SGML is really doing something very simple. It's providing a standard way to represent a tree, and your nodes have a label with names and they can have attributes. That's all it's doing. It's not a complicated concept. Yet SGML manages to make writing something that implements it into a several-man-year project.
A lot of the features do have a reasonable mo
-
Re:20-year prediction
I believe James Clark, who co-designed Relax/NG, understands and programs in Lisp pretty well (as well as Haskel, Java, C and many other languages). He helped design and implement DSSSL (wikipedia article), which is based on Scheme, and led to XSLT, which he also designed.
-Don
-
Relax NG's compact non-XML syntax
Relax NG has a compact non-XML syntax. But C++/Java is a horrible syntax to use if you want a language to be readable and easy to understand. Since when was 17 levels of operator precedence easy to understand? Of course any good programmer always uses parenthesis to avoid ambiguity, so why should a language have 17 levels of built-in ambiguity just to make it that much easier to make hard to find mistakes?
-Don
From my blog: Relax NG Compact Syntax: no to operator precedence, yes to annotations!
James Clark is a fucking genius! Hes the guy who wrote the Expat XML parser, works on Relax NG, and does tons of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax.
I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):
There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section 1.
You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.
Here's an interesting example of a complex Relax NG application: OpenLaszlo is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.
The schema starts out by defining a few namespaces:
default namespace = "http://www.laszlosystems.com/2003/05/lzx"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1 .0"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"The a: namespace defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!
To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:
# RELAX NG XML syntax specified in compact syntax.
default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace loc -
Relax NG's compact non-XML syntax
Relax NG has a compact non-XML syntax. But C++/Java is a horrible syntax to use if you want a language to be readable and easy to understand. Since when was 17 levels of operator precedence easy to understand? Of course any good programmer always uses parenthesis to avoid ambiguity, so why should a language have 17 levels of built-in ambiguity just to make it that much easier to make hard to find mistakes?
-Don
From my blog: Relax NG Compact Syntax: no to operator precedence, yes to annotations!
James Clark is a fucking genius! Hes the guy who wrote the Expat XML parser, works on Relax NG, and does tons of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax.
I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):
There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section 1.
You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.
Here's an interesting example of a complex Relax NG application: OpenLaszlo is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.
The schema starts out by defining a few namespaces:
default namespace = "http://www.laszlosystems.com/2003/05/lzx"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1 .0"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"The a: namespace defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!
To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:
# RELAX NG XML syntax specified in compact syntax.
default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace loc -
He's never heard of the SAX xml parser: expat ...
Originally developed by James Clark an expatriot Brit living in (of all places) Thailand! http://www.jclark.com/xml/expat.html.
A prime example of great open source code. -
Paging James Clark...
I hope that James Clark will be able to help correct the situation.
In case you haven't heard of James Clark, he wrote groff (for displaying man pages amongst other things), XSLT, the expat XML Parser and the Relax NG schema language. I'd be very surprised if anybody here hasn't used his stuff... Take a look at his bio.
-Dom
-
What I want to see......is support for this in automated tools like JavaDoc , Jade/OpenJade (for DocBook), and so forth. There are times when I want to read through a manual that's online, and waiting for the next page in an ordered manual really breaks my concentration sometimes. Sometimes it's so bad that I just run wget -r against the website in question in order to prime my proxy-cache.
On the other hand, it would be nice to be able to specify a maximum bandwidth to use for prefetching as another option. However, perhaps a proxy-cache (like Apache or Squid) could recognize the
X-moz: prefetch
header and give those requests lower priority and more-throttled bandwidth. Hey... the cache could even parse HTML requested from it and fetch those links; then users of older browsersw could get the same benefits! -
Re:Um, the answer is in the link you posted.
All very excellent points, but if you need an SGML parser to 'do it the hard way' (?) just grab a copy of James Clark's SP parser from here.
You could probably use that as a starting point if your XML parsers don't like the doc format.
Cheers.
-
Re:What about the other JADE?
Not to mention the Jade Docbook processor and Jade text editor...
-
Re:Jade DSSSL open source projectPlease learn how to make links.
<a href="http://www.jclark.com/jade/">Jade DSSSL open source project</a>
(without any spaces put there by Slashdot) yields: Jade DSSSL open source project
If that's too much typing for you,<URL:http://www.jclark.com/jade/>
(without any spaces put there by Slashdot) yields: http://www.jclark.com/jade/ -
Re:Jade DSSSL open source projectPlease learn how to make links.
<a href="http://www.jclark.com/jade/">Jade DSSSL open source project</a>
(without any spaces put there by Slashdot) yields: Jade DSSSL open source project
If that's too much typing for you,<URL:http://www.jclark.com/jade/>
(without any spaces put there by Slashdot) yields: http://www.jclark.com/jade/ -
jade package in debian is something else
The jade package in Debian is "James Clark's DSSSL Engine", and it's been there since 1997, and has copyright dates going back to 1994, so I think James has precedence over both you and this company.
I doubt if James would mind that your quite different project has the same name, but he might have some interest in an upstart company threatening people who use the name, and might be willing to work with you on dealing with their threats. His home page is at http://www.jclark.com/ (and the jade project page is at http://www.jclark.com/jade/). -
jade package in debian is something else
The jade package in Debian is "James Clark's DSSSL Engine", and it's been there since 1997, and has copyright dates going back to 1994, so I think James has precedence over both you and this company.
I doubt if James would mind that your quite different project has the same name, but he might have some interest in an upstart company threatening people who use the name, and might be willing to work with you on dealing with their threats. His home page is at http://www.jclark.com/ (and the jade project page is at http://www.jclark.com/jade/). -
Re: INACCURATE TERMS
You can get 1.3.x for Win32 though it's a bit of work to get it to run. The instructions there aren't 100% complete, meaning you'll have to hunt for a few DLLs. A few of them are just in the lib directory, while a few more are in Expat.
It's better than before, but still not great. -
SGML/XML/SVG
-
Re:Can anyone
-
Re:Can anyone
-
Re:Can anyone
-
Re:Maybe he should have read Knuth
XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.
WTF? Perhaps you could explain more about these two cases. As far as I know, general XML parsers such as Expat do not require unlimited memory to parse any finite input document, nor do they require infinite time.
The Document Type Description (DTD) system is equivalent to a BNF grammar for XML documents. It's not quite as flexible as a full BNF because it enforces that elements are correctly nested, but I don't see this as a bad thing.
And yes, DTDs are machine readable. Other grammars for XML documents such as DSD, XML Schema or Relax-NG are also machine readable.
Just as with BNF grammars and flex(1), you can take a DTD and generate an efficient parser from it using FleXML.
Comparisons with TeX aren't really appropriate because TeX is a Turing-complete language, and so impossible to parse automatically in 100% of cases (unless you want to allow that your program will sometimes fail to terminate, ie hang, on particular input files). I don't know what you mean by your subject line 'Maybe he should have read Knuth'...
-
Get a license from somebody else
nothing legally binds you to keep your word that the work is unencumbered by copyright restrictions
Except for language in the typical nearly-public-domain free software license. If Alice can't get a license from you, she can get a license from Bob: "Permission is hereby granted, free of charge, to any person obtaining a copy of
... the Software to deal in the Software without restriction ... and to permit persons to whom the Software is furnished to do so" (emphasis by yerricde). The GNU GPL (a popular copyleft license for software) says it a different way: "Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor". Unlike with a submarine patent, once this type of contract is in place, you can't just revoke the licenses at any time. -
Wow! Slashdotted Already?
This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named <template>, <output>, and <stylesheet> can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.
An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix xmlns: is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.
Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix bk to the namespace name urn:xmlns:25hoursaday-com:bookstore and its child element contains an inventory element that contains a namespace declaration that maps the prefix inv to the namespace name urn:xmlns:25hoursaday-com:inventory-tracking.
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tra cking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the urn:xmlns:25hoursaday-com:bookstore namespace name is the entire bk:bookstore element, while that of the urn:xmlns:25hoursaday-com:inventory-tracking is the inv:inventory element. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from the http://www.rddl.org namespace that can be used to locate machine readable resources about the members of an XML namespace.
It should be noted that by definition the prefix xml is bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.
Default NamespacesThe previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name xmlns and its value is the namespace URI that is the namespace name.
A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tra cking" />
</book>
</bookstore>
All the elements in the above example except for the inv:inventory element belong to the urn:xmlns:25hoursaday-com:bookstore namespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.
Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the bookstore element is from the urn:xmlns:25hoursaday-com:bookstore while the other unprefixed elements have no namespace name.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tra cking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>&n bsp;
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>&n bsp;
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>&n bsp;
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the title attribute belongs to the bk:book element and has no namespace while the bk:title attribute has urn:xmlns:25hoursaday-com:bookstore as its namespace name. Note that even though both attributes have the same local name the document is well formed.
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
</bk:bookstore>
In the following example, the title attribute still has no namespace and belongs the book element even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book title="Lord of the Rings, Book 3" />
</bookstore>
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the urn:xmlns: scheme for this purpose and create namespace names similar to urn:xmlns:25hoursaday-com when authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.
Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][ / rea]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and xmlns as their prefix or qualified name.
Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]); //create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]); //Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr); //print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
&n bsp; + PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1xpathquery.exe bookstore.xml
/bookstore/book/titleSelects all the title elements that are children of the book element whose parent is the bookstore element, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title>xpathquery.exe bookstore.xml
//@genreSelect all the genre attributes in the document and returns:
genre="autobiography"
genre="novel"xpathquery.exe bookstore.xml
//title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the book elements.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix bk to the namespace name urn:xmlns:25hoursaday-com:bookstore is in scope for the second book element only. Example 2
xpathquery.exe bookstore.xml
/bookstore/book/title
Selects all the title elements that are children of the book element whose parent is the bookstore element, which returns NO RESULTS.
xpathquery.exe bookstore.xml
//@genreSelects all the genre attributes in the document and returns:
genre="autobiography"
genre="novel"
xpathquery.exe bookstore.xml
//title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no bookstore, book, or title elements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.
The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes. Example 3
xpathquery.exe bookstore.xml
/b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the book element whose parent is the bookstore element and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"> The Confidence Man</bk:title>xpathquery.exe bookstore.xml
//@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all the genre attributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction"xpathquery.exe bookstore.xml
//bk:title[(../bk:author/bk:first-na me = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"> The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
A program for use in executing transforms from the command line.
An XSLT stylesheet that prints all the title elements from the urn:xmlns:25hoursaday-com:bookstore namespace in the source XML document when run against the bookstore document from the urn:xmlns:25hoursaday-com:bookstore namespace.
The resulting output. Program Imports System.Xml.Xsl
-
This was my final year project thesis
This was my final year project thesis. Just remember the golden rule unstructured 2 structured == convert 2 XML I wrote a [very bad] program in C++/Perl/tcsh IPC=pipes to add XML tags to English, and then index them into a search engine which would use the lingual data stored in the XML tags to help the search.
NIST does a MASSIVE competition on this annually. I don't want to be an XML-buzzword whore <Arnold Schwarzenegger accent> (XML commando eats Green berets, C++, Java, Perl, COBOL for breakfast)</Arnold Schwarzenegger accent> but you can't beat XML for easily converting anything that you can make sense out of into computer readable format. Real h3cKoRs use SGML, but us underlings have to stick with things we can understand like XML. As for expandability, if we want to encode something else into the document, then just tag-it-and-go
It took me 200 hours to fish out all these links (before the Google days), I don't want anyone to have to waste as much time as I did feeding the search engines exotic foods. It's a year old so pardon me for the odd broken link, armed with these you could probably turn jello into XML ;-)
My favourite bookmarx
PROJect[21 links]
Beginners' Guide[13 links]
Berkeley Linguistics Dept. Course Summaries, general stuffzzzzzzzzzzzzzzCryptic IR Vocabulary defined
Explanations of weird words like hypernym zzzzzzzzzzzzzzHow do we produce and understand speech
How Inverted Files are Created - Univeristy of Berkeley zzzzzzzzzzzzzzNLP Univ. of Indiana, very good basics e.g. word sense d
Simple langauge - useful.... zzzzzzzzzzzzzzWhat is Natural Language Processing, links
What is POS tagging........ zzzzzzzzzzzzzzWord Sense Disambiguation defined
Word Sense Disambiguation in detail, scroll down far zzzzzzzzzzzzzzWord Sense Disambiguator - LOLITA (tested at MUC-7 and SENSEVAL competition as best)
XML for the absolute beginner
HTML, XML stuff + parsers[19 links]
Apache plug-in that uhhh does stuff with XML zzzzzzzzzzzzzzConvert COM to XML
convert XML, HTML to Unix pipeable formats zzzzzzzzzzzzzzconverters to and from HTML
expat XML parser zzzzzzzzzzzzzzHTML Tidy - converts HTML 2 XML + source code!!
Parse DB (RDBMS, whatever) to XML zzzzzzzzzzzzzzPerl-XML Module List
PHP Manual XML parser functions - what the hell are they talking about, PHP Virtual M... zzzzzzzzzzzzzzPublic SGML-XML Software
Pyxie - XML Processor for Python, Perl, etc. zzzzzzzzzzzzzzSGML+XML tools.org
The XML Resource Centre - massive number of links zzzzzzzzzzzzzzW4F wrapper - wrapper converts XML to HTML
XFlat - convert flat file into XML zzzzzzzzzzzzzzXML Parsers and other XML stuff
XML.com - Parsers, etc. zzzzzzzzzzzzzzXML-Data Catalog System - uhhhh looks close
XTAL's general converter - convert anything 2 XML
other Background[8 links]
Is Linux ready for the Enterprise, scalable... zzzzzzzzzzzzzzLinux reliability
Linux Versus Windows NT, Mark(sysinternals bloke) zzzzzzzzzzzzzzPC reliability (pcworld)
SPEC - Standard Performance Evaluation Corp. zzzzzzzzzzzzzzSystems benchmarks
TPC - Transaction Processing Performance Council zzzzzzzzzzzzzzUnix Beats Back NT In EDA Workstation Arena
Proper TREC(-8) QA systems[2 links]
pg. 387 LIMSI-CNRS pretty deep parsing[2 links]
More links....
NLP, IR links - lots to corpii, etc.
pg. 575 U. of Ottawa and NRL (shit system, got 0%)[1 links]
LAKE Lab
pg. 607! University of Sheffield (crap system, but OPEN SOURCE!)[2 links]
GATE - FREE IE app w`source code
LaSIE - ER, coreference, template (cv)
pg. 617 Univ of Surrey (inconclusive matches)[2 links]
System Quirk - Or is this their search system..... Hmmmmmm
Univ of Surrey - pointers (hopefully this is their WILDER search system...)
SMU - Pg. 65[1 links]
Natural Language Processing Laboratory at SMU
Textract[2 links]
Cymfony - Technology
Textract - State of the Art Information Extraction
Xerox uhhhhh maybe[1 links]
Xerox Palo Alto Research Center
(OVERVIEW) 1999 TREC-8 Q&A Track Home Page
NLP bloke, Univ Sussex
Tcl-Tk[4 links] Tcl tutorial
Tcl-Tk Contributed Programs Index
Tcl-Tk Resources, sources
TclXML - manipulating XML using Tcl-Tk
Artificial Natural Language - Is this what I'm trying to parse into...
Comparison of Indexers - Prise vs. Inquery vs. MG, etc.
Eagles - Language Engineering Standards
Language Technology Group - lots of modules!
LDC - Linguistic Data Consortium, lots of corpora
Lexical Resources
Links 2 resources, indexers.....
Lots of IR stuff, University of uhhh
Managing Gigabytes Indexer
Managing Gigabytes Manuals and stuff
Htdig search system
NLP & IR (NLPIR, NIST) Group
OVERVIEW OF MUC-7-MET-2
Perl XML Indexing - XML search engine type thing
Phrasys Language Processing Software Components (money)
QA HCI bullshit
SIGIR - TREC-type thing, resources
SMART indexer system documentation
Text REtrieval Conference (TREC) Home Page
The Natural Language Software Registry
Thunderstone IE and IR products
WordNet - FREE DOWNLOADABLE lexical English database
Page created with URL+, nice utility for working with internet shortcuts -
Free software
Distributing copies of the game is clearly copyright infringement.
Not if the game has been released as free software (or even free as in beer). Nintendo's titles aren't free, but mine are. I put my proofs of concept and short utilities under the Expat license and my full games under the GNU GPL. It's only copyright infringement to redistribute binaries of GPL'd software if you neglect make an offer to distribute machine-readable source code at cost.
Developing and using free software constitutes a substantial non-infringing use of the Visoly Flash Advance Pro cartridges.
-
XP
2002-01-10 21:59:08 Is the XP name copyrighted already? (askslashdot,microsoft) (rejected)
This is from a story I submitted earlier. I'm not going to type it again, but here are the links, you check it out. Look at the copyright date and remember that XP is a generic term for WindowsXP, and Microsoft often uses the term "XP" for Windows XP. It's used in commercials, and elsewhere. Does this cross the line?
XP - an XML Parser in Java
Google:Searched the web for XP. The above page is the second one down. -
XML/XSLT parsers
There are several good XML parsers, some free, some commercial. Have a look at the following URLs for more info on free versions:
xml.apache.org
users.iclway.co.uk/mhkay/saxon
www.jclark.com/xmlI hope this is of some use to use.
-
Re:DML is not an XML dervied language
As a co-author of several XML FAQ's and HOWTO's I should say without doubt that the code I saw there is not XML in any way. It's pure bullshit. Just a bunch of tags that would never run through any parser. Not even non validating parsers such as my good friend James Clark's expat. If you still insist it is XML, feel free to run it through expat and let me know the results
:) -
Re:Don't use DTD - use XML SchemaSGML's tag minimization is friendlier for authoring by hand--you can write
<h1 id=my-heading/My Heading/
instead of<h1 id="my-heading">My Heading</h1>
which can add up in a long document. You can use a tool such as James Clark's sx to normalize your document, and get a version that's still valid according to the original DTD but can also be handled by more simple-minded XML parsers (the parse tree doesn't depend on knowledge of the DTD). -
Re:Ouch !
I guess I should have looked this up before posting, but ouch, bother... Doesn't it have to do with what's the characters in the value are...? Alphanumerics are OK, others are not. Anyway, the validator is based on a rigorous parser, the SP, I'm pretty sure it reports only the things that are wrong.
:-) -
Re:markup
SGML would be better. Librarians invented SGML for exactly such purposes (long-term data storage). It allows you to encode all sorts of things, like hyperlinks, proper footnotes, typesetting/formatting information, etc.
IMHO, I think a lightweight SGML variant would be ideal for PG. From that, you could use freeware tools (like Jade) to generate TXT, HTML, DVI, PDF files as necessary, with hyperlinks and/or beautiful TeX-like typesetting as the format allows. And the source language would be stable enough to not be completely irrelevant 100+ years down the line. (which, btw, I think will become the case with HTML) -
Re:WYSIWYG is your enemy, mod_include is your frie
Sounds like a useful setup. I hope we won't be stuck with this forever, though. Too many files. Bluh.
Once XSLT is out there, you can use an XSLT stylesheet to paste in the header, footer, navigation, and so on. It'll be pretty easy, too. Here's an example stylesheet. It just contains the header, the footer, and three XSLT elements.
<?xml version="1.0"?>
<!-- xsl stylesheet to add template & navigation to pages -->
<html xsl:version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <head>
<title><xsl:value-of select="/html/head/title"/></title> ...
</head>
<body>
... (all sorts of header stuff, including tables and graphics) ...
<h3>Related links:</h3>
<xsl:copy-of select="//div[@id='related']" />
... (more tables and stuff) ...
<!-- now paste in the content -->
<xsl:copy-of select="//div[@id='mainbody']" />
... (all sorts of footer stuff) ...
</body>
</html>
The first one glues in the title of the page; the other two paste in content. Simple-- but it is also a lot more powerful than this example shows.
Share and enjoy:
- XSLT specification (http://www.w3.org/TR/xslt)
- James Clark's partial XSLT implementation (http://www.jclark.com/xml/xt.html) - not standard-compliant yet but very good.
-
Re:Hmm...
Well, the expat library already exists and seems to be quite defacto under Unix.
At least, PHP and Apache use it, and well. And a couple of other utilities and CPAN modules. I've come across. It's fast, small, and not full of unneccessary crap.
Anyway, utilities like Apache and ProFTPD already have meta XML config scripts, and a fair few perl scripts that make use of XML::* as well.
I don't think I like the idea of having a dynamic XML library, and your entire /etc filesystem depending on it. What do you do when it breaks? It's like screwing up your LD.so.1 ... ow :)
Still, a standard, easy to use DOM for utilities would be nice, perhaps some kind of extension of the GetOpt stuff would work? It's succeeded largly because it's there and easy to use, and an XML standard would have to do the same. -
Re:iCab does this already
SGML validation WRT a particular DTD is well-defined, and several tools like James Clark's excellent SP are available. Any tool that gives different validation errors (as opposed to style or compatibility warnings, which some boneheaded tools don't distinguish) has either unearthed a bug (which can be checked for by hand) or is simply wrong, and I know which way I'd bet.
-
Re:You're looking at the problem the wrong wayXML really doesn't change any of the domains EXCEPT the presentation domain.
I only partially agree. In the presentation domain, XML can be used to isolate the logical structure of the data from the HTML/WML/etc. It's very useful for this, but beware of the slowness of XSLT (as others have commented). I found that using the fastest XSLT (the jclark version) it still took around 300 ms to produce about 20K of HTML from XML.
In my situation, much of the XML was static information, so I decided to generate JSP output using XSLT instead, since JSP is compiled; the same could be done with another compiled scripting lanuguage. What was most interesting to me was the problem of isolating the static parts of the page, which could be compiled in JSP, from the dynamic parts, which had to come from the database / application layer. In this case, the tag extensions in the latest JSP (1.1) are very handy. They allows the JSP file to be a well-formed XML document, and therefore easily generated by XSLT, and the extended tags can be programmed to interact with the application layer in a very clean way. The tag extensions could be programmed to either interact with an application object, or a XML DOM, although actually the latter is more cumbersome.
I agree that XML is not very valuable as a direct interface to the database -- there should always be a layer between the database server to enforce access control, implement rules, etc. However, XML is useful as an exchange format between loosely connected servers, such as in B2B interactions. In these cases it is better than using distributed objects, because the coupling is looser and easier to define. But I'm of the opinion that the XML should represent a high-level operation, not database rows.
-
Re:Performance Issues, XSL and Available Tools
I'd like to echo your shout out to James Clark's products. On the Java front, his XT library implements XSLT, and uses a SAX parser (which, as was pointed out, implies better performance than DOM).
http://www.jclark.com/xml/xt.html -
Re:Why XML?
Definitely use XML over HTML. With XML, you can make up your own tag-set that accurately represents the structure of your documents. It would then be trivial for you to write an XSLT (see http://www.w3c.org/TR/xslt) stylesheet to transform your document into HTML (which has very little structure beyond lists and nested 'div' containers) for delivery, complete with auto-generated TOCs, indices, etc.
Then, if you decide to change your HTML style, you can just re-generate it by changing your stylesheet - without touching your content. It's sort of like generating HTML forms out of content in a database.
In terms of internationalization support, XML documents can contain just about any Unicode character. So basically you can write an XML document in practically any language.
Your XML source can capture things like:
12345-67
Whereas in HTML it would be more like:
12345-67, or at best 12345-67
In HTML, the only way to reproduce the 'build-your-own-vocabulary' capability of XML would be to have your whole document be a sea of div> and elements, with their 'class' attribute set to different values. But processing (and reading) such documents would be a real bitch.
A plethora of XML tools are available and tons more are on the way...
I recommend using James Clark's "Xt" (http://www.jclark.com/xml/xt.html) XSLT engine in conjunction with Sun's "Project X" (http://java.sun.com/xml) XML parser.
James was editor of the XSLT spec, and an outstanding programmer. The Sun XML parser has been shown to be the most conformant, and is quite fast.
Both applications are written in Java.
More XML info:
Open standards! -
Why use PNG?
SGML is horrendously complex, even for an ISO standard. nsgmls (part of James Clark's excellent SP package) compiles to over a megabyte of x86 code- and it's just a parser.
IMHO specs like PNG that J.Random Hacker can sit down and actually implement correctly are far superior.