Domain: xml.com
Stories and comments across the archive that link to xml.com.
Comments · 183
-
It looks like they offer some kind of an API...
According to this rather old news article, they may offer some kind of API to address getting at the data stored inside. You might want to phone their supportenfolken.
-
Pfft...
Anyone could do that...
[/troll]
I did something similar except as a loadable kernel module. This stuff is actually really cool. And everyone should read Rubini and Corbet. It's free, so why the hell not? -
Re:XML frees us from PerlAbsent free-form text munging, Perl really has no advantage over other languages. At the same time, it has real deficits for people who need to know they have solved a problem correctly and completely.
Absolutely. Once you get beyond text parsing by standadizing the syntax, the goal of a program is to manipulate objects. XML maps very well into object trees and that is why it is commonly processed using Java and Python. If you want the powerful capabilities of a dynamically typed language, with a simple, easy to learn grammer, then you should use Python for processing XML, not Perl. (Perl's object syntax is as obtuse as the rest of the language and offers no advantages over the elegant object model of Python. In fact, Larry Wall borrowed much of the Perl object design from Python. Use the genuine original, not the imitation.) The standard Python library includes a fine package for navigating through XML data and zero text processing code needs to be written to do this. It's objects all the way down.
There is a good article that explains how to use Python generators to process XML content. This is something you will never be able to do as easily in either Java or Perl.
-
hasn't received much attention until recently?
The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."
I mean one need only scroll down the extensive list of CPAN Modules to see well over 50, as well as many sites/authors devoting time, energy and resource.
Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 and XML-RPC also in '01 -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl" written shortly after the turn of the millenium.
Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.
-
I tend to agree, but don't dig out the ORB yet...
I reallly like the approach of the REST guys, a much lighter weight and more intuitive approach to web services than SOAP.
Basically, they are saying - use HTTP as it was intened to be used, not abusing it in a way it was not meant to be abused. -
Re:typical job posting
Actually going to the site and looking up this application, it seems that Harrah's Entertainment is looking for everything from a CIO to multiple "senior application developers" to "technical trainers" to project managers to...
Hmm, but Harrah's apparently is a pretty big chain of casino's. The "Las Vegas" location should've tipped me off. So at least they have the money to throw at it... -
Keep the zealotry to yourself
Hopefully Microsoft will succesfully be forced to integrate Java. The combination could help Java smother
I am getting really tired of Open Source zealots criticizing .net .NET just because Microsoft created it. I am very familiar with both .NET and Java, and IMHO .NET is a better architecture. .NET will soon have at least one Open Source implementation, and Microsoft has actually supported these efforts.People authoritively claim that Microsoft will use patents to kill these efforts if they become competitive, but there is no evidence to support this paranoia, and in-fact Microsoft does not have a histroy of abusing patents in this manner (unlike another company I could mention).
-
Missing information
For some reason, the table doesn't list the license or price of any of the surveyed tools. Only one tool even has the phrase 'open-source' in it's 'Notes' column. One other says 'source available', and a third seems to be hosted on sourceforge.net. Should I simply assume that the rest of these tools are proprietary?
Given the fact that I am (on occasion) willing to part with my $$$ for software if necessary, I definitely would have liked to see pricing information included in this table, if only to rule out those tools which are out of my price-range.
-
Re:Someone give a copy to Microsoft...
I think they already have a copy. The author, Dare Obasanjo, works for them.
-
Article Motivation
When the W3C XML Schema recommendation was first released, there were certain parties whom overwhelmed by its newness, complexity and buggy implementations began to advocate using as few features as possible which culminated in the article W3C XML Schema Made Simple by Kohsuke Kawaguchi. However, a year later with parser implementations getting up to speed and more people using the technology it is clear that a number of the earlier misgivings about using some parts of the technology were misguided.
This is very similar to the situation with Mozilla and C++. In 1998, a few months after the ISO standard was ratified a set of guidelines for using C++ were specified by the Mozilla team which included rules like don't use templates, don't use exceptions, and don't use namespaces. Since then the Mozilla team has looked back at their decision and realized that some of the decisions they made were unwise specifically listed as mistakes were avoiding exceptions and templates. I truly commend the Mozilla team for making their post mortem available online for other [C++ or otherwise based] software development projects to learn from.
This article aims to do the same thing for the XML community and the W3C XML Schema recommendation. -
Article Motivation
When the W3C XML Schema recommendation was first released, there were certain parties whom overwhelmed by its newness, complexity and buggy implementations began to advocate using as few features as possible which culminated in the article W3C XML Schema Made Simple by Kohsuke Kawaguchi. However, a year later with parser implementations getting up to speed and more people using the technology it is clear that a number of the earlier misgivings about using some parts of the technology were misguided.
This is very similar to the situation with Mozilla and C++. In 1998, a few months after the ISO standard was ratified a set of guidelines for using C++ were specified by the Mozilla team which included rules like don't use templates, don't use exceptions, and don't use namespaces. Since then the Mozilla team has looked back at their decision and realized that some of the decisions they made were unwise specifically listed as mistakes were avoiding exceptions and templates. I truly commend the Mozilla team for making their post mortem available online for other [C++ or otherwise based] software development projects to learn from.
This article aims to do the same thing for the XML community and the W3C XML Schema recommendation. -
Re:Whats wrong with html/css2 ?
I don't think the new XML format is meant for documents you wish to publish on the web.
Just being curious here and not a troll, I thought mozilla supported XML. Take a look at this page, where it appears that XML style sheets can be used to impart some BibTeXian features to information perhaps meant for the web. It looks potentially very useful. -
Re:What is the format?
A starting place. No way to know really how close they'll stick to what they've done up to this point.
-
Re:What do they mean, "XML 1.0 chokes"?
Actually it's not, but I'll admit that my "most of the time" was -- I was mixing thinking about attribute values with thinking about element content. Outside attribute values, though, I repeat, I don't understand what the problem is supposed to be. If you've got a Unicode NEL in your UTF-8 encoded XML document, by and large the parser's going to pass it through to to your application. What will happen is that it'll fail to get marked as white space. I'm sure there are people for whom that's hugely relevant, but I bet for the majority of XML applications it's not. If I'm wrong, somebody post a link.
Actually your understanding of whitespace in XML is almost completely incorrect.This comment about external parsed entities is much more relevant. I don't use them in any of my XML applications, but I suppose they could get someone in trouble. However, I still don't see how this qualifies as choking.
This is all clearly explained in the standard itself (W3c XML pages).
Dude, you can't just say "it's clearly explained" and point to an enormous mountain of documents. Why didn't you point to the relevant part of the spec? -
Re:What do they mean, "XML 1.0 chokes"?
Actually it's not, but I'll admit that my "most of the time" was -- I was mixing thinking about attribute values with thinking about element content. Outside attribute values, though, I repeat, I don't understand what the problem is supposed to be. If you've got a Unicode NEL in your UTF-8 encoded XML document, by and large the parser's going to pass it through to to your application. What will happen is that it'll fail to get marked as white space. I'm sure there are people for whom that's hugely relevant, but I bet for the majority of XML applications it's not. If I'm wrong, somebody post a link.
Actually your understanding of whitespace in XML is almost completely incorrect.This comment about external parsed entities is much more relevant. I don't use them in any of my XML applications, but I suppose they could get someone in trouble. However, I still don't see how this qualifies as choking.
This is all clearly explained in the standard itself (W3c XML pages).
Dude, you can't just say "it's clearly explained" and point to an enormous mountain of documents. Why didn't you point to the relevant part of the spec? -
Re:One tiny little update ???
Considering what some other vendors have done to standards, one tiny addition (which is an improvement) proposed by IBM shouldn't be a big deal.
Two wrongs make a right?
IBM has contributed so much, it's only natural that some changes might be characterized in the news as benefitting them more than other parties.
I don't know what that means. This change was requested by IBM and only IBM. As far as I know, no IBM customers have even stood up and asked for it publically (I could be wrong).
Is anyone that worried about adding a new EOL character in 1.1 that XML 1.0 "chokes" on ?
Obviously some people are. Let's keep in mind that there are millions of XML parsers out there and they work together in large part because there is only one version of XML. Now there are two and it will take years to roll out the new parsers universally.
-
The other side of the argument
The Slashdot commentary has been pretty one-sided so I'll try and address the other side. First, IBM has said that this fix is for their mainframe customers, not for themselves. But nobody in the XML world has heard from these customers. As far as I know, no user has submitted a request for this NEL feature. No user has sent a message to the many XML mailing lists. No user has posted to Slashdot. Updating all of the XML parsers in the world is really expensive and if the mainframers don't care enough about the problem to storm the gates then maybe it isn't hurting them that badly. So from a democratic point of view, we're going to make life harder for the people who care enough to scream out loud in order to make life easier for the small minority who perhaps are not even that badly impacted.
Further discussion is on xml.com.
-
What do they mean, "XML 1.0 chokes"?
Does anyone have a link to a page explaining what's really going on? Last I heard, XML doesn't even have a concept of newlines -- most of the time all white space gets normalized (collapsed). The only problem that I could see is if the character wasn't part of the spec for white space. Now, people may have written XML software that chokes, but I think that's a slightly different story. So is the problem that the new character shows up as bogus text content in elements? And is that true for all XML processing software, or does software that relies on a proper Unicode engine not have the problem? What's the deal?
-
Re:hmm
-
Re:SVG has poor form support
No, you're right that SVG doesn't have a full set of widgets - it's still in the primitive, low-level drawing language stage, kind of like Adobe Acrobat.
Not that such a higher-level capability couldn't be developed in future versions of SVG. In that sense, it's like another poster said, where the entire Mac OS X desktop could fit inside Mozilla, or any SVG compliant renderer. Just as we got PostScript enabled printers to abstract away the raw rendering interface onto paper, we could conceivably get SVG enabled frame buffers to abstract away the low level device interfaces. Like Display Postscript would be. It's brave, it's cross-platform, but it's probably inefficient and the SVG spec is probably too new and amorphous to bet hardware design on.
To some extent, I think what you're looking for - forms capability - probably won't develop inside SVG simply because it's the responsibility of a separate working group for XForms.
-
Re:Why parse XML in the first place?Interoperability is great and all, but I think XML is nothing but hype.
Heck, let me give this my crack...
:-)Ok, obviously the biggest reason for XML's popularity is hype. That's just the way the industry works; it doesn't make XML good or bad.
There are a several legitimate technical benefits to XML, that might be persuasive in one context or another.
- It looks like HTML, so everyone intuitively "gets" it.
- It's textual (not binary)--but of course, many formats are textual.
- It's reasonably easy for humans to understand without a spec, provided the tag and attribute names are not obfuscated, and the relationships are relatively simple. Note this does not make it easy for programs to understand!
- You don't have to write your own parser. You don't even have to write a grammar--just throw in a tag and the corresponding code to read and write it. This advantage is not as big as some make it out to be: many languages have easy-to-use features for parsing, and those that don't can make use of easy-to-use parser-generator tools.
- There are lots of libraries and tools. Of course, this is
self-reinforcing (tools -> popular -> more tools -> more popular ->
...).
Many XML proponents, including some in this thread, would add to this list that XML is a good data storage and/or interchange format. Some "insightfully" note that it is better for data interchange than data storage. This is the biggest delusion over XML: XML is a rotten format for data.
Remember what XML was back before the hype machine was in overdrive? It was a better HTML, and a simpler SGML. HTML and SGML have always been formats for documents, and XML was intended to be the same. XML is indeed a pretty good match for documents. (This is debated of course: documents are complex things, and modeling them is non-trivial. Embedded Markup Considered Harmful, by Ted Nelson, is a good introduction.)
But XML is a poor match for data. This is because an XML document is a tree, and most data are not hierarchical. Consider that the database industry abandoned hierarchical databases many years ago (ok, abandoned is a little strong: we still use LDAP). Hierarchical data formats force you to pick which relationships will define the hierarchy, and any other relationships have to be kluged in.
Take a simple example of the sort of thing people use XML for: address book entries. Say you start out with a person element (I'm not going to write out the examples in XML syntax because it's too painful on slashdot) containing a name element and an address element. Now, you realize that multiple people may live at the same address, and you don't want to duplicate the address (data formats should be normalized). You either have to turn things inside out, putting the person element inside the address element, or make person and address both top-level elements, and link them somehow. In the former case, you have chosen an awkward hierarchy, and have "used-up" your ability to group people. What if you want a different grouping in the future? In the latter case, you have given up a lot of simplicity and read/writability (since now names and their corresponding addresses are in different places) by forcing non-hierarchical data into a hierarchical format.
What is the solution? Well, I won't assert that it is the best data model that will ever exist, but the database industry has settled (roughly) on the relational model. So I think we should create a format describing relations, combined with the other advantages of XML: extensible, textual, readable, and most of all, standards-based. Yes, this would mean we would have to learn two technologies, one for documents and one for data. But the technology for data would be so much simpler--and as a bonus, integrate easily into our databases--that it would be a huge win overall. I don't have time to defend this model in depth. But think about it.
By the way, another example of the bad match between XML and data is the great debate over when you should use elements, and when you should use attributes. The fact that there is an arbitrary decision to be made shows that XML has degrees of complexity that only get in the way when you use it as a data format. (If you're going to use XML for data, at least have the decency to eschew attributes except for an id attribute.)
-
Excellent introduction book
I found this book an excellent introduction for Perl programers who want (or have) to start processing XML. It cuts through the long list of XML modules on CPAN (485 results!) and gives you the basic techniques and tools you can use.
XML is really not that difficult to deal with but it can be a little intimidating. "Perl & XML" is written in a simple and direct style that gives the reader enough information to start writing code, and pointers to find more specific information once they have chosen the tools they need.
Armed with this book, The Perl-XML FAQ and Kip Hampton's column on XML.com any Perl programer can start working confidently with XML.
-
Google API
After the introduction of the Google API, some people, especially from the REST camp, criticized the the use of SOAP, claiming it just adds superflous bloat and is generally "unwebby". What do you think about this?
-
rubini for free-e
-
XSL Considered Harmful
Check out this excellent article entitled XSL Considered Harmful by Michael Leventhal from 1999. IMHO it's as true now as it was then. -
SVG
SVG is a W3C approved vector graphic and animation XML language. Development tools for it are coming right along. There is a good series about SVG on XML.COM. The author demonstrates many flash features using SVG.
-
Re:PNG *is* a god-send.As a webmaster and web designer, I see the wide spread adoption of SVG and PNG as far more important steps than a new JPEG.
SVG is the most fantastic vector based graphics format ever created. Not only is it fully scalable to what ever size you care to scale it to, but it's all done in XML, which means scripting graphics creation is nigh on basic.
Since SVG is basically text, the file sizes are tiny! Add to this the fact that svgz (gzipped svgs) are part of the svg standard, and you end up being able to create fully interactive vector based animations weighing in at less than 1K (try this -it's a perfect example of how cool SVG is)
As it stands now, SVG can only be viewed on IE and NS4 with a plug-in, but Mozilla supports it natively if you enable it
;) It's a more important standard to propergate than JP2 IMHO.On the PNG front: PNG is so much better than any other format for layout graphics on web pages. It's alpha tranparency and colour pallet is all you need (it runs circles around GIF). PNGs should be the internet standard for non vector graphics, but alas, IE does not render them properly (the colours get twisted and changed as far as I've experienced). If MS could stick to standards, it'd make the internet a whole lot better.
Anyway, in conclusion, JP2 may sound nice, but there are much more important formats out there that need to be adopted before JP2, which will not only cut down the transfer sizes of graphics, but make web development just that much easier for people like me.
-
Re:WTF is SOAP?
AFIK it is a protocol devised by Dave Winner from Userland and Microsoft, it has been rubber stamped by the W3C, and it's specifications can be found on their site: Simple Object Access Protocol (SOAP) 1.1.
I think some of the most interesting things that have been written about SOAP have come out of the REST thesis, probably the best two introductory articles on REST and the ones on XML.com by Paul Prescod; Second Generation Web Services and REST and the Real World.
There has been quite a bit of interesting discussion on SOAP on the W3Cs Technicial Architecture list, see this thread: SOAP breaks HTTP?.
-
Re:WTF is SOAP?
AFIK it is a protocol devised by Dave Winner from Userland and Microsoft, it has been rubber stamped by the W3C, and it's specifications can be found on their site: Simple Object Access Protocol (SOAP) 1.1.
I think some of the most interesting things that have been written about SOAP have come out of the REST thesis, probably the best two introductory articles on REST and the ones on XML.com by Paul Prescod; Second Generation Web Services and REST and the Real World.
There has been quite a bit of interesting discussion on SOAP on the W3Cs Technicial Architecture list, see this thread: SOAP breaks HTTP?.
-
Re:ESR's FlawWhat interests me is IE's handling of XML in the browser
Wrong. Mozilla have better XML support. See bugzilla #64945 for in-depth discussion. More specifically, look at comment #34 and reply. Another usefull link is this article
Cheers,
--fred
-
Objectivity / Bias
It seems the author is trying to proffer REST, a putatively alternative approach to the use of the existing web infrastructure as little more than a transport for messages to be interpreted by the endpoints, like SOAP does, and I think that is the motivation for the FUD article mentioned in this slashdot story. To me that article does not seem to say much besides that the existing web architecture cannot be used to satisfy the additional security demands created by application level web services interaction protocols like SOAP. I do not see that as a "SOAP security problem".
-
This was my final year project thesis
This was my final year project thesis. Just remember the golden rule unstructured 2 structured == convert 2 XML I wrote a [very bad] program in C++/Perl/tcsh IPC=pipes to add XML tags to English, and then index them into a search engine which would use the lingual data stored in the XML tags to help the search.
NIST does a MASSIVE competition on this annually. I don't want to be an XML-buzzword whore <Arnold Schwarzenegger accent> (XML commando eats Green berets, C++, Java, Perl, COBOL for breakfast)</Arnold Schwarzenegger accent> but you can't beat XML for easily converting anything that you can make sense out of into computer readable format. Real h3cKoRs use SGML, but us underlings have to stick with things we can understand like XML. As for expandability, if we want to encode something else into the document, then just tag-it-and-go
It took me 200 hours to fish out all these links (before the Google days), I don't want anyone to have to waste as much time as I did feeding the search engines exotic foods. It's a year old so pardon me for the odd broken link, armed with these you could probably turn jello into XML ;-)
My favourite bookmarx
PROJect[21 links]
Beginners' Guide[13 links]
Berkeley Linguistics Dept. Course Summaries, general stuffzzzzzzzzzzzzzzCryptic IR Vocabulary defined
Explanations of weird words like hypernym zzzzzzzzzzzzzzHow do we produce and understand speech
How Inverted Files are Created - Univeristy of Berkeley zzzzzzzzzzzzzzNLP Univ. of Indiana, very good basics e.g. word sense d
Simple langauge - useful.... zzzzzzzzzzzzzzWhat is Natural Language Processing, links
What is POS tagging........ zzzzzzzzzzzzzzWord Sense Disambiguation defined
Word Sense Disambiguation in detail, scroll down far zzzzzzzzzzzzzzWord Sense Disambiguator - LOLITA (tested at MUC-7 and SENSEVAL competition as best)
XML for the absolute beginner
HTML, XML stuff + parsers[19 links]
Apache plug-in that uhhh does stuff with XML zzzzzzzzzzzzzzConvert COM to XML
convert XML, HTML to Unix pipeable formats zzzzzzzzzzzzzzconverters to and from HTML
expat XML parser zzzzzzzzzzzzzzHTML Tidy - converts HTML 2 XML + source code!!
Parse DB (RDBMS, whatever) to XML zzzzzzzzzzzzzzPerl-XML Module List
PHP Manual XML parser functions - what the hell are they talking about, PHP Virtual M... zzzzzzzzzzzzzzPublic SGML-XML Software
Pyxie - XML Processor for Python, Perl, etc. zzzzzzzzzzzzzzSGML+XML tools.org
The XML Resource Centre - massive number of links zzzzzzzzzzzzzzW4F wrapper - wrapper converts XML to HTML
XFlat - convert flat file into XML zzzzzzzzzzzzzzXML Parsers and other XML stuff
XML.com - Parsers, etc. zzzzzzzzzzzzzzXML-Data Catalog System - uhhhh looks close
XTAL's general converter - convert anything 2 XML
other Background[8 links]
Is Linux ready for the Enterprise, scalable... zzzzzzzzzzzzzzLinux reliability
Linux Versus Windows NT, Mark(sysinternals bloke) zzzzzzzzzzzzzzPC reliability (pcworld)
SPEC - Standard Performance Evaluation Corp. zzzzzzzzzzzzzzSystems benchmarks
TPC - Transaction Processing Performance Council zzzzzzzzzzzzzzUnix Beats Back NT In EDA Workstation Arena
Proper TREC(-8) QA systems[2 links]
pg. 387 LIMSI-CNRS pretty deep parsing[2 links]
More links....
NLP, IR links - lots to corpii, etc.
pg. 575 U. of Ottawa and NRL (shit system, got 0%)[1 links]
LAKE Lab
pg. 607! University of Sheffield (crap system, but OPEN SOURCE!)[2 links]
GATE - FREE IE app w`source code
LaSIE - ER, coreference, template (cv)
pg. 617 Univ of Surrey (inconclusive matches)[2 links]
System Quirk - Or is this their search system..... Hmmmmmm
Univ of Surrey - pointers (hopefully this is their WILDER search system...)
SMU - Pg. 65[1 links]
Natural Language Processing Laboratory at SMU
Textract[2 links]
Cymfony - Technology
Textract - State of the Art Information Extraction
Xerox uhhhhh maybe[1 links]
Xerox Palo Alto Research Center
(OVERVIEW) 1999 TREC-8 Q&A Track Home Page
NLP bloke, Univ Sussex
Tcl-Tk[4 links] Tcl tutorial
Tcl-Tk Contributed Programs Index
Tcl-Tk Resources, sources
TclXML - manipulating XML using Tcl-Tk
Artificial Natural Language - Is this what I'm trying to parse into...
Comparison of Indexers - Prise vs. Inquery vs. MG, etc.
Eagles - Language Engineering Standards
Language Technology Group - lots of modules!
LDC - Linguistic Data Consortium, lots of corpora
Lexical Resources
Links 2 resources, indexers.....
Lots of IR stuff, University of uhhh
Managing Gigabytes Indexer
Managing Gigabytes Manuals and stuff
Htdig search system
NLP & IR (NLPIR, NIST) Group
OVERVIEW OF MUC-7-MET-2
Perl XML Indexing - XML search engine type thing
Phrasys Language Processing Software Components (money)
QA HCI bullshit
SIGIR - TREC-type thing, resources
SMART indexer system documentation
Text REtrieval Conference (TREC) Home Page
The Natural Language Software Registry
Thunderstone IE and IR products
WordNet - FREE DOWNLOADABLE lexical English database
Page created with URL+, nice utility for working with internet shortcuts -
This was my final year project thesis
This was my final year project thesis. Just remember the golden rule unstructured 2 structured == convert 2 XML I wrote a [very bad] program in C++/Perl/tcsh IPC=pipes to add XML tags to English, and then index them into a search engine which would use the lingual data stored in the XML tags to help the search.
NIST does a MASSIVE competition on this annually. I don't want to be an XML-buzzword whore <Arnold Schwarzenegger accent> (XML commando eats Green berets, C++, Java, Perl, COBOL for breakfast)</Arnold Schwarzenegger accent> but you can't beat XML for easily converting anything that you can make sense out of into computer readable format. Real h3cKoRs use SGML, but us underlings have to stick with things we can understand like XML. As for expandability, if we want to encode something else into the document, then just tag-it-and-go
It took me 200 hours to fish out all these links (before the Google days), I don't want anyone to have to waste as much time as I did feeding the search engines exotic foods. It's a year old so pardon me for the odd broken link, armed with these you could probably turn jello into XML ;-)
My favourite bookmarx
PROJect[21 links]
Beginners' Guide[13 links]
Berkeley Linguistics Dept. Course Summaries, general stuffzzzzzzzzzzzzzzCryptic IR Vocabulary defined
Explanations of weird words like hypernym zzzzzzzzzzzzzzHow do we produce and understand speech
How Inverted Files are Created - Univeristy of Berkeley zzzzzzzzzzzzzzNLP Univ. of Indiana, very good basics e.g. word sense d
Simple langauge - useful.... zzzzzzzzzzzzzzWhat is Natural Language Processing, links
What is POS tagging........ zzzzzzzzzzzzzzWord Sense Disambiguation defined
Word Sense Disambiguation in detail, scroll down far zzzzzzzzzzzzzzWord Sense Disambiguator - LOLITA (tested at MUC-7 and SENSEVAL competition as best)
XML for the absolute beginner
HTML, XML stuff + parsers[19 links]
Apache plug-in that uhhh does stuff with XML zzzzzzzzzzzzzzConvert COM to XML
convert XML, HTML to Unix pipeable formats zzzzzzzzzzzzzzconverters to and from HTML
expat XML parser zzzzzzzzzzzzzzHTML Tidy - converts HTML 2 XML + source code!!
Parse DB (RDBMS, whatever) to XML zzzzzzzzzzzzzzPerl-XML Module List
PHP Manual XML parser functions - what the hell are they talking about, PHP Virtual M... zzzzzzzzzzzzzzPublic SGML-XML Software
Pyxie - XML Processor for Python, Perl, etc. zzzzzzzzzzzzzzSGML+XML tools.org
The XML Resource Centre - massive number of links zzzzzzzzzzzzzzW4F wrapper - wrapper converts XML to HTML
XFlat - convert flat file into XML zzzzzzzzzzzzzzXML Parsers and other XML stuff
XML.com - Parsers, etc. zzzzzzzzzzzzzzXML-Data Catalog System - uhhhh looks close
XTAL's general converter - convert anything 2 XML
other Background[8 links]
Is Linux ready for the Enterprise, scalable... zzzzzzzzzzzzzzLinux reliability
Linux Versus Windows NT, Mark(sysinternals bloke) zzzzzzzzzzzzzzPC reliability (pcworld)
SPEC - Standard Performance Evaluation Corp. zzzzzzzzzzzzzzSystems benchmarks
TPC - Transaction Processing Performance Council zzzzzzzzzzzzzzUnix Beats Back NT In EDA Workstation Arena
Proper TREC(-8) QA systems[2 links]
pg. 387 LIMSI-CNRS pretty deep parsing[2 links]
More links....
NLP, IR links - lots to corpii, etc.
pg. 575 U. of Ottawa and NRL (shit system, got 0%)[1 links]
LAKE Lab
pg. 607! University of Sheffield (crap system, but OPEN SOURCE!)[2 links]
GATE - FREE IE app w`source code
LaSIE - ER, coreference, template (cv)
pg. 617 Univ of Surrey (inconclusive matches)[2 links]
System Quirk - Or is this their search system..... Hmmmmmm
Univ of Surrey - pointers (hopefully this is their WILDER search system...)
SMU - Pg. 65[1 links]
Natural Language Processing Laboratory at SMU
Textract[2 links]
Cymfony - Technology
Textract - State of the Art Information Extraction
Xerox uhhhhh maybe[1 links]
Xerox Palo Alto Research Center
(OVERVIEW) 1999 TREC-8 Q&A Track Home Page
NLP bloke, Univ Sussex
Tcl-Tk[4 links] Tcl tutorial
Tcl-Tk Contributed Programs Index
Tcl-Tk Resources, sources
TclXML - manipulating XML using Tcl-Tk
Artificial Natural Language - Is this what I'm trying to parse into...
Comparison of Indexers - Prise vs. Inquery vs. MG, etc.
Eagles - Language Engineering Standards
Language Technology Group - lots of modules!
LDC - Linguistic Data Consortium, lots of corpora
Lexical Resources
Links 2 resources, indexers.....
Lots of IR stuff, University of uhhh
Managing Gigabytes Indexer
Managing Gigabytes Manuals and stuff
Htdig search system
NLP & IR (NLPIR, NIST) Group
OVERVIEW OF MUC-7-MET-2
Perl XML Indexing - XML search engine type thing
Phrasys Language Processing Software Components (money)
QA HCI bullshit
SIGIR - TREC-type thing, resources
SMART indexer system documentation
Text REtrieval Conference (TREC) Home Page
The Natural Language Software Registry
Thunderstone IE and IR products
WordNet - FREE DOWNLOADABLE lexical English database
Page created with URL+, nice utility for working with internet shortcuts -
Re:I want my EPS!
You'd perfer some EPS?
Markup languages are not meant to be terse, they're meant to be easy for someone to parse and to be readable by humans if need be. The design goals of the XML spec state as much:
6. XML documents should be human-legible and reasonably clear.
and
10. Terseness in XML markup is of minimal importance.
XML was developed as a reaction against the complexity of SGML, not to be perfectly tailored to your pet domain.
Here's a useful project: Try writing a parser to take "x^2 + 4x + 4 = 0" and spit out the MathML. Ta da, suddenly you can share your math. And you don't have to use unreadable EPS, LaTeX, or send people your Mathematica notebook. -
An Introduction to XML Signatures (xml.com)
If you want more information about XML Signature, just check this article
http://www.xml.com/pub/a/2001/08/08/xmldsig.html -
Not only Open Source, but also Open Standards
An article on XML.com outlines the US government's new mandate to support only open standards, specifically mentioning W3C. Even cooler, the guidelines expressly forbid competing (proprietary) standards.
See the article
.micah -
Re:Linux Device Drivers
And you didn't link to it? Shame on you!
-
An excellent reference
If this project becomes a centralized point of distribution or access (ie: SourceForge,) this could really help the open-knowledge community.
For example, many people run out to buy expensive assembler books when the best resource is available online. Or, they run out to buy expensive Linux device driver manuals when the best resource is available online.
Open-source software mainly helps people write new software that uses key techniques / algorithms from open software. Open-source documentation, on the other hand, helps impart the foundations on which the open-source programs get created.
Ideally, this openscience approach would spread -- and students wouldn't need to spend $500 per semester on textbooks. And unfortunately, the Project Gutenberg idea to import books as their copyright expires (50 years after the author dies) would never fly for technology-based books.
As a side note, this index of online books has a lot of good information.
-
Re:Many crawlers ignore robots.txt
This is much simpler:
mkdir -p example.com
touch example.com/robots.txt
wget -r -nc http://example.com/
Note the -nc ("no clobber") option, so you don't have to screw around with su. (and you don't need to download the whole site twice if you take a minute to think about where the robots.txt will go...)
Yes, I know about the .wgetrc setting, but this is good if you want to grab something off of just one site (like http://www.xml.com/axml/testaxml.htm) but don't want to have to worry about forgetting to re-enable robots.txt handling. -
Native XML DatabasesI recently wrote an introduction to native XML databases article for xml.com. My main point there and it applies to this discussion too, is that native XML databases are a tool like any other. For some jobs they're right and for some they're not. I've been working on the technology in the form of dbXML for about a year and a half and in some cases it's great and in others it really stinks. It's all about the right tool for the job.
It's easy to dismiss a new database technology as irrelevant because of the dominance of the RDBMS, but you should really learn more about it and when it is appropriate and when it's not. It's not going to replace relational, and isn't intended to. Here's a few links where you can learn more beyond what's available on Ronald Bourret's site mentioned in the original post.
The XML:DB Initiative
The dbXML Project (open source native XML database) Soon to become an Apache XML project named Xindice
eXist (another open source native XML database)My blog on the subject.
Kimbro Staken -
Re:Tim O'Reilly comes through again
Your kidding right? You don't see the point. Mundie and Rosen were picked just for that reason, to show what we're up against. The old, know your opponent...Yeah this guy has no idea what he's doing, O'Reilly GPL'ed the Linux Device Driver book to encourage the development of Linux drivers, that company must be crazy...They're actually trying to help the community..that's unbelievable..
-
Resource on XMl Data Base
THere was an interesting discussion on XML Data Bases recently on the XML-Dev mailing list, with several XML experts giving pretty interesting and not too biased opinion on the subject. You can find a summary of it by Leigh Dodds in
:XML and Databases? Follow Your Nose. -
An other interesting link
There was a good discussion on XML data bases on the XML-Dev mailing list, which is summarized pretty well by Leigh Dodds XML and Databases? Follow Your Nose.
-
Re:The XML doesn't work that way
I also wrote an article on the XML format for XML.com which you can find here.
It was written before they did the whole Zip thing (though I do mention the zipping in the article), but some of the pointers should still be valid for anyone looking to be able to read the format. -
Re:Online XML references?
And of course the basic reference is the annoted specification. The spec is actually quite simple (and short!) and the annotations are a great way to get the extra details that you can't get usually unless you sit in the working groups.
It is really a shame that the rest of the XML-related specs (XSLT, DOM...) have forgotten one of the basic design goals of the XML spec: simplicity!
-
Re:Files Easy, Editing Hard
XMetaL is the leader in the XML editing category (in North America, anyhow). They've been in the structured editing business for roughly 15 years. Another strong contender is Documentor.
-
Re:Zope
Three relevant links to read in considering Zope for XML are:
Creating XML Applications With Zope
Create a XML Based Document Repository
In some data management scenarios, using Zope obviates the need for XML markup. In practice, content management issues like security, revision control, and online access through a browser are bigger issues than markup. Zope provides solutions to all these problems.
My main caveat in using Zope is that finding all the relevant documentation for XML or anything else is a veritable Easter egg hunt. The Zope API doesn't seem to be documented in one place. More than once a Zope tutorial seriously proposed that the reader read the Python source code for further information.
-
bandwidth is cheap? On what planet?
So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous articles about XML compression also tend to disagree with you that it is not an issue.
PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day. -
bandwidth is cheap? On what planet?
So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous articles about XML compression also tend to disagree with you that it is not an issue.
PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day. -
Very Nice
This is excellent. I bought the original edition of this book and it did a great job of explaining device driver development for the Linux kernel.
I have written device drivers for Windows NT and Windows 2000, and have learned device driver techniques on those OS's. It is much easier to implement a driver on Linux than in Windows. The simple design of the Linux device driver compared to the Windows driver makes design and implementation go much quicker.
For example, the device driver I wrote for the hardware I am working on for Sonic in Windows NT took about 2 months before it was basically working in our software framework. I wrote the same driver in about 2 weeks for Linux.
I could recommend some good books for Windows NT device drivers, but this O'Reilly book is the best I've seen for Linux drivers.
I found that the link on this page was incorrect. The actual link is http://www.xml.com/ldd/chapter/book/bookindexpdf.
h tml
Michael A. Uman
Sr Software Engineer
softwaremagic.net