elharo · Slashdot Mirror

Re:Useless on An Early Look at JUnit 4 · 2005-09-15 01:35 · Score: 1

I don't believe that code using 3rd party APIs requires either setting up a complex environment or mocking things out. And I see no reason the code should be tested in isolation of third party APIs. If the code uses the third party APIs, so should the tests.

I routinely write tests that access various external APIs such as SAX, DOM, and QuickTime for Java. Some of the time what I'm testing for are workarounds for bugs in specific third party APIs. Perhaps some third party APIs require complex setup (DOM isn't trivial) but that's what fixtures are for.

Really the question of 3rd part APIs is completely orthogonal to the question of complex setup. Some APIs have complex setup. Some don't. Some APIs are third party. Some aren't. And these come in all combinations and flavors.

There are times mock objects are appropriate. I don't want my database tests to affect the production server; but maybe I do want them to hit a real database, not some pseudo thing that doesn't behave like a real database behaves. I want my tests to tell me as much as possible about how my code will operate under real conditions. The more I mock out, the less confidence I have that code works when it has to run in the real world instead of the mock world I've invented.

Re:Useless on An Early Look at JUnit 4 · 2005-09-15 00:44 · Score: 1

I don't believe code does need to be refactored to make it compatible with JUnit. JUnit can and should test the API as it exists. Unit testing should not be a consideration in designing an API. Of course, if the API needs to be changed or refactored for other reasons, that's fine; but unit testing isn't one of those reasons.

It's a common misconception that unit testing must directly access the method its testing; but this isn't the case. private and non-public methods can be throughly tested through the public methods that call those private methods. Doing this ensures that tests will mimic the way code is actually executed in a running application. It leads to more reliable, more useful tests.

Re:Useless on An Early Look at JUnit 4 · 2005-09-15 00:30 · Score: 1

JUnit's had fixtures for as long as I've been using it, since about 1999 or so. JUnit 4 adds an additional kind of fixture that enables setup and teardown code to run once per test class rather than once per test method.

However, ultimately if there's too much setup and teardown, you may not really have a unit test. You might be trying to run an integration or acceptance test instead. Admittedly the boundaries are fuzzy, and I cross the line routinely in my own work. However, it is important to remember that JUnit is designed for unit testing; and while it can be used for other kinds of tests, it won't fit those as well as it does true unit tests.

Re:Web / GUI on An Early Look at JUnit 4 · 2005-09-15 00:26 · Score: 1

Unit tests for GUIs are indeed challenging, but doable. I'll be talking about this (including Abbot) at Software Development Best Practices in Boston in a couple of weeks. I'll put my notes online after the talk.

Re:Really? on Effective XML · 2005-02-28 23:40 · Score: 1

Binary formats are fundamentally more opaque than text formats. You cannot just open up a binary file in emacs or jEdit and start hacking on it. They require special purpose tools to generate, edit, and consume. The more complex the format the more complex (and expensive) the tools become. Currently it's possible to generate completely well-formed, valid XML using nothing more complex than printf(). With a binary format, standard or otherwise, this would no longer be feasible.

Re:Disgruntled with XML.... on Effective XML · 2005-02-28 22:37 · Score: 1

The big problem arises when you try to process arbitrary XML by binding it to object structures. Consider, for instance, trying to data bind XHTML. It can be done, but you're unlikely to come up with anything simpler than DOM in which case, why not just use DOM?

Data binding tools tend to implicitly subset XML. That is they assume things like

Documents have schemas or DTDs.
Documents that do have schemas and/or DTDs are valid.
Structures are fairly flat and definitely not recursive; that is, they look pretty much like tables.
Narrative documents aren't worth considering.
Mixed content doesn't exist.
Choices don't exist; that is, elements with the same name tend to have the same children.
Order doesn't matter.

Different data binding tools have different subsets of these problems, but most have at least some of them.

The problem is data binding tools tend to view the world through object colored glasses. They assume elements are just a funny kind of serialized object, and they're not really. That said, if all you're doing with XML is serializing objects the limitations of a data binding API may not bother you so much because you already have a class and object centric view of the world. However, if you start with arbitrary XML you're unlikely to be able to bind it to anything much simpler than DOM without throwing information away.

Re:Just because you CAN... on Effective XML · 2005-02-28 12:29 · Score: 1

CDATA sections don't need to nest. If you're trying to nest them, you're doing something wrong. CDATA sections are merely syntax sugar. (Items 9, 14 and 15) You absolutely can include the three character sequence ]]> in XML documents. You just have to escape the greater than sign as >.

The point is not that escaping is not necessary when creating an XML document. The point is that the escapes you need are predefined and understood by the parser. You don't need to think about them.

I've seen way too many CSV and similar flat-file parsers that keel over and die (or worse, corrupt data without noticing a problem) when presented with data that contains commas, tabs, quotation marks, line breaks and the like.

XML avoids this by providing necessary escapes. Furthermore, when you receive an XML file, you know what the escapes are. You don't have to guess whether this file uses \" or "" or some other mechanism for escaping otherwise reserved characters. It's not that XML's escape mechanism is fundamentally better or worse than other escape mechanisms. It's just that it's standard enough that we can stop worrying about it.

Re:Disgruntled with XML.... on Effective XML · 2005-02-28 12:19 · Score: 1

Building dual hierarchies is indeed a problem when documents get large relative to available memory. In these cases you're normally better off using a streaming parser like SAX or StAX or System.Xml.XmlReader rather than a tree API like JDOM, DOM, etc. There's little to no extra overhead there, and it's a lot faster than writing your own grammar (more robust too). Possibly you can takle a middle ground with XOM and only build subtrees in memory.

You might want to consider some of the XML data binding APIs. However, you need to be very careful when choosing one, as most of these tools have serious design flaws that are not always apparent at first glance; but if those flaws don't impact your specific application you may be able to get away using one.

Re:Really? on Effective XML · 2005-02-28 12:09 · Score: 1

XML works extremely well as a mechanism for language independent object persistence, precisely because XML is language independent. It's not tied to any one language's structures or data types. The key to using it this way is to simply define an appropriate XML format for your data, and then write the code to persist that format. It's actually quite easy to do.

The problems arise when you start drinking the snake oil that many object-to-XML mappers are trying to sell you, both in the payware and open source worlds. Way too many of these tools treat XML as just a format for persisting objects, and forget that XML structures are much richer than naive object mappings sometimes allow. Mixed content, document order, multiple child elements with the same name, and invalid documents are just some of the bugbears that haunt poorly designed OO-to-XML mapping tools.

However, if objects are what you start with, it's pretty easy to write them to XML and then read them back in again. Starting with arbitrary XML and going to objects is a lot trickier.

Re:Really? on Effective XML · 2005-02-28 11:58 · Score: 2, Interesting

There's a very real tension between making examples too trivial to be interesting and making them too long to be readable. I struggle with it in every book I write, and every other programming book author I know does so too. I've tried putting so-called real-world examples in books, and it's hopeless. It can't be done. There wouldn't be any space left for the explanatory text, nor would anyone put up with reading page after page of code.

Most importantly, while I tend to be writing about just one topic at a time, real world programs wander all over the map. I may be trying to explain how to use callbacks in SAX, but a realistic program also has to consider network latency, GUI design, error logging, numerical algorithms, internationalization, and a hundred other things that aren't on topic. Covering them all would obscure the subject I'm actually trying to explain. Some things you just have to leave for other books and other authors.

As an author, I try to strike the right balance between excessive simplicity and excessive length. Sometimes I hit it. Sometimes I don't. I actually think Effective XML hits it fairly well. In fact, this book was one of the toughest I ever had to write, precisely because it was so short that I couldn't spew pages like I did in Processing XML with Java (1100 pages) or the XML 1.1 Bible (1000 pages). I had to be really picky about how much code I included, and make sure that each example carried its weight, demonstrated just the point at hand, and nothing else.

By the way, the chapter with that specific example is online if anyone cares to see for themselves just what it is that makes names a more interesting and complex problem than "John Doe Ph.D" seems to be at first glance.

Re:Disgruntled with XML.... on Effective XML · 2005-02-28 11:42 · Score: 2, Informative

Hmm, that's one I haven't been asked before.

I suspect what it offers is that you don't have to define and write your own BNF grammar, and then implement it in lex and yacc or similar tools.

Grammar design is non-trivial, especially if you need to consider issues like internationalization. Picking XML as the underlying format means you don't have to do this work yourself. Why reinvent the wheel?

Sometimes you do need something different, but a lot of alternative formats don't really have a good reason to exist. More often than not, custom parsers just come about because a programmer is more comfortable writing bad parsing code quickly than learning a new, more robust API in order to use someone else's parser.

Re:Just because you CAN... on Effective XML · 2005-02-28 11:35 · Score: 2, Informative

These days data has to be pretty damn simple to justify using a flat file rather than XML. I wrote more about this in my previous book, Processing XML with Java than in this one, though. Chapters 1-4 discuss this in some detail.

Real-world data often gets messy in ways that don't lend themselves to flat files. For instance, two of the thorniest problems:

How do you handle encoding detection and international characters?
What do you do when the data contains characters you're using as field delimiters?

Both of these are completely solved by XML with no extra effort on your part, and these are hardly the only issues.

I certainly agree that it's easier to write a parser for a flat file format than it is to write a parser for XML. However, it's much easier (and much more reliable) to use one of the existing well-tested, debugged XML parsers than it is to write your own flat-file parsing code.

Re:XML Seems Cool on Effective XML · 2005-02-28 11:27 · Score: 2, Informative

Please don't tar XML with the schema brush. One of the unique innovations of XML is that schemas are optional, and need not be agreed on. Schemas can be useful as I discuss in Item 37. However, they are misused and overused far more often than they're used correctly.

Really, schemas are just convenient tools for a few special purposes. Not everyone needs them, and no one needs them all the time. Schemaless XML is a lot more interesting and practical.

Re:Mod parent up on Effective XML · 2005-02-28 11:23 · Score: 1

There is no such thing as "the XML data model". There are XML data models, in fact any number of them. For instance, right now I'm working on a program that processes XML a as linear stream of events, with little if any hint of a tree structure anywhere to be found.

There is not now, never has been, and never will be one canonical XML data model. XML is about syntax, not data models. Data models are local and non-exchangeable. Syntax is interoperable and transferable. This is one of the points I try to bring out in the book.

Re:hmmm on Effective XML · 2005-02-28 11:18 · Score: 5, Insightful

Ever try to debug deeply nested LISP in a plain vanilla text editor? Ever try to find exactly which closing parenthesis is missing where? That's why end-tags have names. It's pure human factors. Computers don't care about this. People do.

SGML (XML's precursor) did have minimized end-tags like . Experience proved this caused more pain than it alleviated. Hence the lack of minimized end-tags in XML.

Re:Really? on Effective XML · 2005-02-28 11:12 · Score: 1

I'm very skeptical of so-called binary XML formats, as you'll find in Item 50, Compress if Space is a Problem. There are use cases where XML isn't appropriate (and I discuss these in the book, mostly data scanned from nature such as JPEGs and MP3s) but it isn't at all clear how a binary encoding of XML, would help these use cases. There are also environments like the smaller cell phones where XML doesn't (yet) work very well. Again, moving to binary doesn't necessarily address the underlying issues here. Furthermore, developing new formats tailored to special purposes and environments such as cell phones and scientific data, tends to deoptimize XML for other uses. XML isn't an optimal format for any one use case, but it's a very nice compromise across many different areas.

The one use case a binary XML encoding does address well is the need of a number of vendors to sell expensive tools for working with data and hide people's data from them. XML is just too obvious and too cheap to justify lots of expenditures on tools. If you hide the text inside an opaque binary format that programmers need special (even patented) tools to view, why then, companies can sell tools again! Surprisingly, I don't find this use case too compelling. :-)

H1B visas are unfair to immigrants on Debugging Indian Computer Programmers · 2004-12-17 00:53 · Score: 1

The review (and perhaps the book--I haven't read it yet) completely misses the point of opposition to H1-B visas. Doubtless there's bigotry against immigrant workers, but H1-B visas are still a very bad thing for immigrants as well as native workers.

A person who comes to the U.S. on an H1-B visa is an indentured servant, allowed to be here for only a limited time. They must work for the company that hired them. They are paid less than the native workers they replace, laws to the contrary notwithstanding, and they do drive down wages for everyone. H1-B visa holders have almost no negotiating leverage, they are frequently mistreated, and they cannot stay in the U.S. to build businesses, raise families. or become part of the community over the long term unless they're lucky enough to get their visa status changed. The U.S. does get a lot of benefit from foreign workers, but only if they can stay here to become part of the community, not if they're forced to leave after a few years of toiling away in a cubicle.

Guest worker status has to be eliminated. It's unfair to the new immigrants and its unfair to permanent residents and citizens. If industry and the government really cared about filling skills gaps as they claimed, rather than just driving down the cost of labor, the solution is simple: replace H1-B, L1, H2 and all similar guest worker visas with green cards. Once a person is allowed into the U.S. to work, let them do so, just like any citizen, without any restrictions. They can take any job they want. They can quit at any time to go to a better job or no job at all. They can start their own businesses. Give immigrants the same rights native-born workers get, and the mistreatment of immigrant workers would vanish overnight, at least in high-tech, and it would be significantly reduced in lower-skill jobs like farm labor. The negative pressure on wages would be reduced too. Finally, immigrants would be allowed to really contribute to building the future of the country and their local communities over the long term. This helps the native workers. It helps the immigrant workers. The only ones that it hurts are the corporations that would have to pay a fair market price for labor instead of purchasing indentured servants at auction.

XOM and nu.xom on Niue WiFi Network Gone, .nu TLD May Follow · 2004-01-12 03:34 · Score: 1

I'm currently working on an open source library (LGPL) written in Java called XOM, http://www.xom.nu/. Its package name is nu.xom (a new XML Object Model) and I have registered xom.nu to stay out of the clutches of NSI. Now I'm wondering if I should change that, for reasons of both stability and good karma.

The only thing I'm sure of after reading this thread, and the references linked from it, is that there are at least two sides to this story, and it's not at all clear to me who, if anyone, is in the right. It's possible the administrators of the .nu domain are scamming greedheads trying to make a buck off a resource they don't own without feeding anything back to the people who do own it. It's also possible they're benevolent men of good will who are trying to provide free Internet access to the entire nation in the face of government demands for bribes. I do not know. In my usual cynical nature, I tend to suspect both sides may well be in the wrong.

Regardless of who's the good guy and who's the bad guy, I'm not sure I want to get tied up in this dispute. I'm now wondering if nu.xom is an appropriate package and domain name. Would anyone care to comment? If anyone is more familiar with the issues involving the .nu domain name, I'd love to hear about it. Also, would anyone like to suggest a different non-NSI affiliated top-level domain XOM could use? I haven't declared alpha yet, but I'm very close, and I'd prefer not to make such a major change after that point.

Re:Why do I have the feeling... on Effective XML · 2003-11-25 04:02 · Score: 3, Insightful

You're right about that. It doesn't. Not all technologies that are isomorphic to each other are equally useful, any more than all Turing complete programming languages are the same. The representation matters, and the XML representation has proven more useful and accessible than the S-expression representation.

I'm not fully convinced that S-expressions are isomorphic to XML either. The proper handling of Unicode and non-English, non-ASCII text presented in multiple encodings is a big advantage of XML compared to S-expressions. I suppose something like this could theoretically be added to S-expressions, but has it been?

Re:One thing is for sure... on Effective XML · 2003-11-25 03:52 · Score: 2, Funny

Hmm, I was wondering what I could do for a sequel. (Only half-kidding).

Several chapters are online on Effective XML · 2003-11-25 03:50 · Score: 4, Informative

Nice review. Thanks! It's interesting how many of the comments here relate directly to chapters in the book. For instance, there's a lot of concern about XML's perceived verboseness. This is addressed directly in Item 50, Compress if space is a problem. This chapter and ten others are online at http://www.cafeconleche.org/books/effectivexml/ . Check it out.

Slashdot Mirror

User: elharo

Comments · 21