No Nonsense XML Web Development with PHP

← Back to Stories (view on slashdot.org)

No Nonsense XML Web Development with PHP

Posted by ryuzaki0 on Wednesday March 15, 2006 @05:46AM from the serious-development dept.

Alex Moskalyuk writes "PHP and XML seems like a marriage made in heaven. Powerful manipulation functions and support on the core language level in PHP5 combined with universal extensibility of XML make it a technology of choice for quite a few Web enthusiasts and companies out there. However, anyone inspired by PHP's ease of use can probably find a good cure from insomnia when facing with XML specs. With all the DTD's, XML Schemas, XSLT and XPath queries one can easily get the impression that the world is changing on them, and perhaps sticking to hard-coded HTML with PHP statements combined with SQL statements for data retrieval would be within the zone of comfort." Read the rest of Alex's review. No Nonsense XML Web Development with PHP author Thomas Myer pages 354 publisher SitePoint rating 9/10 reviewer Alex Moskalyuk ISBN 097524020X summary XML, XSLT, XPath and DOM primer for PHP developers

Thomas Myer's No Nonsense XML Web Development with PHP is an XML primer for those who have been exposed to PHP, but are yet waiting to appreciate the elegance of PHP+XML solutions. Throughout 10 chapters and 2 appendices Myer is introducing the reader to different aspects of XML, their best-practice implementations in LAMP (where last P stands for PHP) environment, and their relevance to the real world. For the real-world example Myer decides to guide the reader through writing a custom content management system - complete with publishing/admin interface, templating/presentation layer, search engine, RSS feeds and other commonly expected features.

The book is not an introduction to PHP, but it does assume that the Web developer knows what XML is, but has never dealt with it. So the first chapter just talks about properly parsing XML with IE and Firefox, validating an XML document, differences between a well-formed and a valid XML document. Overall, it provides a very good introduction to XML for someone who has never dealt with it, and could probably be skipped by developers with XML exposure.

Chapter 2, XML in Practice, goes into nitty-gritty details of XML, and 26 pages later the reader knows how to create an XML file to display in the browser, declare proper namespaces, attach a CSS file to existing XML file and display the proper XML+CSS file (look, Ma, no <html>!) in the browser. The author earns instant geek credibility by displaying Firefox screenshots, with the exception of IE screenshot whenever IE is discussed. At the end of the chapter the author takes us through the basic XSLT.

DTD's, XSLT and writing a practical PHP app take up the next three chapters, followed by XML manipulation chapters. JavaScript enthusiasts will probably find Chapter 6 pretty useful, as it discusses manipulating XML on the client side, working with XSLT, and creating dynamic site navigation based on the XML source. Chapter 7 is what one would expect from the book that has the words PHP and XML in the title - discussion of SAX, DOM and SimpleXML parsers, examples of their implementation, discussion of proper use cases for each one of the technologies. The SimpleXML subchapter also contains a good primer on XPath - a query language that allows the developer to provide the parser with a query to navigate down the XML document.

Chapter 8 takes the reader through RDF and RSS, discusses the ways the syndication feeds are used on the Web nowadays. Since throughout all these chapters we're building a content management system, this is the right time to add the RSS headlines functionality to the site. The next chapter discusses another practical implementation of XML on the Web - XML-RPC calls between the sites and proper ways of exchanging data via XML Web services. The chapter discusses SOAP, although not a whole lot, and just mentions REST as another way to implement Web Services. As a practical exercise, the author takes readers on a tour of building an XML-RPC client, server and connecting those two together.

The last chapter talks about using XML with databases. Native XML databases are discussed, but let's face it - most of the PHP development is done with relational databases anyway. Myer talks about exporting MySQL database contents into XML with phpMyAdmin and mysqldump. The first appendix includes function reference for SAX, DOM and SimpleXML parsing in PHP, while the second one completes the CMS project by providing the rest of the necessary files.

I found the author's style very easy to follow and approachable. The code samples are succinct and to the point, there are also no generic discussions, such as "Why PHP?" The project chosen for the practical implementation is a bit boring, but at the same time quite real-world. The screenshots are clear, and code examples are nicely highlighted. The errata is provided on the book Web site. Code archive is available as a single file download as well. The book site also provides 100% money back guarantee (less shipping and handling fees) to anyone who bought the title, and didn't feel like they were getting their money's worth.

However, there are a few drawbacks that I noticed as well. With topics like XSLT and XPath broken into several chapters and discussed in smaller chunks, it's hard to use the book as a reference later on. Appendix A with PHP function reference for XML parsing hardly seems like a worthy addition, since PHP manual page on the subject contains equivalent information with more real-life examples contributed by users.

With all that, the book is quite informative, educational and useful. The author manages to tackle quite a few difficult topics in 260 pages provided to him (the count excludes preface and appendices). However, kudos to the author for writing chapters on XML without sounding boring, redundant or too academic. I would highly recommend this book to anyone interested in developing PHP-driven Web sites that provide or consume Web services, work with XML data or generate XML for others to use."

You can purchase No Nonsense XML Web Development with PHP from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

19 of 131 comments (clear)

Min score:

Reason:

Sort:

Better matched, perhaps, than Perl & XSLT by Orrin+Bloquy · 2006-03-15 05:54 · Score: 3, Insightful

One thing I will concede to PHP is that you tend to be more likely to have XSLT engines installed on a PHP based system, whereas I had to cajole my sysadmin into getting the C-based transformation libraries installed and then locally install the dependent Perl libraries to use it on top of that. In the end the Perl/XSLT solution I created works, but it wasn't fun to install.

--
"Made up/misattributed quote that makes me look smart. I am on /. and I must look smart."
1. Re:Better matched, perhaps, than Perl & XSLT by icklemichael · 2006-03-15 06:00 · Score: 4, Informative
  
  Any recent install of java will almost certainly have an xslt processor on it, you just have to remember the magic incantation:
  java org.apache.xalan.xslt.Process -XSL [template] -IN [file]
Code Download by saberworks · 2006-03-15 05:56 · Score: 4, Informative

I tried to download the code examples but the site is asking me for a special code that is printed inside the book. Bleh.
1. Re:Code Download by twitchkat · 2006-03-15 06:19 · Score: 3, Informative
  
  <?php $f = fopen('/usr/share/dict/words'); while ($w = readline($f)) { $c = curl_init(); curl_setopt($c, CURLOPT_URL, 'http://www.sitepoint.com/books/xml1/code.php'); curl_setopt($c, CURLOPT_POST, 1); curl_setopt($c, CURLOPT_POSTFIELD, array ('word1'=>$w, 'boughtfrom'=>'amazon', 'country'= >'US', 'email'=>'foo@bar.com', 'submit'=>'Downlo ad Code Archive')); curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); $r = curl_exec($c); curl_close($c); if (strstr($r, 'Code Word')) { continue; } write_to_file('result.zip', $r); } ?>
Can't please everyone, can you? by RobertB-DC · 2006-03-15 06:00 · Score: 4, Interesting

From the bn.com reviews, a contrarian view:

Michael, a web programmer, February 7, 2006,
Almost worthless.
Based on the title, one might presume that Myer and Marini wrote the book for people who are already familiar with PHP and XML and want to learn some advanced techniques for combining them. What he gets instead are long (relative to the book itself), superficial introductions to PHP and XML and tiny, trivial examples of their combination. Everything in the book is common sense to someone who already knows PHP and XML. What the book teaches to beginners, however, is effectively useless for its superficiality, so I'd discourage anyone, especially beginners, from reading this book, even if he receives it for free. Time also is too valuable to waste on this book. Read 'PHP and MySQL Web Development' by Luke Welling and Laura Thompson and 'XML 1.1 Bible' by Elliotte Rusty Harold. One can visit SitePoint's web site to find a list of their titles and then return to a vendor site to read product reviews. SitePoint books are generally sub-par. This book is no exception.

Somewhere, someone at bn.com is shaking their head, wondering if this "reader reviews" thing is all that good a deal after all.

(FWIW: I think the book looks like just what I need, with my n00b level of knowledge of PHP and XML but with hopes to put them together myself, if I can just find the right feed.)

--
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
XML/XSLT is often more work than it's worth by markmcb · 2006-03-15 06:01 · Score: 4, Insightful

I authored the site OmniNerd. When I first started writing code, I made a point of storing data either in a database or XML, translating data to XHTML with XSLT, using CSS for all style issues, and controlling everything with PHP. What I struggled with for over a year was the XML/XSLT portion of the site. I was constantly having to jump through all sorts of hoops to get things done that could easily be handled with just PHP and a database.

This isn't intended to be me bashing XML/XSLT, but more of a warning. If you plan to use these two, ensure you fully understand them and how they will tie into your site. I've found with OmniNerd that XML/XSLT solutions are very nice for the more static or semi-static content and that using PHP to generate XHTML directly from the database is better suited for dynamic content.

Whatever you choose to use though, good luck!

--
Mark A. McBride -- OmniNerd.com
1. Re:XML/XSLT is often more work than it's worth by G)-(ostly · 2006-03-15 06:13 · Score: 5, Insightful
  
  XML is not for "storing data". I can't believe people still find that confusing in this day and age. XML is for describing data. It's little more than a loosely built, glorified file format. It serves no more purpose to data than tabs seperating "columns" in a text file do.
  
  XML is good for transferring data between systems. It is not good for storing data, which is what databases are for, or presenting data, which is what applications are for.
  
  --
  From now on, I buy only Intel.
2. Re:XML/XSLT is often more work than it's worth by fm6 · 2006-03-15 06:17 · Score: 3, Insightful
  
  That's a very good analysis. I'm a strong XML/XSLT advocate, but only because I work with the kind of documents that need them: big nasty technical manuals and guides that have a lot of complicated structure, are always be updated, and have to be delivered in multiple formats. When someone challenges by XML dogma, they always point to some project they've worked on that would have been much harder if they'd had to use XML. Most of the time (not always!) they're right, usually because the particular project is a one-shot document that will see little or no revision. Of course, that just says that XML is useless to them.
  XML is a key technology, and much underused by my profession, which still relies too much on FrameMaker, Word, and (God help us!) plain old HTML. But it's not the solution to every content management problem.
3. Re:XML/XSLT is often more work than it's worth by markmcb · 2006-03-15 06:29 · Score: 3, Insightful
  
  XML is not for "storing data".
  
  Well, in the classroom you may be correct, but when you're looking for solutions, XML is often times a better place to store static data than a database. A perfect example is on OmniNerd, when one of our articles gets Slashdotted, or we think it's going to be, we bypass the database and create a static copy of our article in XML. It's faster since no "thought" is required to query specific data as it's all just there. The results have been that our server doesn't flinch when the massive wave of HTTP requests hit our site.
  
  I also use it to store data for parts of the site that remain static. Why insert my FAQ into my database if it's not structured in a dynamic manner? It's far easier for me to go edit an XML file than run a bunch of queries, and we already mentioned the removed burden from the database.
  
  Consider the alternative of storing it in an XHTML file. If I change the style of my site, then I have to update the XHTML file too as it's static. I can quickly translate the XML via XSLT with PHP, ASP, etc. There's no need to touch the data when I make a structural change. So given the static nature not requiring a database, the desire for easy updates, and the need to remove data from structure, I still choose XML.
  
  So, yes, from a purist perspective it's for describing data. But from the perspective of someone trying to run a functional and effective site, it can be useful for storing certain data as well.
  
  --
  Mark A. McBride -- OmniNerd.com
4. Re:XML/XSLT is often more work than it's worth by IMarvinTPA · 2006-03-15 06:33 · Score: 3, Informative
  
  Agreed.
  XML is designed to be an exchange format.
  Databases are designed to hold and maintain data.
  Applications are designed to present and modify the data.
  
  My database talks to my application which talks in XML to talk to your application to talk to your database.
  
  XML/XSLT is more work than it is worth because it is forcing the squre block into the round hole with a hammer. XSLT is for converting somebody else's XML into the XML your application wants to consume.
  If you have a limited number of users of your raw data, you can just as easily talk to them and give them a simple CSV or TSV file and move on with live.
  
  IMarv
  
  --
  Trusting software vendors is no smarter than trus
5. Re:XML/XSLT is often more work than it's worth by G)-(ostly · 2006-03-15 06:48 · Score: 5, Insightful
  
  You're not storing the data "in XML", you're storing it on the filesystem in files that describe the data via XML. The performance benefit of the static data over the RDBMS data store is provided by the filesystem, not as a function of XML. To the contrary, your retrieval of the data is actually hindered by the XML because it increases the size of the files that must be retreived and transferred.
  
  --
  From now on, I buy only Intel.
Malice by Savage-Rabbit · 2006-03-15 06:09 · Score: 4, Funny

XML stands for Xtremely Media-hyped Language and PHP stands for Perl-Hater's Platform. They are both very overused and should be ignored from this point on. Oh crap. I guess I get a free downmod for going against Slashdot culture. Oh well.

Dude, calm down! Hating Perl is not something developers do out of malice. It's a bit more like the obvious conclusion a child draws about fire after getting burned for the first time. Of course there are also some people, like you for example, who enjoy pain....

--
Only to idiots, are orders laws.
-- Henning von Tresckow
A compromise? by MasterC · 2006-03-15 06:11 · Score: 4, Insightful
Since I started using PHP's DOM functions, I haven't written a lick of hard coded HTML except for templates that I import into DOM. I create template tags within the template as hook points so on loading the template into DOM I can cache a list of all these template hooks (and remove them so the template is back to valid HTML) and then I can inject my dynamic content directly into where the hooks are.

Some quick advantages:
- You don't have to worry about closing your tags, just assigning parents
- You can modify your tree at any point in execution (such as style changes, removing sections of the page based on user input, etc.)
- Outputting HTML or XHTML doesn't change your DOM tree
- You can more easily write code with more separation between functionality (model) and interface (view)
- If an error occurs then you don't have to worry about the "headers already sent" issue
- You can easily create DOM manipulation libraries to do a lot of the tedious tasks for you (element creation, attribute population, etc.)
So even if you don't want to get into XML, XSLT, etc. then using the DOM for page generation is a much better solution than the traditional mixing HTML into PHP into files. The only qualifier to that I can think of is very small sites and when you don't have said libraries and such built up.

When else would hard coding HTML be preferred? I'm drawing a complete blank.
--
:wq
1. Re:A compromise? by Bogtha · 2006-03-15 06:22 · Score: 5, Insightful
  
  When else would hard coding HTML be preferred?
  
  The downside to using the DOM as you describe is that you need to generate the whole document before you start sending it. For example, imagine if Slashdot used your approach - on a page with hundreds of comments, you'd have to wait for every last comment to be added to the DOM before you even started to send the headline to the user.
  
  --
  Bogtha Bogtha Bogtha
Re:wut by Anonymous Coward · 2006-03-15 06:24 · Score: 3, Insightful

XML is absolutely not all it's hyped up to be.

That said, as any Lisp programmer will tell you, tree-structured data is a Good Thing(TM). There's a reason why reading in input like:
Mar 15 12:32:31 localhost dhclient: DHCPREQUEST on eth0 to 192.168.5.5 port 67
is complicated and fragile, whereas reading in input like:
(logentry (date (month Mar) (day 15) (time 12:32:31)) (host localhost) (sender dhclient) (message "DHCPREQUEST on eth0 to 192.168.5.5 port 67"))
is so trivial that, well, I just typed this into DrScheme:
(define logdata (read))
and copy-pasted the second one into the input box, and DrScheme understood it perfectly.

Regexps are basically a hack to deal with data, like the first log file (which is what it actually looks like on my system), where the structure has been compressed/eliminated. In a perfect world, everything would be tree-structured, and none of those hacks would be necessary.

But wait... that's XML!
<logentry><date><month>Mar</month> <day>15</day> <time>12:32:31</time></date> <host>localhost</host> <sender>dhclient</sender> <message>"DHCPREQUEST on eth0 to 192.168.5.5 port 67"</message></logentry>
It's harder to read than the parenthetical version, and slightly harder to parse (especially if there are attributes inside the XML tags), but the two are basically equivalent.

In Scheme, at least, you can build a generic XML-to-s-expression parser that will allow you to deal with any XML data that comes at you as easily as if it were parenthetical. And by generic, I mean that it can deal with any (well-formed) XML data ever. By contrast, regexps are fragile by definition. Even splitting along whitespace isn't always safe.

As far as PHP goes, I couldn't care less... it's both slower and less flexible than Scheme. What a combo! (Of course, Perl is too... ;)
Re:wut by Cereal+Box · 2006-03-15 06:34 · Score: 3, Informative

The beauty of XML is that the format is simple and there is a huge stack of technologies that build on it. If you store some data in an XML format, you instantly have the ability to transform you data into any number of formats (via arbitrarily-complex XSL transforms), perform automatic validation (via XML schema or a DTD), perform arbitrarily-complex queries on your data (via XPath/XQuery), automatically include other resources (XInclude), etc. Thanks to namespace support, you can aggregate multiple XML data formats into a single document -- an example of which is XHTML, which allows a single web page to include things like mathematical annotations (MathML), vector graphics (SVG), multimedia (SMIL), complicated input forms (XForms), and so on. Like so many other people, you just see XML as a substitute for comma separated value files, and don't realize the rich set of complicated functionality that's available to you "for free" just by storing your data in an XML format.

And BTW, XML is a tree format, not strictly key/value. And when you parse an XML file, you're never having to do direct text manipulation (which is error-prone). You're either receiving the information stored in the XML file as a series of events (SAX) or you're manipulating it via an object model (DOM).
Re:wut by Bogtha · 2006-03-15 06:57 · Score: 3, Informative

I think the parenthetical version and the XML version are about equal in terms of readability once you remember that any decent editor will have syntax highlighting to emphasise the text over the tags and that both versions will typically be split over multiple lines. Linebreaks don't really aid readability when you have short ending delimiters, but they do when you have longer ending delimiters.

The idea that XML is just a reinvention of s-expressions is quite popular, but this article does a decent job of explaining how they differ.

--
Bogtha Bogtha Bogtha
It's not just key/value pairs. by jkeegan · 2006-03-15 07:05 · Score: 3, Informative

No, it's not just key/value pairs. It's hierarchical. The hierarchy itself contains data - where things are located. You can't express rich hierachical data easily in a flat key/value layout.

In an XML file, I can throw in extra attributes or elements that won't be read by an old version of an app that wasn't looking for them. In a simple comma-separated-values layout, if I add something to the format, it's completely incompatible with previous versions.

The most complicated tools you have for comma-separated values are along the lines of cut and sed. When you have an XML document, you can convert it to *any* other XML format with a simple XSLT stylesheet (or, for that matter, into non-XML formats). SQL-Select-like statements can be represented with XPath, letting you select various fields of nodes which contain a certain attribute, act on the a certain way, etc.

Any anyway, would you look at an HTML document and say "it's just key-value pairs"? No! The order of elements, the hierarchy of data, etc, all makes up the page as a whole. HTML was an application of SGML, which XML was derived from.. Use XHTML if that last bit confuses you - it's not key/value pairs.

People have thrown the buzzwords at you because they're either really impressed with the technology, or because they're the kind of people that like buzzwords. Ignore the latter group of people, and try to focus on why those of us in the first group are singing its praises.

--

..Jeff Keegan
seven syllables explain TiVo: kee gan dot org slash ti vo
Joy and Sorrow by Tom · 2006-03-15 11:27 · Score: 4, Insightful

I can't imagine two languages less suited for mixing than PHP and XML.

PHP is losely typed, full of hacks (excellent hacks that make coding easier) and is great exactly because it allows the coder to be pretty careless and have the language look out for him as far as possible.

XML, on the other hand, is strict and harsh on the coder. Forgot to close a tag? Wrong character somewhere? Not got the tag order correct? Sorry, your entire tree fails parsing.

They just don't mix well, and it shows everywhere. I'm currently coding a PHP app using XML-RPC, and gosh is it convoluted. You've gotta cast practically everything into the special XML-RPC values and back out again. You'd expect the libraries to have functions doing that for you, but you'd be mistaken. On the average line stuffing together an XML-RPC call, the whole "new XML_RPC_VALUE" stuff takes up twice the space of the actual variables.

Doesn't mix well. Sorry, I like PHP a lot and XML is an excellent thing. But they just don't mix well.

--
Assorted stuff I do sometimes: Lemuria.org