slamb.org · Domains · Slashdot Mirror

Re:Until they bother fixing critical bugs... by slamb · 2008-06-07 02:21 · Score: 1 · on Firefox 3 Hits Release Candidate 2

I wish to download your proxy server, and subscribe to your newsletter.

All right. It's a work in progress (no keepalives yet; requires libevent trunk + a few not-yet-merged patches), but I'm willing to share what I have. I've traveling now, but I'll put something on my website Monday night. I'll call it FoxIgniter, unless someone has a better name.

Re:Virtualisation negates the need for a compile f by slamb · 2007-03-11 05:28 · Score: 1 · on Alternatives To SF.net's CompileFarm?

Virtualization is great, but it's not perfect solution here:

Installing a dozen operating systems is a lot of work. It was nice to be able to take advantage of work someone else has already done.
If you use continuous integration tools like buildbot to test after every checkin, it's best to leave the systems running all the time. I don't have enough RAM to have a dozen operating systems running on my machine at once. VMware at least has some ability to be started and stopped programmatically, but that's more work and is obviously slower.
Virtualization tools don't target other architectures, and emulation tools (like QEMU) are generally slow. Not everything is x86.

I used to use SourceForge's Compile Farm (in addition to HP's Test Drive) to test sigsafe. I need to write assembly for the cross-product of supported processors and operating systems. Without the ability to log in, compile, run my automated tests, and use a debugger, I can't support a platform. This decision means I'll have to drop sparc support. It's a shame - I learned a lot from writing assembly for these different platforms.

Re:Perforce? by slamb · 2006-12-01 12:18 · Score: 2, Informative · on Getting a Grip on Google Code

The write up of what I do is here: http://www.mcternan.co.uk/PerforceBackup/

Interesting! I'll have to look it over more later.

For comparison, I've put the latest (not yet deployed) version of our offline checkpoint process here. (It's a NetVault backup script; pre locks and does the checkpoint, post touches a file signalling success to our monitoring and releases the lock). It's a procedure outlined by Perforce, though they didn't mention error handling...

Re:Sure, blame the "untrained" developers.... by slamb · 2006-10-05 05:45 · Score: 1 · on How Prevalent Are SQL Injection Vulnerabilities?

In short, trusting the client (i.e. the web browser) to not send bad values - either through the INPUT tag's maxlength attribute, JavaScript scrubbing or whatever - is entirely the wrong way to go. The web script must check all user input for validity along with properly escaping everything from the database that's getting sent back via HTML.

It's not just "bad" values, unless you believe that all people with names like O'Malley are out to destroy your website. SQL Injection is a very simple problem: people are confusing arbitrary data with fragments of SQL. The former should be passed in through bind variables - or escaped if you're hellbent on destroying your performance - while the latter should be executed directly. It's fortunate for the O'Malleys out there that this mistake is a huge security hole, or they would be rejected by any kind of automated system. People barely care about security; they certainly don't care about "lesser" bugs.

I often see people claim you just check for bad characters - either specific lists of punctuation or anything outside the alphanumeric range - because you can't possibly get all of your database code right. That's an incredibly wrong idea:

You can and must get all of your database code right. I developed Axamol SQL Library for this purpose. Other solutions vary.
The "nothing but alphanumeric" crowd's code doesn't work with Unicode. So José can't type his name in properly.

Re:Solutions in search of a problem by slamb · 2005-04-27 12:41 · Score: 1 · on What to Expect from Linux 2.6.12

kqueue lets me know, when the file grows. For example, tail(1) on FreeBSD uses it (with -f and -F switches). How would you do that with select/poll?

Switch to using FAM (File Activity Monitor) on both systems. It's a daemon implemented on top of kqueue (on *BSD) or dnotify/inotify (on Linux). It talks to instances on other machines to work properly across network filesystems. It also abstracts the underlying API for you. It just gives you a file handle to plug into your monitoring loop, and an API to call when it has input. No need to care if it's using kqueue, dnotify/inotify, or (in the worst-case scenario) a timed loop on the backend.

There's your practical answer. If you'd like to know why you can't directly see when a file handle grows with select/poll, read on.

The thing is that regular files and socket/pipes/character devices are treated in a fundamentally different way on Unix systems. Sockets have nonblocking IO and select/poll. Regular files have much less support - a separate, nasty async API on some systems, and inotify/dnotify on Linux/kqueue on BSD for notifications. (Something on IRIX...they built fam there, after all.) This doesn't make a lot of sense and people like DJB have argued that this doesn't have to be, but...well, that's still how it is.

kqueue is no exception. They've grouped a number of things into the same system call and made it more convenient to safely wait for several types of events at the same time, but you still can't treat them in the same way. On Linux the equivalent of kqueue is accomplished through:

epoll
dnotify/inotify signal handlers
ptrace? I don't remember how you watch processes off-hand.
more signals for async IO

The biggest pain there is handling signals and epoll stuff simultaneously in a correct manner. If you need to, I urge you to check out the documentation for my sigsafe library. It describes some things not to do plus a couple good ways: the self-pipe trick (a popular way if you're using select/poll/epoll) and my own sigsafe_* signal call wrappers.

Re:While it would be nice... by slamb · 2005-04-23 08:50 · Score: 1 · on C++ Creator Confident About Its Future

out of interest, what's wrong with std::string?

It doesn't support Unicode well. Specifically,

It doesn't support variable-length character encodings like UTF-8 or UTF-16.
wchar_t is not a standard size. (It may be two or four bytes, depending on your platform.) So you need to define your own ucs4_t specialization to portably hold wide characters.
There's no decent transcoder in the standard. boost has utf8_codecvt_facet, but currently it's for internal use only. (With a class like this, you could use std::string with above ucs4_t and still do UTF-8 I/O.)

The best workaround I've found is to require Glibmm. It's a huge dependency, but it has a UTF-8 string class and appropriate conversion functions.

But fuck workarounds. I'd rather just code in a language with a standardized string class that doesn't suck. This sort of thing is why my largest C++ project is stagnating while I write code in other languages.

Re:I want a real RDBMS by slamb · 2005-04-22 13:29 · Score: 1 · on E-mail As the New Database

I'd love it even more if my email server was actually a true RDBMS where I could have, besides the traditional IMAP interface, a D (Tutorial D or D4 or something the like) language interface where I could query at will, and save my queries as views that would show up in IMAP as (virtual) folders.

IMAP's closer than you think. If you don't think SQL is relational, you certainly won't think IMAP is, but you can do more than most MUAs support. You can save arbitrary tags on email messages. You can execute surprisingly-sophisticated queries. I recently wrote some crude Jython scripts that use the JavaMail API to do queries like this:

to_me = OrTerm([HeaderTerm(header, "slamb@slamb.org") for header in ["To","Cc","Bcc"]]) to_list = OrTerm([HeaderTerm(header, "") for header in ["List-Post","List-Id","List-Archive"]]) msgs = sourceFolder.search(AndTerm(NotTerm(utils.to_me), utils.to_list))

(sorry about the indentation; ecode apparently doesn't like it.)

You have to read RFC 2060 to know all that IMAP can do.

Re:No please... by slamb · 2004-04-28 07:34 · Score: 4, Informative · on Struts Survival Guide

"Struts is such a big over-engineered pile of shit."

Compared to what? Pure JSP? Maybe if you have like two pages total, but if you have any more than that you'll discover that:

You've got huge chunks of Java code in your JSP that make little sense there. You've got two conceptually separate things there - the actions your code is taking and how it's presented. JSP makes sense for the presentation, but not for the real work.
You're either making a lot of redundant pages or redirecting back and forth in weird ways.
You're having to do a lot of work to keep passing users' data back to them in a HTML form when you have a validation error. I.e., when the user fills out a huge form and has an error halfway through. Most of the values need to be defaulted to their previous ones.

If you set out to solve these problems, you'll inevitably end up at struts. It may do other things (don't know; I haven't ventured in that deeply), but it does these in about as simple a way as anyone could.

I challenge you to find any significant amount of redundant code in a project of mine that uses struts: mb. Description here, browse the code here. There's not a lot of code there, and struts is largely responsible.

Internally, struts may be hugely overengineered...but I, as a user, don't care. It helps me keep my applications more terse and well-organized. (Much more maintainable than what I wrote before.)

Re:No please... by slamb · 2004-04-28 07:34 · Score: 4, Informative · on Struts Survival Guide

"Struts is such a big over-engineered pile of shit."

Compared to what? Pure JSP? Maybe if you have like two pages total, but if you have any more than that you'll discover that:

You've got huge chunks of Java code in your JSP that make little sense there. You've got two conceptually separate things there - the actions your code is taking and how it's presented. JSP makes sense for the presentation, but not for the real work.
You're either making a lot of redundant pages or redirecting back and forth in weird ways.
You're having to do a lot of work to keep passing users' data back to them in a HTML form when you have a validation error. I.e., when the user fills out a huge form and has an error halfway through. Most of the values need to be defaulted to their previous ones.

If you set out to solve these problems, you'll inevitably end up at struts. It may do other things (don't know; I haven't ventured in that deeply), but it does these in about as simple a way as anyone could.

I challenge you to find any significant amount of redundant code in a project of mine that uses struts: mb. Description here, browse the code here. There's not a lot of code there, and struts is largely responsible.

Internally, struts may be hugely overengineered...but I, as a user, don't care. It helps me keep my applications more terse and well-organized. (Much more maintainable than what I wrote before.)

A new approach is needed by slamb · 2004-04-27 04:59 · Score: 3, Informative · on PHP and SQL Security

Most people are attempting to solve cross-site scripting and SQL injection vulnerabilities (the #4 and #6 causes of web security problems, according to this article) through brute force. Everywhere they use these, they use an escaped version. But this approach doesn't work! For several reasons:

it's hard to notice when something is not there.
people tend to push these farther and farther away from the actual usage, so they get confused about what has been escaped. It's hard to maintain clear contracts between functions about something like this.
even if you're diligent when writing the initial code, it's easy to slip when applying patches

So I think a new approach is needed. One where you don't mix instructions and data so easily, or flag them more readily.

With SQL, this has been around for a while: bind variables. Your SQL queries tend to be static with ? thrown in (or :foo for named bind variables). In Perl, it looks like:

my $sth = $dbh->prepare('select * from mytable where foo = ?'); $sth->execute($foo);

Not everyone is using bind variables, and I don't know why. One reason may be that positional bind variables can be confusing: they require you to correlate two lists in your head to position the correct variables in the correct spots. Not all language/database combos support named bind variables. (JDBC doesn't!) But they can be emulated - that's one reason I made xmldb.

For HTML, it's more rare to find something that does this. Apache Cocoon does, but it's grotesquely complex. I'm working on a simpler system, though it's not ready for production. Here's the idea: my files (XFP) are to a SAX ContentHandler as JSP is to a byte stream.

I like SAX because it's a way of making XML that does things right. Instead of doing something like:

out.println("<elem a=\"" + foo + "\" b="blah">Blah: " + bar + "</elem>");

you write something like:

AttributesImpl attribs = new AttributesImpl(); a.add("a", foo); a.add("b", "blah"); out.startElement("elem", a); out.characters("Blah: " + bar); out.endElement("elem");

it's nice in that you don't do any of the escaping yourself - you just tell it how you're using each string, so it can do the escaping right. But that's six ugly lines instead of one, and it's worse with real SAX because you need extra arguments for namespaces and things. So I looked at JSP. It sticks Java code inside the text to produce. I stick Java code inside the XML to produce. I write something like this:

<elem b="blah"> <xfp:attribute name="a">foo</xfp:attribute> Blah: <xfp:expr>bar</xfp:expr> </elem>

...and it turns it into the code above when it makes a .java file. It still knows how to escape things from context. And whenever you stick in literal text, you can write it just like you'd normally write XML - less long-winded. I might change it to this:

<elem a="{foo}" b="blah">Blah: {bar}</elem>

which is shorter still.

My code is all Java. But the concepts should apply to PHP, Perl, Python, anything.

Anyone else working on a system to solve this problem? I'd be interested to share ideas.

A new approach is needed by slamb · 2004-04-27 04:59 · Score: 3, Informative · on PHP and SQL Security

Most people are attempting to solve cross-site scripting and SQL injection vulnerabilities (the #4 and #6 causes of web security problems, according to this article) through brute force. Everywhere they use these, they use an escaped version. But this approach doesn't work! For several reasons:

it's hard to notice when something is not there.
people tend to push these farther and farther away from the actual usage, so they get confused about what has been escaped. It's hard to maintain clear contracts between functions about something like this.
even if you're diligent when writing the initial code, it's easy to slip when applying patches

So I think a new approach is needed. One where you don't mix instructions and data so easily, or flag them more readily.

With SQL, this has been around for a while: bind variables. Your SQL queries tend to be static with ? thrown in (or :foo for named bind variables). In Perl, it looks like:

my $sth = $dbh->prepare('select * from mytable where foo = ?'); $sth->execute($foo);

Not everyone is using bind variables, and I don't know why. One reason may be that positional bind variables can be confusing: they require you to correlate two lists in your head to position the correct variables in the correct spots. Not all language/database combos support named bind variables. (JDBC doesn't!) But they can be emulated - that's one reason I made xmldb.

For HTML, it's more rare to find something that does this. Apache Cocoon does, but it's grotesquely complex. I'm working on a simpler system, though it's not ready for production. Here's the idea: my files (XFP) are to a SAX ContentHandler as JSP is to a byte stream.

I like SAX because it's a way of making XML that does things right. Instead of doing something like:

out.println("<elem a=\"" + foo + "\" b="blah">Blah: " + bar + "</elem>");

you write something like:

AttributesImpl attribs = new AttributesImpl(); a.add("a", foo); a.add("b", "blah"); out.startElement("elem", a); out.characters("Blah: " + bar); out.endElement("elem");

it's nice in that you don't do any of the escaping yourself - you just tell it how you're using each string, so it can do the escaping right. But that's six ugly lines instead of one, and it's worse with real SAX because you need extra arguments for namespaces and things. So I looked at JSP. It sticks Java code inside the text to produce. I stick Java code inside the XML to produce. I write something like this:

<elem b="blah"> <xfp:attribute name="a">foo</xfp:attribute> Blah: <xfp:expr>bar</xfp:expr> </elem>

...and it turns it into the code above when it makes a .java file. It still knows how to escape things from context. And whenever you stick in literal text, you can write it just like you'd normally write XML - less long-winded. I might change it to this:

<elem a="{foo}" b="blah">Blah: {bar}</elem>

which is shorter still.

My code is all Java. But the concepts should apply to PHP, Perl, Python, anything.

Anyone else working on a system to solve this problem? I'd be interested to share ideas.

Re:Dropping multiple inheritance ? by slamb · 2004-04-19 04:03 · Score: 3, Insightful · on C, Objective-C, C++... D! Future Or failure?

I've never found a situation where I actually required the additional functionality that multiple inheritance allows and coudn't be done better with just interfaces.

How about StreamSocket. Okay, multiple inheritance isn't required in the strictest sense, but object-oriented programming isn't, either. MI makes this class make much more sense - it is both a stream and a socket.

In a language providing only support for multiple interfaces, you'd have to reimplement at least one of those in the derived class. You'd probably end up just dispatching all of the calls in the derived class to a shared implementation elsewhere. Not nearly as clean.

Or you could pull a Java and have a getStream() method on the StreamSocket. (Make the caller do the dispatching to the shared implementation.) I don't like it either.

Plus, if you were gonna copy multiple inheritance from c++ you'd need to copy all those nasty casting operators.

I don't see how eliminating MI makes any of them unnecessary:

static_cast<> - still useful. I like saying "make it an error at compile-time if this can be false". Catching errors earlier = more goodness. C didn't have this, but C didn't ever have a way of knowing one structure could be cast to another safely, since it lacked OOness (inheritance, specifically). Also better performance than a dynamic cast.
dynamic_cast<> - yup, still useful. A little simplified (it would always return the same actual address or NULL). This is basically what Java's cast is.
const_cast<> - nothing to do with MI. (Nonexistant in Java because Java doesn't have constness at all.)
reinterpret_cast<> - nothing to do with MI. Necessary for backwards-compatibility with C stuff.

Re:Why is there only one database access language? by slamb · 2004-03-26 13:12 · Score: 1 · on Prothon - A New Prototype-based Language

Requiring tuples' values to have (valid identifier) names would take SQL even further away from relational algebra, and who knows what other changes in the language would be required to ensure that every value has a name.

What I proposed was solely a syntactic change. Every column in the table already has to have a valid identifier. My modified example just has the column name and associated value closer together; they were both there in the original case. (I'm talking about the insert into mytable (foo, bar, baz) values (:foo, :bar, :baz) case here.)

I would not take away the syntax in which you do not need to specify the column names at all (insert into mytable values (:foo, :bar, ...)). I never use it myself (for similar reasons), but I don't see a need to eliminate it.

VALUES is a table constructor; interleaving identifiers and literals only makes sense if you're inserting a single record.

No, it's not. insert values (...) and insert (...) values (...) always insert a single record. values does not occur anywhere else in SQL, AFAIK.

DBI's encouragement to use dynamic SQL to prepare the same statement over and over is dumb. Perl should support embedded SQL (which uses the :name syntax to access any language's local variables) instead.

I don't like the embedded syntaxes. I prefer an external query library, as provided by my xmldb project. SQL is different enough that I think it should not be lumped in the same file, for maintenance reasons if nothing else.

xmldb also provides a form of named parameters, internally using the ? placeholders. I'd like the JDBC people and all the vendors to support the named syntax, but this way I can use something similar right away.

Re your parameterized view, it seems like all you need is ...

Check your query again. It would return (for each group) all records if and only if they are all on or before :some_date. I'm looking for one that (for each group) returns the latest record on or before :some_date.

Views only exist for access control or optimization of frequent queries.

Those are good reasons, but convenience is important, too.

Can a parameterized view do anything that a subquery on a normal view doesn't?

Not sure. Maybe not.

Re:Hungarian Notation by slamb · 2004-03-21 06:24 · Score: 1 · on Why Programming Still Stinks

Of course, if everyone were using an intelligent editor, then moronic conventions like Hungarian wouldn't be necessary, because your editor could instantly tell you the type of any variable and perhaps display it in small letters above the variable name at all times.

That's what I mean by annotations. Are you saying that you know of an existing editor that does this? Hmm. I'm still using vim. Maybe I need to look around at other editors again. I'd like one that:

is relatively lightweight. I like to go to the commandline and type "vim blah" and have it instantly pop up. I don't like the emacs attitude that you do everything within the editor.
Can decouple what I see, type, and store in interesting ways. (Which requires knowledge of the languages I edit: C++, Java, Python, XML (XSLT, XHTML), SQL, etc.) Ways like syntax highlighting, smart indentation, folding, completion, annotations, etc.
has some economy of movement like vim. I love that it doesn't require me to move between the keyboard and mouse often, and that it doesn't require me to contort my fingers in bizarre ways (emacs)
supports collaboration features (like SubEthaEdit).
is configurable: I can tell it style conventions for various projects, templates for new files in those projects, etc.

I've yet to find anything that does all of that. vim is the closest I've come.

Re:Kernel development interests me terribly by slamb · 2004-02-19 05:52 · Score: 4, Interesting · on Behind the Scenes in Kernel Development

I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.

Very recently, I've been writing some low-level code. There was a long while I'd thought this was out of my league. Then I realized several things:

I was not happy with several characteristics of the low-level code other people had written and I was depending on.
I had done some more low-level stuff long ago - like a couple simple but legitimately useful assembly programs in DOS, and even a patch that added a sort of capability system to the OpenBSD kernel. (I never polished up the patch enough to send it in to them or anything, but the point is that it essentially worked, and I wasn't afraid to take it on.)
When I'd done those things back in the day, I wasn't anywhere near as good a coder as I am now.
The only reason I'd been unable to do these things more recently is an attitude that I'm not good enough, not a reality. (It's an attitude a lot of people in low-level code promote, I think. They so much don't want to waste their time with people who really are bad that they probably don't mind scaring off a few people who are in fact good but don't realize it. Also, I think there's ego involved - it's an exclusive club, why not let it stay that way.)

So I think the moral of the story is to just be fearless/persistent. If you're not confident, there are plenty of ways you can improve without even involving anyone else:

Read the code. It sounds obvious, but there's a lot of code I'd stayed away from even looking at because of intimidation.
Try experiments. Make a change, set a hypothesis about what it will do, and run it. Then see why you were wrong, if you were. Then try it again. Even just getting in the habit of running the build system will help, and setting up experiments like this will help your debugging.
Find something lacking and try to fix it.

And then, if you're still not comfortable talking on the linux-kernel list, I think you have at least another couple choices:

If you're lucky, you're friendly with someone more skilled and can use him/her to screen questions.
There's a couple lists like kernel-janitors and kernel-newbies to dip your feet in the water.
Sometimes in the process of writing an eloquent question through email you'll figure out the answer yourself. (Did you see the teddy bear anecdote in the debugging link above?)

As for myself, I'm taking my own advice to make sigsafe - an alternate set of system call wrappers (libc level) that eliminate a couple race conditions involving signals, without a performance penalty. It's going well - the code works, and I have a race condition checker and microbenchmark to prove it. I just released my first version. Now I'm working on the documentation; it still needs a lot of work. (I could use plenty of help with this project! If you want to try low-level programming, it's a great way. It requires writing assembly for each combination of operating system and architecture. I've only written it for two systems. There are plenty left, and public systems to do it on if you don't have access to exotic machines of your own. Plus, you can hopefully gain some low-level understanding by proof-reading and helping me write the documentation.)

Once I have that polished, I've got a couple projects I might try in the Linux kernel (and/or other kernels):

implementing a couple of system calls - the nonblocking_read(2) and nonblocking_write(2) that djb mentions.
implementing SO_RCVTIMEO and SO_SNDTIMEO under Linux. Assuming no one has yet; I haven't checked, so the manpage could just be out of date. Which brings m

Fedora has some great technology by slamb · 2003-11-12 13:19 · Score: 2, Interesting · on OSNews Rates Fedora Core 1 Mild Disappointment

...whatever supposed usability problems Fedora has, there's some great new technology behind it.

For example: they've got a new and shiny version of the glibc & NPTL. This threading support is worlds better than anything I've seen in other distributions or most other operating systems. I wrote a small test for C++-safe thread cancellation support. It failed on pretty much every system I tried. Only Fedora Core 1 and Tru64 passed. This is a behavior more hinted at than mandated by the pthread standard at this point, but realistically, no one would ever use thread cancellation in a C++ program if it didn't work the way it does in Fedora.

There are lots of architectural improvements like that always thrown into a new RedHat release, and I think Fedora will be no different. It leads to their problems with x.0 releases, but I think it's worth it.

In my mind, Fedora Core 1 is RedHat 10 - the name + the community. It even upgraded from my RedHat 9 installation. That's a dead give-away.

Re:Lack of understand of how PHP works? by slamb · 2002-10-29 14:04 · Score: 2 · on Yahoo Moving to PHP

Smarty loops over an array of hashes. [...] The downside is that your templates sometimes end up looking like an uglier version of some ColdFusion page written by a drunken 6 year old.

Actually, I can think of a couple other significant downsides: the information for the entire page must be gathered in memory before you can begin to send it out. A big part of perceived performance is latency, so that's bad. And the memory usage could also be bad. I wouldn't want to use your model on really big pages for those reasons.

I'm working with Jakarta Struts and something roughly like the "Model 2X" mentioned a couple places on the web. It sounds similar to what you're doing, despite a complete different programming language: in my Actions, I do all updates and stuff that can totally change the layout of the page. The results of it (JavaBeans tied to the page, much like your arrays of hashes) are sent to a servlet of mine. The servlet produces SAX events from those and pumps them through XSLT, an incredibly flexible (and unfortunately sometimes overly wordy) template language.

And here's where it differs: my servlet can produce SAX events from arbitrary queries as well. With an incremental XSLT engine (Xalan), it can process a single row and then discard the information; it's no longer needed. I don't think that's quite the structure the Jakarta people envisioned and it does have at least one downside. Error handling of those can't be as smooth since it has already sent out a good deal of the page. But it decreases latency and memory usage, and I think it's worth it. The queries I put there are not likely to error out and an unlikely fugly page is an acceptable compromise.

The whole setup is very much a work in progress. I've got a messageboard currently implemented with one giant, ugly servlet that produces SAX output and sends it through some really nice XSLT [1]. This is my way of keeping the XSLT (which I was proud of) and get rid of the single giant ugly servlet (which I was definitely not).

Plus, using XSLT has some other big advantages:

correct escaping; no cross-site scripting attacks. With most ways of producing HTML, you have to explicitly quote stuff. [2] Here, it's different: you have to explicitly treat stuff as XML fragments for that to happen. That way is much better from a security perspective.
it's more standard. Lots of people know it.
it's impossible to produce non-well-formed XML, short of disabling output escaping (which I don't). This is because of the first reason and because an XSLT template is itself an XML document which must be well-formed to run.
templates are much more reusable. I've got a standard resultset schema I use for SQL resultsets and a template that produces a nice, shiny stock table from it with greybarring and everything. I just need to override it for the exceptions, so it's much less redundant. A big part of why I'm so proud of my XSLT.

[1] - feel free to check out the xslt if you like. It, the database schema, and the database queries (processed through xslt from the raw xml) are what I'm proud of. The rest of the architecture I've described is being written right now, really.

[2] - I'm not sure if that's true for Smarty. It's certainly true for HTML::Template in Perl, which my first version of this software used. I was tired of having to specify escaping everywhere to be correct.

Re:Lack of understand of how PHP works? by slamb · 2002-10-29 14:04 · Score: 2 · on Yahoo Moving to PHP

Smarty loops over an array of hashes. [...] The downside is that your templates sometimes end up looking like an uglier version of some ColdFusion page written by a drunken 6 year old.

Actually, I can think of a couple other significant downsides: the information for the entire page must be gathered in memory before you can begin to send it out. A big part of perceived performance is latency, so that's bad. And the memory usage could also be bad. I wouldn't want to use your model on really big pages for those reasons.

I'm working with Jakarta Struts and something roughly like the "Model 2X" mentioned a couple places on the web. It sounds similar to what you're doing, despite a complete different programming language: in my Actions, I do all updates and stuff that can totally change the layout of the page. The results of it (JavaBeans tied to the page, much like your arrays of hashes) are sent to a servlet of mine. The servlet produces SAX events from those and pumps them through XSLT, an incredibly flexible (and unfortunately sometimes overly wordy) template language.

And here's where it differs: my servlet can produce SAX events from arbitrary queries as well. With an incremental XSLT engine (Xalan), it can process a single row and then discard the information; it's no longer needed. I don't think that's quite the structure the Jakarta people envisioned and it does have at least one downside. Error handling of those can't be as smooth since it has already sent out a good deal of the page. But it decreases latency and memory usage, and I think it's worth it. The queries I put there are not likely to error out and an unlikely fugly page is an acceptable compromise.

The whole setup is very much a work in progress. I've got a messageboard currently implemented with one giant, ugly servlet that produces SAX output and sends it through some really nice XSLT [1]. This is my way of keeping the XSLT (which I was proud of) and get rid of the single giant ugly servlet (which I was definitely not).

Plus, using XSLT has some other big advantages:

correct escaping; no cross-site scripting attacks. With most ways of producing HTML, you have to explicitly quote stuff. [2] Here, it's different: you have to explicitly treat stuff as XML fragments for that to happen. That way is much better from a security perspective.
it's more standard. Lots of people know it.
it's impossible to produce non-well-formed XML, short of disabling output escaping (which I don't). This is because of the first reason and because an XSLT template is itself an XML document which must be well-formed to run.
templates are much more reusable. I've got a standard resultset schema I use for SQL resultsets and a template that produces a nice, shiny stock table from it with greybarring and everything. I just need to override it for the exceptions, so it's much less redundant. A big part of why I'm so proud of my XSLT.