What's in Your HTML Toolbox?

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Sunday September 3, 2006 @03:54PM from the utilities-of-the-trade dept.

Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"

31 of 192 comments (clear)

What's in your... by AKAImBatman · 2006-09-03 15:56 · Score: 3, Funny

So what's in YOUR toolbox?

CAPITAL ONE!

[...]

Wait, what was the question again?

--
Javascript + Nintendo DSi = DSiCade
Perl by hahafaha · 2006-09-03 15:58 · Score: 2, Informative

I know many of the geeks out there have forsaken Perl, but it is still, in my opinion, an indisposable tool. I am currently fixing up a website similar to the one you described, especially in terms of the HTML problems. Write a Perl script to fix capitalization, closing of tags, etc. But understand that if code is not written well to begin with, than in many cases, it is impossible to automate the process of fixing it. You are going to have to do some things by hand.

Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.
Tidy or Meyer by hedronist · 2006-09-03 16:01 · Score: 4, Informative

There are two approaches: live with it and make as few changes as possible, or bite the bullet and do a complete rebuild. To do a cleanup, checkout tidy - it does a good analysis of the existing pages and can generate CSS that is OK, but not beautiful. If you want the final pages to look the same, but be standards compliant, see meyerweb.com and read his books on rebuilding pages ("Eric Meyer On CSS" and "More Eric Meyer on CSS"). Pragmatic is his keyword: lots of examples and he makes sense.

Good luck. You're going to need it.
HTML Tidy by d3ik · 2006-09-03 16:01 · Score: 3, Informative

Been there, try this
1. Re:HTML Tidy by Anonymous Coward · 2006-09-03 17:05 · Score: 3, Informative
  HTML Tidy
  
  A text editor (I like vim and gedit)
  
  The GIMP (image editor)
  
  ImageMagick
  
  W3C validator
  
  Various docs and tips such as Checklist of Checkpoints for Web Content Accessibility Guidelines from W3C
  
  CSS Cheat Sheet
Re:FTW by mr_stinky_britches · 2006-09-03 16:03 · Score: 2, Informative

You can also do batch file processing with vim by using the following commands: vim *.match.files.* then once in vim: :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

--
Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
HTMLKit for Windows by SocialEngineer · 2006-09-03 16:07 · Score: 4, Interesting

HTMLKit has a lot of great options for developers, and a good plugin system.

--
"Better to be vulgar than non-existent" -Bev Henson
Obligatory by hahafaha · 2006-09-03 16:07 · Score: 2, Informative

> :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

Or, in emacs

M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing] :-)

P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
1. Re:Obligatory by mr_stinky_britches · 2006-09-03 16:15 · Score: 2, Informative
  
  Or, in emacs
  M-% (AKA Meta(usually Alt)-Shift-5)
  Query Replace: ^M with [nothing] :-)
  
  Question for you: how would you do that across multiple files in emacs?
  The global search and replace command using vim on a single file would be simply:
  %s/^m//g
  
  --
  Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
2. Re:Obligatory by maxwell+demon · 2006-09-03 21:38 · Score: 3, Informative
  
  I'd rather use recode. After all, there might be other Windows specific stuff in there, like replacement of certain ISO-8859-1 high-bit control characters by graphic characters. With recode, those can be handleded as well (ideally convert it directly to Unicode).
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
3. Re:Obligatory by ianezz · 2006-09-04 05:48 · Score: 2, Informative
  
  Question for you: how would you do that across multiple files in emacs?
  Use dired on the directory where files are located (i.e. M-x dired), mark the files you are interested in (with 'm'), then use 'Q' (uppercase) to perform what basically is a query-replace-regexp on all the marked files (actually it is dired-do-query-replace-regexp).
  See the GNU Emacs dired documentation for further details.
MY toolbox... by grammar+fascist · 2006-09-03 16:09 · Score: 3, Funny

My toolbox has a little white pill that I take every time I get a hankering to work with HTML. It fixes me up right quick.

--
I got my Linux laptop at System76.
Creating white space by M0b1u5 · 2006-09-03 16:10 · Score: 3, Interesting

The disaster that was "s.gif" (or "trans.gif" in some circles) used as a layout tool was horribly over-used - and the 'net is a worse place because of it. In most projects now, I seek to replace all instances with a "compatible" approach.

I create a class: .spacer{
line-height:0;
font-size:0;
}

Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:

I use a span sometimes, as required - if the DIVs alone cause layout issues.

Say hello to faster web pages instantly!

--
How many escape pods are there? "NONE,SIR!" You counted them? "TWICE, SIR!"
Re:Creating white space - apologies by M0b1u5 · 2006-09-03 16:12 · Score: 4, Informative

Oops Sorry! <div class="spacer" style="width:Xpx; height:Ypx;"></div>

--
How many escape pods are there? "NONE,SIR!" You counted them? "TWICE, SIR!"
Re:Why use static HTML? by Phroggy · 2006-09-03 16:26 · Score: 2, Interesting

in that situation, server side includes are just as useful, but faster and more secure.

If what you need is very simple (including footers would count as simple), here's more information about server side includes (SSI). Either rename your pages .shtml, or keep the .html name but set the files as executable (chmod a+x *.html) using the XBitHack.

If you want something more complex, you can use SSI to include a mini-CGI script into the middle of your HTML. CGI scripts can be written in any language, even a shell script:

#!/bin/sh
echo Content-type: text/html
echo
echo (insert HTML here)

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Firefox with plugins by bhav2007 · 2006-09-03 16:29 · Score: 3, Interesting

Firefox with the IE Tab (or IE View), Web Developer, View Formatted Source, and HTML Validator extensions.
Actually... Frontpage by Planesdragon · 2006-09-03 16:42 · Score: 2, Funny
Actually... Frontpage.

No, really, stop laughing.

Frontpage, once you convince it to stop the WYSIWTG crap, has three tools that will make fixing a non-technical user's webpage easy. (Never, ever, let a non-technical user use Frontpage without supervision. It's worse than Word.)
1. "Site Management", where you can let Frontpage check for dead files, orphan files, broken links, and do mass re-names of all HTML-based links. (No script correction here, but non-techies don't do that.)
2. Regular Expresions (or a workable subset thereof)
3. VBA, to invoke things like "optimize HTML" and "standardize name"
I'd be shocked if there aren't better tools out there -- but by and large either they don't do as much, or they cost a significant chunk of change.

(Hey, you, with the laughing -- point me to a app that can do #1 with compatible replacements for #2 and #3, and, er, you'll get good karma for being so mean and laughing.)
Two tidbits by ptaff · 2006-09-03 17:03 · Score: 2, Interesting

Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.

Separating grain from chaff

A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?

Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.

Be sure your filesystem is configured to register access times if you pick the first method...

(As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)

Lynx / Links / ELinks

Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.
Re:FTW by Baricom · 2006-09-03 17:20 · Score: 2, Interesting

Agreed. I dismissed people who kept suggesting vim as "crazy UNIX people." I still felt that way about a week into playing with it, but soon after, I realized how powerful it is once you've figured out how the keystrokes work. Since then, I've used vim on every computer I've worked with and gvim (the GUI-enhanced version of vim) is my primary editor on my Windows box.

vim has excellent syntax highlighting, predictive typing, line numbers, search and replace (with regular expressions), code folding, spell-check, built-in help, and more.

Give yourself two weeks with an open-mind, and you might be surprised about it. The easiest way to get started is to type vimtutor from almost any shell account.
Re:Oh, the usual by reanjr · 2006-09-03 17:22 · Score: 2, Informative

Nope.

br is not now br /, one must simply write well-formed documents. Well-formed HTML (with all tags closed) also uses br /.
em and strong are still alive and well as of XHTML 2.0.
b and i are still available in XHTML 1.0.
There is no HTML 4.1. Presumably you meant 4.01 strict, which is pretty much XHTML 1.0 Strict.
Cheat with PHP by GloomE · 2006-09-03 17:49 · Score: 2, Informative

$doc = new DOMDocument(); $doc->loadHTML($junky_html); echo $doc->saveHTML();
Reads in your crappy HTML, turns it into compliant XML, then dumps it out as nice clean HTML.
tidy, web developer FF extension, search & rep by Tumbleweed · 2006-09-03 17:51 · Score: 3, Informative

Tidy, as others have already mentioned, will be your very best new friend.

Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.

Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").

If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).

And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.

Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.

Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.

Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).

Oh, and learn to intent your freaking HTML!

Some nice resources:

Activating the Right Layout Mode Using the Doctype Declaration

Quirksmode - a GREAT resource. Awesome info here. Memorize it.
Re:Creating white space - apologies by masklinn · 2006-09-03 18:33 · Score: 4, Informative

This is worse than image spacer, please go die in a fire

--
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
Re:First, you better learn HTML before complaining by Bazman · 2006-09-03 18:54 · Score: 2, Insightful

The great thing about web standards is... there's so many of them!
Re:Creating white space - apologies by Anonymous Coward · 2006-09-03 19:13 · Score: 4, Informative

Or you could just use the padding / margin features provided by CSS.

margin-top: 1px;
margin-right: 2px;
margin-bottom: 3px;
margin-left: 4px;
or margin: 1px 2px 3px 4px;

padding-top: 1px;
padding-right: 2px;
padding-bottom: 3px;
padding-left: 4px;
or padding: 1px 2px 3px 4px;
Re:Macros by Mad+Merlin · 2006-09-03 19:22 · Score: 3, Informative

Can't even consider vim because of the macro capability in emacs. I have remapped Crtl-z to be equivalent to 'Ctrl-x e' (repeat last macro -- since I don't use 'suspend', the normal Ctrl-z function). Then I can record a macro ('Ctrl-x (' and type *anything* then close with 'Ctrl-x )') and use Ctrl-z to rapid-repeat the last macro. Makes repedetive editing very efficient. Can also do 'Ctrl-u 50 Ctrl-z' to repeat a macro 50 times, etc.

I'd move to vim if it had similar ease with macro creation / execution. Does it? Huh? Well, does it? Come on, preach it, brother! Make me a vim believer!

q<register> to record a macro, q to finish recording. Execute the macro with @<register>, then you can execute it again with @@. Obviously the @ commands can be prefixed with a number to repeat them that many times, 5@@ would repeat the last macro 5 times, for example.

--
Game! - Where the stick is mightier than the sword!
Re:Creating white space - apologies by julesh · 2006-09-03 19:43 · Score: 2, Informative

Err.. this approach just doesn't work. Images are inline elements, you can't replace them with an equivalently sized block element and expect the page layout to be the same. And setting the CSS 'width' attribute of an inline element doesn't work in Explorer, so the entire approach is flawed. Sorry.
Dreamweaver by Leroy_Brown242 · 2006-09-03 19:48 · Score: 2, Interesting

As much as WYSIWYG editors some times suck, Dreamweaver is alright. I like that it helps with the organization but also lets me get as geeky as I'd like.

--
Pretty Pictures!
Re:Creating white space - apologies by soliptic · 2006-09-03 21:30 · Score: 2, Insightful

How is this better than an image spacer? Elements have padding and margin properties, use them!
Web Developer and HTML Validator Extensions! by Selanit · 2006-09-03 22:54 · Score: 4, Informative

My biggest web devel tool is Firefox, with the Web Developer extension and the HTML Validator extension. The former does all sorts of amazingly neat things like letting me get precise info about any element within a page (using "Dispaly Element Information" under the "Information" menu, CTRL+SHIFT+F for short), showing me the HTTP response headers to any given page, add custom styles to a page, validate links, check for Section 508 accessibility compliance, resize the window for simulating lower screen resolutions, and on and on and on!

The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.

As for editing HTML, I generally use SciTE or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
Line endings - use dos2unix by bcmm · 2006-09-03 23:36 · Score: 3, Informative

There is a small utility called dos2unix which changes MS-style line endings in text files to Unix style. /usr/bin/mac2unix is symlinked to dos2unix on my Gentoo box, so I guess it can fix MacOs line endings too.

--
# cat /dev/mem | strings | grep -i llama
Damn, my RAM is full of llamas.