What's in Your HTML Toolbox?
Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
CAPITAL ONE!
[...]
Wait, what was the question again?
Javascript + Nintendo DSi = DSiCade
Dreamweaver FTW! It would be a huge timesaver in this situation.
Good luck!
Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
I know many of the geeks out there have forsaken Perl, but it is still, in my opinion, an indisposable tool. I am currently fixing up a website similar to the one you described, especially in terms of the HTML problems. Write a Perl script to fix capitalization, closing of tags, etc. But understand that if code is not written well to begin with, than in many cases, it is impossible to automate the process of fixing it. You are going to have to do some things by hand.
Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.
There are two approaches: live with it and make as few changes as possible, or bite the bullet and do a complete rebuild. To do a cleanup, checkout tidy - it does a good analysis of the existing pages and can generate CSS that is OK, but not beautiful. If you want the final pages to look the same, but be standards compliant, see meyerweb.com and read his books on rebuilding pages ("Eric Meyer On CSS" and "More Eric Meyer on CSS"). Pragmatic is his keyword: lots of examples and he makes sense.
Good luck. You're going to need it.
Been there, try this
I know it's a huge mickey mouse and there's probably (scratch that-- definitely) better ways, but when I need to do repetitive, but relatively simple, that can be done via command line, I use JavaScript to automatically create all the commands, copy them into a batch file, and done.
I use PHP. Server side includes are perfect for standard headers/footers. I check server variables to change behavior based on whether it's on the dev server or the final webserver.
I'd paste an example, but slashdot seems to think PHP code is "junk characters".
HTMLKit has a lot of great options for developers, and a good plugin system.
"Better to be vulgar than non-existent" -Bev Henson
> :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).
:-)
Or, in emacs
M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing]
P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
My toolbox has a little white pill that I take every time I get a hankering to work with HTML. It fixes me up right quick.
I got my Linux laptop at System76.
The disaster that was "s.gif" (or "trans.gif" in some circles) used as a layout tool was horribly over-used - and the 'net is a worse place because of it. In most projects now, I seek to replace all instances with a "compatible" approach.
.spacer{
I create a class:
line-height:0;
font-size:0;
}
Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:
I use a span sometimes, as required - if the DIVs alone cause layout issues.
Say hello to faster web pages instantly!
How many escape pods are there? "NONE,SIR!" You counted them? "TWICE, SIR!"
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Oops Sorry!
<div class="spacer" style="width:Xpx; height:Ypx;"></div>
How many escape pods are there? "NONE,SIR!" You counted them? "TWICE, SIR!"
Vim, grep, and sed. I heard they make movies, too! :-)
"All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke
Bash, Sed, Awk, Perl and vi.
I've used Dreamweaver pretty successfully to clean up a lot of poor HTML since it has pretty good functionality. I don't really have any suggestions as far as other tools go but for general single page cleanup I like DW. I've cleaned up quite a few huge documents that someone just saved as a webpage out of Word and ended up with 2 MB of HTML. Not really sure if that would work for your batch processing needs but if you have excessive issues with single pages I would recommend it.
Really, the only way to do a cleanup of your typical dog's breakfast collection of html is
1. Tidy the pages (using htmltidy)
2. Use a custom written script in whatever language (perl is good) to do as much of the task as possible automatically (things like replacing static headers with includes) - you'll need to be good with regex
3. Open the pages manually, and finish the job - I like Dreamweaver for this particularly if it's a complicated table based layout
whatever the case, it's going to take you a lot of time and energy, there is no quick fix.
NZ Electronics Enthusiasts: Check out my Trade Me Listings
Firefox with the IE Tab (or IE View), Web Developer, View Formatted Source, and HTML Validator extensions.
Vim for the editting, Emacs for the web server, interpreted language, games, database, web browser to check it with, source code management, image editor, vector graphics editor, e-mail client, e-mail server, ...
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
But, but do you have a Captial One card?!?
I've been using jEdit for a few years now. I've used almost every text editor out there, from Crimson to UltraEdit, and I still think jEdit is the best. When combined with the WebDeveloper extension and DOM inspector for Firefox, it can't be beat.
http://www.jedit.org/
No, really, stop laughing.
Frontpage, once you convince it to stop the WYSIWTG crap, has three tools that will make fixing a non-technical user's webpage easy. (Never, ever, let a non-technical user use Frontpage without supervision. It's worse than Word.)
I'd be shocked if there aren't better tools out there -- but by and large either they don't do as much, or they cost a significant chunk of change.
(Hey, you, with the laughing -- point me to a app that can do #1 with compatible replacements for #2 and #3, and, er, you'll get good karma for being so mean and laughing.)
TextWrangler or BBEdit Lite, vi, telnet, ftp, Photoshop CS (not CS2), GraphicConverter, Firefox, Safari.
vim to edit the XHTML, the W3C HTML validator to check its correctness, and Konqueror to test how it looks.
Leave it as a giant tangled mess and secure your job for the next 3 years. When they threaten to lay you off, tell them you need at least 1 more years of work before you can straighten up the code and 'hand off' the job to the new webmaster.
I know this is completely off topic but Steve Irwin died about 3 hours ago. He was killed by a stingray.
Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.
Separating grain from chaff
A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?
Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.
Be sure your filesystem is configured to register access times if you pick the first method...
(As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)
Lynx / Links / ELinks
Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.
First, before bitching about something, you should take a moment to learn about it.
"It has all the usual problems: paragraph tags with no ending tag"
There's no end tag required for paragraphs, as per the official spec: http://www.w3.org/TR/REC-html40/index/elements.htm l
HTML is not XML. Closing tags are optional for some elements, and forbidden for several others. and putting a slash at the end of a tag that doesn't have a closing tag, so it looks "xml-y" is an affectation and a waste of bytes.
Sera
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
At my last job, I had to do a LOT of this. Basically, I had to duplicate someone's web site look'n'feel, given nothing more than a URL, and put our (dynamic) content in the middle of it. Then, they could link to our page, and we'd essentially have one page of their site under our control.
First thing: Crack open the source. I would try not to clean it up if I didn't have to. If I didn't like it, that means I had to -- MS FrontPage and all of its DAMN CAPS ON EVERY TAG meant I'd run it through HTML Tidy.
Second thing: Fix the URLs. Since it was on our server, I had to make everything into absolute URLs. Rather than write a general-purpose script for this, I just wrote semi-generic regex search-and-replace in Vim. Replace href="/ with href="http://example.com/. Replace href="../ with href="http://example.com/foo/. And so on, and also with src.
Now the real challenge: Fix the structure of the document. Some don't need much. Some need major surgery -- fixed table widths, images set to those exactly, fixed heights, all kinds of other stuff in a layout... The worst were the ones where their main textual content was split up arbitrarily, to create things like columns.
Or worse, Adobe GoLive. I simply refused to work with it -- absolutely everything on the page, no matter how small or meaningless the distinction -- list items, everything -- was wrapped in its own div and positioned absolutely with separate CSS. The structure of the code did not match the structure of the visual document at all. And the menu (something I'd always have to customize) was generated entirely from some difficult-to-read JavaScript -- I wish I'd known about the web developer's "view generated source"...
Two main things to remember here: Dom Inspector and the Web Developer Toolbar. Dom Inspector to find where what you're looking for lives in the code, and the Web Developer extension (for Firefox) to edit the CSS and see changes reflected in realtime, as well as way, way more stuff than I could possibly mention here, including "view generated source".
Sometimes I couldn't fix their layout, and I'd have to make a brand new document and paste their content into a brand new layout. Sometimes it worked, often it didn't. So keep that in mind -- I know others have said it, but sometimes it makes sense to just throw the whole thing out. But yours looks like it could work with some simple search/replace in Vim -- look for href=, src=, and in CSS, url('...
Don't thank God, thank a doctor!
Previous posts have mentioned Perl and PHP; seconding those for high-intensity search-and-destroy missions. As for software, you can't go wrong with TextPad, WinSCP, and PuTTY.
For best practices (separation of content from structure from behavior, mostly) keep an eye on are listed in and around A List Apart and the Web Standards Project. And if you're looking for several sets of outstanding presentation and behavior tools, check out the YUIBlog and the Yahoo! Developer Network. (Hint: their page grid layout, font normalization, and CSS reset libraries are an excellent place to start.)
The only thing more pathetic than a Mac zealot, is a Mac zealot who believes that the type of computer he owns will get him laid.
Even more pathetic than that, is one who brags about this belief on slashdot, while at the same time exposing his sexual isolation by linking to fugmo's who he thinks are attractive women (who consequently, would quit their jobs as call-girls to avoid ever having to touch him).
Fromdos, grep, sed and awk. Possibly some normal pretty printer too.
Tidy, as others have already mentioned, will be your very best new friend.
Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.
Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").
If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).
And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.
Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.
Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.
Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).
Oh, and learn to intent your freaking HTML!
Some nice resources:
Activating the Right Layout Mode Using the Doctype Declaration
Quirksmode - a GREAT resource. Awesome info here. Memorize it.
When I clicked the page I was so sure I would see at least one "What's in My HTML Toolbox? vi!" comment, modded Funny of course, but no...
Maybe I should check again later...
You just got troll'd!
CSSEdit by Macrabbit.
Awesome program and worth checking out if you use a Mac.
I like big butts and I cannot lie.
If you're using all static HTML, you can get rid of dead pages with wget. Do "wget www.website.com/whatever -r" to download it, and then just use what you've downloaded as your base.
To find broken links, I like to use Xenu. Google it.
http://ablegray.com
My HTML toolbox, which is my little 64Mb thumbdrive, has really only the bare essentials for website development: Notepad++ and WS_FTP.
But, but do you have a Captial One card?!?
It is now.
This is worse than image spacer, please go die in a fire
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
A hammer for hitting myself over the head, and a bottle of whiskey to numb the pain... of dealing with HTML.
Game... blouses.
I've used OpenSP a lot. It's a suite of tools that includes onsgmls, the parser that lies at the heart of the W3 validator. Combined with find you can easily validate local copies of all the files. Its faster than using the validator for multiple pages. It also included onsgmlnorm, which is used to normalize SGML. If you have a load of "XHTML without closing p tags" type HTML, change the doctype to an HTML doctype, run it through onsgmlnorm, switch the doctype back, and all the closing ps are there. (It's not quite that simple though - you have to clean up lots of suprious > s which get introduced for sensible but obscure SGML reasons, usually after img elements. It's trivial to do the cleanup automatically.)
no it's not HTML is still SGML, and still alive and well.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
The great thing about web standards is... there's so many of them!
Can't even consider vim because of the macro capability in emacs. I have remapped Crtl-z to be equivalent to 'Ctrl-x e' (repeat last macro -- since I don't use 'suspend', the normal Ctrl-z function). Then I can record a macro ('Ctrl-x (' and type *anything* then close with 'Ctrl-x )') and use Ctrl-z to rapid-repeat the last macro. Makes repedetive editing very efficient. Can also do 'Ctrl-u 50 Ctrl-z' to repeat a macro 50 times, etc.
I'd move to vim if it had similar ease with macro creation / execution. Does it? Huh? Well, does it? Come on, preach it, brother! Make me a vim believer!
Changing old code to new code could rarely be automated, it's not a simple syntax change, it's aq paradigm shift, and computers are not as smart yet as to figure out the semantics of old code and rewrite it into HTML/CSS combo.
HTML Tidy is something free and available which will do the very basic work of cleaning up and fixing the HTML where possible.
try Notepad++. syntax highliting for html php js and conversion for windows/unix line ending, macros, hex editor, html tidy-ier-upper, and more. Lots o nifty stuff and i's OSS.
Place a curse on Microsoft
Or you could just use the padding / margin features provided by CSS.
margin-top: 1px;
margin-right: 2px;
margin-bottom: 3px;
margin-left: 4px;
or margin: 1px 2px 3px 4px;
padding-top: 1px;
padding-right: 2px;
padding-bottom: 3px;
padding-left: 4px;
or padding: 1px 2px 3px 4px;
Although you're right, flamewar in 5..4..3..2..1.. *ducks*
Set up a subversion repository, or whatever your version control of choice is.
.olds etc. Then remove all the old stuff, to what you 'think' is current, commit.
Add everything to it, even the
Checkout the repo to a webserver, see if anything is broken. (someone previously suggested wget, this too would work). Basically, get yourself a nice starting point.
Then go to town on the code. Everything is in version control, so if you accidentally delete something, you can always look back and figure out what it was and re-add it.
I use Adobe Golive for this, and it's served me well. It detects errors like broken links, and offers batch fixing.
Failing that, perl is probably your best bet.
foo mane padme hum
Err.. this approach just doesn't work. Images are inline elements, you can't replace them with an equivalently sized block element and expect the page layout to be the same. And setting the CSS 'width' attribute of an inline element doesn't work in Explorer, so the entire approach is flawed. Sorry.
As much as WYSIWYG editors some times suck, Dreamweaver is alright. I like that it helps with the organization but also lets me get as geeky as I'd like.
Pretty Pictures!
jEdit (www.jedit.org) - best editor in existance, unmatched functionality
Dreamweaver 8 (on OS X) DW is an outdated way to do things, but it still is very powerfull
Quanta (Quanta Gold for Win or OS X - > http://www.thekompany.com/products/quanta/; Quanta Plus for Linux -> http://quanta.kdewebdev.org/)
PHPEclipse (has anoyances but very good PHP tools)
For a redo of that old site of yours I recommend simply installing a CMS and migrating the content by hand if neccesary. That's probably faster and more effective than anything else. Static HTML just isn't the way to go these days, which eliminates most of the need for a large-type HTML editor. Check out joomla! (www.joomla.org)
We suffer more in our imagination than in reality. - Seneca
> It has all the usual problems: paragraph tags with no ending tag
You said it was HTML, right? Ever read the specification? The closing tag for paragraphs are optional.
If you want to risk your site looking like crap in IE6 you can, yes.
I wrote many webmaster scripts to deal with all kind of problems I run into while building and maintaining my sites. And here is a script that many webmasters may find particularly useful, it reduces the size of html files: htmloptim
How is this better than an image spacer? Elements have padding and margin properties, use them!
I will no doubt get replies that "Scripting Language X would be better", but I have the most experience with Perl. So if time was of the essence, that's what I'd use. Perl is a Swiss Army Knife in this kind of situation, and you can easily get just about any kind of blade or tool you might want to deal with files and formatting via CPAN.
You can use Perl to fix the file names, restructure the directories, extract the content, put it into a database, and even drive the new site if you'd like. No matter what the choice of new site software, Perl can salvage the existing content and transform it into whatever format you require.
If I had more time I might choose Ruby instead simply because I like programming in it more. However the choice of ready-made tools via the Ruby CPAN equivalent is somewhat less.
No matter what scripting language you choose, you'll be saving time in the long run. Building tools is always time well spent. Indeed, taking a few hours or even days to write a script that makes a weeks-to-months long job of reformatting take hours is one of the great joys of programming for a living.
Post Scriptum: I'm sure you already did, but just in case: Don't forget to back up the original. Thrice. They'll tell you it's already backed up. That's fine. Make three of your own anyway. If they'll let you, lock one in the safe. "Whenever testing or reconfiguring, always mount a scratch monkey."
What the fuck is REDUNDANT about this. It's about time that all negative moderations get special attention. This place is starting to suck.
NEdet and Firefox & opera
Politics is Treachery, Religion is Brainwashing
On all of my Windows machines, I keep a copy of CSamp running in the systray at all times. It's a tiny little app that will grab the RGB/Hex values for any pixel on the screen. Great for matching colors in images, or if you like me are too lazy to view source and go digging for a color attribute.
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
WTF? Take a computer course, they should teach basics like this in the first week!
My biggest web devel tool is Firefox, with the Web Developer extension and the HTML Validator extension. The former does all sorts of amazingly neat things like letting me get precise info about any element within a page (using "Dispaly Element Information" under the "Information" menu, CTRL+SHIFT+F for short), showing me the HTTP response headers to any given page, add custom styles to a page, validate links, check for Section 508 accessibility compliance, resize the window for simulating lower screen resolutions, and on and on and on!
The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.
As for editing HTML, I generally use SciTE or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
Programming smart indent:
/") /p or /table
Initialisation:
$XHTML_COMPATIBILITY = 1
# set to 0 if you don't want XHTML like "
" (simply "
" instead)
define TagEnd {
start = search("= 0) && (search_string(tag, ">",0) == -1)) {
# it really looks like an HTML tag: " (otherwise, comparison operators in PHP are hard to type)
newtag = replace_in_string(tag, "^\\= 0) {
# If this is a tag without content (like
or
if ($XHTML_COMPATIBILITY && (length(newtag) > 0 )) {
# if we want XHTML compatibility AND there really is a tag
replace_selection(tag "
# insert the XHTML end-of-tag
gotoPos = $1 + 2
# as something was inserted, the cursor needs to be moved
} else {
# no XHTML compatibility or no tag
gotoPos = $1
# essentially: do nothing
}
select(gotoPos, gotoPos)
set_cursor_pos(gotoPos)
# reset the selection and put the cursor to where the user expects it
} else {
# a normal tag with a content
replace_selection(tag "")
# insert closing tag - the matched tag (e.g. p or table) ends with
select($1, $1)
set_cursor_pos($1)
# reset the selection and put the cursor to where the user expects it
}
} else {
# it's not an HTML tag - leave everything alone
select($1,$1)
set_cursor_pos($1)
}
}
Newline:
return -1
Type-in:
if ($2 == ">") {
TagEnd($1, ">")
}
Its not mine, I didn't write it but I find I fantastic for working with html. Wish I could credit the writer.
http://michaelsmith.id.au
What for do you got him in your toolbox?
:)
To fix the holes in your code?
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
Never heard of the strict doctype, have you?
There is a small utility called dos2unix which changes MS-style line endings in text files to Unix style. /usr/bin/mac2unix is symlinked to dos2unix on my Gentoo box, so I guess it can fix MacOs line endings too.
# cat
Damn, my RAM is full of llamas.
I agree. Please continue to post your moronic jokes based on something that kinda sounds like something else. They make the place so much nicer than all those negative mods.
Dreamweaver FTW!
What kind of advantage would using dreamweaver give you in a situation like this?
I first started with HTML/websites in the mid 90s with AOLPress, then Adobe Pagemill, NetObjects FUSION, GoLive Cyberstudio (which was bought by adobe and turned into GoLive), and eventually, I dropped all of these studio apps in favour of vim using PERL and eventually moved on to PHP.
I've since started using this great app called TextMate, and when I get a complete site that I need to work on, I pipe the code through a handful of PERL programs I wrote to make it readable and make sure all tags are properly closed, then open it in TextMate to start working.
I haven't used any of those big apps (GoLive et al) since the late 90s, so they may have improved since then, but aside from their WYSIWYG aspect and their built-in validators, what other advantages does it earn you? How do those apps aide you when you've got embedded code or PHP or whatever? Do they have built-in interpreters?
I dunno, I've just found that you really need to have a full webserver to properly work on a site. I wonder when Adobe is going to embed apache/php/perl/mysql/etc into GoLive/Dreamweaver to get a proper environment for the previews.
and, I dunno if you can answer this, but how well does Dreamweaver handle Ruby on Rails? I can't imagine it supporting rhtml (erb) or yaml code.
...spike
Ewwwwww, coconut...
I prefer to use vi (of the elvis variety) unless I'm editing a page some a$$hole has used dreamweaver or frontpage to create. I can't stand "^M"! If I'm doing some heavy php work then I use Bluefish.
Having to work for a living is the root of all evil.
http://validator.w3.org/
http://jigsaw.w3.org/css-validator/
Along with awk, sed, vi/pico/nano, and occasionally perl for really complex alterations.
So no, XHTML != HTML.
Perl, especially Template Toolkit, with Emacs takes care of most things.
I had to do some extraction of text from HTML, so I wrote a program for it. It may or may not be useful in this case, and it doesn't always do 100% of the job (but I've found the 98% it does do to be very useful). It is (OF COURSE!) open source so you are free to tinker and improve it for your own use. Download from:
http://jsoftco.8m.com/download.html
Teen Angel - a Ghost Story
For batch changing I have found Advanced Find and Replace to be very effective. I had to update a none standards compliant site that didn't use CSS to standards compliance with CSS recently. The site had about 15000 pages at the time, if I remember rightly, but it was quite painless updating it with Advanced Find and Replace.
For HTML, CSS and PHP editing I use TextPad. A great text editor with syntax highlighting and other tools that make writing code easy. For checking the page I use Firefox with Web Developer plugin, Opera (my main browser) and, grudgingly, IE.
I would....
bring all the files into Dreamweaver as a 'site' then as I changed the filenames (i think) DW would automatically update all links to those files.
DW will also report which files are used by no pages in the site. And which pages are not linked to by any pages in the site.
Write some sort of applescript that would open all the files in bbedit and change their line endings (should be simple I think but I've never done it myself).
For fixing the broken html, just run it through one of the many applications based on HTML Tidy. I'm sure there's something automated out there.
One would assume, seeing as it's 2006 and all, that he intends to rebuild the site as a modern standards compliant site. Even if he chose html 4.01 instead of xhtml it's still best practice to close all your tags.
Keep the old version around to review with... then rebuild the whole thing in a CMS.
- Set up your stylesheet to cover all the examples in the old version... just click through the old site and pick out consistent examples of html entities... don't forget to scope your entities by providing IDs around such areas as menus, masthead, sidebars, advertising, etc.
- Ignore anything that is similar enough to look almost the same, no one will complain if you resolve inconsistencies... but will if you make unilateral decisions like 'All lists should look the same'
- Add in any custom classes... for when 'All lists just aren't the same'
- Hire an assistant with no web experience to copy/paste all the plain text scraped from a browser view of the page into a vanilla Dreamweaver generated html page and save it using the page title as filename.... no links, no formatting... just text. Takes 10 minutes to instruct on this one, then they go do it for a day or two.
- Instruct said assistant to go back and use the WYSIWYG viewer to add paragraphs and select lists and convert them to html lists. Takes 10 minutes to instruct, another day to complete.
- Instruct assistant to go back and add h1, h2, etc where needed.
- You can see where I'm going. Delegate the job in easy to do, hard to mess up, bite-sized tasks.
- While they are doing this you can be finishing up the more complicated pages and adding in stuff like form validation, unobtrusive dom based javascript to replace the horrible Dreamweaver scripting that's inevitably in there... and swapping script based mouseovers for CSS based ones... etc. and setting up all the chunks of html that need to be handled more delicately for accessibility.
- When pages are complete... just copy/paste the final html into the CMS according to your layout requirements for content regions
Essentially I'm saying that instead of using Tidy or something like that which will require you to go back and double check that it's automation went well... use a human equivalent which if constrained to simple tasks will do a much better job.
The nice thing that you get as a bonus is an assistant who knows enough html to be useful but not so much as to be dangerous... and that's hard to come by without paying for a full fledged developer. If that person wants to learn more, great... you can teach him/her the right way and won't have to unlearn them of bad habits. In the meanwhile you can teach them how to make maintenance updates to text via the CMS using FCK or TinyMCE as a WYSIWYG... very easy for making text changes.
A fool throws a stone into a well and a thousand sages can not remove it.
Who said anything about "getting laid"? You're projecting, methinks.
This has got to qualify as the WTF of the Day:
"One would assume, seeing as it's 2006 and all, that he intends to rebuild the site as a modern standards compliant site. Even if he chose html 4.01 instead of xhtml it's still best practice to close all your tags."
HTML 4.01 IS the current HTML (as opposed to XHTML) standard. And some of those bullshit "best practices", like "closing all tags", are forbidden by that very standard. Its not like its hard to read. I linked to the specific page on the W3C site.
So stop being a Microsoft Weenie (yes - you're easily identified by your willingness to break standards, just as FrontPage breaks those same standards by doing "best practice" shit like closing tags that don't need them).
List of tags that the standard forbids having a closing tag: http://www.w3.org/TR/REC-html40/index/elements.htm l
Do you close your image tags? Then you're not in compliance with the published standard. So please spare the bullshit about "modern standards compliant site. Even if he chose html 4.01 instead of xhtml it's still best practice to close all your tags.". You don't know what you're talking about, and it shows.
In case you missed it, the article's title asked what was in your HTML toolobx, not you XML toolbox, or XHML toolbox.
I use Subversion (locally) for my web sites, with hooks to automatically upload changed files on commit.
It saves a ton of time, and probably bandwidth since I can work on a local copy and only upload the changes. (Without having to keep track of which files I've changed.)
So if I were in your shoes, I'd make a local copy, start ripping stuff out little by little until the site doesn't work, rollback the changes until it does, and repeat. Having versioning really helps out if you make a mistake, which is inevitable.
If moderation could change anything, it would be illegal.
But not putting in closing paragraph tags just invites sloppy coding. Always close your tags. Makes it far easier to read, for both yourself and others who may have to read the code, and means you can jump to XHTML without any hassle.
Which you can do now without too much worry.
What a lot of hot air.
It is highly unlikely the poster you are replying to was saying you should close all elements, even those who do not require a closing element, as that is madness.
What they were probably suggesting is that elements that can be closed *should* be closed, which is entirely sensible. It makes HTML far easier to parse for a human, assuming you have reasonably sensible code layout.
ry filezilla or Novells netdrive. Makes an FTP server look like a driveletter in windows, any app can use FTP for read and write.
and it means there's less work for the future when he or somebody has to update the site to xhtml or whatever comes in the future that demands properly nested and closed tags.
Mass processing of text files including moving and renaming files and following links?
Sounds like a job for some custom PERL code.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
What is your criteria for excluding XHTML from the set of valid HTML specifications?
As is XML, therefore XHTML.
The article is about HTML standards. The standard is clear. Closing tags are optional in some cases, forbidden in others. We have enough problems with certain companies breaking standards - we don't need to advocate it here. Sloppy coding is the result of stupidities like breaking standards because you want the code to "look xml-y".
Anyone who can't read HTML 4.0 shouldn't be writing HTML 4.0. Its as simple as that. Don't claim to want to follow the standard, but want to break it because you can't be bothered to learn it.
Here's what the fucktard said in black and white (yes, its Tuesday):
"Even if he chose html 4.01 instead of xhtml it's still best practice to close all your tags."
Not "close all the tags the standard allows for". All tags, in direct violation of the standard.
Whens the last time you saw an image tag pair with enclosed content? A Break tag pair? Paragraph pairs aren't even needed since logically, the end of one paragraph starts another.
If someone can't read HTML, they shouldn't be writing HTML.
Here's the link for XHTML: http://www.w3.org/TR/xhtml1/
And here's how its described:
Extensible HTML is NOT HTML 4 any more than SGML in HTML. XHTML is a superset of HTML. For example, "All dogs are animals, but not all animals are dogs."
The problem is finding good tools for handling web pages that consist of HTML/Javascript with the occasional bit of JSP/ASP/PHP/PSP or even Java. Not to mention the additional complication of code blocks for (other) templating engines.
I'd say that XHTML is a subset of HTML. All of the markup in XHTML can be expressed in valid HTML4, but the reverse is not true.
... />
... >
XHTML1
<p>...</p>
<img
<div class="...">...</div>
HTML4
<p>...
<img
<div class=...>...</div>
The XHTML elements are, of course, case sensitive.
XHTML is a grammatically strict dialect of HTML4 with HTML4 compatible semantics. XHTML grammar rules are HTML4 compatible as well as XML compatible.
Things do get a little more snakey when you start using MathML embedded in the document.
* Perhaps to be clear, I should say that when I refer to XHTML, I mean XHTML 1.0. I don't think XHTML 1.1 has gathered enough steam for me to bother with. XHTML 1.1 breaks backwards compatibility in many ways.
** You fell into a fallacy. e.g. dogs are animals, cats are animals, no dogs are cats.
Unfortunately, the topic is HTML, not XML or XHTML, which are both totally irrelevant to the discussion. Additionally, what you or I say doesn't matter - the spec and the W3C, which maintains it, are the final arbiters of what is HTML, and they say that XHTML != HTML.
What next - argue that SMGL is a subset of HTML? Forget it.
Wow... You I bet you really like the smell of your own farts.
JOhn
Campaign for Liberty
Dammit.. I should have hit preview... I'm sure grandparent understands what I meant though. His / Her superior intellect could probably even translate this entire thread into eight different languages.
JOhn
Campaign for Liberty
Re: IE CSS -- Thats IE's fault, not CSS's
What about a Table Cell with a width parameter?
http://www.jedit.org/
I love it - all the features listed above, plus a plugin framework that supports sftp, ftp, and a zillion other tools. (Only complaint is no seamless svn checkin/checkout)
Java - so it runs exactly the same on Windows, Mac, Linux - makes platfor transitions quite smooth.
Not dinging any other tools - just saying that this is an essential tool in my (expanded) toolkit.
But Herr Heisenberg, how does the electron know when I'm looking?
Re: IE CSS -- Thats IE's fault, not CSS's
Yes, but those of us doing real web site development work understand that our results have to work with IE, because a lot of people use it and we have to support anything that significant number of people use. You can't just blindly code to the standards and hope, you have to pick and choose only the standards that actually work for most people.
Table cell widths works fine, in some situations.
Well, you could try a combination of SmartFTP and Notepad++
:)
http://www.smartftp.com/
http://notepad-plus.sourceforge.net/
SmartFTP allows you to edit files live. It ftps the selected file down, you edit it in your favourite editor (automatically launched from SmartFTP, of course), SmartFTP automatically detects that the file has changed and ftps it back up again. The overall effect is that you can hit save in Notepad++ then refresh the webpage to see the changes. A very convenient way to break a live website.
biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!
Hardly. Each image requires a separate HTTP connection back to the web server which increases overall load on web servers, client machines, and the routers in between due to TCP connection setup/teardown overhead.
And take your "go die in a fire" comments back to 4chan, please.