Domain: hwg.org
Stories and comments across the archive that link to hwg.org.
Comments · 23
-
Re:Project GutenburgI'm confused a little bit about what you are talking about here. You are looking for original source material, but you are also insisting on having extensive bibliogrphies and footnotes.
If you have ever read a "Featured Article" quality Wikipedia entry, they will almost always have very extensive bibliographies, footnotes, and links to original source documents, so this statement that you are looking for this seems like you are missing something essential here.
Or that you are looking at older books that don't have bibliographies and dismissing them.... when in fact they are the original source documents you claim to be craving here.If someone started a project to provide that kind of information for Project Gutenberg books, I'd get on board to help. Bonus points if they're also putting them in formats that don't suck (making plain text look good on the screen is a pain in the ass).
If I am reading this correctly, you are looking for people who mark up the Gutenberg Project files to something that isn't just plain ASCII? Check out these website:- http://gutenberg.hwg.org/ - HTML Writer's Guild - They have moved to a more XML scheme for markups, but it originally started by a couple of guys who wanted to take the PG material and formatted it using HTML. The raw ASCII is available, of course, if you really want to get it.
- http://www.wikisource.org/ - Wikisource - A "sister project" to Wikipedia and sponsored by the Wikimedia Foundation, this project aims primarily to support Wikipedia with original source documents, although most of the "regular" participants simply are fans of old documents. You have all of the tools available on the MediaWiki software for markups (aka the same software used for Wikipedia) and some extensive work has been done with many documents to "pretty them up" and format them to something more than a plain ASCII text page. While some Project Gutenberg pages do exist on this site, it isn't exclusively PG material.
I'm sure I could find other websites to do this, but it isn't exactly a brand new idea, and there are groups of people who do agree with you that plain ASCII sucks and needs to be fixed in terms of something more visually appealing. If you want to participate with either of these groups in terms of making it easier to read some of these clasical documents, volunteers are always wanted. -
Re:Don't fall for the trap
www.sun.com/software/communitysource/faq.xml
You know, I love how companies are doing this, creating a file extension, and associating that as HTML 4.01 Transitional in their AddType directives. Gentoo did it (and in fact, I still checked, they're STILL doing it), and now Sun is doing it. This document is NOT a valid XML document, nor is it well-formed. In fact, it barely qualifies as HTML (and doesn't even validate against its declared doctype).
Now, for some examples of REAL XML in a browser, go to the Gutenberg XML pages and look at their works. True, valid, well-formed XML, rendered in the browser.
This pseudo "Look ma! I'm using XML" madness needs to end. Its getting tiresome.
-
Re:still free
I participated in this when it started up. It's dead in the water, becalmed, caught in the horse latitudes, so far as I can tell.
For example, take a look at the dates attached to the marked-up texts in this list. A shame--folks were mighty excited.
The Project Gutenberg XML mentioned earlier here was also exciting, but I've been off the mailing list a few years, and am having trouble finding its archives now. Anybody have more luck than me? As I recall, one of the unanswered threads that ran through it was what to do in the TEI headers, since TEI was an attractive choice for a mark-up vocabulary. It is not that obvious how to accommodate the Gutenberg boilerplate and metadata appropriately in the header. -
Re:still free
The HTML Writers Guild is translating Project Gutenberg texts into HTML.
-
Re:Christians using Darwin
Yeah, here it is. Fascinating stuff. Here is a list of his works that are available on Gutenberg.
-
Personal preference
In other words, it is nice to get away from the computer sometimes and just read.
Yes, but sometimes it's nice to have an electronic document to read off of your computer screen so when someone walks past your desk during business hours, you can tell them you're "researching for an upcoming project" while you're actually reading about nanotechnology or something else that interests you.
Personally, I applaud the efforts of this project. Once products like ePaper begin to be mass-produced and available to the public, you will be able to have YOUR way (tactile reading) and I will still be able to have MINE (be able to read the same document off of a computer screen). Well, I actually DO like books, but I'm just trying to say that having a CHOICE is a GREAT thing.
And FURTHERMORE, once all of these books get converted from plain text files into XML files, you'll be able to apply whatever your favorite stylesheet is to it to have your own personalized reading experience. (Examples: If you're older and have vision problems, you can have bigger fonts. Maybe you prefer plain black text on white background; maybe you prefer green text on a black background with a Courier New font. It's up to you!)
And of course, speaking of vision problems, you can also have a text-to-speach program READ the text to blind people (or people who like audio books) as well!
Ain't technology a wonderful thing?!? :) -
Re:XML please
Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.
With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:
The HTML Writers Guild - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.
Project Gutenberg XMLThis is a group more dedicated to the XML, but has a very similar purpose.
The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done. -
On Beyond ASCIII understand the support in a lot of the comments here for the plain-vanilla ASCII Project Gutenberg approach to ebooks. Paradoxically, however, a simple ASCII conversion from print to digital form provides less assurance of future survivability and usability of your book than rendering it with the structured XML markup specified by the Open eBook standard (where well-formed XHTML is the least common denominator).
Why? Well, an ASCII text version of a printed book is really more like an analog facsimile than is a version in XML that has been tagged for structural features. Leaving aside issues of non-English characters, illustrations, and unusual typography, ASCII does a relatively poor job of capturing all of the structural conventions that exist in printed books. Books have copyright pages, tables of contents, chapter titles, subtitles, bylines, epigraphs, block quotations, footnotes, running headers and footers, citation lists, etc. ASCII can provide rough format equivalents of some of these, very poor equivalents of others. With an appropriate XML tagset, however, it's a relatively simple matter to tag most of the structural features of a book and then use stylesheets for presentational rendering. That's the whole assumption of the Open eBook specification.
Suppose you're in a world where all printed copies of Huckleberry Finn have been lost. You have two CD-ROMS that somehow you've managed to decode so that you can read the files and interpret their character sets. One of them contains the Project Gutenberg etext of the novel, an ASCII transcription. The other contains an XML encoding tagged according to a DTD from the Text Encoding Initiative, the current best standard for encoding literary (and many other) texts. It has all of the textual content of the PG version, as well as some that's missing (like the table of contents and the copyright page from the transcribed edition, which the PG version unaccountably omits). XML tags mark all the line and page breaks of the original. In addition, there are tags to mark quoted speech, unusual typography, words in foreign languages, and other significant features of the original. The CD-ROM contains the DTD used along with documentation on the tagset.
In this imaginary scenario, even if all of the XML documentation were missing it would be pretty straightforward for 31st-century programmers to strip out the tags and recreate the ASCII transcription. But with the documentation, it's possible to reconstruct something much closer to the original than the plain-vanilla PG version allows. And suppose your 31st-century archaeologist found a trove of TEI-tagged books on CD: with all of the structural tagging and metadata about authorship, publication dates, etc., a 31st-century librarian will be able to plug all of the books into a cataloging system that allows sophisticated searching. If instead you had a trove of plain-ASCII books, the best you could do with the collection would be simple full-text searches.
Leaving aside the sci-fi scenario, the reality is that our documents, over the next few decades, will move from format to format and be used for purposes that we can only guess at right now. Of course plain ASCII, or even proprietary formats, will be better than no documents at all. But the work involved in converting them will be a lot higher than if they are tagged in a well-documented, structured markup language.
Incidentally, there's already at least one project underway to take Project Gutenberg texts and add minimal XHTML or XML markup to capture structure and make them more readable via stylesheets. The Open eBook specification is just a more sophisticated way of doing the same thing.
-
Gutenberg Texts in XML
For Gutenberg texts in XML -- or to get involved in the process yourself -- see the HTML Writers Guild's "Gutenberg at HWG" project started by XML author Frank Boumphrey, at:
Volunteers are needed!
--Kynn
-
Composite Reply on Web Accessibility
A few composite replies to some of the statements that have been made here:
fleener wrote:
Either the W3C standards will change to somehow radically change the makeup of pages on-the-fly for blind users, or another Jakob Nielsen will rise to power and make a lot of money.Actually, the W3C standard to change the makeup of pages on the fly exists; it's XSLT -- XSL Transformations. We use it at Reef (formerly Edapta) to do dynamic edaptations of the user interface to meet the needs of various audiences, including people with disabilities. If you want to see the semi-non-public demo pages from last year, drop me a note in email. (I'm not at liberty to get us slashdotted at the moment!)
Argy offered great advice, including:
As to what you're looking for, I'd spend some time browsing your sites using lynx.If you haven't used Lynx for a long time, and don't want to bother to install it, you can also try Delorie's Lynx Viewer, a web-based lynx simulator script.
GC wrote:
You do not have to change your website at all. Your website does not define the media which will be used to define it. Your website will just send down the Internet pipe what it is requested for. The accessibility concerns are fully dependent on the equipment used to communicate and receive the information at the users end and this is not within your power nor should it be your concern.I beg to differ here; it's a common fallacy that assistive technology can solve all of the problems of access. In fact, I included this on a list of Common Myths About Web Accessibility because many people seem to think that a screenreader or braille terminal can fix everything.
The problem, however, is a simple "garbage in, garbage out" scenario. Assistive technology needs enough information to be able to cobble together an alternate access method. That information is encoded within the HTML file. If the HTML file is poorly done, then it may prove impossible to get even the minimum information from a page.
If you don't want to simply believe me because I say it's so, then you could do a test yourself -- download a screenreader and try it out on a web page and see how it works. You may be disappointed to find that it's not as easy as you'd hoped -- and then remember that for many people this is their only way to access the web.
A few quick links to screenreader (or screenreader-like) technology:
- IBM Home Page Reader 30 day trial, runs on Windows
- emacspeak download from Sourceforge, runs on Emacs
Enjoy!
--Kynn Bartlett
-
Composite Reply on Web Accessibility
A few composite replies to some of the statements that have been made here:
fleener wrote:
Either the W3C standards will change to somehow radically change the makeup of pages on-the-fly for blind users, or another Jakob Nielsen will rise to power and make a lot of money.Actually, the W3C standard to change the makeup of pages on the fly exists; it's XSLT -- XSL Transformations. We use it at Reef (formerly Edapta) to do dynamic edaptations of the user interface to meet the needs of various audiences, including people with disabilities. If you want to see the semi-non-public demo pages from last year, drop me a note in email. (I'm not at liberty to get us slashdotted at the moment!)
Argy offered great advice, including:
As to what you're looking for, I'd spend some time browsing your sites using lynx.If you haven't used Lynx for a long time, and don't want to bother to install it, you can also try Delorie's Lynx Viewer, a web-based lynx simulator script.
GC wrote:
You do not have to change your website at all. Your website does not define the media which will be used to define it. Your website will just send down the Internet pipe what it is requested for. The accessibility concerns are fully dependent on the equipment used to communicate and receive the information at the users end and this is not within your power nor should it be your concern.I beg to differ here; it's a common fallacy that assistive technology can solve all of the problems of access. In fact, I included this on a list of Common Myths About Web Accessibility because many people seem to think that a screenreader or braille terminal can fix everything.
The problem, however, is a simple "garbage in, garbage out" scenario. Assistive technology needs enough information to be able to cobble together an alternate access method. That information is encoded within the HTML file. If the HTML file is poorly done, then it may prove impossible to get even the minimum information from a page.
If you don't want to simply believe me because I say it's so, then you could do a test yourself -- download a screenreader and try it out on a web page and see how it works. You may be disappointed to find that it's not as easy as you'd hoped -- and then remember that for many people this is their only way to access the web.
A few quick links to screenreader (or screenreader-like) technology:
- IBM Home Page Reader 30 day trial, runs on Windows
- emacspeak download from Sourceforge, runs on Emacs
Enjoy!
--Kynn Bartlett
-
Re:Some Things To Do:
> I think there is a Webmaster's guild somewhere.
There's the HTML Writer's Guild, http://www.hwg.org/ - they claim to have over 120,000 members, although how many are active is another matter. -
Check out this site on copyright law
There have been numerous LONG discussions of this very issue, specifically relating to HTML/site design/site construction on the HTML-Business list at HWG.org (HWG = HTML Writer's Guild, not Horny White Guys). If anyone is interested, the discussions are in a searchable archive.
While there's lots of angst and chest beating (very entertaining), it boils down to getting a lawyer well-versed in copyright law to help you develop a STRONG contract.
A frequent contributor to this discussion there is Ivan Hoffman, whose web site is a good jumping off spot for solid advice. YMMV, of course. -
Elements of Programming With PerlThis book was just published in October, so it is not yet very well known. It is written by Andrew Johnson, one of the original members of the Winnipeg Perl Mongers group and a regular contributer to comp.lang.perl.misc. I have seen the book; it looks like it would be extremely useful for someone new to programming and/or Perl, with material of interest to more experienced programmers as well.
Here are some links for more information on the book:
Andrew's home page for the book at Manning Publications
a review by Billy Baron of Delphi Consultants at javamug.org
an online Perl programming course using Elements as its textbook
Disclaimer: I am from Winnipeg, and know the author.
-
Re:The answer is simple - Ignorance.I agree 100%, schon. The HTML Writers Guild established the AWARE Center to fight this kind of ignorance. It's a hard fight, though, and we could use the help of slashdotters to spread the word.
--Kynn
-
Re:WTF are they supposed to do?Well, you may think that ALT text is "as accessible as it's going to get", but do you know that most of the sites out there DON'T have proper ALT text?
An exercise I teach in my online course on web accessibility asks the students to turn off their images and surf around the web a bit. Nearly every site they visit is barred to them, even some sites by disability organizations!
You may want to try a similar exercise yourself. I sense from your posts that you don't know much about this issue, and this would be a good way to increase your knowledge base. You could also visit the AWARE Center site.
--Kynn
-
Re:WTF are they supposed to do?Well, you may think that ALT text is "as accessible as it's going to get", but do you know that most of the sites out there DON'T have proper ALT text?
An exercise I teach in my online course on web accessibility asks the students to turn off their images and surf around the web a bit. Nearly every site they visit is barred to them, even some sites by disability organizations!
You may want to try a similar exercise yourself. I sense from your posts that you don't know much about this issue, and this would be a good way to increase your knowledge base. You could also visit the AWARE Center site.
--Kynn
-
Re:Web Access for the Disabled - Useful LinksHere's two more useful sites:
The W3C's Web Accessibility Consortium creates technical specifications that are guidelines for web page authors, browser programmers, and authoring tool creators. (Warning: Dry and technical in that charming W3C manner.)
The HTML Writers Guild's AWARE Center is all about educating web designers on creating accessible pages. You may want to read the Common Myths about Web Accessibility article or the Selfish Reasons for Accessible Web Design. (Full disclosure: I maintain the AWARE center site and wrote both of the articles cited above.)
--Kynn
-
Use of ORA books as textbooks"Crutcher asks: Not sure how to phrase this, but, well, what is the status of O'Reilley and marketing books to schools and colleges for use as textbooks. Our textbooks suck, and if there textbook versions of ya'lls books it would rock."
What's interesting is that some places are beginning to use their books. For instance, as a newbie to Perl, I'm taking the HWG.org class starting on Sept 20, for Beginning Programming with Perl, which uses the ORA Learning Perl book that's so popular with slashdotters. From a cursory glance through some of the other courses, there do appear to be some of ORA's excellent books used as texts. There's hope after all!
-
What about Java.
Java applications and applets can indeed be made accessible, quite easily! If you use the IBM Self-Voicing Kit (SVK) with the Sun Java Foundation Classes ("Swing"), it's pretty easy to build Java programs that interface seamlessly with assistive technology.
There is a section on Java Accessibility on the
AWARE Center
website, at http://aware.hwg.org/tips/. Enjoy!
-
What about Java.
Java applications and applets can indeed be made accessible, quite easily! If you use the IBM Self-Voicing Kit (SVK) with the Sun Java Foundation Classes ("Swing"), it's pretty easy to build Java programs that interface seamlessly with assistive technology.
There is a section on Java Accessibility on the
AWARE Center
website, at http://aware.hwg.org/tips/. Enjoy!
-
Accessibility of Web Pages
There's a bit of a problem in the way that
articles on this topic have been written --
reporters glossing over the facts in favor of
a more sensational headline, and of course that
makes it harder for the average person to
understand what's going on here.
One thing to keep in mind here is that this is
primarily a story about the federal government
deciding to mandate accessible web authoring
practices on their own pages. In one sense,
this is no different from any other large
company deciding that they will follow a certain
standard level of HTML coding on their own
websites.
In a broader sense, however, it's vitally
important that information that the government
provides can be used by everyone, and not
necessarily exclude one type of person, especially
not on basis of a disability. This is why
public buildings are wheelchair accessible
and why braille versions of documents are
made available. As required by the ADA, if you
are going to make something available to sighted
people, you also need to make it available to
people who can't see, for example.
Now, the good thing is that the proper use of
HTML (and other web technologies) actually makes
it trivially EASY to provide disabled people with
the same access to information that non-disabled
folks enjoy. The web is a very egalitarian,
platform-independent medium, better than any
we've ever had before on the planet, and if you
make your web page well, nobody should have any
problem with accessing it.
Of course, there's the rub -- the vast majority
of web pages aren't made "well", and I mean that
from a technical, HTML-pedant standpoint. The
biggest "sin" is a lack of alternative text
(ALT attributes) on image-heavy sites, and that
alone makes it very hard for people with
disabilities to use many web sites.
Now, the solution here is NOT to throw away
graphics-heavy, table-laden, multimedia
extravaganza websites. The specifications that
make the web work were designed specifically to
allow for new advances of technology while still
maintaining usability in older browsers. Adding
ALT text and other features that benefit various
users (such as disabled folks, people with older
browers, and people with the newest tech such as
web-enabled phones, pagers, or PDAs) is simple
and painless, and does not mean you have to give
up your lovely design!
So why don't people do it? Why aren't they using
HTML to its fullest and creating pages that aren't
exclusionary? It's primarily a case of awareness.
Most web designers aren't aware of the problems
nor are they aware of how easily those can be
solved. It's because of that lack of awareness
that the HTML Writers Guild created the AWARE
Center.
The AWARE Center is a special project of the
non-profit HTML Writers Guild, and the letters
stand for Accessible Web Authoring Resources and
Education. The goal of the AWARE Center is to
promote a better understanding among web authors
of the need for accessible web design and the ways
in which this can be accomplished.
You can find out more about accessible web
authoring at the AWARE Center homepage:
http://aware.hwg.org/
The site is a resource for the community and is
open to anyone, HWG member or not. If you have
any questions, you can send me email at
aware@hwg.org.
--Kynn Bartlett
Director, AWARE Center
HTML Writers Guild -
Accessibility of Web Pages
There's a bit of a problem in the way that
articles on this topic have been written --
reporters glossing over the facts in favor of
a more sensational headline, and of course that
makes it harder for the average person to
understand what's going on here.
One thing to keep in mind here is that this is
primarily a story about the federal government
deciding to mandate accessible web authoring
practices on their own pages. In one sense,
this is no different from any other large
company deciding that they will follow a certain
standard level of HTML coding on their own
websites.
In a broader sense, however, it's vitally
important that information that the government
provides can be used by everyone, and not
necessarily exclude one type of person, especially
not on basis of a disability. This is why
public buildings are wheelchair accessible
and why braille versions of documents are
made available. As required by the ADA, if you
are going to make something available to sighted
people, you also need to make it available to
people who can't see, for example.
Now, the good thing is that the proper use of
HTML (and other web technologies) actually makes
it trivially EASY to provide disabled people with
the same access to information that non-disabled
folks enjoy. The web is a very egalitarian,
platform-independent medium, better than any
we've ever had before on the planet, and if you
make your web page well, nobody should have any
problem with accessing it.
Of course, there's the rub -- the vast majority
of web pages aren't made "well", and I mean that
from a technical, HTML-pedant standpoint. The
biggest "sin" is a lack of alternative text
(ALT attributes) on image-heavy sites, and that
alone makes it very hard for people with
disabilities to use many web sites.
Now, the solution here is NOT to throw away
graphics-heavy, table-laden, multimedia
extravaganza websites. The specifications that
make the web work were designed specifically to
allow for new advances of technology while still
maintaining usability in older browsers. Adding
ALT text and other features that benefit various
users (such as disabled folks, people with older
browers, and people with the newest tech such as
web-enabled phones, pagers, or PDAs) is simple
and painless, and does not mean you have to give
up your lovely design!
So why don't people do it? Why aren't they using
HTML to its fullest and creating pages that aren't
exclusionary? It's primarily a case of awareness.
Most web designers aren't aware of the problems
nor are they aware of how easily those can be
solved. It's because of that lack of awareness
that the HTML Writers Guild created the AWARE
Center.
The AWARE Center is a special project of the
non-profit HTML Writers Guild, and the letters
stand for Accessible Web Authoring Resources and
Education. The goal of the AWARE Center is to
promote a better understanding among web authors
of the need for accessible web design and the ways
in which this can be accomplished.
You can find out more about accessible web
authoring at the AWARE Center homepage:
http://aware.hwg.org/
The site is a resource for the community and is
open to anyone, HWG member or not. If you have
any questions, you can send me email at
aware@hwg.org.
--Kynn Bartlett
Director, AWARE Center
HTML Writers Guild