Dir is a sensible abbreviation of a word that you’d normally use to describe a list of names? The word you used yourself is "list". However, the function of ls is to list information about one or more files, not to print a directory. For example, ls myfile, prints information about "myfile" regardless of whether it's a filr or directory. So, there's a reason for the name too. The original main file commands, like ls, chdir, dsw, pwd, mkdir, had a pattern to them: the first two consonants for a single word (ls), with an abbreviated second word appended if appropriate, such as mkdir for "make directory", chdir for "change directory", and abbreviations for others, such as dsw (delete from switches, long replaced by rm for remove). Over time some of the commands were shortened further, so that cdir became cd, using the first letters of "change directory." A lot of this need for terseness was because the original Unix interface involved teletype machines, a sort of computer-driven steampunk typewriter, and they were erlatively slow and painful, especially over 110 baud dialup lines (yes, 110, not 110K).
Having said that, by 6th Edition Unix there was a short "quick reference" document that listed the most common commands, and something like that is clearly needed on Linux, to help people find the most important tools. It doesn't matter how well they're named if you can't find them!
I run a Website for images (mostly) and text scanned from old books. When Google books started I thought at first I could just give up, but it turns out that the quality is so low for Google books that http://www.fromoldbooks.org/ and other sites like it continue to perform a valuable service.
I have had to spend a lot of time researching copyright law. I started out believing wikipedia, hah! And there are tons of Web sites with myths about copyright, e.g. that anything published before 1923 anywhere in the world is out of copyright in the US. Did you know that the UK copyright act has an exception specifically for works created in a hovercraft? Or that anonymous works have different copyright terms than ones that are credited, but e.g. if the name of a photographer becomes known (or knowable through any public means) after publication, it gets the longer term? And there's no central registry.
We're all getting screwed out of our heritage when a private corporation can control the world's library. To stop this, copyright law must be made simpler, and there must be online searchable registries. Copyright must eventually be harmonized between all countries, since digital information knows no borders. But it must be harmonized in such a way that some currently cpoyrighted works fall out of copyright, and as few as possible works that are out of copyright are placed back into cpoyright. The difficulty is that in corrupt regimes like the US, companies can pay politicians for their election campaigns, and hence special interests predominate politics. And I have idea how to end that corruption, of course.
One problem with all this that I see is that the quality seems awfully spotty, but usually pretty low. Which means one day it'll have to be done again. Google's main goal is to have as much content as possible, so as to drive advertising revenue. Here's a short extract from a Google book (try it, look through the text of them), showing how well Google is preserving our shared heritage: [[ SAAVEDRA-FAXARDO (OfBooiBa), a SpMishfipolU
ileal and moral writer,' was bora* May 6^ l58e^>atiAlgaMMm
iittbe kiDgddm of 'MiUQia, i^ndslttdiedal^SslanMieoai' lA
1606, he'went toRome tei^sect^taiy to tiie>icahliAA<tes-
par de Bdfgia,-who was appointed Spaaisbaaabaaiadoi^i^e
the pope,-^ and assisted inahe^xionBtav^'of tl6ai.atidi1i6(l8$
Held for the eleeHoti of Ae popett Gregovy JiV. awd^Uas
baaVfn. For these aervices Saaaedra^wtts rewarded wiril
a^ eanorrry in the* ehifrdi of St:>Jatnes, attbeegb be kmi
ifever tikei^. prieitVoniera. iBem^time aiier Ae was #pu
The quality of scans from Google Books seems very low to me; much lower than I'd use on my own Web site, http://www.fromoldbooks.org/ - it's not uncommon for pages to be missed, and in one 19th century mechanics textbook I was looking at, the low scan resolution meant that most of the line drawings and diagrams vanished entirely.
It's obvious to me that the Google work will need to be done again by people who care about the content. Note, by the way, that most flat-bed scanners destroy the binding of books, although some people are now using e.g. a Canon 5D full-frame-sensor SLR camera instead. For illustrations, some simple mathematics (and experience) shows 600dpi to be an absolute minimum for a scan of an engraving, with fine steel engravings needing at least 1200dpi (I used 2400) in order to prevent the lines from being aliased into a blotchy gray. This is much higher than the Canon SLR gives, but Betterlight have a 500 megapixel backend to a medium-format professional camera that would give enough resolution for a good digital fac simile, and e.g. the University of Wisconsin uses that sort of equipment. But it's much slower and hence more costly, and the files are huge.
Here's a fragment of text from a Google book I've been working with:
ALLEN ^Anthony), an English lawyer and antiquaiy,
was born at Great Hadbam in Hertfordshire, about the end
of the seventeenth century, and was edu<?^ted at ffton;
whence he went to King's college, Cambridge, and took
his bachelor's degree in 1707, and his master's in 1711.
He afterwards studied law, was ciiJI^d.to' the bar, and bjr
the influence of Arthur OnsloW^ speaker of the house of
commons, became a roaster in chancery. His reputation
as a lawyer was inconsiderable, Jbiut he was Esteemed a good
classical scholar, and a man of Wit: and -convivial habits.
The version I have at http://words.fromoldbooks.org/Chalmers-Biography/a/allen-anthony.html (I am still working on these) is based on scanning done at the University of Toronto, combined with four other digitisations, including two apparently independent ones by Google, both of the quality demonstrated here.
It might turn out that it would have been less work to have scanned this 32-volume encyclopedia myself (I have a copy) and so the OCR with commercial software that works 1,000 times better than Google's, but, for reprints, the important thing is the quality of the scanned images, not the OCR - and there too, the Google scans are really sucky.
A teletype such as the popular ASR-33 was a printing device, and had no difficulty with descenders. It worked a bit like a golf-ball typewriter: the letters were embossed (in reverse) on a cylinder, which moved into position and then struck an inked ribbon which then hit the paper. The embossed letters came from an analog process, and could be any shape. Parentheses, for example, and the comma, typially went below the baseline.
They could go at about 10 characters per second, for what it's worth, and I still remember the noise they made:)
A knowledge of programming in general, an understanding of algorithms, of complexity and basic security issues, will put you ahead of a lot of "consultant programmers" I have met.
Recruiters generally think C and C++ are the same thing, and so do HR departments.
If you enjoy what you do, and are good at it, you'll get better and be an asset to any slave-farm^H^H^H^H^Hcorporation.
Contribute to an open source project or two, perhaps.
Having said that, what university is teaching computer science students only three languages, and all procedural? You should know at least one fuctional, non-procedural language, e.g. pure scheme, ocaml, xslt or xquery, prolog, would all be candidates, even if your university forgot to teach lambda calculus:D
The value of a declarative language isn't that you will get a job programming in LISP (although you might) but rathar that it gives you a different toolset, a different way to think about problems that turns out to be useful in a lot of other places.
Many people include the years they were at University as part of their experience in various programming languages.
For example, if in your first year you learned OCAML, and that's 5 years ago, you have been programming in OCAML since 2003.
This means that any good interviewer will need to narrow down exactly what your relevant skills are, of course, so don't exaggerate and never ever tell an outright lie on your CV. In some (usually large) companies (and in some countries more than others) that can be grounds for dismissal without notice even 15 years later. But do remember you can include open source work and your university courses - they are every bit as relevant as the experience of someone who did not go to college but spent a couple of weeks reading "Lean Perl in 21 days" and then decided to re-write the aircraft navigation system;-)
In my spare time I scan old books, put the pictures online, and sometimes also make XML transcriptions, e.g. of the dictionaries of thieving slang.
I tend to use technologies from work too, but for me that makes work more interesting and more relevant to my life at the same time as making the spare time project move forward.
The site makes money from ads (a little) and I sell the high-res images on stock sites (although I also give them away free on request, or for the cost of shipping and so forth).
I hope the rights to digitization are non-exclusive.
For one thing, Google books is so appallingly badly done that it can't be used even for OCR: it's rare to have a whole book without a few missing pages, folded pages, badly under or ever-exposed pages etc.
For another, the resolution is too low. For the posted newspaper spread, look at the ad near the bottom left, Today at your neighborhood theater, and notice that you can't really read it. Given that this was a famous story chosen for a major press release, you'd think they'd take care. If this is their best, we can expect most other issues to get much worse treatment. Pages 18 and 19 are sideways, was this a mistake? Seems so.
If they are doing this for posterity they need to do it well. It might never be done again.
What thought have they given to copyrights for adverts, and to privacy... if you ever posted a classified ad your 'phone number is about to be made public; if you were ever wrongly accused of a crime, it may re-surface... this is different than news posted on the Web today, because no-one thought of newspapers as easily-accessible public archives in the same way as they do for Web pages. Old newspapers were often archived at libraries, sometimes microfilmed, but not readily available.
There should also be a mechanism, where possible, for people to make transcriptions (distributed proofreaders comes to mind as a possible model) so that the newspapers can be indexed and made accessible to people who are not sighted, or who can't read 6-point type:D or have low bandwidth.
So it's an interesting start, but no, please don't do this without some hard thinking by someone who isn't just a marketing executive. There's a whole lot of things that don't appear to have been considered and should be considered on such a project.
Once they are considered, there's a lot of fabulous stuff to be uncovered (which is why I have a Web site for scanned images and texts, of course, but at least I scan at as close to archival standards as possible, as such standards evolve).
That's without subsidy. The Canadian government used to have a programme called the One Tonne Challenge, through which you could get $3,000 or so back for changing your heating system or otherwise reducing your carbon footprint, but the Conservatives ended it immediately when they came into power some 3 years ago. There may be some Ontario tax rebate programmes, but they vary too often for me to give a good answer I'm afraid. The Canadian government Web sites on energy tend to be pretty good.
If you live in, say, Texas, you'll get a lot more sunlight; if you live in Moose Factory Ontario on James Bay, or (at about the same level) the Northern part of Cornwall in the UK, you'll get even less sunlight: the days are longer in the summer, but shorter in the winter, and the sun is always further away.
So the seven years part isn't fixed, it's a very rough guideline I got from going on a course and from reading about it and talking to people who had done it.
An off-grid solar + wind system can easily cost $60,000, but if, like us, you live in a rural area it could quite literally be a life saver. Spending $60,000 with no idea how it would affect your costs would be like gambling all your money away on the stock market.
You need to budget for long-term maintenance. And to do that you need to know about reliability of the equipment, and about how long before it's paid for itself, to work out the average costs including replacement parts.
Currently you can expect a home solar panel installation to pay for itself within 7 years (here in southern Ontario). If you combine it with wind turbines you can get your money back sooner, and if you spend the extra to be able to sell electricity back to the grid, you can get a payback much sooner because Ontario hydro (the power company here) pays you more than it would charge for the electricity (no distribution fee).
Ideally you want the installation to last for 10 years or more without significant failures, though.
Often "thinner and cheaper" translates to "more easily broken" and "less reliable" - for example, when the units flex in high winds. So my main worry would be about the expected (and achievable) lifetime of the units. Maybe if they gave a five or ten year warranty I'd be OK with it.
When I lived in the Boston area, I tried to get Internet service with Verizon. They took ages, so I contacted their support.
Eventually they got back to me and said that they were unable to sell me service because of "a problem with your name".
I never found out the exact nature of the problem. By best guess was that I have two middle initials, and they are both on my credit card, but their application form only allowed me to enter one middle initial. But I don't know for sure.
I was able to get at least five free wifi signals from the couch in my room, and for anything using significant bandwidth I'd walk to my office at MIT, so in the end I went without.
We [the XSL Working Group at W3C] are working on improving the specification of XSL-FO to ease some of these limitations.
Note, by the way, that TeX's line-breaking algorithm, although it uses dynamic programming to "optimize" a numeric value, is nowhere near as good as the line-breaking algorithms in some of the high-end batch formatting programs.
I run fromoldbooks.org, a Web site devoted to scanned pictures and text from old books -- some more than 500 years old.
I use an Epson Expression 1000XL flatbed scanner (A3+ resolution, approx 12x17.5" with colour calibration), Linux xsane and gimp, for most of the images, but this does involve damaging the binding of thicker books. I scan wood engravings usually at 2400dpi, but modern screened pictures at only 1200dpi or sometimes even lower. The idea that you only need to scan at twice your print resolution assumes (1) you know what printer you'll use 10 years from now, (2) that once you scale down by more than 50% there's no visible difference (false). For colour you will need to do some descreening, which will generally involve something like an 11 to 17 pixel radius gaussian blur followed by a sharpen.
I also use a Canon 450D (Digital Rebel) camera on a tripod, with a 50mm f/1.8 lens (you can get the lens for around $75 to $100 in US or Canada, less if used) and a remote control; use the mirror lockup function of the camera and the remote to minimise camera shake. I point the camera at the open book.
In either case if there are significant amounts of text I then use Abby FineReader OCR; the open source OCR programs (and most of the other commercial programs) are a waste of time by comparison, or at least that was true 2 years ago when I was last researching this.
Go and buy a couple of large USB external disk drives, e.g. 500GBytes or more, and also write DVD backups frequently. Use a consistent naming scheme; I use a separate directory (folder) for each book or magazine, and I include the page number in the filename, together with -raw for the origial scan and -cleaned for the processed version. I use PNG to save the files because it's lossless, an open standard, and widely supported; I'd suggest avoiding GIF (not enough colours), TIFF (portability problems) or JPEG (lossy).
Obviously if you want to put the magazines on the Web you'll need permission; in my case I am usually digitising out-of-copyright books, although copyright laws have changed since I started, and also my understanding of copyright has changed. E.g I started out believing Wkipedia:-)
It can be a big project, but a lot of fun!
Re:"How will you use XML in years to come?"
on
The Future of XML
·
· Score: 1
For what it's worth, XPath is not a query language for the DOM, and is unrelated to the DOM: they were developed entirely separately, with different constraints, and in fact there is not a perfect match.
XPath is an abstract addressing language for XML documents; XPath 2.0 makes it clear that you can use XPath (and XQuery and XSLT for that matter) over a wide range of things, as long as they can conform sufficiently well to a particular data model (XDM). there are implementations that let you use XPath over relational databases, with neither DOM nor in fact SQL under the covers.
Actually most of the major relational databases now have efficient (non-relational) representations for XML and have implemented XQuery natively (on the underlying storage model not just via SQL). And it's perfectly fine for performance on graph queries. There are also XML-native databases with excellent performance.
You're right that a "traditional" shredding approach to putting SGML or XML into a relational database was a performance nightmare. Times have changed.
Having said all that, claiming that graph-specific software may perform better than more general software does makes sense; the nonsense part was the claim that you can't represent graphs in XML. You can represent them, and it can be efficient, too. Probably faster, today, than most of the RDF-specific products, but that, too, will change over time.
I see no reason why SPARQL could not be implemented in XQuery, either directly or using syntax translation.
You would need to choose an underlying data representation (in XML or at least the XDM) that could be optimised by the database technology you were using. Probably just a "node" element and a "relation" element would do it.
The point of using SPARQL (as I see it) is so that RDF people can think in terms closer to their problems (wanting to explore inferencing and logic) and so that they don't have to understand XML, which seems to frighten or confuse them:-) (see some of the muddled comments to this post).
And the point of using RDF, which I see as a sort of assembly language for the Semantic Web, is that in theory you can interchange it and combine people's RDF, although that seems to me to be useful only when the RDF collections had compatible "ontologies" (vocabularies, as XML people might say).
None of this precludes interchange of RDF using XML, and in fact the RDF specification defines a (spectacularly ugly) XML serialization of RDF. And none of it precludes using XQuery, or at the very least sharing code in an implementation.
Best of all, the Gimp is Free Software. You're guaranteed to be able to get at the source code and change the program. And to the average user, this means nothing. For the average GIMP user? I don't know. Freedom of speech doesn't mean much to most people either. Until they lose it.
Possibly nothing - GIMP has no obligation in that regard:-)
On the other hand, it is scriptable. I find it better at some things than CS2 (not tried 3) - e.g. I can be working on one image, scanning another, and saving a third, all at the same time. The grid preview for rotating I find more useful in GIMP. My primary work environment is Linux or Unix, and the Linux version of PhotoShop isn't very good:-) Whether any of those things (or many more) matter to you when you already have a tool that works for you, I can't say.
Best of all, the Gimp is Free Software. You're guaranteed to be able to get at the source code and change the program.
Gimp has layers, although not (yet) adjustment layers.
No other program will every be exactly like PhotoShop, so if you judge (as many do) other programs by how like PhotoShop they are, all other programs will fall short. The other programs may still be useful in their own right, though.
Well, I suppose I could host some 10,000 torrents, or generate them on the fly. Hard to test since my ISP uses packet shaping & filtering & NAT to stop bittorrent.
And I still have to pay for bandwidth, just less of it (no, I can't host a server at home that gets a few million hits per month).
Dir is a sensible abbreviation of a word that you’d normally use to describe a list of names? The word you used yourself is "list". However, the function of ls is to list information about one or more files, not to print a directory. For example, ls myfile, prints information about "myfile" regardless of whether it's a filr or directory. So, there's a reason for the name too. The original main file commands, like ls, chdir, dsw, pwd, mkdir, had a pattern to them: the first two consonants for a single word (ls), with an abbreviated second word appended if appropriate, such as mkdir for "make directory", chdir for "change directory", and abbreviations for others, such as dsw (delete from switches, long replaced by rm for remove). Over time some of the commands were shortened further, so that cdir became cd, using the first letters of "change directory." A lot of this need for terseness was because the original Unix interface involved teletype machines, a sort of computer-driven steampunk typewriter, and they were erlatively slow and painful, especially over 110 baud dialup lines (yes, 110, not 110K).
Having said that, by 6th Edition Unix there was a short "quick reference" document that listed the most common commands, and something like that is clearly needed on Linux, to help people find the most important tools. It doesn't matter how well they're named if you can't find them!
I run a Website for images (mostly) and text scanned from old books. When Google books started I thought at first I could just give up, but it turns out that the quality is so low for Google books that http://www.fromoldbooks.org/ and other sites like it continue to perform a valuable service.
I have had to spend a lot of time researching copyright law. I started out believing wikipedia, hah! And there are tons of Web sites with myths about copyright, e.g. that anything published before 1923 anywhere in the world is out of copyright in the US. Did you know that the UK copyright act has an exception specifically for works created in a hovercraft? Or that anonymous works have different copyright terms than ones that are credited, but e.g. if the name of a photographer becomes known (or knowable through any public means) after publication, it gets the longer term? And there's no central registry.
We're all getting screwed out of our heritage when a private corporation can control the world's library. To stop this, copyright law must be made simpler, and there must be online searchable registries. Copyright must eventually be harmonized between all countries, since digital information knows no borders. But it must be harmonized in such a way that some currently cpoyrighted works fall out of copyright, and as few as possible works that are out of copyright are placed back into cpoyright. The difficulty is that in corrupt regimes like the US, companies can pay politicians for their election campaigns, and hence special interests predominate politics. And I have idea how to end that corruption, of course.
One problem with all this that I see is that the quality seems awfully spotty, but usually pretty low. Which means one day it'll have to be done again. Google's main goal is to have as much content as possible, so as to drive advertising revenue. Here's a short extract from a Google book (try it, look through the text of them), showing how well Google is preserving our shared heritage:
[[
SAAVEDRA-FAXARDO (OfBooiBa), a SpMishfipolU
ileal and moral writer,' was bora* May 6^ l58e^>atiAlgaMMm
iittbe kiDgddm of 'MiUQia, i^ndslttdiedal^SslanMieoai' lA
1606, he'went toRome tei^sect^taiy to tiie>icahliAA<tes-
par de Bdfgia,-who was appointed Spaaisbaaabaaiadoi^i^e
the pope,-^ and assisted inahe^xionBtav^'of tl6ai.atidi1i6(l8$
Held for the eleeHoti of Ae popett Gregovy JiV. awd^Uas
baaVfn. For these aervices Saaaedra^wtts rewarded wiril
a^ eanorrry in the* ehifrdi of St:>Jatnes, attbeegb be kmi
ifever tikei^. prieitVoniera. iBem^time aiier Ae was #pu
{loiiited agent freer 4be cobrt^of vSpaia^lit^eaiey.aiidt kii
]]
Got that?
The quality of scans from Google Books seems very low to me; much lower than I'd use on my own Web site, http://www.fromoldbooks.org/ - it's not uncommon for pages to be missed, and in one 19th century mechanics textbook I was looking at, the low scan resolution meant that most of the line drawings and diagrams vanished entirely.
It's obvious to me that the Google work will need to be done again by people who care about the content. Note, by the way, that most flat-bed scanners destroy the binding of books, although some people are now using e.g. a Canon 5D full-frame-sensor SLR camera instead. For illustrations, some simple mathematics (and experience) shows 600dpi to be an absolute minimum for a scan of an engraving, with fine steel engravings needing at least 1200dpi (I used 2400) in order to prevent the lines from being aliased into a blotchy gray. This is much higher than the Canon SLR gives, but Betterlight have a 500 megapixel backend to a medium-format professional camera that would give enough resolution for a good digital fac simile, and e.g. the University of Wisconsin uses that sort of equipment. But it's much slower and hence more costly, and the files are huge.
Here's a fragment of text from a Google book I've been working with:
ALLEN ^Anthony), an English lawyer and antiquaiy,
was born at Great Hadbam in Hertfordshire, about the end
of the seventeenth century, and was edu<?^ted at ffton;
whence he went to King's college, Cambridge, and took
his bachelor's degree in 1707, and his master's in 1711.
He afterwards studied law, was ciiJI^d.to' the bar, and bjr
the influence of Arthur OnsloW^ speaker of the house of
commons, became a roaster in chancery. His reputation
as a lawyer was inconsiderable, Jbiut he was Esteemed a good
classical scholar, and a man of Wit: and -convivial habits.
The version I have at http://words.fromoldbooks.org/Chalmers-Biography/a/allen-anthony.html (I am still working on these) is based on scanning done at the University of Toronto, combined with four other digitisations, including two apparently independent ones by Google, both of the quality demonstrated here.
It might turn out that it would have been less work to have scanned this 32-volume encyclopedia myself (I have a copy) and so the OCR with commercial software that works 1,000 times better than Google's, but, for reprints, the important thing is the quality of the scanned images, not the OCR - and there too, the Google scans are really sucky.
A teletype such as the popular ASR-33 was a printing device, and had no difficulty with descenders. It worked a bit like a golf-ball typewriter: the letters were embossed (in reverse) on a cylinder, which moved into position and then struck an inked ribbon which then hit the paper. The embossed letters came from an analog process, and could be any shape. Parentheses, for example, and the comma, typially went below the baseline.
They could go at about 10 characters per second, for what it's worth, and I still remember the noise they made :)
A knowledge of programming in general, an understanding of algorithms, of complexity and basic security issues, will put you ahead of a lot of "consultant programmers" I have met.
Recruiters generally think C and C++ are the same thing, and so do HR departments.
If you enjoy what you do, and are good at it, you'll get better and be an asset to any slave-farm^H^H^H^H^Hcorporation.
Contribute to an open source project or two, perhaps.
Having said that, what university is teaching computer science students only three languages, and all procedural? You should know at least one fuctional, non-procedural language, e.g. pure scheme, ocaml, xslt or xquery, prolog, would all be candidates, even if your university forgot to teach lambda calculus :D
The value of a declarative language isn't that you will get a job programming in LISP (although you might) but rathar that it gives you a different toolset, a different way to think about problems that turns out to be useful in a lot of other places.
Many people include the years they were at University as part of their experience in various programming languages.
For example, if in your first year you learned OCAML, and that's 5 years ago, you have been programming in OCAML since 2003.
This means that any good interviewer will need to narrow down exactly what your relevant skills are, of course, so don't exaggerate and never ever tell an outright lie on your CV. In some (usually large) companies (and in some countries more than others) that can be grounds for dismissal without notice even 15 years later. But do remember you can include open source work and your university courses - they are every bit as relevant as the experience of someone who did not go to college but spent a couple of weeks reading "Lean Perl in 21 days" and then decided to re-write the aircraft navigation system ;-)
Just when I finished downloading 2008.1!
You could always try upgrading with urpmi (see other comments on this post for more details).
In my spare time I scan old books, put the pictures online, and sometimes also make XML transcriptions, e.g. of the dictionaries of thieving slang.
I tend to use technologies from work too, but for me that makes work more interesting and more relevant to my life at the same time as making the spare time project move forward.
The site makes money from ads (a little) and I sell the high-res images on stock sites (although I also give them away free on request, or for the cost of shipping and so forth).
I hope the rights to digitization are non-exclusive.
For one thing, Google books is so appallingly badly done that it can't be used even for OCR: it's rare to have a whole book without a few missing pages, folded pages, badly under or ever-exposed pages etc.
For another, the resolution is too low. For the posted newspaper spread, look at the ad near the bottom left, Today at your neighborhood theater, and notice that you can't really read it. Given that this was a famous story chosen for a major press release, you'd think they'd take care. If this is their best, we can expect most other issues to get much worse treatment. Pages 18 and 19 are sideways, was this a mistake? Seems so.
If they are doing this for posterity they need to do it well. It might never be done again.
What thought have they given to copyrights for adverts, and to privacy... if you ever posted a classified ad your 'phone number is about to be made public; if you were ever wrongly accused of a crime, it may re-surface... this is different than news posted on the Web today, because no-one thought of newspapers as easily-accessible public archives in the same way as they do for Web pages. Old newspapers were often archived at libraries, sometimes microfilmed, but not readily available.
There should also be a mechanism, where possible, for people to make transcriptions (distributed proofreaders comes to mind as a possible model) so that the newspapers can be indexed and made accessible to people who are not sighted, or who can't read 6-point type :D or have low bandwidth.
So it's an interesting start, but no, please don't do this without some hard thinking by someone who isn't just a marketing executive. There's a whole lot of things that don't appear to have been considered and should be considered on such a project.
Once they are considered, there's a lot of fabulous stuff to be uncovered (which is why I have a Web site for scanned images and texts, of course, but at least I scan at as close to archival standards as possible, as such standards evolve).
That's without subsidy. The Canadian government used to have a programme called the One Tonne Challenge, through which you could get $3,000 or so back for changing your heating system or otherwise reducing your carbon footprint, but the Conservatives ended it immediately when they came into power some 3 years ago. There may be some Ontario tax rebate programmes, but they vary too often for me to give a good answer I'm afraid. The Canadian government Web sites on energy tend to be pretty good.
If you live in, say, Texas, you'll get a lot more sunlight; if you live in Moose Factory Ontario on James Bay, or (at about the same level) the Northern part of Cornwall in the UK, you'll get even less sunlight: the days are longer in the summer, but shorter in the winter, and the sun is always further away.
So the seven years part isn't fixed, it's a very rough guideline I got from going on a course and from reading about it and talking to people who had done it.
An off-grid solar + wind system can easily cost $60,000, but if, like us, you live in a rural area it could quite literally be a life saver. Spending $60,000 with no idea how it would affect your costs would be like gambling all your money away on the stock market.
You need to budget for long-term maintenance. And to do that you need to know about reliability of the equipment, and about how long before it's paid for itself, to work out the average costs including replacement parts.
Currently you can expect a home solar panel installation to pay for itself within 7 years (here in southern Ontario). If you combine it with wind turbines you can get your money back sooner, and if you spend the extra to be able to sell electricity back to the grid, you can get a payback much sooner because Ontario hydro (the power company here) pays you more than it would charge for the electricity (no distribution fee).
Ideally you want the installation to last for 10 years or more without significant failures, though.
Often "thinner and cheaper" translates to "more easily broken" and "less reliable" - for example, when the units flex in high winds. So my main worry would be about the expected (and achievable) lifetime of the units. Maybe if they gave a five or ten year warranty I'd be OK with it.
When I lived in the Boston area, I tried to get Internet service with Verizon. They took ages, so I contacted their support.
Eventually they got back to me and said that they were unable to sell me service because of "a problem with your name".
I never found out the exact nature of the problem. By best guess was that I have two middle initials, and they are both on my credit card, but their application form only allowed me to enter one middle initial. But I don't know for sure.
I was able to get at least five free wifi signals from the couch in my room, and for anything using significant bandwidth I'd walk to my office at MIT, so in the end I went without.
We [the XSL Working Group at W3C] are working on improving the specification of XSL-FO to ease some of these limitations.
Note, by the way, that TeX's line-breaking algorithm, although it uses dynamic programming to "optimize" a numeric value, is nowhere near as good as the line-breaking algorithms in some of the high-end batch formatting programs.
Liam
I run fromoldbooks.org, a Web site devoted to scanned pictures and text from old books -- some more than 500 years old.
I use an Epson Expression 1000XL flatbed scanner (A3+ resolution, approx 12x17.5" with colour calibration), Linux xsane and gimp, for most of the images, but this does involve damaging the binding of thicker books. I scan wood engravings usually at 2400dpi, but modern screened pictures at only 1200dpi or sometimes even lower. The idea that you only need to scan at twice your print resolution assumes (1) you know what printer you'll use 10 years from now, (2) that once you scale down by more than 50% there's no visible difference (false). For colour you will need to do some descreening, which will generally involve something like an 11 to 17 pixel radius gaussian blur followed by a sharpen.
I also use a Canon 450D (Digital Rebel) camera on a tripod, with a 50mm f/1.8 lens (you can get the lens for around $75 to $100 in US or Canada, less if used) and a remote control; use the mirror lockup function of the camera and the remote to minimise camera shake. I point the camera at the open book.
In either case if there are significant amounts of text I then use Abby FineReader OCR; the open source OCR programs (and most of the other commercial programs) are a waste of time by comparison, or at least that was true 2 years ago when I was last researching this.
Go and buy a couple of large USB external disk drives, e.g. 500GBytes or more, and also write DVD backups frequently. Use a consistent naming scheme; I use a separate directory (folder) for each book or magazine, and I include the page number in the filename, together with -raw for the origial scan and -cleaned for the processed version. I use PNG to save the files because it's lossless, an open standard, and widely supported; I'd suggest avoiding GIF (not enough colours), TIFF (portability problems) or JPEG (lossy).
Obviously if you want to put the magazines on the Web you'll need permission; in my case I am usually digitising out-of-copyright books, although copyright laws have changed since I started, and also my understanding of copyright has changed. E.g I started out believing Wkipedia :-)
It can be a big project, but a lot of fun!
For what it's worth, XPath is not a query language for the DOM, and is unrelated to the DOM: they were developed entirely separately, with different constraints, and in fact there is not a perfect match.
XPath is an abstract addressing language for XML documents; XPath 2.0 makes it clear that you can use XPath (and XQuery and XSLT for that matter) over a wide range of things, as long as they can conform sufficiently well to a particular data model (XDM). there are implementations that let you use XPath over relational databases, with neither DOM nor in fact SQL under the covers.
Actually most of the major relational databases now have efficient (non-relational) representations for XML and have implemented XQuery natively (on the underlying storage model not just via SQL). And it's perfectly fine for performance on graph queries. There are also XML-native databases with excellent performance.
You're right that a "traditional" shredding approach to putting SGML or XML into a relational database was a performance nightmare. Times have changed.
Having said all that, claiming that graph-specific software may perform better than more general software does makes sense; the nonsense part was the claim that you can't represent graphs in XML. You can represent them, and it can be efficient, too. Probably faster, today, than most of the RDF-specific products, but that, too, will change over time.
I see no reason why SPARQL could not be implemented in XQuery, either directly or using syntax translation.
:-) (see some of the muddled comments to this post).
You would need to choose an underlying data representation (in XML or at least the XDM) that could be optimised by the database technology you were using. Probably just a "node" element and a "relation" element would do it.
The point of using SPARQL (as I see it) is so that RDF people can think in terms closer to their problems (wanting to explore inferencing and logic) and so that they don't have to understand XML, which seems to frighten or confuse them
And the point of using RDF, which I see as a sort of assembly language for the Semantic Web, is that in theory you can interchange it and combine people's RDF, although that seems to me to be useful only when the RDF collections had compatible "ontologies" (vocabularies, as XML people might say).
None of this precludes interchange of RDF using XML, and in fact the RDF specification defines a (spectacularly ugly) XML serialization of RDF. And none of it precludes using XQuery, or at the very least sharing code in an implementation.
Liam
You're aware, I hope, that you can represent RDF (or any other graph model) in XML, making utter nonsense of your claim?
:-)
I do agree that trees don't work well in relational databases though
16-bit in this context means 16 bits per channel. That is, 16 bits for each of C, M, Y, K and transparency. This gives you 80 bits per pixel.
Gimp already works with 8 bits per channel, for each of R G B and alpha/transparency, giving 32 bits per pixel.
Liam
Liam
Possibly nothing - GIMP has no obligation in that regard :-)
:-) Whether any of those things (or many more) matter to you when you already have a tool that works for you, I can't say.
On the other hand, it is scriptable. I find it better at some things than CS2 (not tried 3) - e.g. I can be working on one image, scanning another, and saving a third, all at the same time. The grid preview for rotating I find more useful in GIMP. My primary work environment is Linux or Unix, and the Linux version of PhotoShop isn't very good
Best of all, the Gimp is Free Software. You're guaranteed to be able to get at the source code and change the program.
Liam
Gimp has layers, although not (yet) adjustment layers.
No other program will every be exactly like PhotoShop, so if you judge (as many do) other programs by how like PhotoShop they are, all other programs will fall short. The other programs may still be useful in their own right, though.
Best,
Liam
Well, I suppose I could host some 10,000 torrents, or generate them on the fly. Hard to test since my ISP uses packet shaping & filtering & NAT to stop bittorrent.
And I still have to pay for bandwidth, just less of it (no, I can't host a server at home that gets a few million hits per month).
But it's an interesting suggestion, thanks!
Liam