Exactly. This is my problem. I don't want the pretty fonts, I want the pretty HTML. The problem is when you get to referencing things like the Journal of Economic History all over- titles are supposed to be italics, there are occasional hyperlinks I'd like to leave, headings are kind of awesome... and apart from the insane overformatting which I referenced in another post, Word does a decent job of generating footnotes/endnotes.
Now, etting this pretty stuff from Word is the tricky part.
The people we're dealing with here are not social sciences people, specifically economics. I'd be perfectly fine with taking DocBook or TeX documents- but nobody's going to send them. It's not happening. We accept Word documents because we have ALWAYS accepted Word documents and most countributors probably aren't even aware that something like TeX or Docbook even exists, let alone how to use it. And they're not willing to learn it just to send us stuff.
The problem with your suggestion is that the data isn't especially structured in nature. There are about three major inputs here: an Encyclopedia, Book Reviews, and Abstracts for various papers. While we do maintain templates for the various metadata for all these in a structured format, there's not much structure for the rest of the entries besides generic rich text formatting and the occasional table. The trouble is extracting the Useless Formatting (repeated insistence that This is Black 12-Point Times New Roman, using both font tags AND span style="") from the Useful Formatting (italics, please: this is the title of a book or journal, this is a footnote).
Don't get me startted on stupid formatting for footnotes, either:
<a name="_ftnref1" href="#_ftn1"><span class="MsoFootnoteReference"><sup><span><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><sup><font size="3" face="Times New Roman" color="black"><span style="font-size: 12pt; font-family: "Times New Roman"; color: black;">[1]</span></font></sup></span><!--[endif]- -></span></sup></span></a>
Most of the cruft I can sorta-kinda-vaguely understand (though I don't like it or agree with it) but why are there TWO <sup>s? And what's with the <span> that has no attributes?
actually, I'm quite all right. At first I was a trifle worried when I saw that my machine's load was a little high and the story relatively new, but then I realized that it was just running pisg to generate channel statistics for #wikipedia. It's a beefy server on a fast line, really; I don't anticipate any issues if I can hide way down in the comments page instead of in the fine summary...
I don't want to preserve the look. I want to destroy the look and replace it with another one. Every author of every book review or article has his or her own look. I don't want it. I want MY look, the look of all the other pages on the site. On the other hand, I can't go around destroying all the hyperlinks or italics/underlines/etc around titles or anything like that.
Not quite what I'm looking for. Maybe I should clarify: I want to remove the nonessential formatting, while keep certain niceties (in particular, italics for the names of papers they reference, hyperlinks for footnotes, etc) and convert the rest into something simple and plain with just-the-basics of HTML, so I can then style it to match the other pages on the site. Many of these documents go to collections: encyclopedia articles, book reviews, abstracts of papers. If they don't look consistant, then people do complain. (And my site has enough formatting-consistency issues as it is;)
Actually, I use emacs. But I really appreciate people who can grok vim (I can't). =D
And you've hit the nail on the head: Book reviews and encyclopedia entries and abstracts (oh my). These things aren't exactly "structured" beyond the basic metadata (title author etc).
As opposed to the ever-so-tedious process of installing, say, SquirrelMail? I mean, it's not like it comes with major operating systems like Fedora or anything...
We at Microsoft are well aware of these outstanding interoperability issues between Linux and Windows. Rest assured that we at Microsoft have made it part of our primary mission to resolve these issues: We can assure you that the next release of our operating system, Windows Vista, will not interoperate with Linux in any way, shape, or form whatsoever.
(not that you'll really hear that out of Microsoft, but...:)
It shouldn't be too hard to get your hands on a few Unix-y command line utilities like ps2pdf and ps2epsi and psmerge and the like. You might also look at 'tracing' tools such as 'potrace' which will take a bitmap file, trace the edges of shapes, and output vector graphics (in any one of several fun formats). They're limited, but useful.
Last I checked, a MOO was a MUD, Object Oriented. Most MOOs are probably based off the LambdaMOO server, which was initially developed at PARC; the original LambdaMOO is available via your favorite telnet or MOO client at lambda.moo.mud.org port 8888.
However, I would find such a system to be extremely unsuitable as a general-purpose database.
The trinary operator by itself with simple variables on all sides is just fine. The problems only really start when you try to nest stuff with it. Particularly other ternary operators.
A note that some IRC networks (well, Freenode) automatically detect Tor connections and assign them a hostmask of the form whateverwhatever.tor, and it's easy enough to ban or ignore *.tor from there.
Watch me justify in five letters: speed. I don't have to reach for the mouse. I don't have to accurately position the mouse pointer, or wait for the windows to do all their ultra-shiny tricks. Ctrl-C, alt-TabTabTab, Ctrl-V, and then I can be typing again in a minute. I'll race you if you want.
Speed is, to me, the ultimate end of usability, and encompasses other aspects as well. (I mean, if it's complicated or confusing and hard to learn, you won't be fast when you're using it for quite a while, eh?)
The term in English for "cosmic speed" is, I believe, "escape velocity" - the speed required to escape from the Earth's gravity and go off into the cosmos, I suppose. The alternate term, however, is fascinating - what language is it from?
No, it's just that Jabber.org provides some of the server processes which you want to run for free. You can also run these processes on your own server, if you really want to (of course, having a real domain name is a help here). But this is why Jabber IDs are stuff like foo@bar.org instead of just 'foo'.
I'm running on a quantum computer right now, and I've not experienced and problems with any instacpqHeIkHBciBhAw 1uU6T1EK22qB9BBhokmNK6Ddv8CzpsgSEm HWn0CQEzPkDZJijN66jc/yy9Z3DBPguo1IqgWpSPMnqXAz4c8W f+2AVHipQWAsqw7QMZ7RO5k6Rr03cSM8d3uM+KdRTBV/q
Bulgaria is not funny. Bulgaria is dead serious. Vulgaria might be amusing, mind you, and Romania "just makes sense". The densely-forested-small-European-nation is quite cliche.
Go take a look at the nearest convenient copy of Much Ado About Nothing, particularly those with some analysis/background/fu in them, and read a little about cuckoldry. It's an archaic term in English these days, but....
Now, etting this pretty stuff from Word is the tricky part.
The people we're dealing with here are not social sciences people, specifically economics. I'd be perfectly fine with taking DocBook or TeX documents- but nobody's going to send them. It's not happening. We accept Word documents because we have ALWAYS accepted Word documents and most countributors probably aren't even aware that something like TeX or Docbook even exists, let alone how to use it. And they're not willing to learn it just to send us stuff.
It's called XHTML. Maybe you've heard of it.
I agree with the rest though...
Don't get me startted on stupid formatting for footnotes, either:
Most of the cruft I can sorta-kinda-vaguely understand (though I don't like it or agree with it) but why are there TWO <sup>s? And what's with the <span> that has no attributes?
actually, I'm quite all right. At first I was a trifle worried when I saw that my machine's load was a little high and the story relatively new, but then I realized that it was just running pisg to generate channel statistics for #wikipedia. It's a beefy server on a fast line, really; I don't anticipate any issues if I can hide way down in the comments page instead of in the fine summary...
I don't want to preserve the look. I want to destroy the look and replace it with another one. Every author of every book review or article has his or her own look. I don't want it. I want MY look, the look of all the other pages on the site. On the other hand, I can't go around destroying all the hyperlinks or italics/underlines/etc around titles or anything like that.
Not quite what I'm looking for. Maybe I should clarify: I want to remove the nonessential formatting, while keep certain niceties (in particular, italics for the names of papers they reference, hyperlinks for footnotes, etc) and convert the rest into something simple and plain with just-the-basics of HTML, so I can then style it to match the other pages on the site. Many of these documents go to collections: encyclopedia articles, book reviews, abstracts of papers. If they don't look consistant, then people do complain. (And my site has enough formatting-consistency issues as it is ;)
And you've hit the nail on the head: Book reviews and encyclopedia entries and abstracts (oh my). These things aren't exactly "structured" beyond the basic metadata (title author etc).
My SSH connection to my server still lives; I think my task was accomplished well enough. :)
As opposed to the ever-so-tedious process of installing, say, SquirrelMail? I mean, it's not like it comes with major operating systems like Fedora or anything...
Time for the big Tux Racer tournament.
(not that you'll really hear that out of Microsoft, but... :)
It shouldn't be too hard to get your hands on a few Unix-y command line utilities like ps2pdf and ps2epsi and psmerge and the like. You might also look at 'tracing' tools such as 'potrace' which will take a bitmap file, trace the edges of shapes, and output vector graphics (in any one of several fun formats). They're limited, but useful.
However, I would find such a system to be extremely unsuitable as a general-purpose database.
:)
On that note, 2003 EL61 has its own article already.
The trinary operator by itself with simple variables on all sides is just fine. The problems only really start when you try to nest stuff with it. Particularly other ternary operators.
A note that some IRC networks (well, Freenode) automatically detect Tor connections and assign them a hostmask of the form whateverwhatever.tor, and it's easy enough to ban or ignore *.tor from there.
At a guess, to cut power consumption? That's typically the reason. If you don't really plan to use those cycles, after all...
Speed is, to me, the ultimate end of usability, and encompasses other aspects as well. (I mean, if it's complicated or confusing and hard to learn, you won't be fast when you're using it for quite a while, eh?)
The term in English for "cosmic speed" is, I believe, "escape velocity" - the speed required to escape from the Earth's gravity and go off into the cosmos, I suppose. The alternate term, however, is fascinating - what language is it from?
No, it's just that Jabber.org provides some of the server processes which you want to run for free. You can also run these processes on your own server, if you really want to (of course, having a real domain name is a help here). But this is why Jabber IDs are stuff like foo@bar.org instead of just 'foo'.
++ATH
NO CARRIER
The real funny part is Missouri.
Watch out for the mansquitoes.
Go take a look at the nearest convenient copy of Much Ado About Nothing, particularly those with some analysis/background/fu in them, and read a little about cuckoldry. It's an archaic term in English these days, but....