Google To Digitize Much of Harvard's Library
FJCsar writes "According to an e-mail sent today to Harvard students, Google will collaborate with Harvard's libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University's extensive library system, which is second only to the Library of Congress in the number of volumes it contains. Google will provide online access to the full text of those works that are in the public domain. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. As of 9 am on December 14, a FAQ detailing the Harvard pilot program with Google will be available at hul.harvard.edu."
to never leave my apartment.
because its time to dive into the deep web. Projects like this are the key to unlocking the vast stores of important which are currently not readiy accessed online. Personally I'd like to see a Google-run free access Lexis-Nexus project.
Just how much storage space will all this data consume? It seems like a massive undertaking.
I am ambivalent about this. Will the books be stored as text to enable searching? If so, given that part of a book's character is its font and typesetting, will ALL the flavor of these books really be captured, in the same way that it would be to read them? Something seems likely to be "lost in translation" here.
Currently hooked on AMP
Ok, so this is just a bit of devil's avocate, but what happens if you just *happen* to have a writing style similar to someone else who was printed before... what if you read something, and unknowingly wrote something in a similar vein in your essay? I assume you could check it yourself, but then that would just introduce extra cost to even write the essay in the first place... or worse, the plagiarists could just "tweak" their papers ensuring that they're "below the radar" by changing enough style to not be recognizeable...
Make sure everyone's vote counts: Verified Voting
Sure, the company needs to get some money to cover the costs of printing, distribution, and other things, plus the associations that sponsor the journal want some money to help hold conferences, but why, oh why, must they price journals so expensively that many colleges can't even afford them?
Sleep is for the weak!
DP probably isn't threatened either - they just shift focus to books that are not in the Harvard collection to avoid duplication of effort.
Are they really going to provide proofread texts? A novel might only take a couple hours to process, but math is going to take hand markup, and some of the more complex critical editions are a bear. Even at only 2 hours a book (and that's not including scanning time), 4 million volumes adds up to 8 million man-hours or a million man-days. At seven bucks an hour that's 56 million dollars. I expect we'll get scans and OCR, but no hand work; there will still be a place for DP. In fact, we'll be better off, with a huge source of scans to work from.
The uncorrected OCR is very useful for indexing (by Google or others), as the 5% or fewer typos are not enough to interfere with indexing keywords. Uncorrected OCR can also be corrected later.
The page images are tied with the uncorrected OCR so you can see exactly what's there.
For an example, see books at University of Michigan's Making of America (MoA) Exhibit, which has thousands of 19th century books and periodicals available.
Only public-domain books will be scanned. In all or most cases the author's are dead. However, this will revive a great body of work and widen access to many.
One class of author may be pissed will be authors who take older works and just slap a foreword or introduction to the front and collect royalties. I've seen this done for many histories. But author's of todays works can count on royalties for themselves, their children, and their grandchildren (if the book is still selling). The copyright term is too long in the U.S., but that's another story . . .
The professor can just wait until the match comes up, and then double-check at that point.
You'd want to do a thorough overview of any potential instance of cheating anyway. A quick run-through would determine whether or not a paper happened to contain an identical sentence clause or three identical paragraphs.
I think the bigger problem would be the second one you described -- that students could plagiarize and then go through each paragraph, changing the wording slightly so as to avoid positive matches. Still, you could argue that this is pretty much what academics is anyway, just with footnotes and a bibliography.
--------
Bleah! Heh heh heh... BLEAH BLEAH!!! Ha ha ha ha...
In addition to the other reply to this, there might also be the case of journals which are published by proffesional organisations being used to defray the cost of running such organisation. You'll also find individual subscription prices being much cheaper than institutional subscription prices, I'd posit a guess that the institutional subscription holders are in some part subsidising the individual subscription holders.
Not all conservatives are stupid,
but it is true that most stupid people are conservative.
- Hume
Good quality search engines have lots of qualities that Google lacks.
One solution is to use google to locate a superset of the target articles and then use a more powerful search engine to winnow the google result set. For an individual, this approach would mean maintaining a personal index of the articles but that is a problem of storage space and bandwidth which is relatively cheap.
The two main problems that google solves is
One could imagine a plugin for browsers that would add the additional search facilities to a google search. Until then, Google Hacks will get you started.
I wasn't saying that the prestige of the journal had anything to do with the medium, but that there is a lot of name recognition.
JSTOR varies in quality from journal to journal--some are actually okay, while others suck. I know that I have gotten pdf's from JSTOR, but I wonder if that is a function of JSTOR or the amount that a person/institution is paying for access.
Most journals that I have dealt with online where I had to pay (because the university wasn't a subscriber) wanted between $15 and $25 for a single article. This is a LOT of money, and sometimes (if you aren't in a hurry), it is easier to contact the author and ask for a reprint--they usually have them, and if they are like many researchers, they are glad to send you a copy, provided you explain what you are doing.
There is a trick to it--the current prestigious journals ARE NOT going to go to a low/no cost format for publishing online until there are one or two major competitors who are seen as valid (peer-review) and prestigious. The prestige factor is huge and rests largely on (as you mention) the peer review process AND who is publishing in the journal. Sorry, but Robert Sternberg doesn't generally publish in just any old journal--he has one or two that he will send a manuscript to, and go from there.
When my thesis advisor (who wrote two chapters for the Handbook of Research Methods in Industrial Psychology) publishes, he typically sends stuff first to the Journal of Occupational Behavior, not DarkSarin's Online Journal of Amateur Psychology or Commoderesloat's Journal of Human Weirdness. Why? Because no one has EVER heard of those journals, and if puts that on his vita, it won't make any difference to the next folks wanting to hire him for his research ability (not that he's going anywhere--he's a full professor).
But when the next university sees that he has published 10 articles in the Journal of Occupational Behavior (JOB), they say, "Hey, this guy is getting published in one of the top 10 journals in Behavioral Psychology, he's probably pretty good!" They will then probably hire him.
But when that same university interviews me, and I put down that I published 123 articles in DarkSarin's Journal of Computer Gaming Psychology, they are going say, "Wow, I've never heard of that journal--is it peer reviewed? Is it attached to a professional association (APA, MPA, SIOP, etc)? Has anybody here heard of it? Does anyone who's any good publish in that journal?" If you are REALLY lucky, they MIGHT take the time to look up the answers, but chances are slim if the position is getting very many applicants (and if it isn't, it probably isn't paying very well!).
The long and the short of it is that there is little, if any, financial pressure to offer content online for free, and that is unlikely to change without competition. There is unlikely to be much competition, because few young researchers are going to put their career on the line by publishing in any but the most prestigious journals that they can possibly get an article into. Older researchers are already in the habit of sending articles to certain journals, and so they aren't likely to change either.
There isn't a good, quick, easy solution to this, and anyone who says that there is needs to have their head checked. Sorry.
"We don't know what we are doing, but we are doing it very carefully,..." Wherry, R.J. Personnel Psychology (1995)