Turing Award Winner On The Future of Storage
weileong writes "Ars Technica highlights an interview at ACM Queue with Jim Gray, a winner of the ACM Turing award *(among other things) by one of the pioneers of RAID (among other things). Many issues touched upon, including: "programmers have to start thinking of the disk as a sequential device rather than a random access device." "So disks are not random access any more?" "That's one of the things that more or less everybody is gravitating toward. The idea of a log-structured file system is much more attractive. There are many other architectural changes that we'll have to consider in disks with huge capacity and limited bandwidth."
Actual interview has MUCH detail, definitely worth reading."
dupe dupe dupe
...does anybody else think this sounds familar?
I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?
Mod me down with all of your hatred and your journey towards the dark side will be complete!
I think we'd all be better off when solid state, non-mechanical disks become commonplace.
Is there any reason other than cost why we can't have 100Gb solid-state drives yet?
Get your own free personal location tracker
you may have seen this somewhere....
This week: You can make a trade-off between latency and throughput!
Next week: Cars that can haul less can be more fuel-effiecent!
The week after: Algorithms that use more memory, but are faster to execute!
If I look at the trends of the last decades, while disk sizes increase exponentially, the actual number of top-level objects I store on my systems increases only linearly, and quite slowly. True, I still store individual documents, but I also store AVIs, ISOs, entire photo albums that take gigabytes each.
It's still random access: I can choose and access an object, even individual photos, without scanning through large amounts of unwanted data.
Ceci n'est pas une signature
I love his commenta about mailing disks to Europe and Asia..
.-D
The biggest problem I have mailing disks is customs. If you mail a disk to Europe or Asia, you have to pay customs, which about doubles the shipping cost and introduces delays.
Thereby adding a corrolary to the old adage "Never underestimate the bandwidth of a vanload of tapes barrelling down the highway"...
"Never underestimate the bottleneck caused by a far-Eastern customs inspector."
A little planning goes a long way...
...does anybody else think this sounds familar?
I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?
Thanks to my well-developed powers of telepathy, I can tell you that you have read a previous article on the topic by the same author. So I'm happy to confirm that for you.
I can also tell you, thanks to my equally well-honed powers of clairvoyance, that this post will soon be modded up as funny.
(Sheesh. And I thought that some recent "Ask Slashdot" questions were dumb.)
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
When I was really young, I tried to use a tape drive as a sequential hard-disk drive. I figured since the tape drive was sequential access it would work... let's just say it didn't go real fast. I tried to run an EXE from it, on probably a 386 mind you, and yeah. The laundry got done before the EXE ran! (SIGH) Was I *ever* that young?
Still, I am glad to see that the paradigm is now realizeable.
stuff |
Check out Jim Grey's info page on Microsoft Research He's done research on many diverse and interesting technologies such as distributed computing and sequential I/O performance. There are some nifty sites he has taken part in creating, such as a browsable photo of Earth, and a map of the Universe
they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have
Is this really true? Wasn't there a recent Slashdot story where researchers transfered a gigabyte of data, in fourteen seconds or so, on Internet 2 from California to the Netherlands?
I suppose that disk access times will be limiting factor in both ends if you were to read and write the data from/to a disk.
How small a thought it takes to fill a whole life
Frankly the interview was painful every time Dave Patterson said something. How many times does he have to ask questions about the concept of mailing a computer? "We mail computers because transferring over the Internet is too slow for these massive data transfers." "Are they computers?" "Yes." "Do you mail them?" "Yes." "It's like a movie." "Uhh ok." "Is it a whole computer that you mail?" "Yes, it is a computer full of hard drives." "Why don't you just use the Internet?" "Because it is too slow."
...
We have a dozen doing TeraServer work; we have about eight in our lab for video archives, backups, and so on.
..., uhhm..., video archives."
That's a good excuse to use on my wife: "No honey, those are my
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
Jim Gray? Why the fuck should I believe anything from a man who demonized future Hall of Famer Pete Rose?
And what does Jim Gray know about storage? He's a sports commentator, and a terrible one at that.
I'm not Seth Finkelstein. I still speak the truth.
According to research at an English university, many people store data on magnetic disks.
Does that mean he managed to convince someone he was a computer?
Fatal error: Call to undefined function: message_die() in db/db.php on line 88
My prof talked about this in my networking class. Apparantly they tweaked the hell out of the data link layer to do this, so it was not a generic data transfer at all.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
I've seen this a couple times before, but Google seems to come up with nothing useful for it. It doesn't help that every crappy musician who has made a tape sells it out of their crappy van or that so many scientist have the old prussion "van der something" in their names. But perhaps it's crappy musicicans and these van der scientists who really control the highspeed data transfer.
I started long ago doing this. I treat my drives as sequential and my code as random access by applying random number functions to all my pointers. I find that even my most complex algorithms now finish almost instantly!
Two quotes from the article (emphasis mine):
Gray, head of Microsoft's Bay Area Research Center, sits down with Queue and tells us (...)
JG: If it is business as usual, then a petabyte store needs 1,000 storage admins. Our chore is to figure out how to waste storage space to save administration.
MS bashers will have a field day on this one...
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
Apart from speculating as to whether this attempt at FUD was the real payload of the article, did it really say anything that most of us haven't already noticed? Whether Flash or fast SCSI, we could do with an intermediate layer of backing store, with faster random access than current IDE HDDs. And we are fast heading for removable IDE drives to be a better and cheaper tape replacement. And the Internet has limited bandwidth. I'm sorry, but you don't need a Turing prize to work any of that out.
Panurge has posted for the last time. Thanks for the positive moderations.
Bah, check your history books. It has happened already! (in Soviet Russia, of course ;)
For more info on (very-cool) Log-Structed File Systems, check out Mendel's original paper at:
m l
http://citeseer.nj.nec.com/rosenblum91design.ht
smd4985
Not to say that large, sequential access hard disks aren't a good idea for archives or corporate data. It'll cut out the need for tape drives. But this will never fly with the home user.
IAALS.
...Gray, head of Microsoft's Bay Area Research Center...
And here he is singing the praises of open source software, MySQL, Linux, Posgresql, Oracle, IBM etc! He'll most likely be getting a visit from Balmer in person I think. Obviously the brainwashing didn't work on this guy.
So I guess the disk algorithms from Knuth's TAOCP are still useful after all those years?
Last link doesn't lead to 'a map of Universe'.
One final thing that is even more speculative is what my co-workers at Microsoft are doing. They are replacing the file system with an object store, and using schematized storage to organize information. Gordon Bell calls that project MyLifeBits. It is speculative--a shot at implementing Vannevar Bush's memex [http://www.theatlantic.com/unbound/flashbks/compu ter/bushf.htm]. If they pull it off, it will be a revolution in the way we use storage
I've talked about it before. This guy thinks what Microsoft is doing is revolutionary. Come on all you people, can't you see the problem with today's file systems ? the problem is that the type information is lost!!! we need objects, and we need type information to be stored along those objects!!! This is the only way lots of problems will go away.
There is no island of "Rand McNally"!
Looks like middle age hasn't been kind to action hero Duke Nukem. In a prerelease press preview, presented by Joe Siegler, the studly hero is bald with a huge beer-gut. "We wanted to flesh out the character of Duke", Siegler said, "we want to make him more a character that his fans can directly relate to".
In the new title, Duke is in a custody dispute with his ex-wife. Apparently, since he lost his job, he's in arrears on his child-support payments. When his (alien) wife kidnaps their kids and leaves for her mothers on Moltar III, it's butt-kicking time!"
This doesn't just affect file storage and virtual memory. It also changes the economics of cache and main memory, and makes deployment of 64-bit CPUs more urgent. It also makes system crashes much less tolerable, because turning the computer off and on doesn't involve long shutdown and boot procedures any more.
You're just looking at the massive downstream capacity you have, which is really only available if nobody else at your CO is using it. Odds are your upstream capacity is significantly more expensive. I have a 128kbps upstream on my $40/mo cable modem, which comes to about $1/GB up.
If you have a dedicated T3 (45Mbps), for example, and pay $7000/mo for it (a reasonable price), you are paying about $1/GB up AND down.
aQazaQa
It's a shame this guy passed away recently. This was the absolute most intelligent and insightful interview/story I've ever read.
So, one could rent a $20K device for $240/year? Those must have been the days...
That can't be right.
I forget what 8 was for.
Chrisd
Co-Editor, Open Sources
Open Source Program Manager, Google, Inc.
This does indeed sound familiar - like the old cassette drive hooked up to my C64. I guess everything old really *is* new again.
Take this choice quote from the article:
My buddies are being killed by supporting all the Linux variants. It is hard to build a product on top of Linux because every other user compiles his own kernel and there are many different species.
Ain't it sweet? I count five lies:
(1) people being killed by supporting (gasp) operating systems... gosh, horror and violence, not nice at all!
(2) all the Linux "variants", are in fact pretty much one standard, LSB, with several skins
(3) "hard to build a product on top of Linux", rather than, hmmm, Windows? Linux is incredibly easy to build for. I suspect the fact that it's very standard helps.
(4) "every other user compiles his kernel"... maybe at Microsoft. I suspect less than 1 in 20 Linux users ever compiled a kernel.
(5) compiling a kernel means you can't support it... WTF? The kernel is incredibly stable, since most changes are in external modules. And I can't remember a single case where a kernel change broke one of my apps.
(6) (sorry, I was not counting well), "many different species"... well, AFAICS the only difference between the Linux distributions is that they have different packaging methods, different timelines as to their versions, and different UI tools for hardware detection, configuration, etc. Nothing at all that makes life hard.
Look: I just installed Xandros, which is Debian with a nice face. On two different types of machine, and it installed without asking a single question about my hardware except whether the mouse was left or right-handed. Check my journal...
Windows never worked this nicely. Where is the support issue?
In the writing indistry we call this "to condemn with faint praise".
Yeah, Windows kinda works, I mean, it'll run Office without crashing too often, but it's just killing by buddies to have to maintain Win2K, WinXP, and even some older Win98 machines, not to mention we have a whole cupboard simply filled with driver CDs for every PC we have.
Ceci n'est pas une signature
Only until we finish bombing off all the other parts that AREN'T the USA!
Anyone know what happened to that bloke at keele who
invented a way of cramming 3 Terrabytes on a credit card. Apparently it would have cost about 35 pounds to manufacture. this was a couple of years ago, why hasnt it happened yet?
Surely something like this is the real future of storage ?
Terrabyte on a credit card
Electronic Music Made Using Linux http://soundcloud.com/polyp
"Sneaker net" was when you used your sneakers to transport data?
Oh my. How old I feel when someone has to ask what "sneaker net" was. And someone has to answer...
computerlady - a brand new Slash-daughter - alone, but no longer invisible, in the
Damn, timothy, when it says June on the article it just might be a dupe, ya know? But it's nice to know that the future of disk access hasn't changed since then.
-Looking for a job as a materials chemist or multivariat
This is a *MAJOR* breakthrough! Most Turing Test contestants don't even win, but this one can eloquently discuss topics and give complex answers, rather than just turning back the question, Eliza-style.
Can we download a copy of this "Jim Gray" yet?
>programmers have to start thinking of the disk as a sequential device rather than a random access device
This is partially already true for classic UNIX userspace behavior. You pipe the data from the input file(s) trough a filter and generate the output, sequentially.
A completely different model from the FS drivers or a SQL database.
What current file systems need is meta data in them. That is that the File system itself stores the MetaData about the file. Think about the Mac File system, with the Meta data contained in the file itself, as the "resource fork". Now imagine a systemized, extensable meta file system, that organized files by what the Meta Data said about them.
Imagine, media files stored in such a way that both random and sequential access was optimized, where the file structure was automagically defragmented and organized behind the scenes.
Imagine a computer that watched what files were used at bootup, and organized them so that the hard drive streamed the bootup data sequentially, straight into memory.
Imagine being able to start PRELOADING applications before you even finish the second of your double clicks on the datafile.
Imagine Database files that were automagically indexed as part of the file system.
Imagine Security and encryption being built into the filesystem beyond today's capabilities, where the security and encryption does not rely upon a master controller or centralized security policies, but rather has the ability to follow the file, seemlessly.
I am sure that I haven't even begun to tap the possibilities.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
The thing is that while the internet2 was supposed to be a research tool, it is actually being used for high bandwidth data transfers from academic units. This happens so much that it actually hinders networking research to an extent.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
To answer your question directly, cost is probably the biggest reason. But there's more...
...the other answer to your question, by the time solid-state disk prices are low enough, access methods will have shifted enough and memory hierarchy grown enough that the effects simply won't be what you'd expect given the technology, today.
"programmers have to start thinking of the disk as a sequential device rather than a random access device."
"I think we'd all be better off when solid state, non-mechanical disks become commonplace."
Now that you mention it, I don't think so. In reality, there's a lot time spent dissing the latency (and even bandwidth) of DRAM. (any flavor) That's why caching is being elevated to a fine art form. That's why Intel is introducing L3 cache in their new "Extreme" gaming chip.
The reality is that the next level up is never as "good" as the level below, only denser and cheaper. Caching helps get around this, and good old Single Level Store extended the concept by treating the disk as the real (bit) address space and main memory as a cache of that.
Even without Single Level Store already disk paging and caching mechanisms, head elevator algorithms, I/O schedulers and the like already do some of this 'sequentializing' for us, behind the covers.
IMHO, we're getting where he wants to go, through the back door.
The living have better things to do than to continue hating the dead.
With an ever growing collection of digital photos, I've come to the same conclusion as Jim Gray. Hard disks are superior for backups.
I currently have about 100 GB of images and it takes more than 20 4.7 GB DVD-R discs to create a full backup. Although DVD media is still slightly cheaper than new large capacity IDE drives, the added time and hassle factor of burning 20 disks far out weighs any minor costs savings. Moreover a 3.5" drive in a padded anti-static bag takes up less room in the safe deposit box than 20 DVDs (especially if you have the DVDs in protective jewel cases). And if HD-based-backup lets me avoid some future artists tax on burnable media, so much the better.
A Firewire enclosure and a rotating collection of IDE drives is the way to go.
Two wrongs don't make a right, but three lefts do.
Interesting thought popped when i read your post,
there is a current trend towards cramming as much storage into something the size of a 3in Hard drive.
I wonder why they dont make larger harddrives in the physical sense? A hard drive the size of a washing machine using todays technology would store a phenomenal amount of stuff, but whatabout something more reasonable like a hard drive merely twice the physical size of todays. how much more storage could you get just by scaling up the platters? anyone here good at math . Hard drives today must be up to 200-250gb.
Electronic Music Made Using Linux http://soundcloud.com/polyp
Like the subject said.
Does that mean Jim Gray proved to someone over a computer terminal that he was human?
You are in a maze of twisty little passages, all alike.
His basic idea is 100% correct, but the reson is all wrong. It *IS* much harder to develop an app Linux the myriad of flavours, not because of the kernel, but because every distro has its own versions of libraries. I work for a company that makes Linux software, and we only support RedHat, and even certain versions of RedHat at that. While our product would probably compile against any number of distros, and even the BSDs, we just don't have the time and manpower required to build, test, debug, package, and maintain 15 different releases for every sub-release or patchlevel we have in the product. With Windows products, at least, (unless you are doing some lower-level stuff) if you build something you can be reasonably assured it will run on Windows 2000, or Windows XP, or Windows 2003. Not the same if you build something with RedHat 9 and try to run it on Debian or Suse, etc. And before you go on about "release a source package", not all companies release everything GPL, and want to keep their IP theirs, since they like to put some money on the table at night. It's definitly not FUD to say it is much more effort to develop and release cross platform binaries in Linux than Windows.
What Gray is talking (mostly) about is what we used to call the "Roadmaster Scenario." When I worked for [a major electronics company], we had a data center in Dallas and a redundant site about 30 miles away in Lewisville. Every Sunday the entire IMS database was archived to mag tape and shipped to the other data center for a second level of redundancy. This begged the question, why not just copy them over the T1 lines (this was 1980) to the other site's tape drives directly? The answer, of course, was that it takes a helluva lot of bandwidth to outrun a Roadmaster full of mag tapes.
"Stop whining!" - Arnold, as Mr. Kimble
> To some extent you can think of Codd's relational algebra as an algebra of punched cards. Every card is a record. Every machine is an operator.
Interesting how the guy literally wrote the book on transactions, yet grossly misrepresents Codd's work, which BTW wasn't simply the relational algebra, but even higher level: the relational model of database management, including the relational calculus.
While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent. The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.
Incidentally this has a big impact on all Gray is discussing -- without a fairly simple and powerful data model, so much data is basically a waste. He's thinking too low level, including the object stuff he touts, but we will only find use for so much data the day we get proper relational implementations, and this excludes SQL in general and MySQL in particular.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
First, DLL hell on Windows, shared libraries on Linux, same headaches. Consider static linking: larger binaries but fewer headaches.
Second, it is trivial and cheap to build packages for RedHat, Debian, and SuSE as you need them, we do this automatically. See, when the OS is free, it costs you little to set-up development systems. If you're tight for hardware, use UML.
Third, there are serious arguments against delivering binary-only packages, and in favour of building from source, and these arguments are not related to the GPL. My company has always had a policy of delivering source code whenever possible, and we've not had any issues with that. In contrast, it allows our customers to get much more out of the product.
Observation: you will notice many, many packages that install and run just fine on a wide mixture of Linux systems. Someone, somewhere, is not killing themselves doing this. Perhaps Microsoft buddies just die easily. Or perhaps the problem is easily solved by the application of tools like autoconf that are unknown in the Windows world.
Ceci n'est pas une signature
The best part of the Turing awards ceremony is when the camera pans to the crowd and we see a bunch of losers wearing Japanamation T-shirts stitting with ugly girlfrieds.
So, if the plural or "virus" is "virii", then I guess the plural of "radius" is "radiii".
The thing is, it is, except for the number of i's (radii, not radiii).
I believe that sometime in the future, we'll look back on our spinning disks and chuckle. I think we will eventually get to near-infinite storage, and sequential will be the way to go. There won't be any erasing necessary, you will just write to the next available space, move the pointer to it, and move on. Why did we come up with erasing data anyway? or compression? It was to save space. What if you didn't have to save space because there was no limit on it? Some technology will come along that will offer us near-infinite storage space, and we won't have to worry about random access, erasing data, space management, etc. We'll get there, I just hope it is within my lifetime.
My beliefs do not require that you agree with them.
Oh, please, keep those coming.
I just love your sense of humour. I remember when we switched an ISAM application to Oracle in the mid 1990's, on a Unix box. A single record access by primary key was 20,000 times faster with the ISAM system than under Oracle.
I retested this with later versions of Oracle and found that the performance was worse, not better.
Now, I have a nice server under a desk here, and we reloaded an Oracle 9 database on it, it took something like 8 hours to rebuild. Since we make portable software, just for fun I reloaded the same database under MySQL. Less than 15 minutes.
Oh, but perhaps you were serious. No, you were serious? Jesus! You really were serious! Oh, that's even funnier. (wipes tear from corner of eye).
More, more, more...
Ceci n'est pas une signature
That's because the original is "station wagon" (or "stationwagon"). Another common variant is "a 747 full of...". See e.g. this story
And no, it's certainly not Tannenbaum 1996; it was (IIRC) mentioned in Bentley's "Programming Pearls" CACM column/book in the 1980s. It's unclear that anything original can be attributed to Tannenbaum (okay, that's flamebait, but Tannenbaum irritates me).
Professional Wild-Eyed Visionary
Apple came to a similar conclusion for it's server storage solutions (XServe and XRaid)
IDE drives can offer the speed & reliability that are needed even for demanding applications
In the article at the bottom they say "In Memoriam -- Since this interview, Edgar "Ted" Codd, inventor of the relational data model, died of heart failure. He was much loved by his colleagues both for his warmth and for his many contributions to computer science. "
But then in the middle of the article theres this: DP Let's talk about the higher-level storage layers. How did you get started in databases? You were at IBM when the late Ted Codd formulated the relational database model. What was that like?
So which is it? Did he die AFTER the interview or BEFORE it??
E V E R Y T H I N G I W R I T E I S F A L S E
Ooh, you're good. Now can you tell me where my keys are?
Uhhh, I can't use my telepathy to tell you where they are because you don't know yourself. And if you don't know where you left them then how am I supposed to read that info from your mind?
However, I can use my clairvoyance to tell you where they will be. They'll be in the last place that you look.
Glad to be of assistance.
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
Gray has some strongly-held opinions. Not many folks can convince him of much without damned good reasons. Of course, he also rags on how poor free software is.
It's nice to see Jim Gray has learned to apply himself since he was black-balled by the sporting public after his mean-spirited and inflamatory interview of Pete Rose at the World Series a few years ago.
He didn't come across as that bright at the time, but I guess he wasn't showing his true colors. And now he's won a Turing Award. Utterly amazing...
Ooooh, wait...
-- Experience is a wonderful thing. It enables you to recognize a mistake when you make it again.
I've taken the liberty of removing the referrer tag from your amazon.com link.
If anyone were stupid enough to buy that book (an $89 book? WTF? text is free), you would have made $4.45 in commissions. This abuse of the comments system for personal gain is egregious and in violation of the Terms of Service for Slashdot.
Please delete your account.
I'm not Seth Finkelstein. I still speak the truth.
"The algorithms are simple enough so most implementers can understand them, and they are complicated enough so most people who can't understand them will want somebody else to figure it out for them. It has this nice property of being both elegant and relevant."
Or what happens when the concept of taking complexity (made up of simpler things) and automating it such that it is easy to use and reuse, by the user, is applied here?
No, of course performance is not the only indicator of a product's worth.
Let me list my criteria for, e.g. a database product:
1. accuracy
2. performance
3. ease of administration
4. ease of installation
5. price
Not in any specific order. I've used Oracle databases for about 12 years, and on every single one of these counts, MySQL wins. Every single one, without exception.
Oracle wins on a number of other criteria:
1. profitability
2. complexity
3. need for expensive DBAs
4. consumption of excess time
5. image
6. marketing strength
7. market share
8. number of marketing drones
But as an independent software developer, thank goodness none of these actually help me in my business. MySQL has already killed Oracle's database, and their attempt to escape that trap and move to ERP systems and clustering technologies is just another industry troll.
Sorry, you're talking to someone who has been there, seen it, and speaks from long, painful experience.
Ceci n'est pas une signature
This is a great article. Jim Grey' idea of shipping boxes around really makes sense, but he talks about spending $400 to ship his 300MB a heavy computer, why not use a 10 pound terabyte box. I have to believe it would be cheap ship a 10 lb archiving appliance like the RocketVault " http://www.intradyn.com/products/specs.php" ? You could also have this thing do all the work, set it up to get the data over the network and e-mail you when it's done. Then pack it up and ship it. Since everything is self contained all you need at the other end in a network connection to restore the data.
And how long would it take you to transfer a terabyte of information to the UK. Total cost here.. the other end has to pay as well.
Your cost is cheap because your ISP does cost averaging. If you pin your connection at maximum usage in/out 24/7, most broadband residential ISPs will send you a nasty letter, and shortly after, simply drop you as a customer. They aren't REALLY selling you bandwidth at that price. OR if you want to look at it, they are, but only if you use it a small amount.
If you need to transfer terabytes of data long distance, quickly, it's cheaper and faster to send computers via fedex than it is to purchase the bandwidth from some network provider.
Actually I got about halfway through and decided to skip the rest of it.
- First they ignore you, then they laugh at you, then ???, then profit.
That's why it seems so old - we all saw this months ago!
Maybe /. needs to implement a dupe catching feature. Whenever an article is submitted, extract URLs supplied and compare it to the list of previously submitted URLs. Then display the list to submitter and let him check the links to insure that it's not a dupe.
...") instead of deep links. These can be skipped.
/. editors.
Potential problem is links to front pages (i.e. "Yahoo! reports
For deep URLs that change content but don't change URL run a comparison between checksum on new link and old one.
Certanly this is not bulletproof, just an additional check that can be run by submitter and
This has been around a while, I mentioned it on Slashdot over a month ago. But it's still a great interview.
Da Blog
Why don't you send out a mixed source/binary package:
The binary part can be the core of your program and contain all your IP.
The source part can be an interface layer to the rest of the system (aliases for library calls, or equivalent implementations for missing functions, etc...basically a wrapper layer between the system and the program).
During the installation the source part can be compiled and (statically/dynamically) linked to the binary part. The source package doesn't have to be GPL (since, if it linking it to your binary would force the binary to be GPL), but it could still use some other open source license.
That way you can mitigate the disadvantages of a binary distribution without having to use a full source distribution.
Also, if many companies were doing this, it might be a good idea to open source these compatability layers so that every company that makes something for linux isn't duplicating the effort. (though this is kind of what libraries are supposed to do....)
Another alternative is to *trust* your customers:
You could have a full source package, but under a proprietary license (not GPL). Just because the source is available doesn't mean that the customers have full reign over your IP, or even are more likely to pirate it: I have the full "source" for several books, but that doesn't cause me to violate the IP of those authors.
I really doubt that PHB's will go for the full-source approach though, as they tend to be paranoid about such things...which is why I suggested the first thing.......first....
If I recall correctly, there is a Jim Gray who writes about computing, and was involved in a recent lawsuit to keep using his own name without infringing the trademark of Jim Gray - the one who writes about sports. The case got quite a bit of press, particularly here on slashdot, as an example of how IP laws were fuxored. Despite this, people still keep confusing the two. Poor guy ought to sue us all for not knowing the difference by now, including me for not remembering the details.
Who is John Cabal?
I had a real problem when Gray indicated that there was no real use for a 200G drive. I dedicate a 120G drive to storing photographs I take, and it's nowhere big enough. I would like to have LOTS and LOTS more storage space, but I'm perfectly happy with the access times. When the six megapixels per picture I use become twelve megapixels I'll want even more lots and lots of storage space and maybe a doubling of the access times. One of these days I'll be getting into DV video, and the need for storage will go up even more, while the access time needed to do realtime editing is already here.
I guess that, no matter how smart someone is or how dedicated to keeping on top of what's going on someone will come in out of left field with a use for technology that is completely unexpected ('though I would not have thought that storage of photo or video files would be unexpected).
Information is not Knowledge
Funny, not what I heard... I didn't hear:
"programmers have to start thinking of the disk as a sequential device rather than a random access device."
What I heard was:
"vendors have to get their head out and start thinking of the disk as a random access device rather than a sequential device, and they need to have their equipment quit lying about whether something has actually been committed to stable storage, or is just in a cache."
The free Pointrel Data Repository System I have been working on is optimized for adding data, not changing it. So it fits in somewhat with his model of primarily linear access to data. http://sourceforge.net/projects/pointrel/
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
The Resource-Fork of a file on HFS+-Volume on a Partition of a disk does not contain metadata. The resource-fork contains resources such as the icon-image. The metadata lives in the Desktopdatabase.
If I had it all to do over again, I would implement the file-system a journaled file-system with the file-system consisting of a series of caches containing the ID# of each file as a 64-bit integer, the location of the file on a volume as a 64-bit integer, the paths to the file through the hierarchy, ownership and permissions, the MIME-Type, the file-size, the file-dates, and the name of the file as upto 255 UTF-8-Octets.
The file would would have three forks:
The resource-fork would contain the icon. The metadata-fork would all of the metadata like ID3-tags and EXIF. The Metadata-fork would use an XML-based language. The metadata and the XML-based language would be extendable. The data-fork would contain the bit-stream of the file.
A 64-bit file-system can handle up to 16 exafiles. Each file could be upto 16 exabytes. Assuming that the minimum file-size is 1 kilobyte, a volume could have upto 16 zettabytes of data.
Unfortunately, HFS+ was not built this way.
Impeach Bush
That browseable photo of the Earth only seems to take in the US.
This is a copy and paste troll trying to whore karma.