Microsoft Invents Symbolic Links
scromp writes, "Microsoft really does innovate!
See for yourself!
" I can't decide if this is supposed to be a joke or not. I mean, it's funny, but I just can't tell. Perhaps I need a cup of coffee before I try to post stories.Update: 03/03 06:11 by H :To be fair to Microsoft, they did talk about symlinks.
I wouldn't call it innovative, but it's clearly not symbolic links. For one thing, it's not explicit. It all happens under the hood. For another, it has all this database of hashes to enable copy-on-write symantecs. Please try not to bait people so much!!!
;-)
That being said, as a sys-admin, I would want to be able to disable this. While I'm sure the copy-on-write feature uses sufficiently unique hashes of the files to identify changes, there are definitely cases where I *want* to have physically seperate copies, particularly if files are on different partitions (maybe this is only done at a partition level, who knows?).
I did like their other "innovations": IPv6 (gee, I can get that for Linux can't I?), text-to-speech (yup, also for Linux), a statistics based trouble-shooting tool (of course, the stats would have to be setup before Win2000 was deployed, so you can imagine how accurate they'll be), etc. I mean who do they think they're kidding? Not only does Linux have similar facilities to most of their "innovations", but ALL of these innovations are available elsewhere as add ons to Windows! Oh, wait, I forgot, innovation is when Microsoft takes other's technology and bundles it with Windows.....
sigs are a waste of space
What actually happens is that when an object is stored in the system - the system checks to see whether it is identical to another object, and if so - just stores a reference to the other object. This is achieved using a "signature" or in non-M$ language - a hash.
It is actaully a reasonably good idea although I can't see how it could have taken so long to implement it. Now it is quite possible that this has been done before, but it certainly isn't just symbolic links!
--
Associated Press
2000-03-02
Today, in an unprecedented show of candor, top Microsoft spokesmen admitted that Windows 2000 ("W2K") is largely redundant.
"Yes, it's true. We have so much duplicated crap in W2K that we're developing new technologies to deal with the sheer volume of bloat", said David Spiker, in a Redmond, WA press conference this morning.
"I mean, honestly. We've incorporated two thirds of RedHat Linux and thirty percent of FreeBSD. They're the same thing, really, so why store two separate copies?"
Other industry figures weren't too impressed with Microsoft's (Symbol: MSFT) new direction.
Said Richard Stallman, a leading freeware author, "Come on. We've had gzip for years. They can just compress everything." ("Gzip" is a cryptic program that runs on older "Unix" systems, which are similar to Microsoft's innovative "DOS" operating system).
At least some insiders, though, cheered the move. An unnamed employee of Sun Microsystems said that "it's about time they squeeze the crap out of that pig", soundly endorsing Microsoft's creativity and initiative. "I mean, really, their competitors don't have a tenth of the [features] to squeeze, let alone a reason to come up with new [systems] to squeeze it."
Side note to Dave: That's what you get for leaving us. :)
Dewey, what part of this looks like authorities should be involved?
Now we know why Windows needs to be rebooted every time that a significant event occurs: the Single Instance Store collapses all solutions into a single answer of "REBOOT" to fulfil their goal of massive saving of storage space, so when the trouble-shooting tool from their Decision Theory and Adaptive Systems Group uses its advanced statistical model to deduce that the most probable solution, naturally it returns the same result every time.
What hope does Tux have against such ingenuity!
;-)
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
Problem is, from the sketchy details in the article, this looks like a hack to fix a problem that traces all the way back to the broken disk storage model inherited from DOS - a problem that doesn't even exist under UNIX.
The UNIX model creates a single storage tree with the ability to mount new devices (disk partitions or NFS exported directories) at any point in the heirarchy. It's a model designed from the ground up around the concept of a network. The main advantage of this model is that it's simple to install a single copy of an application in a standard location on a network server (/usr/local or /opt or whatever) and simply mount this location on all the individual workstations. This way, you have one copy of the software (the "batch of bits") that can be used by any number of hosts.
With DOS, everything was assumed to be local. All your libraries are on the C:/WINDOWS/SYSTEM directory. Everyone has their own copy of the Registry that points to where things are located. If you want to install a piece of software, in most cases you have to install it locally, or at least all the DLL files are local. Hence, if you have 1,000 people on the network using MS Word, you end up with 1,000 identical copies of MS Word instaled! If you want to upgrade software for all those users, you have to do it 1,000 times.
With a storage model like that, it's no wonder MS is looking for ways to save space on identical copies of software. If it's static data (like installed software, then backups are less of an issue. I would think that the backup would grab a copy of the copy and write it out as if it was a unique file.
Still, it's nothing more than a patch to a fundamentally broken architecture!
Your Servant, B. Baggins
It actually makes sense to install this on a machine that has a lot of user drive space.
The software will make symbolic links automaticly, so when lusers all save
that "Wassup" commerical to the user
drive the machine doesn't run out of
space.
It's not just symbolic links, it's automatic
symbolic links.
Shaun Nelson - Bastard Operator (From Hell / For Hire)
I feel a certain ``sameness" in this discussion about how SIS duplicates UNIX symlinks -- a ``sameness" that recurs with every development that MS announces as a ``brand-new innovation."
Let's look at another much-heralded inovation from MS's past -- multitasking -- & compare how its reception mirrors the reception to SIS:
1) MS announces -- with much enthusiasm & pride -- a new development. In 2000 it was SIS; in 1992, it was multitasking under Windows.
2) Based on their enthusiasm & the amount of pride, many knowledgeable computer users expect it to be the same -- or better -- as an existing useful feature under other OS's. In 2000 SIS is compared to UNIX symlinks; in 1992 people expected preemptive, multi-user multitasking.
3) After some examination, it is discovered that the MS innovation is not as useful as first thought. SIS replaces duplicate files with a pointer, & is not actually a symlink; windows multitasking is co-operative -- a second program or process can't get its share of the processor until the first one decides it's finished.
4) The real, but marginal, added good of this innovation is soon countered by the flaws or bugs it introduces. SIS can frustrate a user making backup copies, & can reduce CPU performance as it checks for duplicates; a locked program in a co-operative multi-tasking system can lock the OS just as tight as under a single-tasking OS like DOS -- forcing a reboot.
5) Users have no way around these flaws. ``Every time I move a 40MB file to my Y2K box it slows down to a crawl because its confirming this file is not a duplicate. Byte by byte." -- ``I lost two hours of work because a program GPFed & forced me to reboot the entire system."
6) Expectations of software written by a certain company are once again lowered.
Okay, I admit the last is speculative. But a lot less than it might seem.
Geoff
I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
I think if you actually read the whole article, this is an innovation. The program checks for files that are duplicates, then replaces the duplicates with a symbolic link to a single file. This happens automagically, so the user never notices, or knows. It's the second part of this that may cause problems.
Let's say, at the end of the day, I copy a folder which contains files I have been working on to a backup folder on the same hard drive. The program deletes all of my copies, replacing them with symbolic links. The next day after a few hours work, I realize I need to revert to one of my backup files, but it's been changed to a symbolic link to the file I'm using now. Presto!
I think this has potential, and I think it could be a good idea, but the gods live in the details.
See, the lunch loosing point is how the reporter is basically worshiping the ground these guys walk on. As I wilt into the carpet, I realize Lucovsky must have mentioned to his colleague how nervous I was about approaching them. After all, who the hell am I to be talking with these guys? They are developers' developers -- two of the visionaries behind the operating system that began as NT OS/2 and has evolved into Windows 2000. Cutler refuses virtually all interviews with the press, but he and Lucovsky are willing to talk to me, a program manager from down the hall. They probably find my nervousness amusing.
Is this supposed to be a reporter. Or did they just grab some office lakey is told them to interview them while they recieved a blow job.
This just confirms(paraphrased): "Those who don't understand unix are doomed to reinvent it, poorly" This is just hilarious.
Actually, this is more than symlinks, in that it is done by a background agent on the user's behalf. Thus, they take a useful feature and make it extremely dangerous.
People make duplicates of files for good reasons(it's a bad idea to assume they are clueless).
They may want to have a fair copy of a document before it is mangled by the commitee process. They may wish to check out a document to create a version fork. So, I see a great html file, and decide I want to use it as a template for my own document, throwing out the contents. I go home, and overnight the system replaces my copy with a link. Then I edit the file and blam -- I just changed somebody's web page. Yuck.
Of course there may be safeguards, and maybe the users should use a document management system. The problem is that document management systems are sometimes overkill; where they are used the problem being solved doesn't exist. As far as safeguards are concerned, they complicate a simple process to solve a problem that in the end is not very important at all.
The irony is that they are optimizing the wrong thing. Disk space for user file storage is cheap. Even if you have enough users to drop 20-30K$ on a hardware RAID, it is still cheap relative to the time to administer the system and cheaper still with respect to user time.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
My first thought was that they might have done something slightly different from a familiar old symlink by applying "copy on write" semantics. While I couldn't find anything in the article that unambiguously states whether they're doing this, I did find:
>The first piece searches for duplicate files, computes a signature for each file and stores these signatures in a database. It then compares the signatures in the database and merges duplicate files.
This is actually kinda nice. It's going out and looking for copies, instead of relying on you to create the links explicitly. In order for this to be truly transparent you'd have to use COW, of course. There's also a concern about how much CPU time and bus bandwidth will get used finding and tracking these duplicates. Lastly, let's not forget that it would be trivial to create a daemon to do the same thing on any UNIX.
In short, it's an interesting idea, but perhaps still a bad one. In any case, I don't think it's "just a reinvention of the symlink" even though MS has tried to take credit for a lot of rather ancient computing ideas.
Slashdot - News for Herds. Stuff that Splatters.
>Symlinks are great, but not ALL duplicate files should become them.
I strongly suspect that the real innovation here is using the ancient virtual-memory trick of "copy on write" to files. In your example, foo.conf and foo.conf.old would be links to the same data at first, but the moment you write foo.conf the link gets broken and a fresh copy of the data is created automagically so your updates to foo.conf don't affect foo.conf.old. Problem solved.
For reasons described in my earlier post I'm not sure this is a great idea, and it certainly isn't likely to "free up as much as 80 to 90 percent of the space on a server, allowing users to store as much as five to 10 times the information" as MS claims, but I'd be amazed if even MS would allow the problem you describe to occur. It's too obvious even for then. In fact, this may provide the answer to the "why did it take them so long" question. Plain old symlinks would be easy to add (been there, done that, on an MS platform) but adding the COW behavior could be tricky.
Slashdot - News for Herds. Stuff that Splatters.
If this is combined wit copy on write it is a good idea. If it isn't, it is obvious that it creates the possibility of things being changed identically when only a local copy should be changed.
Another obvious issue is that it clearly isn't as flexible as the Unix combination of copies, hard links and symbolic links. I can choose whether to make a copy which will thenceforth be a separate file, create a hard link which will be a separate name for the same data, or create a symbolic link which is a reference to another file which may or may not exist. This is typical of the differences between Windows and Unix. The Windows approach is powerful, but does not leave as much flexibility in the hands of the user or programmer. Unix assumes that if you know what you are doing, you can make the choice for yourself.
The net will not be what we demand, but what we make it. Build it well.
Well, the article sortof suggests that they set up the links automatically, which would be an innovation perhaps.
This article isn't terribly technical, so I can't be sure, but as I read it, this has one major flaw: If it automatically detects duplicate files and symbolically links them, it will ruin your ability to backup a file by creating a copy of it somewhere.
Symlinks are great, but not ALL duplicate files should become them. If I have file foo.conf, and I back it up to foo.conf.old, but don't change it right away [Or maybe that doesn't matter, if this "feature" is constantly running], their SIS program will symlink foo.conf and foo.conf.old to be the same file. Then when I change foo.conf, and hose my system, I can't restore it by using foo.conf.old, because that file was changed when I changed foo.conf!
Can anyone give more details about how this works?
So your app uses MSVB500.dll? Then your target machine will get an extra copy of it. Have 5 different applications (from 5 different vendors) which each use MSVB500.dll, then you will have 5 copies of MSVB500.dll. This sounds like the road to true bloat but at least you end up with a more robust application because other peoples' installations of common dlls won't break your app. With this automatic symbolic linking, much of the hard drive space which could have been wasted by redundant dlls can now be reclaimed. This could be a real good thing.
.lnk files are worthless unless you're doing 1 of 2 things:
1) launching an executable
2) launching something that is a registered filetype (i e, C:\>foo.lnk where foo.lnk points to foo.txt)
This is extremely limited.......
for example, you can't make a shortcut to a library if that library isn't in your path.
just for kicks, make a text file in win32, then make a shortcut to it.
then:
c:\>type foo.txt
you get a bunch of high ascii garbage with the path of the target mixed in there.
compare
$echo this way works >foo.txt
$ln -s foo.txt foo.realsymlink
$cat foo.realsymlink
OR, drop the -s and then try moving the target file around. Try that with your shortcut.lnk
The behavior described in the article is neither a Unix symlink, nor a Unix hardlink. It is something I have never heard of in Unix, an automated symlink. Frankly (and I am a Unix weenie), this looks like a true innovation.
In Unix filesystems, each file has an "inode" number unique to the filesystem. The directory entries all point to inodes. Thus, two different directory entries can have the same inode, and thus the same bits are accessible from multiple places. Note, for example, that the vi and ex programs are hardlinks to the same executable--the editor simply reads the name it was called with to determine whether it should behave as vi or as ex.
Hardlinks do not really exist to save space, they exist to link two directory entries at the hip. If one file (inode) has two links (filenames), then grabbing it by one filename and editing it will cause changes which will be visible when you pick it up by the other filename. Note that, because of this, Unix hardlinks are manual. The filesystem doesn't spontaneously create hardlinks; it takes a user process to do this.
Microsoft's scheme is implicitly handled by the filesystem code.
This implies that this is happening automagically, without user interferance. At worst, this means that the SIS is creating hardlinks on the fly. I doubt this because it would create Mothra-sized bugs as two files get "married" as links and never "divorced". Think about it: users often copy a file byte for byte (causing SIS to link them together), and then edit one and use the other as an unchanging backup.
My guess is that SIS is linking files on the fly when it recognizes them as equal, and then unlinking them (copy-on-edit) as a file is edited to be different from its linkmates.
This is simply Microsoft eliminating redundancy in its filesystem. Compression algorithms eliminate redundancy all the time--that's how they save bytes.
Some Unix flavors do a similar thing in core. When loading up a program, the bits of the binary can be stored once in memory no matter how many invocations the program currently has. If eight people are running Emacs, memory is storing eight Emacs data segments, but only one copy of the Emacs binary.
This is something one could implement in Linux filesystem code. Each inode would need its own checksum, and there would have to be a one-or-more-to-one relationship between inodes and hardware representations--that is, two different inodes would be able to share the same sectors.
When a file got edited, the FS would determine whether the sectors were shared with one or more other inodes--if so, you have to "divorce" by copying the sectors elsewhere and pointing the inode to the new sectors.
When the edit finished, the FS would recalculate the checksum, then look for all other inodes with the same checksum. For any matches, do a byte-for-byte diff to make sure--if so, then point the inode at the same sectors as the old inode and mark the new inode's sectors for reaping.
The tradeoff is between filesystem space and write performance (read performance is probably unchanged). It takes better minds than mine to determine under what circumstances the tradeoff is worth it.
--The basis of all love is respect
I think you need to put a little more emphasis on that.
nine separate groups within Microsoft Research contributed more than 15 innovations to Windows 2000
Wow! OVER FIFTEEN separate innovations, that's amazing isn't it! How can one company come up with such a staggering number of innovations?! Hooray!
I mean, I thought I came up with an innovation the other day, but actually it wasn't. Innovations are hard, oh boy!
Indeed. And with 63,000 "issues", that's more than four thousand bugs per innovation, folks.
--
This comment was brought to you by And Clover.
In Win2K an application has a choice of using the system dlls, which are protected and can't be written over except by a service pack, or it's own private version of a DLL. So if your app requires a specific version of msvcrt.dll, you can install it in the application directory and it will use that copy instead of the system copy.
For a complete explanation of this: Check out this article