Slashdot Mirror


Microsoft Invents Symbolic Links

scromp writes, "Microsoft really does innovate! See for yourself! " I can't decide if this is supposed to be a joke or not. I mean, it's funny, but I just can't tell. Perhaps I need a cup of coffee before I try to post stories.Update: 03/03 06:11 by H :To be fair to Microsoft, they did talk about symlinks.

33 of 712 comments (clear)

  1. Still not new... by mosch · · Score: 3

    I still don't think this is an 'innovation'. A coworker of mine regularly makes fully standards compliant CD-ROMs with 2 gigs on them (it uses basically the same exact method for getting the space, except using cross-linking and such tricks). The basic difference is that they do it on a live filesystem, something which I really don't think I'd like to see due to the increased risk of data loss with a single sector failure.
    ----------------------------

  2. Isn't this just called "compression"? by Ami+Ganguli · · Score: 3

    I don't really know how existing compressed filesystems are implemented, but since storing pointers to existing copies of data is an old compression technique, I always assumed this was part of it.

    Now I suppose it's possible that existing compression only takes advantage of redundancy within a file. In that case extending this to the whole file system might be considered innovative - but I would ask why they didn't do that in the first place.

    --
    It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail. - Abraham Maslow
  3. Poor Implementation of a Good Idea by jd · · Score: 3
    IMHO, comparing entire files is stupid. You don't do that for RCS! Why? Because most of the time anyone would even WANT this feature, they'll have large numbers of SIMILAR files, NOT identical ones!

    What would I do, then? I'd take a tree-based system, such as ReiserFS, and turn it into a graph-based one. Instead of automatically setting up file-level symlinks, to entire files, I'd have block-level symlinks, chaining file components.

    If a component in one file changes, just create a fresh copy, move the links for that file over, and save the revisions to it.

    By doing this, I could actually see people saving 80%-90% on disk space, as there are lots of files on many computers with identical segments.

    However, the only way that I could see Microsoft claiming that kind of saving, for the system they are describing, is if Windows 2000 has large numbers of identical files in it. Oh, it does? That would explain the 50 million lines of code.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  4. Re:These are neither Unix symlinks nor Unix hardli by Sneakums · · Score: 3

    A shortcut is not equivalent to a symlink. A symlink is handled by Unix at the OS level, when path traversal is done. Handling of ".." is done by the shell.

    In Windows, shortcuts are handled at the shell level, and they interact badly with path-name traversal.

  5. Microsoft Invents *Automatic* Symbolic Links by EricWright · · Score: 3
    From the article:

    "The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times," Bolosky said. "So if you have 10 files with the same exact bits, instead of storing this data 10 times, it stores it once. It frees up a lot of space, and you realize performance improvements on the server."

    The point being that W2K will automatically notice if multiple files are bit-for-bit copies of each other, and store the file once with symbolic links in other folders. It's the automatic part that makes this an M$ innovation.

    BTW, I am *not* advocating M$ at all, just pointing out a yet-another-misconception in the Slashdot title...

    Eric

  6. Symlinks aren't all they created by Shotgun · · Score: 3

    Read the article. They also invented text-to-speech. I guess those programs I got with my first 8-bit SoundBlaster card were stolen from Microsoft by a future CreativeLabs employee with a time machine. He stole the programs from Windows 2005 and travelled back to 1993 where he relabled the program and gave it away with overpriced sound cards.

    --
    Aah, change is good. -- Rafiki
    Yeah, but it ain't easy. -- Simba
  7. Re:These are neither Unix symlinks nor Unix hardli by MikeBabcock · · Score: 3

    I hate to be the guy to burst your ego (nah, I don't really -- but it sounds polite), but you're wrong.

    Shortcuts are files that contain data about a file they want pointed to.

    Hard links are actually pointing to the equivalent of a FAT entry for the file in question.

    File starting at sector 301 = "blah"
    /home/myfiles/blah.txt is a link to 301
    /home/yourfiles/blah.txt can be alink to 301

    ... they don't (actually) link to "each other" but to the same space on the drive ... when the primary changes, the other does too.

    ... read up on linking.

    --
    - Michael T. Babcock (Yes, I blog)
  8. Single Instance Store saves space on Slashdot! by Speare · · Score: 3

    Tongue in cheek:
    If SlashDot were running on Windows 2000, all seven hundred copies of the following article would be coalesced into a single copy:

    • Misleading headline on Slashdot! (Score:2, Informative)
      by Various (dont.spam.me.I.cant.run.spam.filters.myself@somed omain.com) on Thu 02 Mar 08:53AM EST (#69)
      (User Info) http://winblows.sucks/

      It seems to me that this is another example of the Slashdot Editors getting carried away again; I mean, clearly they didn't read the original article or check their facts.

      The original article states that this is an automatic process, and finds identical file copies as candidates for symbolic linking plus copy-on-write.

    Now, all we need is a semantic copy detector.

    (Single Instance Store saves space on Slashdot! =anagram> Cheapness: so overloading tactless nastiness.)
    --
    [ .sig file not found ]
  9. Reinventing UNIX (the real kicker) by Carnage4Life · · Score: 3

    Altogether, nine separate groups within Microsoft Research contributed more than 15 innovations to Windows 2000, including everything from the computer code that identifies bugs and security attacks to the underlying technology that enables computer applications to encrypt and decrypt confidential information.
    ...
    Typically, Microsoft centers its research on innovations that will be ready for development three to seven years in the future.


    So let me see this means that each group took three to seven years coming up with 1.5 innovations that already exist on Unix systems. No wonder people call them Microsloth.

  10. Give the Devil his due by Wellspring · · Score: 3

    My read of the press release is that the links are created dynamically and automatically. Keep in mind, this may be marketting-garbled mush, but it sounds like they are using a daemon to dynamically assign symlinks whereever duplication is found.

    They claim this will save 80%-90% hard drive space. I'm very skeptical of that, even if it is all they are claiming it is.

    Is there a patent? Mayhaps someone can write a filesystem which implements this. I'm really doubtful that this is anything that will more than marginally affect effective hard drive capacities, and at some cost in overhead, but it might be worth playing with on a UNIX.

  11. Wow. by Yaruar · · Score: 3
    They've managed to come up with more that 15 innovations in code millions of lines long and years in the development.

    I'm impressed. At this rate I reckon they must come up with one innovation roughhly every 50000 man hours of coding.

    Makes you wonder how any of these small companies do it. ;-)

    --
    Working for the (other) man
    1. Re:Wow. by Bob+Ince · · Score: 5
      They've managed to come up with more than 15 innovations

      I think you need to put a little more emphasis on that.

      nine separate groups within Microsoft Research contributed more than 15 innovations to Windows 2000

      Wow! OVER FIFTEEN separate innovations, that's amazing isn't it! How can one company come up with such a staggering number of innovations?! Hooray!

      I mean, I thought I came up with an innovation the other day, but actually it wasn't. Innovations are hard, oh boy!

      I'm impressed. At this rate I reckon they must come up with one innovation roughhly every 50000 man hours of coding.

      Indeed. And with 63,000 "issues", that's more than four thousand bugs per innovation, folks.


      --
      This comment was brought to you by And Clover.
  12. This isn't symbolic links... by X · · Score: 4

    I wouldn't call it innovative, but it's clearly not symbolic links. For one thing, it's not explicit. It all happens under the hood. For another, it has all this database of hashes to enable copy-on-write symantecs. Please try not to bait people so much!!!

    That being said, as a sys-admin, I would want to be able to disable this. While I'm sure the copy-on-write feature uses sufficiently unique hashes of the files to identify changes, there are definitely cases where I *want* to have physically seperate copies, particularly if files are on different partitions (maybe this is only done at a partition level, who knows?).

    I did like their other "innovations": IPv6 (gee, I can get that for Linux can't I?), text-to-speech (yup, also for Linux), a statistics based trouble-shooting tool (of course, the stats would have to be setup before Win2000 was deployed, so you can imagine how accurate they'll be), etc. I mean who do they think they're kidding? Not only does Linux have similar facilities to most of their "innovations", but ALL of these innovations are available elsewhere as add ons to Windows! Oh, wait, I forgot, innovation is when Microsoft takes other's technology and bundles it with Windows..... ;-)

    --
    sigs are a waste of space
  13. Inaccurate reporting (again...) by Sanity · · Score: 4
    Yet again we see a sensationalised report in Slashdot.org. It took me 30 seconds of reading the article to discover that what the M$ guys had done was not just re-invent symbolic links, clearly "Scromp" was so keen to post this that he didn't even bother reading the article.

    What actually happens is that when an object is stored in the system - the system checks to see whether it is identical to another object, and if so - just stores a reference to the other object. This is achieved using a "signature" or in non-M$ language - a hash.

    It is actaully a reasonably good idea although I can't see how it could have taken so long to implement it. Now it is quite possible that this has been done before, but it certainly isn't just symbolic links!

    --

  14. In related news... by Just+Some+Guy · · Score: 4

    Associated Press

    2000-03-02

    Today, in an unprecedented show of candor, top Microsoft spokesmen admitted that Windows 2000 ("W2K") is largely redundant.

    "Yes, it's true. We have so much duplicated crap in W2K that we're developing new technologies to deal with the sheer volume of bloat", said David Spiker, in a Redmond, WA press conference this morning.

    "I mean, honestly. We've incorporated two thirds of RedHat Linux and thirty percent of FreeBSD. They're the same thing, really, so why store two separate copies?"

    Other industry figures weren't too impressed with Microsoft's (Symbol: MSFT) new direction.

    Said Richard Stallman, a leading freeware author, "Come on. We've had gzip for years. They can just compress everything." ("Gzip" is a cryptic program that runs on older "Unix" systems, which are similar to Microsoft's innovative "DOS" operating system).

    At least some insiders, though, cheered the move. An unnamed employee of Sun Microsystems said that "it's about time they squeeze the crap out of that pig", soundly endorsing Microsoft's creativity and initiative. "I mean, really, their competitors don't have a tenth of the [features] to squeeze, let alone a reason to come up with new [systems] to squeeze it."


    Side note to Dave: That's what you get for leaving us. :)

    --
    Dewey, what part of this looks like authorities should be involved?
  15. So,that's why you always have to reboot Windows! by Morgaine · · Score: 4

    Now we know why Windows needs to be rebooted every time that a significant event occurs: the Single Instance Store collapses all solutions into a single answer of "REBOOT" to fulfil their goal of massive saving of storage space, so when the trouble-shooting tool from their Decision Theory and Adaptive Systems Group uses its advanced statistical model to deduce that the most probable solution, naturally it returns the same result every time.

    What hope does Tux have against such ingenuity!

    ;-)

    --
    "The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
  16. Broken Disk Storage Model by Bilbo · · Score: 4
    You're right that this "innovation" is a lot more than a reimplementation of symbolic links. In fact, in the world of Windows, it might even be considered a "new innovation".

    Problem is, from the sketchy details in the article, this looks like a hack to fix a problem that traces all the way back to the broken disk storage model inherited from DOS - a problem that doesn't even exist under UNIX.

    The UNIX model creates a single storage tree with the ability to mount new devices (disk partitions or NFS exported directories) at any point in the heirarchy. It's a model designed from the ground up around the concept of a network. The main advantage of this model is that it's simple to install a single copy of an application in a standard location on a network server (/usr/local or /opt or whatever) and simply mount this location on all the individual workstations. This way, you have one copy of the software (the "batch of bits") that can be used by any number of hosts.

    With DOS, everything was assumed to be local. All your libraries are on the C:/WINDOWS/SYSTEM directory. Everyone has their own copy of the Registry that points to where things are located. If you want to install a piece of software, in most cases you have to install it locally, or at least all the DLL files are local. Hence, if you have 1,000 people on the network using MS Word, you end up with 1,000 identical copies of MS Word instaled! If you want to upgrade software for all those users, you have to do it 1,000 times.

    With a storage model like that, it's no wonder MS is looking for ways to save space on identical copies of software. If it's static data (like installed software, then backups are less of an issue. I would think that the backup would grab a copy of the copy and write it out as if it was a unique file.

    Still, it's nothing more than a patch to a fundamentally broken architecture!

    --
    Your Servant, B. Baggins
  17. Great for Servers. by Bastard+Operator+Fro · · Score: 4

    It actually makes sense to install this on a machine that has a lot of user drive space.

    The software will make symbolic links automaticly, so when lusers all save
    that "Wassup" commerical to the user
    drive the machine doesn't run out of
    space.

    It's not just symbolic links, it's automatic
    symbolic links.

    --
    Shaun Nelson - Bastard Operator (From Hell / For Hire)
  18. A History of How Microsoft Innovates by llywrch · · Score: 4

    I feel a certain ``sameness" in this discussion about how SIS duplicates UNIX symlinks -- a ``sameness" that recurs with every development that MS announces as a ``brand-new innovation."

    Let's look at another much-heralded inovation from MS's past -- multitasking -- & compare how its reception mirrors the reception to SIS:

    1) MS announces -- with much enthusiasm & pride -- a new development. In 2000 it was SIS; in 1992, it was multitasking under Windows.

    2) Based on their enthusiasm & the amount of pride, many knowledgeable computer users expect it to be the same -- or better -- as an existing useful feature under other OS's. In 2000 SIS is compared to UNIX symlinks; in 1992 people expected preemptive, multi-user multitasking.

    3) After some examination, it is discovered that the MS innovation is not as useful as first thought. SIS replaces duplicate files with a pointer, & is not actually a symlink; windows multitasking is co-operative -- a second program or process can't get its share of the processor until the first one decides it's finished.

    4) The real, but marginal, added good of this innovation is soon countered by the flaws or bugs it introduces. SIS can frustrate a user making backup copies, & can reduce CPU performance as it checks for duplicates; a locked program in a co-operative multi-tasking system can lock the OS just as tight as under a single-tasking OS like DOS -- forcing a reboot.

    5) Users have no way around these flaws. ``Every time I move a 40MB file to my Y2K box it slows down to a crawl because its confirming this file is not a duplicate. Byte by byte." -- ``I lost two hours of work because a program GPFed & forced me to reboot the entire system."

    6) Expectations of software written by a certain company are once again lowered.

    Okay, I admit the last is speculative. But a lot less than it might seem.

    Geoff

    --
    I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
  19. RTFA: Obvious but Innovative by A+Big+Gnu+Thrush · · Score: 4

    I think if you actually read the whole article, this is an innovation. The program checks for files that are duplicates, then replaces the duplicates with a symbolic link to a single file. This happens automagically, so the user never notices, or knows. It's the second part of this that may cause problems.

    Let's say, at the end of the day, I copy a folder which contains files I have been working on to a backup folder on the same hard drive. The program deletes all of my copies, replacing them with symbolic links. The next day after a few hours work, I realize I need to revert to one of my backup files, but it's been changed to a symbolic link to the file I'm using now. Presto!

    I think this has potential, and I think it could be a good idea, but the gods live in the details.

  20. Re:A new low... by MindStalker · · Score: 4

    See, the lunch loosing point is how the reporter is basically worshiping the ground these guys walk on. As I wilt into the carpet, I realize Lucovsky must have mentioned to his colleague how nervous I was about approaching them. After all, who the hell am I to be talking with these guys? They are developers' developers -- two of the visionaries behind the operating system that began as NT OS/2 and has evolved into Windows 2000. Cutler refuses virtually all interviews with the press, but he and Lucovsky are willing to talk to me, a program manager from down the hall. They probably find my nervousness amusing.

    Is this supposed to be a reporter. Or did they just grab some office lakey is told them to interview them while they recieved a blow job.

  21. Reinventing UNIX by hph · · Score: 4

    This just confirms(paraphrased): "Those who don't understand unix are doomed to reinvent it, poorly" This is just hilarious.

  22. This defeats the way users work. by hey! · · Score: 4

    Actually, this is more than symlinks, in that it is done by a background agent on the user's behalf. Thus, they take a useful feature and make it extremely dangerous.

    People make duplicates of files for good reasons(it's a bad idea to assume they are clueless).

    They may want to have a fair copy of a document before it is mangled by the commitee process. They may wish to check out a document to create a version fork. So, I see a great html file, and decide I want to use it as a template for my own document, throwing out the contents. I go home, and overnight the system replaces my copy with a link. Then I edit the file and blam -- I just changed somebody's web page. Yuck.

    Of course there may be safeguards, and maybe the users should use a document management system. The problem is that document management systems are sometimes overkill; where they are used the problem being solved doesn't exist. As far as safeguards are concerned, they complicate a simple process to solve a problem that in the end is not very important at all.

    The irony is that they are optimizing the wrong thing. Disk space for user file storage is cheap. Even if you have enough users to drop 20-30K$ on a hardware RAID, it is still cheap relative to the time to administer the system and cheaper still with respect to user time.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  23. Well, there is something new here by Salamander · · Score: 4

    My first thought was that they might have done something slightly different from a familiar old symlink by applying "copy on write" semantics. While I couldn't find anything in the article that unambiguously states whether they're doing this, I did find:

    >The first piece searches for duplicate files, computes a signature for each file and stores these signatures in a database. It then compares the signatures in the database and merges duplicate files.

    This is actually kinda nice. It's going out and looking for copies, instead of relying on you to create the links explicitly. In order for this to be truly transparent you'd have to use COW, of course. There's also a concern about how much CPU time and bus bandwidth will get used finding and tracking these duplicates. Lastly, let's not forget that it would be trivial to create a daemon to do the same thing on any UNIX.

    In short, it's an interesting idea, but perhaps still a bad one. In any case, I don't think it's "just a reinvention of the symlink" even though MS has tried to take credit for a lot of rather ancient computing ideas.

    --
    Slashdot - News for Herds. Stuff that Splatters.
  24. Re:End of Backups? by Salamander · · Score: 4

    >Symlinks are great, but not ALL duplicate files should become them.

    I strongly suspect that the real innovation here is using the ancient virtual-memory trick of "copy on write" to files. In your example, foo.conf and foo.conf.old would be links to the same data at first, but the moment you write foo.conf the link gets broken and a fresh copy of the data is created automagically so your updates to foo.conf don't affect foo.conf.old. Problem solved.

    For reasons described in my earlier post I'm not sure this is a great idea, and it certainly isn't likely to "free up as much as 80 to 90 percent of the space on a server, allowing users to store as much as five to 10 times the information" as MS claims, but I'd be amazed if even MS would allow the problem you describe to occur. It's too obvious even for then. In fact, this may provide the answer to the "why did it take them so long" question. Plain old symlinks would be easy to add (been there, done that, on an MS platform) but adding the COW behavior could be tricky.

    --
    Slashdot - News for Herds. Stuff that Splatters.
  25. Not a bad idea by dsplat · · Score: 4
    As several people have pointed out, this appears to be an automatic process rather than a user/programmer selected choice between cp and ln. This paragraph from the article makes that pretty clear:

    "The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times," Bolosky said. "So if you have 10 files with the same exact bits, instead of storing this data 10 times, it stores it once. It frees up a lot of space, and you realize performance improvements on the server."


    If this is combined wit copy on write it is a good idea. If it isn't, it is obvious that it creates the possibility of things being changed identically when only a local copy should be changed.

    Another obvious issue is that it clearly isn't as flexible as the Unix combination of copies, hard links and symbolic links. I can choose whether to make a copy which will thenceforth be a separate file, create a hard link which will be a separate name for the same data, or create a symbolic link which is a reference to another file which may or may not exist. This is typical of the differences between Windows and Unix. The Windows approach is powerful, but does not leave as much flexibility in the hands of the user or programmer. Unix assumes that if you know what you are doing, you can make the choice for yourself.
    --
    The net will not be what we demand, but what we make it. Build it well.
  26. Well, sortof by zyklone · · Score: 5

    Well, the article sortof suggests that they set up the links automatically, which would be an innovation perhaps.

    1. Re:Well, sortof by JordanH · · Score: 5
      • Well, the article sortof suggests that they set up the links automatically, which would be an innovation perhaps.

      It's more than suggested. From the quoted article:

      • "The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times," Bolosky said. "So if you have 10 files with the same exact bits, instead of storing this data 10 times, it stores it once. It frees up a lot of space, and you realize performance improvements on the server."

      I would assume that, in this architecture, if you change one of the instances, it creates a new copy and does not modify the original file. This would allow you to have only one copy of a configuration file that needs to be copied to each user's directory for possible customization and only have the extra copies made if someone did, in fact, customize. Can anyone out there who really knows about Single Instance Store care to comment?

      This is significantly different than symbolic links. It involves symlinks, but it's more than that. I'm not sure if it's really a big win, but it is different.

      It seems to me that it's usefulness is an artifact of how difficult it is to manage Registry entries. Every application has Registry entries that point to application (configs, executables, dlls, etc.) files. You could always create user's common files with Registry entries pointing to a common copy and then modify the Registry entry to point to a local copy when you went to customize the user's environment. Of course, having the file system "recognize" this is in some ways more convenient. It also leads to redundancy of functionality (both the Registry and the Single Instance Store have a way of consolidating common files).

      It seems that the Single Instance Store could be considerable overhead. For example, if I change a configuration file and get a local copy (if that's how it works, if that's not how it works I don't understand how Single Instance Store can be useful at all), does the Single Instance Store scan all of the files in the system to check if there's an exact copy of my new file somewhere? Does it have to make this scan continually for all changed files? Is it "lazy" about creating Single Instances in the case of transient files that are being opened/closed and updated frequently that just happen to be identical for a short period of time?

      What would be really cool is if similar files were maintained as diffs from a single base file that were automatically applied on reading. Would this be an attribute of a log-structured file system? Or do current log-structured file system designs not include provisions for consolidating multiple files like this?


      -Jordan Henderson

  27. End of Backups? by SwiftOne · · Score: 5

    This article isn't terribly technical, so I can't be sure, but as I read it, this has one major flaw: If it automatically detects duplicate files and symbolically links them, it will ruin your ability to backup a file by creating a copy of it somewhere.

    Symlinks are great, but not ALL duplicate files should become them. If I have file foo.conf, and I back it up to foo.conf.old, but don't change it right away [Or maybe that doesn't matter, if this "feature" is constantly running], their SIS program will symlink foo.conf and foo.conf.old to be the same file. Then when I change foo.conf, and hose my system, I can't restore it by using foo.conf.old, because that file was changed when I changed foo.conf!

    Can anyone give more details about how this works?

  28. Re:How to prevent DLL Hell by funkman · · Score: 5
    DLL Hell is when you have need different versions of the same dll on the same machine because some version of some dlls break other applications. I believe W2K gets around this by required new software to install all of the dlls they need in its own application directory.

    So your app uses MSVB500.dll? Then your target machine will get an extra copy of it. Have 5 different applications (from 5 different vendors) which each use MSVB500.dll, then you will have 5 copies of MSVB500.dll. This sounds like the road to true bloat but at least you end up with a more robust application because other peoples' installations of common dlls won't break your app. With this automatic symbolic linking, much of the hard drive space which could have been wasted by redundant dlls can now be reclaimed. This could be a real good thing.

  29. shortcuts? you gotta be kidding by mjackso1 · · Score: 5

    .lnk files are worthless unless you're doing 1 of 2 things:
    1) launching an executable
    2) launching something that is a registered filetype (i e, C:\>foo.lnk where foo.lnk points to foo.txt)

    This is extremely limited.......
    for example, you can't make a shortcut to a library if that library isn't in your path.

    just for kicks, make a text file in win32, then make a shortcut to it.
    then:
    c:\>type foo.txt
    you get a bunch of high ascii garbage with the path of the target mixed in there.

    compare
    $echo this way works >foo.txt
    $ln -s foo.txt foo.realsymlink
    $cat foo.realsymlink

    OR, drop the -s and then try moving the target file around. Try that with your shortcut.lnk

  30. These are neither Unix symlinks nor Unix hardlinks by remande · · Score: 5
    Slashdot has it wrong today. Microsoft has had the equivalent to Unix symlinks for a long time--they're called "shortcuts". Like a Unix symlink, a Windows shortcut is a small file that does nothing but point to another path where the real file is.

    The behavior described in the article is neither a Unix symlink, nor a Unix hardlink. It is something I have never heard of in Unix, an automated symlink. Frankly (and I am a Unix weenie), this looks like a true innovation.

    In Unix filesystems, each file has an "inode" number unique to the filesystem. The directory entries all point to inodes. Thus, two different directory entries can have the same inode, and thus the same bits are accessible from multiple places. Note, for example, that the vi and ex programs are hardlinks to the same executable--the editor simply reads the name it was called with to determine whether it should behave as vi or as ex.

    Hardlinks do not really exist to save space, they exist to link two directory entries at the hip. If one file (inode) has two links (filenames), then grabbing it by one filename and editing it will cause changes which will be visible when you pick it up by the other filename. Note that, because of this, Unix hardlinks are manual. The filesystem doesn't spontaneously create hardlinks; it takes a user process to do this.

    Microsoft's scheme is implicitly handled by the filesystem code.

    The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times

    This implies that this is happening automagically, without user interferance. At worst, this means that the SIS is creating hardlinks on the fly. I doubt this because it would create Mothra-sized bugs as two files get "married" as links and never "divorced". Think about it: users often copy a file byte for byte (causing SIS to link them together), and then edit one and use the other as an unchanging backup.

    My guess is that SIS is linking files on the fly when it recognizes them as equal, and then unlinking them (copy-on-edit) as a file is edited to be different from its linkmates.

    This is simply Microsoft eliminating redundancy in its filesystem. Compression algorithms eliminate redundancy all the time--that's how they save bytes.

    Some Unix flavors do a similar thing in core. When loading up a program, the bits of the binary can be stored once in memory no matter how many invocations the program currently has. If eight people are running Emacs, memory is storing eight Emacs data segments, but only one copy of the Emacs binary.

    This is something one could implement in Linux filesystem code. Each inode would need its own checksum, and there would have to be a one-or-more-to-one relationship between inodes and hardware representations--that is, two different inodes would be able to share the same sectors.

    When a file got edited, the FS would determine whether the sectors were shared with one or more other inodes--if so, you have to "divorce" by copying the sectors elsewhere and pointing the inode to the new sectors.

    When the edit finished, the FS would recalculate the checksum, then look for all other inodes with the same checksum. For any matches, do a byte-for-byte diff to make sure--if so, then point the inode at the same sectors as the old inode and mark the new inode's sectors for reaping.

    The tradeoff is between filesystem space and write performance (read performance is probably unchanged). It takes better minds than mine to determine under what circumstances the tradeoff is worth it.

    --

    --The basis of all love is respect

  31. Not Really by hyrax · · Score: 5
    Win2K doesn't require new software to install all of its own DLLs.

    In Win2K an application has a choice of using the system dlls, which are protected and can't be written over except by a service pack, or it's own private version of a DLL. So if your app requires a specific version of msvcrt.dll, you can install it in the application directory and it will use that copy instead of the system copy.

    For a complete explanation of this: Check out this article