Slashdot Mirror


Microsoft Invents Symbolic Links

scromp writes, "Microsoft really does innovate! See for yourself! " I can't decide if this is supposed to be a joke or not. I mean, it's funny, but I just can't tell. Perhaps I need a cup of coffee before I try to post stories.Update: 03/03 06:11 by H :To be fair to Microsoft, they did talk about symlinks.

8 of 712 comments (clear)

  1. Well, sortof by zyklone · · Score: 5

    Well, the article sortof suggests that they set up the links automatically, which would be an innovation perhaps.

    1. Re:Well, sortof by JordanH · · Score: 5
      • Well, the article sortof suggests that they set up the links automatically, which would be an innovation perhaps.

      It's more than suggested. From the quoted article:

      • "The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times," Bolosky said. "So if you have 10 files with the same exact bits, instead of storing this data 10 times, it stores it once. It frees up a lot of space, and you realize performance improvements on the server."

      I would assume that, in this architecture, if you change one of the instances, it creates a new copy and does not modify the original file. This would allow you to have only one copy of a configuration file that needs to be copied to each user's directory for possible customization and only have the extra copies made if someone did, in fact, customize. Can anyone out there who really knows about Single Instance Store care to comment?

      This is significantly different than symbolic links. It involves symlinks, but it's more than that. I'm not sure if it's really a big win, but it is different.

      It seems to me that it's usefulness is an artifact of how difficult it is to manage Registry entries. Every application has Registry entries that point to application (configs, executables, dlls, etc.) files. You could always create user's common files with Registry entries pointing to a common copy and then modify the Registry entry to point to a local copy when you went to customize the user's environment. Of course, having the file system "recognize" this is in some ways more convenient. It also leads to redundancy of functionality (both the Registry and the Single Instance Store have a way of consolidating common files).

      It seems that the Single Instance Store could be considerable overhead. For example, if I change a configuration file and get a local copy (if that's how it works, if that's not how it works I don't understand how Single Instance Store can be useful at all), does the Single Instance Store scan all of the files in the system to check if there's an exact copy of my new file somewhere? Does it have to make this scan continually for all changed files? Is it "lazy" about creating Single Instances in the case of transient files that are being opened/closed and updated frequently that just happen to be identical for a short period of time?

      What would be really cool is if similar files were maintained as diffs from a single base file that were automatically applied on reading. Would this be an attribute of a log-structured file system? Or do current log-structured file system designs not include provisions for consolidating multiple files like this?


      -Jordan Henderson

  2. End of Backups? by SwiftOne · · Score: 5

    This article isn't terribly technical, so I can't be sure, but as I read it, this has one major flaw: If it automatically detects duplicate files and symbolically links them, it will ruin your ability to backup a file by creating a copy of it somewhere.

    Symlinks are great, but not ALL duplicate files should become them. If I have file foo.conf, and I back it up to foo.conf.old, but don't change it right away [Or maybe that doesn't matter, if this "feature" is constantly running], their SIS program will symlink foo.conf and foo.conf.old to be the same file. Then when I change foo.conf, and hose my system, I can't restore it by using foo.conf.old, because that file was changed when I changed foo.conf!

    Can anyone give more details about how this works?

  3. Re:How to prevent DLL Hell by funkman · · Score: 5
    DLL Hell is when you have need different versions of the same dll on the same machine because some version of some dlls break other applications. I believe W2K gets around this by required new software to install all of the dlls they need in its own application directory.

    So your app uses MSVB500.dll? Then your target machine will get an extra copy of it. Have 5 different applications (from 5 different vendors) which each use MSVB500.dll, then you will have 5 copies of MSVB500.dll. This sounds like the road to true bloat but at least you end up with a more robust application because other peoples' installations of common dlls won't break your app. With this automatic symbolic linking, much of the hard drive space which could have been wasted by redundant dlls can now be reclaimed. This could be a real good thing.

  4. shortcuts? you gotta be kidding by mjackso1 · · Score: 5

    .lnk files are worthless unless you're doing 1 of 2 things:
    1) launching an executable
    2) launching something that is a registered filetype (i e, C:\>foo.lnk where foo.lnk points to foo.txt)

    This is extremely limited.......
    for example, you can't make a shortcut to a library if that library isn't in your path.

    just for kicks, make a text file in win32, then make a shortcut to it.
    then:
    c:\>type foo.txt
    you get a bunch of high ascii garbage with the path of the target mixed in there.

    compare
    $echo this way works >foo.txt
    $ln -s foo.txt foo.realsymlink
    $cat foo.realsymlink

    OR, drop the -s and then try moving the target file around. Try that with your shortcut.lnk

  5. These are neither Unix symlinks nor Unix hardlinks by remande · · Score: 5
    Slashdot has it wrong today. Microsoft has had the equivalent to Unix symlinks for a long time--they're called "shortcuts". Like a Unix symlink, a Windows shortcut is a small file that does nothing but point to another path where the real file is.

    The behavior described in the article is neither a Unix symlink, nor a Unix hardlink. It is something I have never heard of in Unix, an automated symlink. Frankly (and I am a Unix weenie), this looks like a true innovation.

    In Unix filesystems, each file has an "inode" number unique to the filesystem. The directory entries all point to inodes. Thus, two different directory entries can have the same inode, and thus the same bits are accessible from multiple places. Note, for example, that the vi and ex programs are hardlinks to the same executable--the editor simply reads the name it was called with to determine whether it should behave as vi or as ex.

    Hardlinks do not really exist to save space, they exist to link two directory entries at the hip. If one file (inode) has two links (filenames), then grabbing it by one filename and editing it will cause changes which will be visible when you pick it up by the other filename. Note that, because of this, Unix hardlinks are manual. The filesystem doesn't spontaneously create hardlinks; it takes a user process to do this.

    Microsoft's scheme is implicitly handled by the filesystem code.

    The Single Instance Store recognizes that there's duplication, coalesces the extra copies and stores the bits once instead of several times

    This implies that this is happening automagically, without user interferance. At worst, this means that the SIS is creating hardlinks on the fly. I doubt this because it would create Mothra-sized bugs as two files get "married" as links and never "divorced". Think about it: users often copy a file byte for byte (causing SIS to link them together), and then edit one and use the other as an unchanging backup.

    My guess is that SIS is linking files on the fly when it recognizes them as equal, and then unlinking them (copy-on-edit) as a file is edited to be different from its linkmates.

    This is simply Microsoft eliminating redundancy in its filesystem. Compression algorithms eliminate redundancy all the time--that's how they save bytes.

    Some Unix flavors do a similar thing in core. When loading up a program, the bits of the binary can be stored once in memory no matter how many invocations the program currently has. If eight people are running Emacs, memory is storing eight Emacs data segments, but only one copy of the Emacs binary.

    This is something one could implement in Linux filesystem code. Each inode would need its own checksum, and there would have to be a one-or-more-to-one relationship between inodes and hardware representations--that is, two different inodes would be able to share the same sectors.

    When a file got edited, the FS would determine whether the sectors were shared with one or more other inodes--if so, you have to "divorce" by copying the sectors elsewhere and pointing the inode to the new sectors.

    When the edit finished, the FS would recalculate the checksum, then look for all other inodes with the same checksum. For any matches, do a byte-for-byte diff to make sure--if so, then point the inode at the same sectors as the old inode and mark the new inode's sectors for reaping.

    The tradeoff is between filesystem space and write performance (read performance is probably unchanged). It takes better minds than mine to determine under what circumstances the tradeoff is worth it.

    --

    --The basis of all love is respect

  6. Re:Wow. by Bob+Ince · · Score: 5
    They've managed to come up with more than 15 innovations

    I think you need to put a little more emphasis on that.

    nine separate groups within Microsoft Research contributed more than 15 innovations to Windows 2000

    Wow! OVER FIFTEEN separate innovations, that's amazing isn't it! How can one company come up with such a staggering number of innovations?! Hooray!

    I mean, I thought I came up with an innovation the other day, but actually it wasn't. Innovations are hard, oh boy!

    I'm impressed. At this rate I reckon they must come up with one innovation roughhly every 50000 man hours of coding.

    Indeed. And with 63,000 "issues", that's more than four thousand bugs per innovation, folks.


    --
    This comment was brought to you by And Clover.
  7. Not Really by hyrax · · Score: 5
    Win2K doesn't require new software to install all of its own DLLs.

    In Win2K an application has a choice of using the system dlls, which are protected and can't be written over except by a service pack, or it's own private version of a DLL. So if your app requires a specific version of msvcrt.dll, you can install it in the application directory and it will use that copy instead of the system copy.

    For a complete explanation of this: Check out this article