Measuring Fragmentation in HFS+

← Back to Stories (view on slashdot.org)

Measuring Fragmentation in HFS+

Posted by pudge on Wednesday May 19, 2004 @05:03AM from the bring-out-the-big-guns dept.

keyblob8K writes "Amit Singh takes a look at fragmentation in HFS+. The author provides numbers from his experiments on several HFS+ disks, and more interestingly he also provides the program he developed for this purpose. From his own limited testing, Apple's filesystem seems pretty solid in the fragmentation avoidance department. I gave hfsdebug a whirl on my 8-month-old iMac and the disk seems to be in good shape. I don't have much idea about ext2/3 or reiser, but I know that my NTFS disks are way more fragmented than this after similar amount of use."

16 of 417 comments (clear)

Min score:

Reason:

Sort:

HFS+ defrag source by revscat · 2004-05-19 05:06 · Score: 5, Informative

As mentioned in the article, HFS+ does defragging on the fly when files are opened if they are less than 20MB. The source code for this is available here, as is a discussion about it that contains input from some Darwin developers.
1. Re:HFS+ defrag source by ericdano · 2004-05-19 05:25 · Score: 4, Informative
  
  Intech's Speedtools is a good set of utilities and includes a good defragmenter. For a complete defrag, something like Drive 10 or TechTool 4 work better.
  
  Good luck
  
  --
  It's either on the beat or off the beat, it's that easy.
  I moderate therefore I rule!
  --
2. Re:HFS+ defrag source by ahknight · 2004-05-19 05:27 · Score: 4, Informative
  
  As stated in the article, this is a feature of the HFS+ code in Panther. The filesystem cannot have a defrag feature as the filesystem is just a specification. The implementation of that specification, however, can do most anything to it. :)
3. Re:HFS+ defrag source by Anonymous Coward · 2004-05-19 06:23 · Score: 4, Informative
  
  You just made his point. The DRIVER does the defragging. The HFS+ is a specification for how the files are laid out and written to the disk, such that a driver that understands this specification can read it. Linux has HFS+ drivers, but I doubt they defrag on the fly. Supposedly (though I don't know), Mac OS versions prior to 10.3 didn't defrag either.
  
  So therefore it might be a part of the operating system's filesystem. That's the system that deals with files. But that's not what was asked. What was asked was whether it was an inherent feature of HFS+, and that's not possible, since HFS+ doesn't tell the OS what to do when a file is opened, only how the stuff is stored on the disk.
  
  Perhaps you didn't understand the dual nature of the word filesystem: it can be the subsystem of the OS that handles files, or it can be the physical representation of the data on to the hard drive. If you assume it's only the first, your explanation makes sense. If you assume the second one (which would be the usage intended and understood by most people given the fact that the question and response were about HFS+ (physical filesystem) compared to Panther (OS filesystem)), then you'd be wrong.
  
  And I've been trolled, but who cares.
4. Re:HFS+ defrag source by Daniel_Staal · 2004-05-19 06:28 · Score: 5, Informative
  I believe the actual sequence is this:
  
  Get request for file
  
  Open File
  
  Buffer file to memory
  
  Answer request for file
  
  If needed, defragment file
  
  In other words, it defrangments after the file has been returned to the program needing it, as a background process. The buffer to memory is a pre-existing optimization, so the only real trade off is the background processor usage goes up. If you aren't doing major work at the time, you'll never notice. (And if you are doing major work, you probably are using files larger than 20MB in size anyway.)
  
  Files larger than 20MB just aren't defragmented, unless you have another tool to do it.
  --
  'Sensible' is a curse word.
5. Re:HFS+ defrag source by shamino0 · 2004-05-19 08:18 · Score: 5, Informative
  
  HFS+ was one of the major features of the OS 8.1 update. OS 8.0 and earlier can't "see" HFS+ volumes- they see a tiny disk with a simpletext file titled "where have all my files gone?" which, if I remember correctly, gives a brief explanation that the disk is HFS+ and requires 8.1 or higher to view. :)
  And the person who came up with this idea was a genius. This is far far better than what most other operating systems do (refuse to mount the volume.)
  If I boot MS-DOS on a machine that has FAT-32 or NTFS volumes, I simply don't find any volume. I can't tell the difference between an unsupported file system and an unformatted partition. If the file system would create a FAT-compatible read-only stub (like HFS+ does), it would be much better for the user. Instead of thinking you have a corrupt drive, you'd know that there is a file system that your OS can't read.
My stats by Twirlip+of+the+Mists · 2004-05-19 05:13 · Score: 4, Informative

I throw these out there for no real reason but the common interest.

I've got a G4 with an 80 GB root drive which I use all day, every day. Well, almost. It's never had anything done to it, filesystem-maintenance-wise, since I last did an OS upgrade last fall, about eight months ago.
Out of 319507 non-zero data forks total, 317386 (99.34 %) have no fragmentation.
Not too shabby, methinks.

--

I write in my journal
Re:NTFS is not so bad by MemoryDragon · 2004-05-19 05:18 · Score: 5, Informative

Ntfs does not fragment that strongly as long as you dont hit the 90% full mark of your disk, once you reach that, see the files becoming fragmented in no time. NTFS uses the open space for write access and then probably relocates the files in time, once it hits 90% the open space usage algorithm does not seem to work anymore.
Re:Huh? by Ann+Elk · 2004-05-19 05:39 · Score: 4, Informative

My own experience, using a small tool I wrote to analyze NTFS fragmentation:

NTFS is pretty good at avoiding fragmentation when creating new files if the size of the file is set before it is written. In other words, if the file is created, the EOF set, and then the file data is written, NTFS does a good job of finding a set of contiguous clusters for the file data.

NTFS does a poor job of avoiding fragmentation for files written sequentially. Consider a file retrieved with wget. An empty file is created, then the contents are written sequentially as it is read from the net. Odds are, the file data will be scattered all over the disk.

Here's a concrete example. Today, I downloaded Andrew Morton's 2.6.6-mm4.tar.bz2 patch set. (Yes, I run WinXP on my Toshiba laptop -- deal with it.) Anyway, the file is less than 2.5MB, but it is allocated in 19 separate fragments. I copied it to another file, and that file is unfragmented. Since the copy command sets EOF before writing the data, NTFS can try ot allocate a contiguous run of clusters.

Note - This was done on uncompressed NTFS. My feeling is that compressed NTFS is even worse about fragmentation, but I don't have any numbers to back that up.
Re:Bzzt! Nope. Close, though! by Steveftoth · 2004-05-19 05:42 · Score: 4, Informative

Jaguar (10.2) has journaled support as well, but you had to enable it as it was not a default option.

Even in 10.3 it's optional, not required, but it's the new default for new disks. Probably because Apple decided that their code was solid enough to put into production. After testing it on 10.2 I agree with them.
Re:Big frag issues under EXT2 too by 42forty-two42 · 2004-05-19 05:45 · Score: 4, Informative

Manually run e2fsck it'll tell you how fragmented it is, as in:

$ e2fsck -f -n knoppix.img knoppix.img: 453/7680 files (3.1% non-contiguous), 12180/30720 blocks
Apple updated their stand recently by djupedal · 2004-05-19 05:50 · Score: 4, Informative

http://docs.info.apple.com/article.html?artnum=256 68

Mac OS X: About Disk Optimization

Do I need to optimize?

You probably won't need to optimize at all if you use Mac OS X. Here's why:
Re:NTFS is not so bad by 222 · 2004-05-19 05:52 · Score: 4, Informative

For proof, check out this. This drive was defragged about a week ago, and although it does go through heavy use, the current low disk space causes massive fragmentation.
Re:File allocation Table by AKAImBatman · 2004-05-19 06:25 · Score: 4, Informative

There are a couple things that you have to consider. For one, if part of the disk corrupts, how will you identify a header? Or for that matter, how would you identify the header space vs. file space in a non-corrupted file system?

You're probably thinking "just store the size of the file", This is perfectly valid, but it does have certain implications. You see, in Comp-Sci, we refer to a list like this as a "linked list". The concept basically being that each item in the list has information (i.e. a "link") that helps identify the next item in the list. Such a data structure has a worst case access time of O(n). Or in other words, if your item is at the end of the list,and you have you have 2000 files, you'll have to check through all two thousand headers before finding your file.

Popular file systems circumvent this by using what's called a Tree structure. A tree is similar to a linked list, but allows for multiple links that point to children of the node. A node that has no children is referred to as a "leaf node". In a file system the directories and files are nodes of a tree, with files being leaf nodes. This configuration gives us two performance characteristics that we must calculate for:

1. The maximum number of children in a node.
2. The maximum depth of the tree.

Let's call them "c" for children and "d" for depth. Our performance formula is now O(c*d) and is irrespective of the number of items in the data structure. Let's make up and example to run this calculation against:

Path: /usr/local/bin/mybinary

Nodes:
/ (34) /usr (10) /usr/local (9) /usr/local/bin (72)

Longest path: /usr/X11R6/include/X11

Plugging the above numbers (72 for c, 4 for d) we get a worst case of 72*4 = 288 operations. Thus our worst case is much better than the linked list. And if we calculate the real case to access /usr/local/bin/mybinary, we get 34+10+9+72 = 134 operations.

Hope this helps. :-)

--
Javascript + Nintendo DSi = DSiCade
Re:How to determine fragmentation... by pantherace · 2004-05-19 06:52 · Score: 4, Informative

Well, all modern operating systems handle it so that any program, except certain tools such as the defragmenter, which either look at it directly, or use a lower level call.
NTFS is horrible. on a system installed less than a week ago, and a few programs (nwn, firefox, avg, itunes, aa, nvdvd, windows updates, and a couple more programs, it has 9.3GB used, and it is reported that it has "Total Fragmentation: 22%, File Fragmentation: 45%"
So yes there are various methods of calculating file fragmentation. (2 I can think of: (# of files with fragments)/(total number of files) = 0 for a totally defragemented hd (& gives nice percentages) & (# of file fragments)/(total number of files) = 1 for a perfectly defragmented hd. or variations on those, and I haven't been able to find what calculations Windows, & e2fstools use, so I can't tell.
File types and fragnentation by Artifakt · 2004-05-19 08:29 · Score: 4, Informative

There are so many comments already posted to this topic that seem to not grasp the following point, that I think the best way to deal with it is to start a completely new thread. I'm sorry if it seems more than a little obvious to some of you:

There are fundamentally only a few types of files when it comes to fragmentation.

1. There are files that simply never change size, and once written don't get overwritten. (Type 1). Most programs are actually type 1, if you use sufficiently small values of never :-), such as until you would need to perform disk maintenace anyway for lots of other reasons in any 'reasonable' file system. A typical media file is probably Type 1 in 99%+ of cases.

2. There are files that will often shorten or lengthen in use, for example a word processor document in .txt format, while it is stll being edited by its creator. (type 2). (That same document may behave as effectively Type 1 once it is finished, only to revert to type 2 when a second edition is created from it.)

Of type 2, there are files of type 2a. Files that may get either longer or shorter with use, on a (relatively) random basis. (as a relatively simple case, a .doc file, that may become longer for obvious reasons like more text, but may also become longer for less obvious reasons (such as the hidden characters created when you make some text italic or underlined). (These are reasons that are not obvious to most end users, and often not predictable in detail even to people who understand them better). The default configuration for a Windows swap file is type 2a. It is likely to be hard for an automated system to predict the final size of Type 2a files, as that would imply a software system of near human level intelligence to detect patterns that are not obvious and invariant to a normal human mind. It may be possible to predict in some cases only because many users are unlikely to make certain mistakes, (i.e. cutting and pasting an entire second copy of a text file into itself is unusual, while duplicating a single sentence or word isn't).

Then there are files of type 2b. Files that get longer or shorter only for predictable reasons, (such as a Windows .bmp, which will only get larger or smaller if the user changes the color depth or size of the image, and not if he just draws something else on the existing one.). A good portion of users (not all by any means) will learn
what to expect for these files, which suggests a well-written defragger could theoretically also auto-predict the consequences of the changes a user is making).

3. Then there are type 3 files, which only get longer. These too have predictable and unpredictable subtypes. Most log files for example, are set up to keep getting longer on a predictable basis when their associated program is run (type 3b). Anything that has been compressed (i.e. .zip) is hopefully a 3b, but only until it is run, then the contents may be of any type. A typical Microsoft patch is a 3a (it will somehow always end up longer overall, but you never know just what parts will vary or why).

4. Type 4 would be files that always get smaller, but there are no known examples of this type :-).

These types are basic in any system, as they are implied by fundamental physical constraints. However, many defrag programs use other types instead of starting from this model, often with poor results.

In analyizing what happens with various defrag methods, such as reserving space for predicted expansion or defragging in the background/on the fly methods, the reader should try these various types (at least 1 through 3), and see what will happen when that method is used on each type. Then consider how many of those type files will be involved in the overall process, and how often.

For example, Some versions of Microsoft Windows (tm) FAT32 defragger move files that have been accessed more than a certain number of times (typically f

--
Who is John Cabal?