tytso · Slashdot Mirror

Re:ZFS L2ARC on Optimizing Linux Systems For Solid State Disks · 2009-02-21 09:35 · Score: 1

I'm familiar with the L2ARC idea. I think time will tell whether or not adding an extra layer of cache between the memory and commodity SATA hard drive really makes sense or not. For laptop use where we care about the power and shock resistance attributes of SSD's, it makes sense to pay a price premium for SSD's. However, it's not clear that SSD's will indeed become cheap enough, and even if they do, historically the cache hierarchy has 3 orders of magnitude between main memory and disks, and over the last 3 decades, there have been other technologies that have been cheaper per gigabyte than main memory, but more faster given a price level than hard drives (and for one reason or another, they have fallen into what Dr. Steve Hetzler, an IBM Fellow from the IBM Almaden Research Center has called "the dead zone".

I first heard this argument at the December 2008 IDEMA Symposium, where I was giving a talk as the new CTO of the Linux Foundation, and his presentation was well worth the effort I made to head out to the Bay Area to give the talk.

It turns out that Dr. Steve Hetzler is apparently going to be giving the same talk in three days at the Santa Clara Valley Chapter of the IEEE Magnetics Society, which will be held at the Western Digital facility in San Jose on February 24th. A brief talk description and map to the facility can be found here. It's an extremely interesting, entertaining, and thought-provoking talk, and some folks that have seen the slides of Dr. Hetzler's talk have taken an extreme exception to them. However, he makes some very powerful arguments both from supply side (specifically, the capital cost of the Silicon Fabs to replace even 10% of the HDD market is a very large number), and the demand side. For those of you who are in the Bay Area, and who is interested in storage issues, I'd strongly encourage you to listen to his talk and make your own judgements. The web site states that no RSVP's are required, and I don't think you have to be an IEEE member to attend.

Re:take a look at zfs on Optimizing Linux Systems For Solid State Disks · 2009-02-21 09:11 · Score: 1

Seems to me that Sun's zfs filesystem is ready to use the ssd storage. The copy-on-write strategy would seem to avoid the hot spots as zfs picks new blocks from the free pool rather than rewriting the same block.

Actually, given the X25-M's lack of TRIM support, using a log-structured filesystem, a write-anywhere filesystem, or a copy-on-write type system is actually a really bad use of the X25-M, since the X25-M will think the entire disk is in use. The X25-M is actually implemented to optimize for filesystems that reuse blocks as much as possible, since it is internally doing the equivalent of a log-structured filesystem to do wear leveling. TRIM support will obviously help, but for ZFS, the X25-M is probably not a good choice. A cheaper flash drive which doesn't try to be smart about wear leveling would actually be better for ZFS.

Re:What is different about SSD's? on Optimizing Linux Systems For Solid State Disks · 2009-02-21 09:03 · Score: 4, Informative

Because of this, I imagine that the author would like Linux devs to better support SSD's by getting non-flash file systems to support SSD better than they are today.

Heh. The author is a Linux dev; I'm the ext4 maintainer, and if you read my actual blog posting, you'll see that I gave some practical things that can be done to support SSD's today just by better tuning parameters given to tools like fdisk, pvcreate, mke2fs, etc., and I talked about some of the things I'm thinking about to make ext4 better at support SSD's better than it does today.....

Re:Color me stupid on Political and Technical Implications of GitTorrent · 2008-12-04 09:49 · Score: 2, Informative

A group of developers can start a private project without central hosting using git already, today. Look at the man page for "git-bundle". Git commits can already be exchaged via e-mail.

Re:EXT appropriate for desktop? on On the State of Linux File Systems · 2008-11-30 09:46 · Score: 1

If you look at the numbers, the majority of the files on a Linux desktop are not "small files" (by much I mean files substantially smaller than a blocksize). Given that this is the case, why optimize for them?

As far as whether or not the defaults of ext3 are "acceptable" or not --- it's open source! You can change the defaults if you want, or a distribution can change the defaults if they want. I suppose I could add a tuning knob to /etc/mke2fs.conf so you can change the defaults for your system. Regardless, I think it's rather silly to choose an open source filesystem based on whether you like the defaults. After all, Homo Sapiens is a thinking animal; it has the ability to think for itself, and if it doesn't care about the safety of the files stored on the filesystems (or he/she knows that it is protected in other ways, such as RAID-6 with hot sparse, PLUS regular full and incremental backups) he/she can use different filesystem tunining parameters. Or the defaults can be changed --- if you want to distribute a fork of e2fsprogs called "fast and loose with your data progs", there is absolutely nothing in the GPL which stops you from doing that.

Re:A metaphysical question on On the State of Linux File Systems · 2008-11-29 13:02 · Score: 1

The ext2/ext3/ext4 filesystems do a periodic check of the filesystem for correctness out of paranoia; because PC class disks, well, have the reliability of PC class disks, and that's what most people use these days on Linux systems. Other filesystems, such as reiserfs and xfs, are subject to the same kind of potential random filesystem corruption caused by hardware errors that ext3 is; in fact, in some cases their filesystem formats are more brittle than ext2/3/4 against random hardware failures in that a single bad block that corrupts the root node of a reiserfs filesystem, for example, can be catastrophic. It's just that their filesystem checkers don't require doing a periodic check based on time and/or the number of mounts.

If you want to configure ext3 filesystems to have the same happy-go-lucky attitudes towards assuming hard drives never fail as reiserfs, you can do so; it's just a matter of using the tune2fs program; check out the man page options for the -c and -i options to tune2fs. Then you won't do a filesystem check at reboot time.

What I do recommend, especially if you are using LVM anyway, is periodically (say, once a month, Sundays at 2am, or at some other low utilization period), have a cron script run that takes a snapshot of your filesystem, runs e2fsck on the snapshot, and if it has errors, sends e-mail to the system administrator advising them that it is time to schedule downtime to have the filesystem corruption repaired. This has the best of both worlds; you can now do much more frequent checks to make sure the filesystem is consistent, and you don't have to take the system down for long periods of time to do the test, since you can run e2fsck on the snapshot while keeping the system live.

Re:still doing fs on top of RAID :-( on On the State of Linux File Systems · 2008-11-29 09:34 · Score: 2, Insightful

Linux developers are aware of this issue; this is one of the things which is addressed by btrfs.

Re:Problems: IO priority, large #s of files. on On the State of Linux File Systems · 2008-11-29 09:28 · Score: 4, Interesting

NFS semantics require that the data be stably written on disk before it can be client's RPC request can be acknowledged. This can cause some very nasty performance problems. One of the things that can help is to use a second hard drive to store an external journal. Since the journal is only written during normal operation (you need it when you recover after an system crash), and the writes are contiguous on disk, this eliminates nearly all of the seek delays associated with the journal. If you use data journalling, so that data blocks are written to the journal, the fact that no writes are required means that the data can be written onto stable storage very quickly, and thus will accelerate your NFS clients. If you want things to go _really_ fast, use a battery-backed NVRAM for your external journal device.

Re:The article is incorrect with respect to ext4.. on On the State of Linux File Systems · 2008-11-29 09:04 · Score: 5, Informative

Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.

As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.

Re:what fs out there... on On the State of Linux File Systems · 2008-11-29 08:55 · Score: 5, Informative

Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...

The article is incorrect with respect to ext4... on On the State of Linux File Systems · 2008-11-29 08:49 · Score: 5, Informative

The article states that ext4 was a Bull project; and that is not correct.

The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.

Re:What is the news ? on OpenSolaris Indiana Released · 2008-05-05 15:06 · Score: 2, Informative

ext3 has a 4TB partition size limitation, 1TB filesize limit, and requires a fsck every X mounts.

Actually, this is incorrect. Ext3 can support up to 16TB (there were some bugs for kernels older than 2.6.18 for really big filesystems, but even back then 8TB was no problem). The filesize limit is 2TB, and with ext4 that limits for the filesystem and individual files with be 1024 Petabytes, or 1 Exabytes.

As far as requiring an fsck every X mounts, thats basically due to paranoia because PC class hardware, is, well, PC class hardware. You can disable that if you wish, and if you have an enterprise grade RAID array with a robust controller and disk-level checksums, it would probably make sense to turn it off. On a typical PC class-whatever-is-the-cheapest-parts-that-fell-off-the-board-from-Taiwan white box, where companies are competing for price much more than reliability or quality, then a periodic fsck is a good idea. But it's always something you can turn off if you wish.

Re:"Titanium" or "Itanium"? on Donald Knuth Rips On Unit Tests and More · 2008-04-26 13:10 · Score: 1

The joke works much better if you say "Itanic". I used to say that a lot but I try not to if their are people from Intel in the room; they can get a little touchy when you rip on the Itanium. :-)

Titanium makes people think of the metal that expensive geek watches tend to be made out of, so I find it's less likely people will get the joke.

This was the smallest part of the interview... on Patch the Linux Kernel Without Reboots · 2008-04-24 10:46 · Score: 3, Informative

Funny thing... this was the smallest part of my oh, hour and twenty minute interview with the reporter. The reason for the call was to hear about what was up with the 2.6.25 release; she probably spent more time talking with me about KVM and Xen; and I mentioned ksplice just as an aside, as an example of lots of really interesting and exciting work that doesn't necessarily happen as part of a mainline kernel release. I spent maybe 2-3 minutes tops talking to her about ksplice --- and that's what she ends up writing about and getting slashdotted!

Re:mirror on Why OpenSolaris Failed To Build a Community · 2008-04-24 08:16 · Score: 3, Interesting

How's it get up to 80? Lots and lots of apache daemons. :-)

I've never seen the load on a Linux machine rise above like 6, and by then its unresponsive to anything. I was disk-bound, because wp-cache wasn't enabled even though it should have been, so it didn't take me that long to recover once I managed to run shutdown the apache server. Then it was just a matter of setting up a firewall rule to only allow access from my home IP address, restarting the server, figuring out that I needed to enable the wp-cache plugin, then remove the firewall rule, and pray.... :-)

But yeah, I was pretty impressed that my 1 GHz Pentium III with only 512 megs of memory running 2.6.16 linux was able to not only survive, but recover from a slashdotting without needing to reboot. If I had only checked earlier to make sure that wp-cache really was enabled, but as the old saying goes, "no one expects the Spanish Inquisition!"

Re:As a member of the community... on Why OpenSolaris Failed To Build a Community · 2008-04-24 08:05 · Score: 4, Insightful

Please let me make this clear. I was not disparaging Open Solaris as an operating system. And I was quoting Jon Plocher, a Sun Engineer working on Open Solaris, when he admitted that Sun didn't get the community they were hoping for. So it you can call it failure in terms of Sun being to get the results that it had hoped for when it released (most of) Solaris under an Open Source license. Other people who were major Solaris fans, and who were excited with whatever scraps Sun might throw from the table, might be mightly pleased with what Sun did. But nevertheless, it is interesting that Sun hasn't achieved what they hoped to accomplish with Open Solaris after three years.

The reason why I found Jon Plocher's candidate statement for the Open Solaris Governing Board so interesting was that it was first that I had seen someone from inside Sun comment about the what Sun had been hoping to achieve by release Solaris under a Open Source license that didn't appear influenced by Sun's marketing/spin machine. I don't believe Sun's officially stated reasons (that show up on the CEO's blog, for example) because after three years their words have not been matched by their deeds.

So for me, it's more about correcting the marketing spin. If Sun salescritters want to pay analysts to create Total Cost of Ownership white papers which compare the cost of the most expensive get-someone-on-the-phone 24x7 Red Hat support with a support-by-email Solaris support subscription, I might mock their desperation.

Similarly, if Jonathan Schwartz wants to talk about how wonderful it will be that Open Solaris is Open Source, and how they will reap the benefits of having Open Source developers, but three years later still have processes that result in 0.6 patches/day being accepted into Open Solaris, then I think it's only fair that to point out the chasm between his words and his company's actions.

Re:mirror on Why OpenSolaris Failed To Build a Community · 2008-04-24 07:05 · Score: 4, Funny

Not so ancient Chinese saying: "It is not enough to install wp-cache2 and activate the plugin; you must go to options->wp-cache and press then "enable" button to REALLY enable wp-cache."

Doh!

(Once I actually really enabled wp-cache, my server seems to have been able to keep up, for now...)

Re:mirror on Why OpenSolaris Failed To Build a Community · 2008-04-24 06:35 · Score: 5, Informative

Yeah, sorry about that. Thunk.org is a rather ancient machine (> 5 years old) living in a colo facility, and this is how I figured out I had been slashdotted. (The two uptime commands were about two minutes apart):

14:21:06 up 121 days, 16:47, 2 users, load average: 40.47, 12.41, 4.55
14:23:05 up 121 days, 16:49, 2 users, load average: 81.43, 36.97, 14.52

Fortuantely I'm still mirroring my blog onto my old Livejournal account; please read it there for now! The two articles that you want are this one: What Sun was trying to do with Open Solaris and this one: Organic vs. Non-organic Open Source, if you can't get through to thunk.org.

Re:Getting a broad-based education at a tech schoo on For CS Majors, How Important Is the "Where?" · 2008-04-16 08:30 · Score: 2, Interesting

Whoops, I screwed that up. Shows you how long since I've been at MIT.... At MIT A's are worth 5 points, and so I had a 5.0 GPA.

Getting a broad-based education at a tech school on For CS Majors, How Important Is the "Where?" · 2008-04-16 08:26 · Score: 2, Insightful

I went to school at MIT, and yeah, I had a 4.0 (A's are worth 4 points at MIT) GPA --- but I also had a minor in economics, and took classes such as Law for the IT Manager from the MIT Sloan School. I also was an officer at the MIT Gilbert and Sullivan Players, the MIT Student Information Processing Bureau (the MIT computer club), the MIT Lecture Series Committee (which shows 35mm movies to subsidize lectures by people like Leonard Nimoy, Dr. Ruth, Jacques Costeau, etc.) and the MIT Episcopal Chaplaincy.

What I found that was important --- studying with lots of smart people really challenges you, and makes you put in the extra effort so you can minor in student activities _and_ still hold down a good GPA. Learning computer science architectural lessons from older systems like Multics is very valuable; much more so than learning the syntax of C or Java. Learning how to schedule workers for the refreshment committees, disassembling and cleaning a soda machine, and figuring profit margins on soda and popcorn, does teach you many valuable lessons in the real world. So does taking classes in economics and law; just as much so as learning how to build a computer using a breadboard, wires, and 74xx TTL chips.

The important thing to remember is that you can get a very broad based education at a technical school, but you have to reach out for it. I would be very dubious about a school (liberals arts or not) that concentrated more on math theory than CS architecture. Learning on the past mistakes and success of real-life operating systems is valuable. I'm not so convinced about learning about type theory and type functions. Most good technical schools will have clases in IP law, negotiating, economics, and those are very much good things to learn. In particular, if you don't know how to read a balance sheet and a profit and loss statement before you leave college, do take the time to find out. It's useful in so many different contexts....

AT&T new's motto: "Be Evil" on AT&T Silences Criticism in New Terms of Service · 2007-09-29 01:59 · Score: 1

Is anyone really surprised? This is the same company which is against Network Neutrality. The simple answer is to simply not to buy or patronize any AT&T or Southwest Bell services if at all possible.
]

Props to Joshua Davis on Hans Reiser Interview from Prison · 2007-06-28 03:21 · Score: 3, Interesting

If you look at Joshua Davis' past articles on Hans (here and here, you'll see that he has been quite sympathetic to Hans' plight. Yet this particular article is much more ambivalent. I suspect the explanation for why this most recent story seems a bit confusing, and the author some what ambivalent, is that his sympathies and opinions about Hans' guilt or innocent have shifted over time.

I was contacted by the author in late March to give background information on the technical facts in the article, and he has never claimed that he was a technical person or in possession of a geek badge. My input into the story was solely on things like "what is a b-tree", and to eliminate the really embarrassing technical errors and misconceptions that the author might have had. At one point I believe the Joshua Davies wanted to put a spin on the "geek tragedy" that Reiser4 was this ground-breaking filesystem with great ideas that was languishing because its author/architect was languishing in fail. So I was given entire paragraphs of technical detail where I had to say, "no that's wrong," and "no, not quite", etc., etc. As far as whether or not Reiser4 was great, ground-breaking filesystem, I tried very hard to give both sides of the story --- that some people would say it was great, and other people would say that Hans had a tendency to fudge benchmarks ---- and I made it very clear that some people might consider that my views were biased, due to my past and continuing work on the ext2/3/4 filesystem, and that the author should definitely contact other people and get their opinions. So I disclosed all, which in my opinion was the only responsible thing to do, and I tried to be very, very careful about labelling what was fact and what was opinion.

(I'm of the opinion that if you want better technical understanding by journalists, if someone approaches you requesting background information and promises that you won't be quoted, you should spend time educating them about technical details, since that's the only way we can improve technical accuracy in reporting. Another interesting thing which I learned is that while Wired rights about subjects at are of interests to geeks, they do not assume that their articles will be written by geeks and they pitch their articles to be understandable by the general public; also, that most of their writers are not geeks themselves. All not surprising if you think about it a little, and especially if you reflect that the intersection of strong technical clue and strong writing skills is pretty rare.)

In the end, the story was about as good as you might expect. The facts of the story are confusing, as there were and there are no clear heroes and several suspicions and deeply flawed human beings that could possibly be villains but for which we can't really say for sure. There are no obvious technical errors in the story, except for one that I noticed, where the word registry is misused and should have been replaced with "data structure" instead: "It contains a single registry -- known as a balanced tree -- to organize every piece of data in the operating system". A lot of the details about reiserfs and reiser4 was ultimately cut out, as being not very relevant to the storyline that Joshua ultimately chose to tell.

I have to say that having spent several hours talking to Joshua Davies, and talking to his editor who spent a lot of time doing fact checking on the technical details and background, that both he and his editor have my respect seekers of truth. He went into this with point of view that I believe was very, very sympathetic to Hans, and it would have been very easy to turn this into a stock storybook story with the police cast as the cardboard, clueless villians, and Hans the hero languishing in jail, the victim of said clueless Keystone Kops. But he didn't do that. He

Overreaction! on FBI Seeks To Restrict University Student Freedoms · 2007-06-24 10:41 · Score: 1

Reading the guidelines which the FBI purported to have given various University administrators, it is very clearly talking about ways of protecting classified information. In practice, most students do not have access to classified information, and so this would never even be an issue. For example, at MIT there is simply no classified research which takes place on the MIT Campus (MIT Lincoln Labs is not part of the MIT Campus), and I suspect this is true at most universities, simply because the requirements for keeping classified information and maintaining clearances for university employees and students are probably too obstrusive and too difficult for a university environment, all aside from the philosophical issues of whether or not doing classified research can be reconciled with an open academic environment.

So I think the blog "article" is making much ado over nothing. The document referenced by the article very clearly was not intended to be applied to all students, and only where classified information is concerned. It's identical to the sort of briefings that would be given to people in industry who hold clearances, at for people who hold certain level of clearances, it is required that they get briefed on this sort of thing once a year. But if you don't like that sort of thing, the answer is very simple; don't apply for or hold a security clearance.

Unfortunately SoftMaker doesn't support PowerPoint on SoftMaker Rolls Out Office Suite for BSD, Linux, and Others · 2006-12-18 12:10 · Score: 2, Informative

Unfortunately, SoftMaker is only a Word and Excel replacement, and for many users, the level of Word and Excel support in OpenOffice, Abiword, or Gnumeric is probably more than they need. Sure, SoftMaker may have better support for the really complicated Word and Excel formats (see their comparison page for some examples), but how many people really come across 3-d graphics in everyday life?

The bigger problem for most people is PowerPoint slide decks, especially the ones generated by marketing departments that have sound and animation. This is where the shortcomings of OpenOffice hit me the hardest --- and unfortunately, SoftMaker doesn't have a solution. So is it worth it to pay USD $70 for a Word and Excel replacement which is more complete than what is currently available in the OSS world? Not for me. I'd much rather spend $40 for a copy of Crossover Office from Codeweavers and then get an old copy of Office 97 or Office 2000 that I have lying around (or which you can no doubt buy on Ebay for a relatively small change).

Re:Version control on Fedora Project to Help Revitalize RPM · 2006-12-15 09:09 · Score: 1

Mercurial is hardly a fringe VCS. Some of the projects using Mercurial include Xen, ALSA, e2fsprogs, and OpenSolaris.

Slashdot Mirror

User: tytso

Comments · 115