Salamander · Slashdot Mirror

Re:Solaris Journaled UFS? on Reiser On ReiserFS's Future And More · 2001-05-23 01:47 · Score: 2

The most famous example of a file system that grew into journaling was Solaris UFS

Perhaps I wasn't clear enough. I wasn't saying that you couldn't add journaling to an existing filesystem. What I was saying was that you couldn't add it *as a plugin*, i.e. by leaving the original code untouched and adding a separate journaling facility. You have to change the original code, at least to the point of making it "journal-friendly".

UFS is a good example of this. They implemented a mostly-separate logging component called ufs_log, but the original code is still littered with flag checks and callouts to use it. It could probably be more modular than it is, which might be the source of SGI's criticism, but it could never be so modular as to be a plugin.

Standards, limits of extension interfaces on Reiser On ReiserFS's Future And More · 2001-05-22 23:21 · Score: 2

I really hope that when Hans and Co. think about how to extend filesystems they'll put their weight behind some generic facility and not just pull some new ReiserFS-specific interface out of their collective ass. Stackable vnodes, which are supposed to address this same need, are ancient history. NT has had filter drivers - not just for filesystems but for just about anything - since day one, and they have proven invaluable as a way to implement new functionality without writing a whole new filesystem. Erez Zadok has done some really cool work on a system called FIST for this sort of thing. All of these should serve as sources of inspiration at the very least, and of code in some cases.

On a different note, journaling is not really something that can be added as a plugin, extension, or filter. Even if you have a generic journaling interface or layer separate from the "core" filesystem code, you have to make sure that that core code is "journaling-friendly". This involves thinking about ordering and atomicity, plus callouts and hooks between the main FS and the journaling code. It's pretty pervasive, and not something that can be done "stealthily" by adding a plugin or filter to a filesystem that was never designed (or, as in ext3, redesigned) with journaling in mind. Encryption, compression, virtual namespaces, all sorts of other things can be added as plugins or filters, but not journaling.

Wow, what a load of crap on Smart Routers · 2001-05-20 10:59 · Score: 2

This might be the first company I've seen outdo StarBridge in the "blatantly obvious BS" category. Slashdot has yet again fulfilled one of it's major roles in my life: letting me know about companies I should *avoid* investing in

Re:This is a Very Bad Idea. on Hiring Open Source Developers for Closed Source Work? · 2001-05-20 00:59 · Score: 2

If you are in charge of hiring programmers, you should hire the best programmers for the job.

Close, but not quite how I'd put it. I'd say you should hire the programmers who return the most value for the money for the job. Those might not be the best programmers. If they're the best programmer in the world but you can't trust them not to steal your time or intellectual property for application to open-source projects, pass. If they're likely to spend too much time evangelizing or installing their favorite software on already-working systems, pass. If their conditions of employment (express of implied) include attendance at five open-source conferences a year, a dedicated server to host open-source projects and mailing lists, and 50% of their time devoted to open-source development, all unrelated to your business, pass. If the open-source hero-worship culture has inflated their ego (and salary demands) beyond all reason, pass.

If, on the other hand, none of the above concerns apply, the open-source person might well be ideal for the job. Open-source work is no less "real" than for-pay work. There's really no such thing as an "open-source" programmer so much as a programmer who does open-source work, and such programmers should be evaluated according to their individual job-related value just like anyone else.

Weighing in late on Is Law Copyrighted? · 2001-05-13 22:35 · Score: 2

There are a lot of arguments out there based on how ideas can't be considered property. I'd like to suggest an alternative: if it's property, it's subject to eminent domain. It is *clearly* in the people's best interest for the text of a law to be completely open and available to all. If the government is entitled to seize individual citizens' physical property to build a highway, it's sure as hell entitled to seize this little piece of intellectual property as well.

The issue is not that the people who wrote these documents never had ownership, it's that they can and should be stripped of ownership for the public good.

Re:My perspective on XP on Go Extreme, Programmatically Speaking · 2001-05-10 10:00 · Score: 2

"Specification written in the form of executable code" sounds a lot more like an acceptance test than a unit test. It probably doesn't matter, though; IMO even writing toward the acceptance tests is a mistake (though perhaps not as bad as writing toward unit tests). The tests are *not* the requirements. Obviously the former should reflect the latter, but it's no more possible to write tests that perfectly match requirements than it is to write production code that perfectly matches requirements, and if we could do that then we wouldn't need acceptance tests in the first place. There *will* be differences between the requirements and the acceptance tests. Sometimes this will take the form of code that does not in fact meet requirements passing too-lenient tests, and sometimes it will take the form of "false alarms" from tests that misconstrue requirements or are themselves buggy. If you've spent any significant amount of time actually doing development, you've probably had a bellyful of both scenarios. I know I have.

The way to avoid these problems is to keep in mind that the *requirements*, not the tests, are authoritative. One of the worst mistakes I've seen made on a project is allowing tests to dictate changes to requirements, but when you treat tests as a specification or target in their own right that becomes almost inevitable.

Re:My perspective on XP on Go Extreme, Programmatically Speaking · 2001-05-10 05:11 · Score: 2

Congratulations. You're the first XP proponent I've met who was able to make even one valid point. I stand corrected wrt acceptance tests being part of the XP canon.

That doesn't change the fact that it's a bad idea to let the unit tests drive the code instead of vice versa, though, for all the reasons I mentioned in my original post.

Re:My perspective on XP on Go Extreme, Programmatically Speaking · 2001-05-10 00:22 · Score: 2

developers write unit tests first and then write code to make them work. It works well in practice

I've seen it done a bunch of times and, in my experience, no, it doesn't. It often seems to, though, until the product gets into the field.

XP, with its emphasis on constant refactoring, even makes the problem worse. Good unit tests are targeted toward the actual structure of the production code. If you wrote the unit tests first, then refactored the production code ten times, you wasted a lot of time writing unit tests that had nothing to do with the structure of the production code. *Acceptance* tests are the ones that are implementation-independent, and XP has precious little to say about them.

Re:My perspective on XP on Go Extreme, Programmatically Speaking · 2001-05-09 22:28 · Score: 2

I did like the idea of generating a test case and coding to it.

That's a bad idea, actually. If the code is written to pass tests instead of to meet requirements, then *at best* it will only meet requirements to the extent that the tests *accurately and completely* express those requirements. Any bug in the tests will end up being reflected in the production code. Production code should be written directly to requirements, period.

That doesn't mean that requiring tests isn't a Very Good Thing, though. In an ideal world, there would be two full sets of tests for production code. Unit tests should be written by people intimately familiar with the production code, to probe weak spots in the structure of that code. Acceptance tests should be written by people who do not know the structure of the production code, directly to requirements. This helps to ensure that production-code programmers' blind spots don't skew the whole testing process, and also allows using the same acceptance tests for a totally different implementation without loss of efficacy.

Note that unit tests should still be based on the production code, not the other way around, and the production code should be written to the requirements, not to the acceptance tests. The first question when a test failure occurs should always be "Is this a failure against the test, or against the actual requirements?" If the answers are different, the next question should be "How can the test be improved to more accurately reflect requirements?"

Re:AAAARRRRRGHHHH!!!!! on Go Extreme, Programmatically Speaking · 2001-05-09 22:11 · Score: 2

Is anybody else sick and fscking tired of reading about Extreme Programming on SlashDot.

Yes, I am. I'm pretty well known for being critical of XP, but even if I thought it was great stuff I'd still be tired of it being pushed so relentlessly here by biased editors. There are other methodologies out there worth reading about, and at least half the audience is incapable of making an informed judgement about the benefits of XP anyway. It's almost enough to make me wonder whether the people pushing XP so hard here have some sort of vested interest.

Re:The problem I've always seen with Freenet... on SQL Over FreeNet · 2001-05-08 10:43 · Score: 2

I hope that in a not so distant future we can say that data dropping from Freenet is more of a theoretical problem than a real one.

I'm sorry, but Freenet has a long way to go protocol-wise before loss of data ceases to be an issue. I'll try to explain why I say that, but it might take a while so please bear with me.

First, I should point out that I also work in the area of data distribution, and have been doing so since long before Freenet existed. In fact, a system I designed about four years ago had a lot in common with Freenet in terms of caching, though many other aspects were quite different. It's because of that similarity that I've kept a close eye on Freenet.

That brings me to my next point. Ian is probably pretty pissed at me right now, so I feel compelled to point out that I think Freenet is great. It's just unsuitable for some purposes, and IMO has received a disproportionate share of attention relative to many other projects that are also great. Ian has shown far more talent for making promises than for delivering on them, and that's a risk for people like me who work in the same general area. I've seen whole technical areas poisoned and neglected for years because of the disillusionment among investors and would-be deployers who got burned when one early project's hype outran its reality. I don't want to see it again, and Ian seems to be doing his best to ensure that it does happen. That simply pisses me off.

OK, on with the show. Why is it unlikely that the issue of data loss will go away for Freenet? First, it's important to note that anything short of a guarantee that data will stay in the system is worthless. People who might otherwise run serious applications on top of a storage system will not find it acceptable if there's *any* realistic chance that data will be dropped during "normal" operation and even in the face of common failures. Rant and rave and wallow in denial all you want, that's just the way it is.

So, if you want to provide a *guarantee* that data will remain in the system between the time it's inserted and the time it's superseded by new data or explicitly removed, you have two choices. One is to give it one or more authoritative "home" locations, and treat all other possible locations as caches of what's in those home locations. That's just pretty much impossible to reconcile with Freenet's anonymity and non-censorability goals.

The only other option is to have each node that caches data know - at the very least - how many other copies there are, so that it doesn't throw away the last one. In practical terms, you pretty much need to ensure that at least two copies remain in the system, to guard against simple failures. Maintaining this information - and it needs to be both accurate and timely - is possible but quite difficult. It becomes more difficult as nodes become less reliable, and it becomes even more difficult if you want the system to run efficiently without getting bogged down by coordination traffic. That anonymity thing also tends to get in the way a bit. It's quite possibly doable, I can almost see the algorithms and protocols in my head because I've worked on similar ones myself, but they're quite different than the ones Freenet currently uses. Even then, the complexity of the result might well exceed the bounds of maintainability (that wall's a lot closer than people think, in distributed systems) and/or the performance of the result might not be acceptable.

There, at long last, you have it. For all of the reasons described above, I think loss of data will always be a problem in Freenet and derivative systems - a real problem, not a theoretical one, and one that makes it unusable for some purposes. To overcome that, Freenet would have to change so much that it would be unrecognizable. I don't even think it's a valid goal for Freenet. Freenet should continue to be developed for the niche toward which it has always been targeted, and for which it is quite well suited. Other solutions should be found for other problems, and none should create credibility risks for the entire field by claiming to be all things to all people.

Re:this is getting silly on SQL Over FreeNet · 2001-05-08 09:49 · Score: 2

Just because the someone doesn't know what causes a pattern does not make it random

Good thing nobody ever said it was actually random, then, just *practically* random. Why don't you explain to us, Mr. Wizard, how the actual policy is *usefully* different from random cache replacement, with respect to the presence or absence of requested data from the perspective of a requestor? That would actually be useful and productive, unlike this stupid hairsplitting.

I don't know why I bother...

Then don't. You have a lot of work to do to catch up with people who were writing useful code while you were working the PR machine, you don't need to be wasting time arguing about minutiae with them here.

Re:again... no on SQL Over FreeNet · 2001-05-08 02:55 · Score: 2

It is not at random. As far as the retriever is concerned the availability of data will be proportional to its popularity.

In other words, as far as the receiver is concerned the availability of data is proportional to a quantity that the receiver has no way to calculate. Well done, Einstein. Without having any way to gauge popularity, without knowing about intermediate nodes' topology or cache sizes, the receiver is no more able to calculate the probability of data being available than if the cache-replacement strategy were truly random, so it's "practically random" as I originally stated.

There's no point splitting hairs here. We all know it's not really random, but there's just no reason to care. It doesn't make FreeNet any more suitable for storage of data whose continued presence needs to be assured. Anyone who thinks it matters whether it's really random or not is "practically braindead".

Re:Stenography will never be very powerful... on The Rise of Steganography · 2001-05-08 02:42 · Score: 2

Stenography is just another form of encryption, and a weak one at that. The primary reason is simple - it is security through obscurity.

The two are very closely related, as I pointed out in another post. One way in which they are related is that both - along with every other non-physical form of concealment - are at some level "security through obscurity". To recover the message you need to know a secret, whether that's a cryptographic key or a steganographic pattern or a location or an algorithm. They're all equivalent.

Don't believe me? The SDMI "challenge", such as it was, was cracked almost immediately by a simple signal analysis.

The existence of weak stego says nothing about stego in general, just as the existence of weak crypto says nothing about crypto in general.

Re:Whatever. on The Rise of Steganography · 2001-05-08 02:36 · Score: 2

Further, you'd better have a good stash of source materials, rather than just some ol' picture you got off the net - otherwise, it would be easy to use an image search tool to find the original source image, diff the two, and get out the "secret" bits.

After the point you made about pictures of giraffes being pretty conspicuous, it's pretty amazing that you'd fall prey to this error. Much of the power of steganography lies in the idea that an eavesdropper doesn't even know where to look (or might not even know there's anything to look at) and can't afford to look everywhere. There's actually an obvious equivalence between stego and crypto, which is that you could consider the "where to look" information to be a sort of key. This might not please mathematicians who have staked their reputations on application of a particular kind of analysis, but both stego and crypto are ultimately about creating too many possibilities for an analyst to explore. Working through N zillion possible locations or arrangements of data and working through N zillion possible keys aren't that different.

Re:hardly at random on SQL Over FreeNet · 2001-05-07 20:03 · Score: 2

Freenet drops data based on local popularity, this is hardly "at random".

Yes, that is correct, and it is why I said *practically* at random. As far as the storer or retriever is concerned, the difference between random and LRU cache-replacement strategies at intermediate nodes is unnoticeable.

Truly, you have a dizzying intellect. on SQL Over FreeNet · 2001-05-07 11:22 · Score: 5

I love it. Forget all that ACID crap, especially the D part, let's put a database interface on top of a storage system that is designed to drop data practically at random. Brilliant. I can't imagine why nobody thought of this before.

Right message, wrong messenger on Linus Responds To Mundie · 2001-05-03 22:24 · Score: 2

If I have been able to see further, it was only because I stood on the shoulders of giants".

That's one of my favorite quotes, and coming from anyone else I would applaud it...but not from Linus. Why not? Because Linus has one of the worst "my farts smell better" attitudes I've ever seen. It's well known that one of the *worst* ways to get an idea incorporated into the Linux kernel is to say that it's been tried and found successful in some other OS. Linus, and the other senior Linux developers, seem to loathe the idea that someone else thought of something before they did, or - heaven forbid! - better than they did. The spiffy new Linux way of doing things - union mounts, kiobufs - is always assumed to be better than anyone else's way of doing the same things just because it cam from Linux people.

Getting back to the topic, people need to read some of the exchanges between Linus and Andrew S Tanenbaum of MINIX fame. Does that look like proper acknowledgement of a debt owed to another for inspiration or ideas? No, Linus has one of the worst records out there of failing to thank the giants on whose shoulders he stands. For him of all people to throw that quote in someone else's face is the very height of hypocrisy.

Re:IBM makes lots from patents on Software Patents vs. Free Software · 2001-05-03 22:11 · Score: 2

I've heard that Bucky Fuller used to get patents on stuff, then explicitly place the patented ideas in the public domain. Seemed cool to me.

Re:sorry to say this on Coder on the Cross · 2001-05-01 23:04 · Score: 3

You're right that the programmer should have brought up the issue of conflicting commitments *when the "drop everything" order was given* instead of later, but other than that I think you're totally off-base. Saying "drop everything" without meaning it is a major error *by the manager*. Blaming the programmer for it, dismissing the identification of the manager's own role in the misunderstanding as a "smart ass remark", jumping to conclusions about the programmer's motives, slamming the door on a dialog that could clear things up productively - those are all just plain unreasonable. You sound like a manager yourself, the sort who accepts no personal accountability for what happens within their group because hey, it's the programmers doing tha actual hands-on work so it's their responsibility, right? BS.

When an unclear order is given, both parties have a responsibility to seek a clearer understanding. But this wasn't an unclear order. It was *crystal* clear - just wrong. The manager conveyed a clear meaning that was not what he actually intended. Weaseling around with "'drop everything' can mean different things" is like "depends on what the definition of 'is' is".

Re:okay okay.... I'm not informed... on XFS 1.0 is Released · 2001-05-01 02:25 · Score: 3

Someone once told me SGI has a smart disk controller backed up with a battery, so in the event of a blackout, the controller would keep for some hours the data still not written on the disk, flushing it on the disk on the next power up.

Interesting. I dunno about the SGI product, but the EMC Symmetrix takes a different approach. It has enough reserve power so that if it detects loss of external power it will immediately flush its cache to special areas on disk. Then, the first thing it does when it comes back up is slurp all that data back into cache - which not only ensures data stability but preloads the cache for you as well. Cool. I've heard that in a simulated blackout in a big data center everything would get eerily quiet *except* for the Symmetrix, which would actually get extra-loud as it does the flush.

Disclaimer: I work for EMC. I don't speak for them, they don't speak for me, yadda yadda yadda.

Re:okay okay.... I'm not informed... on XFS 1.0 is Released · 2001-05-01 02:16 · Score: 3

Wouldn't you just be mirroring if you wrote user data to the log?

Not quite. The log/journal is structurally different than the main data areas, with different synchronization and performance characteristics. Writing once to the log and once to the main data area is quite different than writing twice to the main data area.

However, an observation very similar to yours is behind log-structured filesystems. In other words, if you're going to write all the data to the log in a highly robust etc. way, why not just make the log the authoritative copy of the data? There's a whole lot of gunk that has to be worked out after that, such as how you find data and how you reclaim log space, but it all flows pretty cleanly from that initial idea. The result is pretty nifty for some kinds of workloads, but in general changing OS structures and their effects on I/O patterns have sort of left log-structured filesystems behind.

If you're interested in exploring further, the seminal papers in this area are The Design and Implementation of a Log-Structured File System by Rosenblum et al, and (IMO even better) An Implementation of a LogStructured File System for UNIX by Seltzer et al. Enjoy!

Re:Real issue is HARD DRIVE CACHEs. on XFS 1.0 is Released · 2001-05-01 02:06 · Score: 4

The kernel can only ask the hard drive to flush the data to disk. The disk need not comply, despite returning a "yes I did" result.

That's an important issue. I'll try to provide a couple of answers.

how can the consumer really know what the drive decides to do?

Well, there are at least two ways:

Turn off write caching.
Set the "Force Unit Access" (FUA) bit on the Write command, if it's a SCSI/FC disk.

SCSI gives you other options as well. For example, if you're using tagged command queuing, you can set FUA only on the last command of a sequence (e.g. a transaction). That way, you can allow the disk or storage subsystem to do appropriate reordering, combining, etc. and you'll still be sure that by the time that last command completes all the commands logically ahead of it (as specified by the tags) have completed as well. It's tres cool, and it's one of SCSI's biggest benefits compared to IDE.

Tagged command queuing also comes in handy if you have to force write caching off - which BTW is common and not particularly difficult on either SCSI or IDE drives. Since you're now forced to deal with full rotational latency, the importance of overlapping unrelated operations (by putting them on different queues) becomes even greater.

This stuff is not document on the box the hard drive comes in nor on the mfg web site.

Tsk tsk, that's a shame. It's pretty common knowledge among storage types, but still far from universal. Go look on comp.arch.storage and you'll see a recurring pattern of people finding this out for the first time and sparking a brief flurry of posts by asking about it.

The problem with having the drive notify the host that a write has been fully destaged is that target-initiated communication (aside from reconnecting to service an earlier request) is poorly supported even in SCSI. Hell, it's even hard to talk about it without tripping over the "initiator" (host) vs. "target" (disk) terminology. Most devices lack the capability to make requests in that direction, and most host adapters (not to mention drivers) lack support for receiving them. AEN was the least-implemented feature in SCSI.

There's also a performance issue. Certainly you don't want to be generating interrupts by having the disk call back for *every* request, but only for selected requests of particular interest. So now you need to add a flag to the CDB to indicate that a callback is required. You need to go through the whole nasty SCSI standards process to determine where the flag goes, how requests are identified in the callback, etc. Then you need every OS, driver, adapter, controller, etc. to add support for propagating the flag and handling the callback. Ugh.

It's a great idea, really it is. It's The Right Way(tm). But it's just never going to happen in the IDE world, and it's almost as unlikely in the SCSI/FC world. 1394 seems a little more amenable to this, but I have no idea whether it's actually done (I doubt it) because even though I know they exist I've never actually seen a 1394 drive close up.

I hope all this helps shed some light on the subject.

Re:okay okay.... I'm not informed... on XFS 1.0 is Released · 2001-05-01 01:21 · Score: 5

The difference is that Reiser is NOT a journaling filesystem (well, not any more that, say, NT or BSD UFS filesystems are), since it only journals the meta data

So does XFS. From one of SGI's own presentations:

5.6. Supporting Fast Crash Recovery
...To avoid these problems, XFS uses a write ahead logging scheme that enables atomic updates of the file system. This scheme is very similar to the one described very thoroughly in [Hisgen93].
XFS logs all structural updates to the file system metadata. This includes inodes, directory blocks, free extent tree blocks, inode allocation tree blocks, file extent map blocks, AG header blocks, and the superblock. XFS does not log user data.

[emphasis added]

This is *normal* for a journaling filesystem. Very very few actually log or otherwise protect file data, because of the cost. Maintaining a metadata-only log is already a significant performance limiter, and journaling data as well would just be prohibitively expensive. Most users wouldn't even want it, if they had to pay the performance cost.

Re:What about making it a little less bloated? on Next Generation C++ In The Works · 2001-04-30 20:28 · Score: 2

Maybe we would have gotten here sooner if you were civil

We would have gotten here even sooner if you hadn't been so uncivil as to sleaze all around the subject (and several others) instead of simply accepting that maybe the point about hidden costs was a valid one.

Slashdot Mirror

User: Salamander

Comments · 1,170