Slashdot Mirror


Linux Clustering

SPK writes "A colleague and I recently discussed how New Riders's most highly regarded book -- Paul DuBois's MySQL -- corresponds to O'Reilly's worst dud: MySQL & mSQL. Charles Bookman's Linux Clustering does nothing to improve New Riders's reputation. The book is divided into eleven chapters, unevenly distributed among three sections: an overview of clustering for Linux, building clusters, and maintaining clusters. Four appendices provide brief information about online clustering resources, options for RedHat's 'Kickstart,' options for DHCP, and information on 'Condor ClassAd Machine Attributes.'" To find out why Krause was so displeased with this book, read on below for his review. Linux Clustering. Building and Maintaining Linux Clusters author Charles Bookman pages xv + 265 publisher New Riders rating 2/10 reviewer Steve Krause ISBN 1578702747 summary A guide to clustering software, networking, and journaling filesystems

Bookman emphasizes a central piece of wisdom that no system administrator should ignore: redundancy. In the case of high availability clusters, parts redundancy is the name of the game, but one should not forget the human component; no administrator should be caught with only a cell phone -- keep a pager just in case. However, in a post-modern turn that might seem brilliant if it were applied in a work of fiction rather than a technical book, the author seems to apply the concept of redundancy to the text itself.

That the book began not as a book but rather as a collection of talks or presentations, or some other smaller format, is evidenced by the repetition of information between chapters and sections. Such nearly poetic repetitions also occurs within sentences and paragraphs (e.g. "nightly backups each night" on page 25).

An editor never looked at Linux Clustering; the book had two "technical reviewers" but their contributions seemingly didn't include fixing mangled syntax and strained style. On page 14 in the second paragraph a large segment of a sentence from the previous page is pasted into another sentence, resulting in a nonsensical block of text. The number of hyphenation, syntax, word choice, and subject-verb agreement errors is atrocious and makes the book difficult to read.

Some of the misinformation in the text appears to be unintentional (but ignorance is no excuse for a UNIX systems administrator); some is due to the fact that the author deals only with old (2.2) kernels (though the book came out 18 months after the 2.4 kernel release), old versions of journaling filesystems, and old distributions; and yet other misinformation is the result of misplaced attempts at humor (such as stating that GNU stands for the Gateway Naming Utility; one can only hope that this was intended to be funny). Other jokes often misfire, but do point to the intended audience (consider, for example, the section heading "Space: The Final Frontier").

In the Introduction, the author indicates that the book should be read by "Linux enthusiasts and users who want to get a Linux cluster up and running with the least amount of fuss." The organization of the book will not, however, aid this enterprise, for there is little "how to" information provided, but rather a great deal of background information on compiling kernels, various types of journaling file systems, and RedHat's Kickstart (perhaps inappropriate considering that the book specifically states that basic information will not be covered). Another section or two deal with basic networking and security. Various types of clusters are discussed, as are a few of the types of clustering software (e.g. Condor and Mosix) available.

The book, however, is clearly intended for administrators of clustering systems; a special emphasis is high-availability and load-balancing clusters. Parallel computing and the types of applications end users would wish to run receive far too little discussion.

Almost all technical books regurgitate the contents of freely available FAQs and HOWTOs to some degree, yet the good ones summarize the relevant points, make dry documentation more accessible, and give the reader some new insights. Because Bookman's Linux Clustering suffers from heinous spelling, grammar, and style errors; deals primarily with outdated software; contributes little new to the discussion; and doesn't speak to non-admins, I can only recommend that those interested in Linux clustering stick to online FAQs and HOWTOs.

You can purchase the Linux Clustering: Building and Maintaining Linux Clusters from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

39 of 162 comments (clear)

  1. MySQL & mSQL by Gortbusters.org · · Score: 4, Funny

    Sometimes I still wake up screaming from reading that!

    --
    --------
    Free your mind.
  2. No good books? by bluethundr · · Score: 5, Interesting

    Why is it so tough to find a decent book on this topic? Even O'Reilly failed here.

    --
    Quod scripsi, scripsi.
    1. Re:No good books? by Gortbusters.org · · Score: 4, Interesting

      Perhaps they're not big sellers. I mean, you're not going to sell as many cluster books as you will PHP Cookbooks.

      My friend worked at a lab in Princeton modeling the inside of a reactor. He worked with a 32 node linux cluster and did all the graphics modeling using a modified version of Unreal Tournament.

      --
      --------
      Free your mind.
    2. Re:No good books? by HowlinMad · · Score: 4, Informative

      Redundancyis the key our are missing here. If one cpu goes down ina cluster, the performance starts to suffer, but you are still up. If you one really beastly CPU goes down, you are down. That is the point o cluster. So while it may be "more" expensive, you are paying for the uptime.

    3. Re:No good books? by ComputerSlicer23 · · Score: 3, Insightful
      Hmmm, that's why most serious clusters are built out of state of the are Dual Processor boards loaded with highend Xeon chips. Clusters aren't a "gimmick". There are some people who want to build one that have no use for it. How's that any different the car junkie who soups up his 1970's muscle car? It's just a gimmick. There's really no need to put a 500HP engine in it, no need to get it new paint, new tires, or a turbo, or a new dual exhaust system. It's just fun for them.

      However, there are good uses for modification of vehicles, like say air bags. I don't call air bags, gimmicks, just because I think that guys who put dual exhaust systems on a 20 year old car seem like they are wasting money to me.

      However, in terms of redundancy, your far, far better off with 10 P3 500's, then with one P4 5Ghz machine. One of the PIII's is having problems, shut if off, run the diagnostics. The P4 has problems, you shut if off, you are in deep shit.

      If I had my choice, I'd rather have a cluster of 5-10 well built, redundant machines then one machine 10 times as fast for any problem that can easily be distributed (think websites, DNS, mail servers). No, I don't want to use 10, 3 year old Dell workstations to serve up my enterprise website, but I wouldn't have any objections to 10 Dell Servers that were bleeding edge 3 years ago assuming it uses parts that are still commonly available.

      Kirby

    4. Re:No good books? by malfunct · · Score: 2, Informative

      It depends on why you are using the clustering, I use NT clusters at work all the time to increase reliability of the system. It doesn't help performance one bit in my situation (and isn't meant to) but if one node of the cluster goes down all the services start up on the other node and there is next to no downtime.

      --

      "You can now flame me, I am full of love,"

    5. Re:No good books? by afidel · · Score: 2, Informative

      What kind of beastly machine goes down because of a single bad cpu? Only Intel based machines and none of those are all that beastly anyways. Big Iron does not go down to a single cpu failure. Where clusters rule is when you can chunk datasets into the ram space of a cheap rackmount box (2-4GB) and you have low enough interprocess communications that using comodity ethernet won't totally pooch your performance. If you meet those two requirements then the incredibly low cost per MIPS of x86 hardware will make it basically a no-brainer to go with a cluster.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    6. Re:No good books? by Usquebaugh · · Score: 4, Informative

      Bollocks,

      granularity, I'm tired of explaining this on /. but here goes.

      There are two main typesd of cluster, High Performance and High Availability.

      HPC tries to increase the power of the cluster by spreading jobs out over the whole cluster. HPC breaks the work down into blocks and farms this block out to the nodes. In the worst case a single node failure could cause the whole cluster to fail.

      HAC tries to increase the uptime of the cluster by running the same job on more than one node. If a node fails then the job on the mirror node takes over. It's worth noting that no Linux cluster has 100% HA.

      If a single node is going to fail 5% of the time what is the up time of a 100 node cluster?

      I work in the commercial not the scietific world. HPCs bore me. HACs could be a god send.

      Imagine if you will a cluster that automatically deals with node addition/subtraction. I have 1,000 users connected to this cluster using Xterms. I need more power, add more nodes. If any nodes fail the user never loses anyuptime as their work is switched to a mirrored node or nodes.

      Centralised computing rocks.

  3. Paul DuBois, MySQL by MattRog · · Score: 5, Funny

    Not to mention he knows *nothing* of relational theory:
    http://www.pgro.uk7.net/qu092902.htm

    --

    Thanks,
    --
    Matt
    1. Re:Paul DuBois, MySQL by ppanon · · Score: 5, Interesting

      Not to mention he knows *nothing* of relational theory:

      It seems that a large majority of MySQL users also know little or nothing about relational theory. The MySQL core developers fought a long time against including support for foreign key constraints. Thus not knowing anything about relational theory may not be a drawback in writing a book about MySQL; it matches the target audience, even if it fails to educate.

      --
      Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
    2. Re:Paul DuBois, MySQL by someguy42 · · Score: 2, Insightful

      I feel a flame-war coming on. This comment is functionally equivalent to saying "Real men use VI" or "Real men use EMACS." It's only intended to start a "my (insert topic here) is bigger than yours" conversation. Real men know better than to assume that their personal preference is, by definition, going to be better than the next guy's preference.

      --
      The probability that someone is watching you is directly proportional to the stupidity of your actions.
  4. O'Reilly's worst dud was also about Linux clusters by Frater+219 · · Score: 5, Interesting
    Here's O'Reilly's worst dud.

    How bad was it? It came with a CD-ROM that was supposed to automate the process of setting up a Beowulf cluster. None of the software on the CD-ROM worked. Running the install script printed out a message telling you to go to a Web site and download the newest beta version of the software. No such software was available ... ever. O'Reilly shortly withdrew the book ... and, reportedly at least, fired the editor who approved it for publication.

    Want more details? Here you go. Waiting for this book, and then discovering slowly just how awful it was, set back a clustering project at my workplace by several months, by the way.

  5. Re:Would you prefer RIAA math? by Artemis+P.+Fonswick · · Score: 3, Funny

    ...the book is divided into eleven chapters, unevenly distributed among three sections...

    --


    Kudos to you, my good man.
  6. O'Reilly's Worst Failure by s.d. · · Score: 3, Informative

    In my mind, this is simple -- I have never read a worse O'Reilly book than Building Linux Clusters. There is a reason that ORA pulled this book out of print after only 6 months, and haven't even bothered to try to fix it and reprint a new edition. It was basically a commercial for the company the author ran, it read as if it hadn't been edited (spelling and grammar mistakes everywhere, included pictures were of the wrong thing that the text referred to), and the code included was so buggy it wouldn't work at all without a lot of fixing.

    This was the first book on Linux Clustering I read, and I was hugely disappointed

  7. O'Reily's Msql and Mysql by mcc · · Score: 2, Insightful

    Uh.. what was wrong with MySQL and msql? I mean, it wasn't the most incredibly intensely mind-opening technical book i've ever read, but it wasn't useless, either. Far as i could tell the first couple chapters introduced you to SQL pretty well (which isn't exactly difficult, but they didn't really flub it), which you would never read more than once, and the rest was just various bits of random somewhat-disorganized reference material, sample sql, and sample database code in a few languages. It wasn't really any more useful than it would have been to have a printed and bound book that just contained the mysql manual, the dbi perldoc, and the manpage for the c database library.. and now that i'm used to mysql i just use the online manual.. and i will probably never dig my copy of the book out of the bottom of my closet never again.. but i don't really think i'm -sorry- i bought it.

    And of course, it's been a long time since i first read the book, but i don't remember it being unpleasant. Why all the disdain?

    1. Re:O'Reily's Msql and Mysql by srw · · Score: 2, Insightful

      I agree. In fact, I often still use it as a reference. Granted, I'm not a university-trained RDBMS expert. Perhaps that wasn't the target audience. As far as I'm concerned, they could have left out mSql entirely (shudder... my first big DB project ever ran on mSql. I eventually converted it to Mysql.) but I guess it makes sense to include it as Mysql grew out of it.

    2. Re:O'Reily's Msql and Mysql by irix · · Score: 4, Informative

      I own maybe 30 O'Reilly Titles and "MySQL and mSQL" is easily the worst one of the bunch. Here's my take as to why.

      I came from an Oracle background (i.e. already understood SQL, relational databases, etc.) and I was interested in 2 things; how to administer a MySQL database and how to do simple access from PHP/Perl.

      Now go and pick that book and try and find that information. The description of the MySQL security model is muddled and confusing. No good details on how to do backup and restore. The examples for using PHP/Perl are horrible. The book has several chapters of filler.

      A year or two later I bought the New Riders title mentioned in the writeup. It is a massive improvement over "MySQL and mSQL" - read them side by side and you'll see.

      One thing that book taught me - just because it is a publisher you trust, don't assume the book will be good. Read it or read a review first!

      --

      Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
  8. Re:Oh please! by MagikSlinger · · Score: 2, Insightful
    One of the major complints about this book is the lack of editting. Thes is just like those people that complain about mispelings on the front page of /.


    His complaint was the lack of editing made it almost unreadable. I.e., re-reading a sentence several times to figure out what it said.

    Good grammar and spelling aren't "windowdressing". They are essential for easy reading.

    --
    The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
  9. So...what books DO you all recommend? by SailFly · · Score: 5, Interesting

    For somebody who wants to learn about Linux clusters. I've played with Mosix and was impressed. What are good books and sources to learn about Linux Clusters?

    1. Re:So...what books DO you all recommend? by jpcampbell · · Score: 3, Informative

      I think the best thing to do would be download some of the open source tools that are out there and build a small one.

      Some of the more popular that are used in the bio-sciences are.

      http://www.platform.com/ (demo only, but most powerful)
      http://www.openpbs.com/ (open source)
      http://gridengine.sunsource.net/ (open source)

      All these run on linux.

      jpc

  10. Other stinkers from O'Reilly by talexb · · Score: 3, Informative

    I have to add that 'Apache: The Definitive Guide (Second Edition)' was pretty horrible as well. Like the MySQL book, it was heavy on re-hashing available information and light on useful information like a dash of theory or a hint of how the authors used it to solve a particular problem.

    And I hate it when O'Reilly comes out with a bad book, because generally their books are great.

  11. Easy way to get a cluster up and running? by kperrier · · Score: 2, Funny

    Why buy one prebuilt, of course....

    1. Re:Easy way to get a cluster up and running? by DeathPenguin · · Score: 2, Informative

      (Shameless plug)

      Head over to LinuxNetworx now for a LinuxBIOS-ready Evolocity II cluster!

      Note: I'm not an employee of LinuxNetworx, but they still kick ass.

  12. OK, so who's got a GOOD book on this topic? by Elias+Israel · · Score: 5, Insightful

    I hate to turn this into an Ask Slashdot, but truth is I could really use a good book on Linux clustering, especially if it covers:

    1. Clustering (not just replicating) MySQL databases.
    2. Network attached storage.
    3. Load balancing and failover.
    4. Probably six other things I'm not thinking of right now.

    Anyone got any suggestions?

    1. Re:OK, so who's got a GOOD book on this topic? by ashpool7 · · Score: 5, Informative

      I wrote up a paper for my employer a while ago about most of those topics. The sad truth is that a comprehensive guide is not available, and most of the solutions are proprietary. However, there are a few bright lights.

      Eddie: Load Balancing Software
      http://eddie.sourceforge.net/

      Linux Virtual Server Project: Clustering Tools
      http://www.linuxvirtualserver.org/

      OpenAFS: Efficient Distributed Storage
      http://www.openafs.org/

      Load-balancing and failover are tough nuts. You can do some stupid things like Round Robin DNS or Rotary NAT, but to be actual balancing, you need a balancer box. You can either make your own (using proprietary software or the stuff above) or buy a piece of hardware to do the job for you. I've heard Cisco makes some good ones.

      NAS units usually operate using CIFS, AFP, or NFS, all of which are pretty lame options for a modern cluster. SANs are pretty cool, but you need some big-ass hardware to support them. Personally, I'm working on an OpenAFS cluster, which is pretty easy if you look into the capabilites of the software. Coda is another option of which I'm not using because it doesn't play as well with Windows.

      As for clustering MySQL: If you read the Slashdot interview log they had a couple days ago, you'd see that the setup here is a master writer that replicates to a couple of reader databases. This is about as effective as it gets with MySQL. If you need higher power, I've read that commercial versions of Postgres support clustering/synchronization. More powerful than that and you're into Oracle territory.

  13. Re:Oh please! by teaserX · · Score: 2, Insightful

    I realize that your post was meant to be a joke, but I have to point out that spelling errors and grammar mistakes do nothing to instill confidence in the reader. It seems unlikley that any of the information in the book was verified, nor the sources checked, if no none could bother to run the text past a spell checking program.

    --
    We really need your help
    http://www.gofundme.com/help-sherry
  14. Unevenly distributed? by sczimme · · Score: 4, Funny


    From the review:

    The book is divided into eleven chapters, unevenly distributed among three sections:

    That's good news: I would hate to read a fractional chapter.

    --
    I want to drag this out as long as possible. Bring me my protractor.
  15. I agree by Anonymous Coward · · Score: 3, Informative

    that book is very poorly written. Beowulf Cluster Computing with Linux is a much better book.

  16. MySQL & mSQL: Worst. Book. Ever. by Traderdot · · Score: 2, Informative
    Here's another vote for MySQL & mSQL as O'Reilly's worst book ever. That book singlehandedly ruined their previously stellar reputation for me.

    Here are the books I've found most helpful on MySQL (and using MySQL with other things):

    MySQL- Paul DuBois
    MySQL and Perl for the Web- Paul DuBois
    PHP & MySQL- Welling & Thompson

  17. New Riders by Anonymous Coward · · Score: 4, Informative

    There *ARE* some good New Riders books - it's just that they tend to deal with digital art:

    "Digital Texturing and Painting"
    "Digital Lighting & Rendering"

    New Rider's focus is more on the artist / animator / illustrator side of things - and at that they excel (the above two are those I'm most familar with, and they are excellent).

    I'm sure they'll gradually improve their hardcore technical books, but it's stupid to dismiss "all" their books as being bad. Just like O'Reilly has a reputation in some circles for being overly dry and out of date - *some* people find their books useful.

  18. Re: mysql replication and high availability howto by ubiquitin · · Score: 3, Informative

    O'Reilly's Linux Hacks has one of the best explanations I've seen for setting up mysql replication. Load balancing and failover area are topics in their own right, but the Linux High Availability HOWTO is a good place to start. In general, the ibiblio site has been a helpful source.

    --
    http://tinyurl.com/4ny52
  19. Is this true? by doc_traig · · Score: 3, Insightful

    Actually, it would be nice to slow down the pounceposters who stab us with one-liners as soon as an article hits the front page in order to grab that funny karma goodness...

    --
    So long, michael. Don't let the door hit you...
  20. Don't assume... by xonker · · Score: 2, Insightful

    the book had two "technical reviewers" but their contributions seemingly didn't include fixing mangled syntax and strained style.

    When you see a book in print, don't assume that the suggestions of the technical editors/reviewers have been heeded. The author basically has final say over the content of the book -- meaning that a tech reviewer/editor can be completely ignored no matter how much they complain about the content of the book or how much it doesn't address what it should.

    And, the tech reviewers/editors are explicitly asked not to try to fix grammar and so forth -- that's supposed to be the job of a different editor.

    Also... I'm surprised to see a review of this book popping up now, it was published about a year ago.

  21. Re:MySQL & mSQL: Worst. Book. Ever. by elmegil · · Score: 2, Informative

    The newer edition, Managing and Using MySQL goes a long way to correcting the sins of the first book. I haven't read these others to compare directly, but I find the new edition a lot more clear and useful than its precursor.

    --
    7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
  22. Imagine... by Daverd · · Score: 3, Funny

    Imagine a Beowulf clust-- oh wait.

  23. "O'Reilly's worst dud: MySQL & mSQL" by presroi · · Score: 3, Interesting

    The topic reads: "O'Reilly's worst dud: MySQL & mSQL"

    MySQL&mSQL was my first O'Reilly book, back in my old days in school. I spend many nights reading it and many classes trying out the things I read.

    I still like it although it has become completely outdated now (at least my edition).

    Maybe I should have a look at the /. archive. What was so bad about this book?

  24. Re:O'Reilly's worst dud was also about Linux clust by Cyno · · Score: 3, Insightful

    That's why I don't buy many books anymore. I can get most of the relevant information for any current topic/projects from the internet. I think the most innovative thing created in the last few years was tabbed browsing, I love Galeon.

  25. Linux Magazine June Issue is Clustering by Anonymous Coward · · Score: 2, Informative

    the june issue is entirely clustering.

  26. There is no good linux cluster book... by dargaud · · Score: 2, Interesting
    When setting up my 24 processor cluster, I did read a lot of book reviews, but no one was satisfied by the 11 books I found. This is probably because clustering is a very dynamic medium, where patches are experimental, software is used only by a few groups, and once stability is reached, no one wants to touch anything anymore !!!

    So I read online, whatever I found that was up to date and settled on the satisfying OpenMosix and... it works ! :-)

    --
    Non-Linux Penguins ?