Slashdot Mirror


Text Processing in Python

Ursus Maximus writes "If you have read an introductory book or two about Python programming, but you are far from being an expert, then you will benefit a lot from reading this book. If you are a competent programmer in any other language, you will benefit from this book. If you are an expert Python programmer, you will also benefit from this book." Ursus Maximus's review continues below. Text Processing in Python author David Mertz pages 520 publisher Addison Wesley rating 10 reviewer Ursus Maximus ISBN 0321112547 summary How to use Python to process text.

As you probably know, there are many good introductory texts about Python. This is not one of them, for this is an advanced book, but not an inaccessible one. David Mertz has a unique style and focus that we have become familiar with from his series of articles on the IBM Developer Network. Dr. Mertz is more interested in facilitating our learning process than in lecturing us, and rather than fill his pages with impressive examples designed to illustrate his expertise, he gently guides us by offering subtle yet important examples of code and analysis that makes us think for ourselves.

He has a special talent for programming in the functional style, and this is a great introduction to that style of Python programming. Thus, this is also a good guide to using the newer features introduced into Python in the last few revisions, which often facilitate the functional style of programming.

The text includes, in an appendix, a 40 page tutorial covering the basic Python language. This tutorial is, like the book, unique in its approach and is worthwhile even for experienced Pythonistas, as it sheds light on some of the underlying ideas behind the syntax and semantics, and it also illustrates the functional style of programming, which is sometimes quite useful when doing text processing. And, despite its many other virtues, this is a book about text processing.

Chapter 1 covers the Python basics, but with a particular eye towards those features most critical and useful for text processing. Chapter 2 covers the basic string operations as found in the string module and the newer built-in string functions. Chapter three is about Regular Expressions, and, although I am shy about regexes because of their relative complexity, I am very glad to have read this chapter and will no longer be intimidated when regexes are the correct approach to take! Chapter 4 is on Parsers and State machines, which are important for processing nested text, as in everyday HTML, XML and the like. This chapter is not as esoteric as its title may sound to relative newbies (like myself), as it does offer useful ideas and principles for dealing with HTML. How much more useful can a topic be than that? It is true that a deep understanding of this subject may be beyond myself and other relative duffers, but this chapter has much to offer those like me and I am sure much more to offer professionals.

Chapter 5 is on Internet tools and techniques, and this a good example of how text processing touches every important area of computer programming. We manipulate text for email, newsgroups, CGI programs, HTML and many other aspects of net programming. A good summary of XML programming is included, as well as useful synopses of other Python internet modules, from a text processing point of view.

Appendix A is the aforementioned selective and short review of Python basics. Appendix B is a ten page Data Compression primer that is quite educational. Appendix C offers the same good service for Unicode, and Appendix D covers the author's own software, a state machine for adding markup to text, which is backed up by his extensive web site that has a lot of free software to support those doing extensive text processing. Lastly, Appendix E is a Glossary for technical terms from the book. This is very much an educational book, and would be suitable for classroom work at the University level, beyond the introductory programming level; in fact, as part of a curriculum to teach programming using Python at the University level, this would be an excellent text for the second course.

One of the highlights of the book is that each chapter is concluded with a problem and discussion section. These are of the highest quality I have encountered in computer texts. Rather than overwhelming the reader with a large number of problems, the author has obviously given a lifetime of thought in coming up with a few key problems that are meant to stimulate thought, creativity, and ultimately understanding and growth in the reader. I will be coming back to the problems often, as they cannot be absorbed quickly anyway; they require thought. These would be most useful in a classroom environment; but as they are accompanied by excellent discussion material, and backed up by the author's web site, the individual reader will be well served also.

The book is more than the sum of its parts. It will be a most useful reference source for when I am doing various text related tasks for some time to come, and it was also a delightful and educational quick read in the here and now. It also amply illustrates the centrality of text processing in all areas of computer science, and I am confident that the book will be useful and educational for all programmers, whatever their area of expertise.

To sum it all up, this book is educational. It is also beautifully bound and printed, and excellently written. I rate it five stars, my highest rating, and heartily recommend its purchase.

You can purchase Text Processing with Python from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

10 of 215 comments (clear)

  1. Do they ever NOT recommend the book? by SpaceRook · · Score: 3, Insightful

    Maybe it would be useful to review some BAD books. First, it would steer people away from them. Second, it would provide good examples of where a lot of tech writing goes wrong. Finally, it's just fun to read someone bash the sh!t of out something.

    1. Re:Do they ever NOT recommend the book? by donutz · · Score: 2, Insightful

      Maybe it would be useful to review some BAD books. First, it would steer people away from them. Second, it would provide good examples of where a lot of tech writing goes wrong. Finally, it's just fun to read someone bash the sh!t of out something.

      Why are you so focused on negativity? With the nightly news pushing out stories left and right about what's wrong with the world, can't we at least keep our Slashdot book reviews a good positive example of what's right with the world?

      Speaking of positive reviews, you might benefit from a book like this: The Power of Positive Thinking .

    2. Re:Do they ever NOT recommend the book? by MikeFM · · Score: 2, Insightful

      I recommend never buying a book that is for Idiots, Dummies, or Stupid. IMO these books suck and leave their readers little smarter for having read them.

      I have seen several series of Learn Visually books and I think they are much better in most cases. That's what I will usually give newbies to learn from.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  2. Re:So how does Python compare to perl? by merlyn · · Score: 2, Insightful
    Careful. You're going to get an "emacs vs vi" debate going.

    Perl and Python have different coding philosophies. Some find Perl's flexibility better. Others find Python's rigidity better. Use what you like.

    As far as core language support and add-on modules, they both cover similar areas.

  3. Re:Great Intro by orthogonal · · Score: 4, Insightful

    You know, if someone goes to the trouble of reviewing a book, what's wrong with having an affiliate link to purchase the book?

    In all seriousness (unlike my original post), it's a conflict of interest: the reviwer who gets compensated when readers of the review purchase the book has a great incentive not to pan the book, even if it deserves panning, because a bad review means fewer buyers means less pay-off to the "affliate" linker.

    "Affiliate" programs also drive up the cost of the books (or Rolexes), both because the affiliate must be paid off, and to cover the administrative costs of the affiliate program.

    It also means a slightly slower response time when I click the link, as the server, besides displaying the page, has to access a database to credit the affiliate -- and possibly track me all the way to purchase to see if the affiliate is to be compensated. In the case where compensation only comnes on purchase, it means another layer of tracking, and probably a web site that wants to send me cookies to identify which affiliate should get paid if I do decide to purchase. Cookies, of course, lead to individualized customer profiles and possibly higher prices when and if the tracking software decides I'll be willing to pay more than the average Joe, based on that customer profile.

    So we have conflict of interest, slightly higher costs, and customer and referer tracking. None of these things benefit me as a customer, and I prefer to avoid them.

  4. Yeah, actually they do by Anonymous Coward · · Score: 2, Insightful

    You need to learn the Slashdot Book Rating System.

    Anything above a "9" is a good book.

    A "9" is an average book. Read it only if you are particularly interested in the subject.

    Anything below a "9" is a bad book. Avoid like the plague.

  5. Re:So how does Python compare to perl? by Cro+Magnon · · Score: 2, Insightful

    You can go back to your Python program in 6 months and still understand it.

    --
    Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
  6. Python copied Perl's RE by abe+ferlman · · Score: 2, Insightful

    This happened a couple years ago. This is no longer a reason to prefer Perl.

    I haven't succumbed to Ruby for the same reason most Java-heads haven't succumbed to Python yet. I am not a Java-head because I like my programming languages free as in liberty.

    --
    microsoftword.mp3 - it doesn't care that they're not words...
  7. Re:Perl is executable line noise, ... by mickwd · · Score: 2, Insightful

    Actually, your situation sounds very similar to mine. I was/am a very experienced C programmer, a pretty good shell script programmer, and with a decent knowledge of stuff like awk, sed, etc.

    Somehow I'd never got round to learning Perl, even though I thought I'd love it, as it seemed to combine the best parts of what I already knew in one single language, which many people raved about.

    Anyway, I bought a couple of the O'Reilly Perl books, and immediately started to think "Whoa, this is just too much functionality and complexity mashed together".

    Perl programmers like to say "there's more than one way to do it". Well that's OK if you're writing code - you only have to know and understand one of the many ways to implement a given task. However, if you're trying to read other peoples' code, you need to understand all of those different ways.

    I started hearing a lot about Python at about the same time I started trying to learn Perl. After finding out a little about it, I bought (another) book, and since then I've not looked back. I no longer feel the need to learn Perl any better - for me, Python is a better alternative.

  8. Re:What about trusty old C? by stm2 · · Score: 2, Insightful

    With the high speed of current computers, coding speed is more important than running/execution speed (unless you are programing a real-time data gathering device).
    Even if the program is slow, you could leave it running overnight, it cost less than average programmer hourly fee.
    5-10 years, I would have said: Learn C, but now, Python have a lot of advantages in order to be consider a "serious" languaje.
    If you are fine with C, keep with it, but I think trying python won't hurt.

    --
    DNA in your Linux: DNALinux