Slashdot Mirror


Text Processing in Python

Ursus Maximus writes "If you have read an introductory book or two about Python programming, but you are far from being an expert, then you will benefit a lot from reading this book. If you are a competent programmer in any other language, you will benefit from this book. If you are an expert Python programmer, you will also benefit from this book." Ursus Maximus's review continues below. Text Processing in Python author David Mertz pages 520 publisher Addison Wesley rating 10 reviewer Ursus Maximus ISBN 0321112547 summary How to use Python to process text.

As you probably know, there are many good introductory texts about Python. This is not one of them, for this is an advanced book, but not an inaccessible one. David Mertz has a unique style and focus that we have become familiar with from his series of articles on the IBM Developer Network. Dr. Mertz is more interested in facilitating our learning process than in lecturing us, and rather than fill his pages with impressive examples designed to illustrate his expertise, he gently guides us by offering subtle yet important examples of code and analysis that makes us think for ourselves.

He has a special talent for programming in the functional style, and this is a great introduction to that style of Python programming. Thus, this is also a good guide to using the newer features introduced into Python in the last few revisions, which often facilitate the functional style of programming.

The text includes, in an appendix, a 40 page tutorial covering the basic Python language. This tutorial is, like the book, unique in its approach and is worthwhile even for experienced Pythonistas, as it sheds light on some of the underlying ideas behind the syntax and semantics, and it also illustrates the functional style of programming, which is sometimes quite useful when doing text processing. And, despite its many other virtues, this is a book about text processing.

Chapter 1 covers the Python basics, but with a particular eye towards those features most critical and useful for text processing. Chapter 2 covers the basic string operations as found in the string module and the newer built-in string functions. Chapter three is about Regular Expressions, and, although I am shy about regexes because of their relative complexity, I am very glad to have read this chapter and will no longer be intimidated when regexes are the correct approach to take! Chapter 4 is on Parsers and State machines, which are important for processing nested text, as in everyday HTML, XML and the like. This chapter is not as esoteric as its title may sound to relative newbies (like myself), as it does offer useful ideas and principles for dealing with HTML. How much more useful can a topic be than that? It is true that a deep understanding of this subject may be beyond myself and other relative duffers, but this chapter has much to offer those like me and I am sure much more to offer professionals.

Chapter 5 is on Internet tools and techniques, and this a good example of how text processing touches every important area of computer programming. We manipulate text for email, newsgroups, CGI programs, HTML and many other aspects of net programming. A good summary of XML programming is included, as well as useful synopses of other Python internet modules, from a text processing point of view.

Appendix A is the aforementioned selective and short review of Python basics. Appendix B is a ten page Data Compression primer that is quite educational. Appendix C offers the same good service for Unicode, and Appendix D covers the author's own software, a state machine for adding markup to text, which is backed up by his extensive web site that has a lot of free software to support those doing extensive text processing. Lastly, Appendix E is a Glossary for technical terms from the book. This is very much an educational book, and would be suitable for classroom work at the University level, beyond the introductory programming level; in fact, as part of a curriculum to teach programming using Python at the University level, this would be an excellent text for the second course.

One of the highlights of the book is that each chapter is concluded with a problem and discussion section. These are of the highest quality I have encountered in computer texts. Rather than overwhelming the reader with a large number of problems, the author has obviously given a lifetime of thought in coming up with a few key problems that are meant to stimulate thought, creativity, and ultimately understanding and growth in the reader. I will be coming back to the problems often, as they cannot be absorbed quickly anyway; they require thought. These would be most useful in a classroom environment; but as they are accompanied by excellent discussion material, and backed up by the author's web site, the individual reader will be well served also.

The book is more than the sum of its parts. It will be a most useful reference source for when I am doing various text related tasks for some time to come, and it was also a delightful and educational quick read in the here and now. It also amply illustrates the centrality of text processing in all areas of computer science, and I am confident that the book will be useful and educational for all programmers, whatever their area of expertise.

To sum it all up, this book is educational. It is also beautifully bound and printed, and excellently written. I rate it five stars, my highest rating, and heartily recommend its purchase.

You can purchase Text Processing with Python from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

6 of 215 comments (clear)

  1. FUCK! by Anonymous Coward · · Score: -1, Offtopic

    Jed Clampett is dead

  2. Nr. Mansfield, Ohio by Anonymous Coward · · Score: -1, Offtopic

    The Coyne case (or "Army helicopter incident") stands out as, perhaps the most credible (in the "high strangeness" category) of the 1973 wave. An Army Reserve helicopter crew of four men encountered a gray, metallic-looking, cigar-shaped object, with unusual lights and maneuvers, as they were airborne between Columbus and Cleveland, Ohio.
    On October 18, 1973, at approximately 10:30 PM a UH-1H helicopter of the United States Army Reserve left Port Columbus, Ohio, for its home base of Cleveland Hopkins airport, ninety-six nautical miles to the north-northeast. In command, in the right-front seat, was Captain Lawrence J. Coyne, thirty-six, with nineteen years of flying experience. At the controls, in the left-front seat, sat First Lieutenant Arrigo Jezzi, twenty-six, a chemical engineer. Behind Jezzi sat Sergeant John Healey, thirty-five, a Cleveland policeman who was the flight medic, and Coyne was the Crew Chief, Sergeant Robert Yanacsek, twenty-three, a computer technician. The helicopter was cruising at 2,500 feet above sea level at an indicated airspeed of ninety knots, above mixed hills, woods, and rolling farmland, averaging 1,200 elevation. The night was totally clear, calm, and starry. The last quarter moon was just rising.
    About ten miles south of Mansfield, Healey noticed a single red light off to the west, flying south. It seemed brighter than a standard aircraft port-wing light, but it was not considered relevant traffic, and he does not recall mentioning it. An estimated two minutes later, at approximately 11:02 PM, Yanacsek noted a single red light on the south-east horizon. He assumed it was either a radio-tower beacon or an aircraft port-wing light - most likely an aircraft, since it was not flashing - and he watched it "for a long time, a minute to ninety seconds" before calling it to Coyne's attention. Coyne, smoking, relaxing, glanced over, noted the light, assumed it was distant traffic, and told told Yanacsek casually to "keep an eye on it."
    After an estimated additional thirty seconds, Yanacsek announced that the light had turned toward the helicopter and appeared to be on a converging flight path. Coyne verified Yanacsek's assessment, grabbed the controls from Jezzi, and put the UH-1H into a powered descent of approximately 500 feet per minute. Almost simultaneously, Coyne established radio contact with Mansfield control tower, ten miles to the northwest. Coyne thought the flight was an Air National Guard F-100 from Mansfield. After an initial acknowledgment ("This is Mansfield Tower, go ahead Army 1-5-triple-4"), radio contact failed. Jezzi then attempted transmission on both UHF and VHF frequencies without success. Although the channel and keying tones were both heard, there was no response from Mansfield; and a subsequent check by Coyne revealed that Mansfield had no tape of even the initial transmission, the the last F-100 had landed at 10:47 P.M.
    The red light continued its radial bearing and increased greatly in intensity. Coyne increased his rate of descent to 2,000 feet per minute and his airspeed to 100 knots. The last altitude he noted was 1,700 feet. Just as a collision appeared imminent, the unknown light halted in its westward course and assumed a hovering relationship above and in front of the helicopter. "It wasn't cruising, it was stopped. For maybe ten to twelve seconds - just stopped," Yanacsek reported. Coyne, Healey, and Yanacsek agree that a cigar-shaped, slightly domed object substended an angle of nearly the width of the front windshield. A featureless, gray, metallic-looking structure was precisely delineated against the background stars. Yanacsek reported "a suggestion of windows" along the top dome section. The red light emanated from the bow, a white light became visible at a slightly indented stern, and then, from aft/below, a green 'pyramid shaped" beam equated to a directional spotlight became visible. The green beam passed upward over the helicopter nose, swung up through the windshield, continued upward an

  3. This is so exciting...... by Brushfireb · · Score: -1, Offtopic

    This is so exciting.... That I am giving up porn. Forever.

  4. Python is a very nice lang. by zymano · · Score: -1, Offtopic

    How do we get more people using it and how can we get it more accepted for E-commerce like java?

  5. Re:I can think of one person... by Anonymous Coward · · Score: -1, Offtopic

    Irregardless, I should point out - because I do not think you will - that your website has a banner that reads "Oppose Imperialist War on Iraq", which is just as, if not more, ridiculous a claim today than it was 4 months ago, and a bodycount link to iraqbodycount.net, which exaggerates the number of Iraqis who have perished defending their freedom from the grasping Baathist hoods. You know, I don't care if youre a leftist - plenty of useless, illogical people who disagree with reality in the world and I am used to them - but I think people should know when their hard-earned money supports un-American blowhards whose rheatoric is hurting people.

  6. SlashdotChaseManager Message #3 by SlashdotChaseManager · · Score: -1, Offtopic

    Hello remaining 17 participants,

    As you may have noticed, two of my previous messages have been deleted from the system. I can only conclude that the contents of those messages were deemed too dangerous for the Slashdot community at large. As a result, the Slashdot Chase has been closed until further notice.

    Thank you for your participation, and love,
    /. Chase Manager