Slashdot Mirror


Perl 6 Grammars and Regular Expressions

An anonymous reader writes "Perl 6 is finally coming within reach. This article gives you a tour of the grammars and regular expressions of the Perl 6 language, comparing them with the currently available Parse::RecDescent module for Perl 5. Find out what will be new with Perl 6 regular expressions and how to make use of the new, powerful incarnation of the Perl scripting language."

36 of 202 comments (clear)

  1. Perl goodness by Zorilla · · Score: 5, Funny

    HXGF*&#$()#P*&ULJKDFHV)(&*#$utrhk:jlhdsf(p*&#$OJDF >KLJDFP)(*$#&pyu:

    Crap, I think I just accidentally programmed a web browser in Perl

    --

    It would be cool if it didn't suck.
    1. Re:Perl goodness by b12arr0 · · Score: 5, Funny

      Uh, that's not a web browser, it's clearly a web server.

    2. Re:Perl goodness by Zorilla · · Score: 4, Funny

      It curls fries, too, you know.

      --

      It would be cool if it didn't suck.
    3. Re:Perl goodness by jandrese · · Score: 4, Insightful
      There are two things about regular expressions:
      1. Perl chose a keystroke-efficent syntax that makes them unreadable to anybody who doesn't know how to read them. It also made them very compact and easy to write for anybody who does know how to read them. They look very intimidating, but underneath they are usually easier to understand than the C like perl code surrounding it.
      2. They are amazingly useful. Seriously, if you have never learned about Regular Expressions you owe yourself a lesson in how they work and what they do. I've seen people spend days working on stuff that can be written (more efficently!) in a regular expression in a matter of minutes. Pattern matching is the sort of thing that every general purpose language should have, it is a shame that the basic Regular Expression libraries that comes with most Unixes is such a piece of crap. Who wants to deal with the arcance invocation method, the extremely limited syntax, or the syntatic sugar like: "[[:digit:]]{2}:[[:space:]][[:space:]]*[[:alpha:]] *" when you could write "\d{2}:\s+\w*"?
      --

      I read the internet for the articles.
    4. Re:Perl goodness by Anonymous Coward · · Score: 5, Interesting
      A web browser? That's:
      perl -MHTML::Strip -MIO::All -e 'print HTML::Strip->new->parse(io($ARGV[0])->scalar )'
      A web server? That's:
      perl -MIO::All -e 'io(":8080")->fork->accept->(sub { $_[0] < io(-x $1 ? "./$1 |" : $1) if /^GET \/(.*) / })'
    5. Re:Perl goodness by ajs · · Score: 3, Insightful

      Perl chose a keystroke-efficent syntax that makes [regular expresssions] unreadable

      No, it most certainly did not. Regular expressions as they exist in Perl today are a direct descendant of POSIX regular expressions which derive from the original work done by Ken Thompson (which resulted in the grep program, which stands for "global regular expression print"). That syntax further dates back to the giants in the field of computational theory, and was specialized only slightly for text matching.

      grep, awk, sed, ed, vi, emacs, and dozens of other programs and languages for Unix used this notation before Perl came along and adopted it, so let's not pretend that this syntax is somehow Perl's doing.

      The extended regular expression syntax of today IS perl's doing and in almost all cases it has been a process of making regular expressions both more powerful and more readable, culminating in Perl6's rule syntax which is highly readable by comparison.

    6. Re:Perl goodness by ajs · · Score: 3, Informative
      Larry Wall could have chosen a different syntax

      There really aren't many choices. The current regular expression syntax is the only form I've seen tried, with only minor variation.

      as he has done somewhat with the Perl 6 expressions

      Perl 6 regular expressions have almost exactly the same syntax as Perl 5. The parts that are new are not regular expressions. Cosmetic differences (like [] vs <[]> are fairly ignorable syntactically. It would be like saying that Perl 5 will use // as the comment character instead of # (not that it will, just an example).

      All of the inline comments and whitespace are part of Perl 5 extended expressions, though the word-matching on whitespace is new to Perl 6.

      POSIX on the other hand ignored most of that historic syntax and instead chose their horribly bloated keyword syntax.

      That's not really part of the regular expression syntax. Having [[:digit:]] as an alias for Perl's \d is hardly a different syntax so much as sugar. The fundamentals of POSIX regular expressions are the fundamentals of all modern regex syntaxes:
      • alphanumerics are literals
      • backslash is a character escape
      • parens are used for grouping
      • *, +, ? and {} are repeat count specifications
      These are the fundamentals of Perl regular expressions, POSIX, and all of the other modern regular expression engines and in turn have only a few small differences from the basic regular expressions which Unix started with.

      I've often thought that the ease with which regular expressions can be accessed within per was a blessing and a curse. So many people like yourself seem to think that Perl championed regular expressions, when in fact it just followed AWK's lead in integration between C and Ken Thompson's regular expression implementation (which in turn inspired the version that was written from scratch by Henry Spencer and used by Larry for Perl).

      If you have a new syntax in mind, I suggest introducing it and seeing how it does. Modern regular expressions are an incremental improvement on classical set notations, and have served us well to date, but I'm sure someday someone will see a better way.
  2. Grammar by dprust · · Score: 4, Insightful

    It is good to see PERL focussing on what makes it great. There is no other language, IMHO, that handles text input as well as PERL does. Adding this level of processing just makes it even more powerful.

    1. Re:Grammar by Mr.+Muskrat · · Score: 5, Informative

      Perl never was an acronym. It's a backronym.

      Perl is the language and perl is the interpreter. Remember, "only perl can parse Perl" and it's easy to remember.

  3. hrm... by Anonymous Coward · · Score: 5, Funny

    "...zztop-wants-a-perl-necklace dept."

    i do not think that means what you think it means.

  4. Re:Big problem by WWWWolf · · Score: 3, Informative

    The idea of :p5 is not just that you can take Perl 5 code and modify it to make it work.

    The idea is that if you don't bother to write a zillion-rule grammar to match whatever you're trying to match, you can still use the P5-style regular expressions you know and love. It's another case of Not Swatting A Fly With The Nuke.

  5. Why would a satisfied Perl5 user migrate? by winkydink · · Score: 4, Insightful
    What does Perl6 offer a satisfied Perl5 user? Is it faster? Smaller?

    To this user, the last several releases (5.x) have looked more like opportunities for continuing royalty streams for perl authors (new versions of old books) than significant releases.

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

    1. Re:Why would a satisfied Perl5 user migrate? by Speare · · Score: 3, Informative

      From what I've seen, it's more amenable to modular libraries and structured design. As for basic scripting where you may not even use a "package" statement, you probably won't care.

      --
      [ .sig file not found ]
    2. Re:Why would a satisfied Perl5 user migrate? by WWWWolf · · Score: 5, Interesting

      Yeah, Perl 5 hasn't changed that much over time. But it has been around for a while. Perl 6 is just different.

      From what I have seen from the announcements, the Perl 6 syntax looks far cleaner, probably more consistent and less ugly. Some of the new tricks look genuinely handy. For example, if it seems like type checking would be a good idea, you can have it if you want it, even on compile time!

      Especially the regular expressions side seems pretty interesting, as noted in this article. Regular expressions have always been a poor but effective replacement for grammar-based parsing, and now finally Perl is going to have both integrated. There's probably going to be less whining about line noise.

      And then there's something that I find especially interesting, though it hasn't been explained in detail yet: Complete tuning of the object system. In case you haven't noticed, Perl 5's object system is a complete and utter mess that looks and smells like it has been added as an afterthought, and rest assured it's going to be changed radically for better in Perl 6. I'm definitely waiting eagerly to see what Perl 6's take is going to look like - I sure hope it's something like Ruby, only it smells like a camel =)

    3. Re:Why would a satisfied Perl5 user migrate? by jonadab · · Score: 3, Interesting

      > What does Perl6 offer a satisfied Perl5 user? Is it faster? Smaller?

      It features better support for key paradigms, including object-oriented
      programming (finally, a real object model), functional programming (we're
      getting continuations), and even some improvements for contextual programming.
      In other words, Perl6 will be a substitute not just for Perl5 but also for
      Scheme and Smalltalk.

      Also, the whole Parrot thingydoo is going to allow software written in one
      language to seamlessly use libraries written in another language, without all
      the ugly messing around you have to do to accomplish that in Perl5. You'll
      be able to construct a complete data structure in Perl code and pass it to
      a library written in Python, for example.

      Read the Apocalypse articles twice. The first time you'll recoil in utter
      horror. (I did.) Then read them again, and you'll be very excited. I am.
      The bummer is that we're still a while off from the release of 6.0

      --
      Cut that out, or I will ship you to Norilsk in a box.
  6. Re:Big problem by Speare · · Score: 5, Informative
    Um, ALL PERL CODE IS TREATED AS PERL5 CODE unless you use a specific Perl 6 keyword in your script. Perl 6 interpreters will not require you modify your scripts AT ALL to use Perl 5 scripts.

    Therefore, it's just Perl 6 scripts which want to use Perl 5 regular expression syntax, which would want to use the :p5 modifier.

    Don't get your knickers in a bunch.

    --
    [ .sig file not found ]
  7. aw hell by The+Unabageler · · Score: 5, Funny

    I'm going to have to rewrite my sig.

    --
    perl -e '$_="\007/4`\cp%2,".chr(127);s/./"\"\\c$&\""/gees; print'
  8. Re:Ok, start the flame wars under this post by TheFlyingGoat · · Score: 4, Insightful

    Those of us that use Perl as more than just system duct-tape know it's a programming language. Perl 6 will make that even more clear by being based on OO fundamentals rather than being a procedural language with OO tacked on top of it. This is just another debate that makes the OS community look like a bunch of freaks and zealots... just like the GNU/Linux thing. Get over it and start focusing on what the software does, not how to classify/name it.

    --
    You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
  9. wow, looks like boost::spirit by Sebastopol · · Score: 3, Interesting

    I'm surprised by the regex grammar. It looks a lot like how I use boost::spirit::rule for parsing regex in C++:

    Perl6---

    # note this is just a language example, not an accurate name matcher
    grammar Names
    {
    rule name :w { };
    rule singlename { + };
    };

    C++::boost::spirit--- // rule for parsing a token string
    rule split = *(*space_p >>
    (+graph_p)[append(tok)] >>
    *space_p);

    msg "Parsing input\n"; // 1. Parse declarations
    while (!header_ok && getline(input, line) && input.good())
    {
    tok.clear();
    parse(line.c_str(), split);

    There are even grammar classes in Spirit.

    I sure hope perl6 is faster! ;-)

    --
    https://www.accountkiller.com/removal-requested
    1. Re:wow, looks like boost::spirit by TimToady · · Score: 3, Informative

      The intent is that grammars default to recursive descent, but that it be possible to ask for various kinds of optimizations via pragma. The grammar for parsing Perl 6 itself will be a hybrid between top-down and bottom-up techniques to maximize both speed and flexibility.

    2. Re:wow, looks like boost::spirit by ajs · · Score: 3, Interesting

      Perl 6 will probably not be faster than boost, but keep in mind that you also gain the power of a fully dynamic programming language in Perl 6's rules. Rules act as closures and can also contain Perl 6 code. Hypothetical variables are really going to blow people's minds (I know they took me a while to grasp, and when I did, I just sat around saying "wow" for a while :-)...)

  10. Re:Ok, start the flame wars under this post by Anonymous Coward · · Score: 3, Interesting

    I've got a fully multithreaded perl script running under Win32. It wasn\'t too bad to write but some parts sucked. One of the things sucked because Win32 doesn't support alarm() calls and you have to manually poll sockets and I hate that shit, or use vec() and that's just insane (how many people understand how vec() works anyway???) The other big thing that sucked was the crappy mechanism for sharing complex data structures between threads. All's honky-dory if you're just sharing a scalar variable, but don't try a HoH or anything like that cause you'll find out that there's no auto-vivification, and you have to manually create the data structure in every thread that needs to access it, and that right there leads to lots of possibilities for "interesting" bugs to show up (it's not very fun to code or pretty to look at or understand once it's done).
    But anyway none of that is really here nor there, but I just wanted to comment that Perl "scripting" isn't just CGI forms and stuff like that. You can really do complex applications in Perl. It's a full-featured language, portable (moreso that even Java, just look at the list of archs in the Configure.sh script), and able to handle most tasks that don't require a tiny memory footprint or direct CPU register manipulation.

  11. Adoption by base_chakra · · Score: 4, Interesting
    Years ago, Eric Raymond wrote:
    "Perl XS is acknowledged to be a nasty mess. My guess is the Perl guys would drop it like a hot rock for our [Python's] stuff -- that would be as clear a win for them as co-opting Perl-style regexps was for us." [emphasis added]
    Maybe I misinterpreted ESR's intended message, but it would be disappointing if hypercompetition prevented Perl's already-influential regex extensions from exerting a positive influence on other platforms. Raymond seems to imply that the Python team only grudgingly included support for Perl-style regex. I understand that developement teams in similar niches each want to make a big splash in the industry, hopefully Python's great increase in popularity has softened the survivalist attitude that seems to characterize this Raymond quote from Python-Dev. Evolving regex can benefit everyone.

    Note to those ready to mod me Troll/Flamebait: I'm not trying to pick on Python, I just happened to be acquainted with this candid quote.
    1. Re:Adoption by Black+Perl · · Score: 4, Insightful

      Yes, you did misinterpret the message. Eric Raymond was a former Perl programmer, and is now a Python programmer. He was saying that Python's native-code-binding facility is superior than Perl's XS, and it would benefit Perl to adopt it. He mentions that Python benefitted from adopting Perl's regex syntax. Nowhere does he say or imply it was "grudgingly" done.

      By the way, not long after he wrote that, Perl coders started using the Inline:: modules like Inline::C instead of XS, which is very easy to use. I do not know if this was an adoption of Python's technique, but I don't think so.

      --
      bp
    2. Re:Adoption by kavau · · Score: 3, Informative
      I don't know the context of the quote, but to me it reads more like this: "Python benefited greatly from adopting Perl technology in the past. I hope the Perl guys will be as open-minded as we are."

      Not much hypercompetition there, if you ask me. But then, it might as well be me who misunderstood the quote.

  12. Complex grammars in Perl by imnoteddy · · Score: 4, Insightful
    I can understand a desire for adding grammars that are more powerful than regular expressions in Perl 6 but it opens up a whole new can of worms.

    The grammars appear to be in a class called "context free languages"(CFGs). Some CFGs are ambiguous in the sense that a given "sentence" can be derived from more than one set of rules. Traditional tools such as yacc/bison tell you where there is ambiguity in your rules - even then it isn't always easy to remove the ambiguity (trust me on this). If the Perl 6 system doesn't help the programmer debug the grammar he/she will not be happy when the parsing doesn't work as expected.

    In addition, the article ends the description of features with "And much more...". It appears that Perl 6 grammars are more powerful than CFGs. If they can simulate a Turing machine...

    --
    No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
  13. Re:Big problem by Zaak · · Score: 5, Informative
    Meaning that it is not backward compatible without modifying your source code.

    Thus spake Larry Wall in Apocalypse 5:
    ...we took several large steps in Perl 5 to enhance regex capabilities. We took one large step forwards with the /x option, which allowed whitespace between regex tokens. But we also took several large steps sideways with the (?...) extension syntax. I call them steps sideways, but they were simultaneously steps forward in terms of functionality and steps backwards in terms of readability. At the time, I rationalized it all in the name of backward compatibility, and perhaps that approach was correct for that time and place. It's not correct now, since the Perl 6 approach is to break everything that needs breaking all at once.

    And unfortunately, there's a lot of regex culture that needs breaking.

    And from Apocalypse 1:
    It would be rather bad to suddenly give working code a brand new set of semantics. The answer, I believe, is that it has to be impossible by definition to accidentally feed Perl 5 code to Perl 6. That is, Perl 6 must assume it is being fed Perl 5 code until it knows otherwise.

    In other words, it is backwards compatible, it isn't backwards compatible, and when you install Perl 6, you are installing both.

    TTFN
  14. You're a language bigot by Mr.+Muskrat · · Score: 3, Interesting
    Sounds to me like you prefer PHP and therefore spent more time perfecting your understanding of it. If you know and understand a language (any language) your work will require less time and will (surprise, surprise) be easier for you.

    (I'm a recovering language bigot so I can say this. :-P )

  15. Pet Project by orlyonok · · Score: 3, Funny

    I'm studing seriously the posibility of tackling a whorty coding proyect, the rewriting of the entire LINUX kernel on a languaje very much but not unlike C and was considering doing it in C-INTERCAL but after seing things like this http://ozonehouse.com/mark/blog/code/PeriodicTable .html , I changed my mind and will use PERL 6 instead.

    --
    And I have prayed unto You, O Lord U**X in the time of the Will of Linux.
  16. Re:An anecdote and an opinion. by adamruck · · Score: 4, Insightful

    I think you went about things the wrong way. Why would you ever look at the nitty gritty syntax rules first when trying to learn a language. First do some simple examples to get the general feel of the langauage. Then learn the nitty gritty stuff as required.

    "IMO, "the right job" for perl is about 2% of all programming tasks out there."

    76 percent of all statistics..... You get the point. You really dont have any valid point here, every language is designed to do certain things, and people will use it for those things and more. Trying to say whats the best langague out there is stupid. Trying to say what percent of projects perl should be used on is also stupid.

    "It can accomplish this, but not without the reader having to go through the mental gyrations of what could be best called linguistic decompression."

    Have you tried to program in a logical language lately? Have you tried to program in a functional language lately? Have you tried to program in anything but your standard imperical/oo language lately? There are tons of styles of languages, and each one required its own linguistic decompression. Which one feels more natural its a matter of opinion.

    --
    Selling software wont make you money, selling a service will.
  17. Re:An anecdote and an opinion. by MrBoombasticfantasti · · Score: 3, Interesting
    I tried to absorb the syntax docs one afternoon, but it gave me nightmares. [...] Ever since I've been haunted by perverse unreadbility of it all.

    When I started to learn Perl (coming from a C background) I had quite a different experience. I really felt I had "come home", or something like that. Sure, you can write obscure code, but that's no different from C. But you don't have to, it can be very clear.

    I'll give credit to the fact that perl is compact, terse, to the point and has a reputation for string manipulation.

    I just love it for the short development times, and the fact that you can really use it for just about any environment. Want to do CGI? Sure! Just GUI? No problem! Connect to about every database there is, that's no biggie.

    And what about CPAN? That's a part of Perl too! You get all that ready-to-run code for just about any problem domain.

    IMO, "the right job" for perl is about 2% of all programming tasks out there.

    Maybe you are right, but somehow I get a lot of those 2% jobs... ;-)

    Now you have to excuse me, I have some perl coding to do! ;-)

    --
    !ERR: Signature not found.
  18. I can't remember all that! by warrax_666 · · Score: 4, Funny

    ... so when I need a webserver, I just

    $ cat /dev/urandom | perl

    It usually works in 3 tries or less.

    --
    HAND.
  19. Let me guess.... the usual Perl backlash by Anonymous Coward · · Score: 5, Interesting
    I get sick of the 'standard' backlash every time a Perl article is posted. Why do people have such a problem with Perl? It's an excellent, high-level general purpose programming language with a huge range of extension modules available. I have personally used Perl for many projects, as do TicketMaster, ValueClick, Morgan Stanley and Ryanair and I've also learnt a lot about software engineering and computing through Perl.

    Yes, it does include a lot of symbols, but there is payback to learning them, and really most programs won't use much beyond $ % # () [] {}. Unlike some languages, Perl is not what I would describe as a 'bondage' language. If you want to program sloppy, you can program sloppy. That's fine by Perl. And this generousity is what gives Perl its bad reputation. This is funny since I and most knowledgeable Perl programmers can write perfectly clear and maintainable code. The way we do this is no secret--it's just by commenting appropriately, using meaningful identifier names and following the Perl style guidelines.

    People can mock Perl all they like, but it is still a widely used powerful programming language and I am more productive in it than any other language. As a parting comment, a Cisco employee once told me (off the record of course!) that "Cisco would fall apart without Perl".

  20. Re:An anecdote and an opinion. by runderwo · · Score: 3, Insightful
    Perl is a language, so it follows that it is a communication medium. By that it should be able to communicate something to a party outside just the author and the perl interpreter.
    Perl source does communicate, with people who know Perl. That's like saying English is a useless language because it is constructed ad-hoc and because the complainer has never been bothered to learn it. The fact that some people find English difficult makes English no less useful to people who most easily express or comprehend ideas in it.
    IMO, "the right job" for perl is about 2% of all programming tasks out there.
    Nice statistic. Where's your breakdown of all programming tasks, and the reasoning for the other 98% why Perl is not the right tool for the job?
    This is evident by the fact that even though perl was the prominent CGI language of the mid-nineties, it lost the overwhelming majority of that interest with alarming speed.
    That has nothing to do with Perl the language, and everything to do with the shift towards languages which are designed to execute within a web server process without forking. mod_perl fills this hole, but as a general purpose language it is not as tightly integrated with a web server environment as something like PHP or ASP.
  21. Re:Big problem by ajs · · Score: 3, Informative
    As others have pointed out, Perl 6 interpreters (at least the default one that is Parrot-based) will hand your code off to Ponie or something like it by default. You will have to start your program with the module keyword or the use 6 statement to force Perl 6 behavior, or use a special binary (e.g. something like /usr/bin/perl6).

    The :p5 modifier is not there for backward compatibility so much as to allow the programmer to choose the model of regular expression to use. There are trade-offs. Here are two Perl 5 regular expressions:
    m{[a-z][A-Z]+}
    m{^(?:\w+\d|\S+(?:\'s)?)$}
    which are written in Perl 6:
    m{<[a-z]><[A-Z]>+}
    m{^[\w+\d|\S+[\'s]?]$ }
    Note that Perl 5 syntax is actually a bit nicer for the first one, so you can continue to use Perl 5 syntax there. In the second case, the new bracket-operator is very handy for enclosing sub-expressions that don't have to be remembered in the positional variables (the same as the Perl 5 (?:...) operator). You can even mix them:
    $r1 = rx:p5{[a-z][A-Z]+};
    $r2 = rx{[\w+\d|\S+[\'s]?]};
    $r3 = rx{^[<$r1>|<$r2>]$};
    Perl 6 is about making the things that you're going to need to do the most often much easier and much more supportable in very large projects. Relax and enjoy it, it's going to be a great ride.
  22. Re:so many for so long by Anonymous Coward · · Score: 3, Funny

    And once you get used to RUBY, you'll never go BACK to PERL. You'll ALSO stop spelling random words that aren't acronyms all in caps, like Perl, Ruby, Java and Unix.