Slashdot Mirror


Regex Golf, xkcd, and Peter Norvig

mikejuk writes "A recent xkcd strip has started some deep academic thinking. When AI expert Peter Norvig gets involved you know the algorithms are going to fly. Code Golf is a reasonably well known sport of trying to write an algorithm in the shortest possible code. Regex Golf is similar, but in general the aim is to create a regular expression that accepts the strings in one list and rejects the strings in a second list. This started Peter Norvig, the well-known computer scientist and director of research at Google, thinking about the problem. Is it possible to write a program that would create a regular expression to solve the xkcd problem? The result is an NP hard problem that needs AI-like techniques to get an approximate answer. To find out more, read the complete description, including Python code, on Peter Norvig's blog. It ends with this challenge: 'I hope you found this interesting, and perhaps you can find ways to improve my algorithm, or more interesting lists to apply it to. I found it was fun to play with, and I hope this page gives you an idea of how to address problems like this.'"

172 comments

  1. FWIW, the Regex Golf game by Amorymeltzer · · Score: 5, Interesting

    http://regex.alf.nu/

    Some favor trickiness, some favor just listing possibilities, but it's fun. I'm at 3651.

    --
    I live in constant fear of the Coming of the Red Spiders.
    1. Re:FWIW, the Regex Golf game by Garridan · · Score: 3, Informative

      Or, if you want to try your hand at meta regex golf there's a place for that, too.

    2. Re:FWIW, the Regex Golf game by Anonymous Coward · · Score: 0

      Or just search for the github page and you'll easily hit 4061. ;-)

    3. Re:FWIW, the Regex Golf game by schitso · · Score: 1

      Why would you do this? I was hoping to actually have a productive day.

    4. Re:FWIW, the Regex Golf game by hugetoon · · Score: 1

      Yep, lots of fun, thank You.

      Got 3912 but I'm really confused by the "Alphabetical" level , I could craft a regex to discriminate both set that scores at 262, however I suppose it is considered cheating since I do not implement any underlying "principle" here because I cant recognize it. Any clues of what was the intent of this level?

  2. Is there a regex... by Hognoxious · · Score: 3, Funny

    ... that can find frost psits?

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:Is there a regex... by syockit · · Score: 1

      Now I know where Frosty Piss's username came from.

      --
      Democracy is for the people; you only vote once per season and we'll do the rest of the work for you don't have to.
  3. Meta-meta-meta-meta.. by Travis+Mansbridge · · Score: 2

    Now build a robot that can design a program to write a regex expression to solve the xkcd problem.

  4. The Motion Picture by Anonymous Coward · · Score: 0

    Why is The Motion Picture not considered the subtitle for the first Star Trek movie?

    1. Re:The Motion Picture by lgw · · Score: 3, Funny

      We fans know the first one is really "Star Trek: The Tone Poem".

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re:The Motion Picture by Anonymous Coward · · Score: 0

      I agree that it was awful, but I don't know why Norvig left it out.

    3. Re:The Motion Picture by JigJag · · Score: 1

      The Motion Picture does not pass the regex: /M | [TN]|B/
      1) no word ends with 'M'
      2) no word starting with 'T' or 'N' is preceded with a space
      3) no B at all

      So, I venture that "The Motion Picture" *is* considered the subtitle for the first Star Trek movie.

      --
      "The hallmark of humanity is the ability to move beyond sensory inputs" - Mary Helen Immordino-Yang
  5. RegExps by ledow · · Score: 5, Interesting

    Regexp's are a programming language unto themselves.

    I'm currently doing some temp IT work for schools while my promised job becomes available and it's eye-opening. The web-filtering is all reg-exp based but nobody understands how it works.

    They just copy/paste an example and change the parts of the URL that they can see to match the one they want. They barely bother to test the impact, past the site they need becoming "unfiltered" or "filtered" as necessary (i.e. no implication of knock-on effects on other sites with similar names). Let's not even mention the use of "." without the escape character for them to mean a literal period (but, obviously, it means "any character" in a regexp).

    I talked to them about changing their template regexp because, from the start, I could see that it wasn't really up to the job and just met if not opposition then at least apathy about the problem.

    Until someone brought an iPad into the helpdesk where a site that was supposed to be unfiltered was filtered - because nobody had considered what happens if you use "http://example.com" instead of "http://www.example.com". I was the one to spot it, and tell them that it's because their regexp was very basic.

    The good thing was, the other tech on the team was young and keen to learn and I was able to give them a quick rundown of regexps and we crafted an alternative template for them to use that would take account of the situation without, for instance, the blocking of "microsoft.com" affecting "antimicrosoft.com".

    But it is amazing how many people I know that work in IT have no idea how to program, no idea how to handle regexps, and just work on a "copy a working example" basis.

    1. Re:RegExps by Hognoxious · · Score: 3, Funny

      Regexp's are a programming language unto themselves.

      It would appear that apostrophes are too.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:RegExps by Anonymous Coward · · Score: 4, Insightful

      ...and then there's the people who think they understand regexes, but try to use them to decide context-free grammars.

    3. Re:RegExps by simplypeachy · · Score: 1

      Just wait till you play with Privoxy. It uses different sorts of regex for different parts of the URL pattern. You then have two additional problems, rather than just one ;-)

    4. Re:RegExps by Anonymous Coward · · Score: 5, Funny

      But it is amazing how many people I know that work in IT have no idea how to program, no idea how to handle regexps, and just work on a "copy a working example" basis.

      You will be truly amazed by the number of people who copy a not-working example...

    5. Re:RegExps by Anonymous Coward · · Score: 2, Informative

      If regular expressions are a programming language, then they are not a very powerful one. The languages they recognize are the so-called regular languages, which are the least expressive category in the Chomsky hierarchy of languages (note the difference between the language of regular expressions and the languages recognized by regular expressions). See the Wikipedia articles on regular languages and the Chomsky hierarchy for details.

    6. Re:RegExps by Anonymous Coward · · Score: 0

      But it is amazing how many people I know that work in IT have no idea how to program, no idea how to handle regexps, and just work on a "copy a working example" basis.

      I'm not. Regular expressions are often too hard to understand, too hard to create correctly without consulting a manual all the time, and there's no requirement for someone to program if administering servers. There's a REASON why we've got so many IT people around - it's relatively easy for any typical computer geek to become one. But if it required skills that you find amazing that people lack, it's be a very unpopular profession indeed.

    7. Re: RegExps by LordSnooty · · Score: 2

      I don't know, apostrophes can be used to indicate missing letters and it helps to highlight that regexp isn't really a word (yet).

    8. Re: RegExps by Anonymous Coward · · Score: 2, Insightful

      Calling backslashes "whacks" disqualifies you from being involved in such discussions.

    9. Re:RegExps by Anonymous Coward · · Score: 1

      But it is amazing how many people I know that work in IT have no idea how to program

      Is it? Programmers routinely make 30% more. Why should an IT person, with their entire forte of knowledge and skills, ever need to take on someone else's and even if they did, why would they keep their low salary.

      It is akin to an auto mechanic that knows how to use a CNC machine to make replacement parts. Sure it would be convenient to hire a $10/hr mechanic that can do $40/hr G-code, but what is in it for the mechanic?

    10. Re:RegExps by Anonymous Coward · · Score: 1

      This is pretty common. It is not some new, strange thing only you have seen:

      http://en.wikipedia.org/wiki/Cargo_cult_programming

      People go through life with blinders on, because life is complicated and our minds try to ignore things that it figures aren't important. A lot of people think details of one sort or another aren't complicated or relevant, then complain when things don't work for some reason. That's why empirical reasoning and the scientific method is cool; it works against this sort of logical fallacy.

    11. Re:RegExps by Anonymous Coward · · Score: 0

      Why should an IT person, with their entire forte of knowledge and skills, ever need to take on someone else's [job] and even if they did, why would they keep their low salary.

      It is akin to an auto mechanic that knows how to use a CNC machine to make replacement parts. Sure it would be convenient to hire a $10/hr mechanic that can do $40/hr G-code, but what is in it for the mechanic?

      Few career programmers are interested in switching down to "general" IT positions, but you imply the converse is only a problem because of salaries.

      You put it as if having programming-enhancements for the IT tech, and CNC-enhancements for the auto-mechanic "knowledge and skills" are all we need to land the programming job or G-code position. In tech, real programmers find walls for jobs they rightfully match with factual resume experience. Just how much more easily is their wall to climb by aspiring job-switchers that come from under-rated skillsets and bring no day-to-day experience?

    12. Re:RegExps by Anonymous Coward · · Score: 0

      Programming languages do not have to be Turing complete.

    13. Re:RegExps by Anonymous Coward · · Score: 0

      Shame about the downmod. That one made me laugh out loud.

    14. Re:RegExps by Redmancometh · · Score: 2

      A $40/hr G-code monkey would be underpaid.

    15. Re:RegExps by Anonymous Coward · · Score: 0

      the so-called "regular expressions" which are actually used are typically extensions, sometimes fairly radical ones, of theoretical regular languages (which are worthy of their own interest, of course).

    16. Re:RegExps by Anonymous Coward · · Score: 0

      But it is amazing how many people I know that work in IT have no idea how to program, no idea how to handle regexps, and just work on a "copy a working example" basis.

      Not really. A lot of people like regexps because of how compact they are.
      The problem is that they aren't efficient nor readable. They may have their uses but if you can achieve the same result in a screens worth of C-code it is preferable.
      Compact one-liners are great as a hobby but doesn't belong in code that is supposed to be maintained.

      Admittedly I might just be burned by people who insists on using regexp for all text-matching, regardless of Chomsky hierarchy. (Yes, including html and xml.)

    17. Re:RegExps by atomicxblue · · Score: 1

      This is the internet -- very little surprises me anymore...

    18. Re:RegExps by Daniel+Hoffmann · · Score: 1

      "Regexp's are a programming language unto themselves."

      Wrong. I am a little rusty in formal language, but I am pretty sure that regular expressions are at least two steps below (normal) computer programming languages. I remember there being state machines with stack and turing machines above regular expressions.

      Also this is supposed to be a funny post, not informative ok? You know because I am taking what he is saying literally and without context.

    19. Re:RegExps by Anonymous Coward · · Score: 0

      Is it? Programmers routinely make 30% more. Why should an IT person, with their entire forte of knowledge and skills, ever need to take on someone else's and even if they did, why would they keep their low salary.

      Obviously, such an IT person might be able to get a better salary, for being "an IT person also able to aply some programming do do his work better."

      It is akin to an auto mechanic that knows how to use a CNC machine to make replacement parts. Sure it would be convenient to hire a $10/hr mechanic that can do $40/hr G-code, but what is in it for the mechanic?

      Even easier. Mechanics charge for 'work' and for 'parts'. If he can make parts cheaper than buying them from a car manufacturer, he pockets the difference. Car parts from alternative sources is already an industry of its own.

      Stop trying to fit people into too few categories. You have mechanics, you have skilled cnc operators, but also mechanics capable of doing the simpler cnc jobs. Just as some IT people can do some programming that most other IT people can't.

    20. Re:RegExps by Keybounce · · Score: 1

      You will be truly amazed by the number of people who copy a not-working example...

      You mean when Microsoft distributes a sample driver with a bug in it?

    21. Re: RegExps by Hognoxious · · Score: 1

      I don't know

      Indeed.

      apostrophes can be used to indicate missing letters

      So why isn't there one after reg?

      (If you don't know, ask the boatswain. He was on the forecastle last time I saw him).

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  6. ioccc 2013 US president matching code by hankwang · · Score: 4, Interesting

    The International Obfuscated C Code contest had a winning entry that could flag the names of US presidents as republican or democrat.

    main(int riguing,char**acters){puts(1[acters-~!(*(int*)1[acters]%4796%275%riguing)]);}

    Quoting: "This one-line C program accepts as a first command-line argument the last name of any of the last 31 US Presidents (from Franklin Pierce onwards), in lower case, and prints out their political affiliation. Use "republican" as the 2nd command-line argument, and "democrat" as the 3rd (or equivalent strings of your choice)."

    De-obfuscated, it is a boolean expression acting on a string s,

    (*(int*)s)%4796%275%4

    I wonder whether you can make a regexp that is shorter than this and accomplishes the same thing.

    1. Re:ioccc 2013 US president matching code by hankwang · · Score: 2
      Pardon, I forgot the "not" operator. De-obfuscated, it is a boolean expression acting on a string s,

      !((*(int*)s)%4796%275%4)

    2. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      I wouldn't say that's de-obfuscated -- I still have no idea what it does

    3. Re:ioccc 2013 US president matching code by hankwang · · Score: 4, Informative

      It takes the first four bytes of the president's name, converts them to int; then applies four modulo operations (%4796 %275 %4). How the author figured out that those four operations would do the job, who knows? Maybe brute-forced the parameter space.

    4. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 1

      main(int i, char **s){
          puts( s[ 1-~!(*(int*)s[1] % 4796 % 275 %i) ]);
      }

      because a[b] is the same as a+b is the same as b+a is the same as b[a].

      The "not" forces a boolean expression, which evaluates to 0 or 1. The bitwise not (~) flips the bits. For 1 the result is -2 (two's complement representation of -2 is 1111...1110). For 0 the result is -1 (two's complement...). So the index evaluates to 1-(-1)=2 or 1-(-2)=3 and the puts prints the second or third argument to the program, depending on the result of the boolean expression.

      For the boolean expression, s[1], the pointer to the first argument, is cast to an int pointer and dereferenced, so *(int*)s[1] is the first four bytes of the first argument interpreted as an int. Then there are 3 modulos. The last one, %i, is %4, because i is the argument count, including the name of the program as argument 0. If the result of this calculation is 0, then the boolean expression (before negation) is false, otherwise it's true.

    5. Re:ioccc 2013 US president matching code by Mes · · Score: 1

      Can someone explain, how are they using 1[..] as the name of an array? I tried making a simplified program and gcc kicked out the approriate error.

    6. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 1

      In C, arrays are really just pointers to the first element, and array[index] is defined through pointer arithmetic: It just means *(array+index). Addition is commutative, so you could just as well write index[array], because that is defined as *(index+array), which is the same as *(array+index).

    7. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      LRN2<ecode>

    8. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      Sorry, forgot to dereference:
      because a[b] is the same as *(a+b) is the same as *(b+a) is the same as b[a].

    9. Re:ioccc 2013 US president matching code by fisted · · Score: 0

      In C, arrays are really just pointers to the first element

      No.

      , and array[index] is defined through pointer arithmetic: It just means *(array+index). Addition is commutative, so you could just as well write index[array], because that is defined as *(index+array), which is the same as *(array+index).

      Yes.

    10. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 1

      Could you be less helpful?

    11. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      To you're no, care to explain?

      int array[32];

      and

      int* blah = &(array[0]);

      will both point to the exact same element in memory. If you do an equivalence operator on them, they'll be equal. There is, fundamentally, no difference. In fact, I can then later take

      *(int*)((int)blah+(3*sizeof(int)))

      and that would be the equivalent of

      array[3]

      In fact, if you ever look at the disassembly of some C code, you find that what it's doing isn't really far off of what I wrote in that rather nasty piece of code.

      Now, hopefully when I post this, all my code doesn't disappear.

    12. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      Could you be less helpful?

      Probably.

    13. Re:ioccc 2013 US president matching code by thogard · · Score: 2

      I think the subtletyâZ the objector had was that arrays and pointers are slightly different which is true In this context, an array is a pointer with potentially compiler allocated backing memory for the data while the pointer might not. A pointer will also have an address while the pointer used in array definitions won't have an address. Old compilers used to treat them identically but then again they used to treat pointers as integers as well. Modern compilers tend to know enough about the CPUs and have built in array checks that they do work slightly differently.

    14. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      Is the code supposed to be compiled for Intel or network byte order?

    15. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      According to the hint.text file, the program is written for a little endian CPU.

    16. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      No. Yes.

    17. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 1

      Try sizeof array vs sizeof pointer. In most contexts arrays decay onto pointers but they are not pointers.

    18. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      Um... How does this code handle the input of 'roosevelt'? Depending on which president you are talking about, either party affiliation would be correct (or incorrect).

    19. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      RTFM

    20. Re:ioccc 2013 US president matching code by fisted · · Score: 1

      int array[32];
      and
      int* blah = &(array[0]);
      will both point to the exact same element in memory.

      ``array'' doesn't point anywhere, it's an object in memory of size (32*sizeof (int)) and of type int[32]. [yes, C does have objects.]
      ``blah'' is another object, of size (sizeof (int*)) and of type int*
       
      Obviously these two things are not remotely "the same" (this was the original claim(*)).

      With this out of the way:

      If you do an equivalence operator on them, they'll be equal.

      In fact (array == blah) will be true in this case, but this is only because there is syntactic sugar involved, and the actual comparison done is (&array[0] == blah).

      *(int*)((int)blah+(3*sizeof(int)))

      No, this is utter gibberish, and even if some implementation choses to compile it, there would be undefined behaviour. (Protip: you cannot cast pointer types to integer types, and no don't argue that your implementation allows it, or that "heey, pointers are just integers to on the metal" (protip-inside-protip: there is no metal in the C Abstract Machine) If you go that route, you only demonstrate that you have no idea what C is)

      (*) Some more reasons for why 'arrays and pointers are the same' is grossly wrong in C:

      1. ``but i can assign an array to a pointer!'' -- try assigning a pointer to an array.
      2. in fact, try assigning an array /at all/ (no, initialization is not assignment)
      3. an array can hold multiple elements, a pointer can hold a memory address
      4. in expression context, the array "value" is a pointer to its first element -- if both were the same, that rule would be fairly meaningless.
      5. ...

    21. Re:ioccc 2013 US president matching code by fisted · · Score: 1

      I just saw this grossly wrong claim and since it was AC's i didn't bother to come up with a detailed explanation on why exactly it is wrong, but I didn't want it to become +5 Informative either. A likely thing to happen it seems.

      Since some more people asked or complainedI did reply, dear AC

    22. Re:ioccc 2013 US president matching code by fisted · · Score: 1

      arrays and pointers are not 'slightly different'. I replied in detail somewhere else in the thread, if you care about details.

    23. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      the actual comparison done is (&array[0] == blah).

      False.

      you cannot cast pointer types to integer types

      False.

      what C is

      Spoilers available at 6.5.2.1 and 6.3.2.3.

    24. Re:ioccc 2013 US president matching code by fisted · · Score: 1

      Spoilers available at 6.5.2.1^W6.3.2.1#3

      (FTFY)
      I am fully aware of that subscripting in turn is syntactic sugar; I agree however that it wasn't the best way to put it. It was sufficient for the point I was trying to make, anyway.

       

      and 6.3.2.3.

      Well it explicitly says there that the results are implementation-defined -- i.e. outside the scope of what C defines.
      Kind of supporting my point, thanks.

    25. Re:ioccc 2013 US president matching code by Anonymous Coward · · Score: 0

      I am fully aware of that subscripting in turn is syntactic sugar; I agree however that it wasn't the best way to put it. It was sufficient for the point I was trying to make, anyway.

      We all forget to engage our brain sometimes.

      Well it explicitly says there that the results are implementation-defined -- i.e. outside the scope of what C defines.
      Kind of supporting my point, thanks.

      While the results might be outside the scope the claim was that you "cannot cast" which in this context means that a (strictly) conforming program shall not do that. Even then, the result is not completely useless. If intptr_t exists you can convert pointer to that and back and get the same pointer.
      Otherwise, it can be worked around with preprocessor and memcpy. The reason for using two paths instead of going always via memcpy route is outside the scope of standard, but let it be mentioned that for a non-strictly-conforming program "#error unimplemented" is a perfectly valid solution.

  7. Terrifying by simplypeachy · · Score: 1

    "Find it interesting"? I find it fucking terrifying. There's no way I'll be following that link. That's in bat country.

    1. Re:Terrifying by ysth · · Score: 1

      No, actually it is far from terrifying. Just some brute force generation and application of simple, short, regex fragments. No "AI-like", no "algorithms going to fly".

    2. Re:Terrifying by bunratty · · Score: 2

      The algorithm is an AI algorithm. It's using hueristics to attempt to minimize some function, as many AI algorithms do.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    3. Re:Terrifying by simplypeachy · · Score: 2

      Well now you've just ruined my terror. I'll have to go read up on the up-coming UK legal hilarity to restore it.

    4. Re:Terrifying by Anonymous Coward · · Score: 0

      That's just dressing it up to make it sound fancy though, when what it actually does is pretty simple.

    5. Re:Terrifying by atomicxblue · · Score: 1

      Sounds like someone's on their way to thereg..

    6. Re:Terrifying by HiThere · · Score: 1

      How about that. Most intelligence is just multiple layers of pretty simple stuff.

      If you don't believe that, why not? To me it seems quite likely.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  8. Re:Regex this by Anonymous Coward · · Score: 2, Insightful

    That other people have hobbies different than you is going to be quite common. If you get this angry every time you run into someone with a different hobby then I fear for your long term mental and physical health.

    Take deep breaths. It is going to be ok.

  9. Missed the point by NonSequor · · Score: 1

    I'd have been extremely surprised if solving for the shortest regex golf pattern for a pair of lists weren't NP hard. And greedy approaches are fairly obvious.

    The point is, that's what makes it analogous to golf. The optimal solution is your hole in one. Some greedy algorithm solution is your par. Those aren't the interesting areas, they're just the end points.

    For those who couldn't work it out for the,selves, here are the rules of regex golf:

    How many characters you use in your regex is your score
    Lower is better
    There's a par calculated for each hole, and totaled for the course, which serves as a frame of reference for performance

    --
    My only political goal is to see to it that no political party achieves its goals.
  10. I know, right! by Anonymous Coward · · Score: 0

    Every single human should be working towards curing cancer and getting in to space and solving the mysterious universe that we live in!

    Fuck people that have fun and hobbies!

  11. Hard AI by Okian+Warrior · · Score: 4, Insightful

    Regex me a list of folks that have time to sit around fucking off their life-time in order to write a regex to work on the XKCD "problem", and folks that don't.

    The field of study known as "AI" has been stagnant, for about 50 years now. One of the field's many problems is the lack of a good definition for intelligence.

    Despite lacking a definition, we have working examples intelligent systems in the real world - humans.

    Humans are very good at partitioning sets by descriptive differences, and they discover these descriptive rules largely by themselves.

    We don't know what intelligence is yet, but if we keep looking at problems and trying to figure out the human approach, eventually we'll have enough contrasts and similarity to partition sets based on differences in intelligence.

    In other words, the more problems we solve, the more data we can use to formulate rules that define intelligence.

    That's a pretty important and useful goal.

    (And belaboring the obvious: If we had even simple AI constructs we could automate much of out work force, freeing us up for more leisurely pursuits. Whether this leads to a post-scarcity utopia or unemployment/welfare apocalypse depends on your political affiliation.)

    1. Re:Hard AI by Eythian · · Score: 2

      That's not true. The field hasn't been stagnant at all. The problem is you're missing most of the field in your assumption of what it is.

      AI is not all about making "computer programs that can think" (a.k.a "Strong AI"), it's more about creating systems that can adapt to their environment in order to improve their chance of success. If their environment is getting lists of names and they adjust, without the direct input of a person, to better match them according to some scoring system, that is AI.

      It can use statistical methods, evolutionary characteristics, or just adjusting variables according to how wrong they currently are in order to try to match some problem. I'm not even touching symbolic AI here, because I don't understand it nearly so well.

      Strong AI is part of the field sure, but certainly not all of it as you imply. It is a very large field.

    2. Re:Hard AI by BringsApples · · Score: 1

      One of the field's many problems is the lack of a good definition for intelligence.

      Well, I don't mean to sound like I know everything, I certainly don't. But in my mind, 'intelligence' is the ability to make sense of Nature (or another way to say this is to be able to sense Nature). Those that don't have a good definition for 'intelligence' more than likely are living in a way that is not in tune with Nature.

      They're probably dorking around with some regex, trying to sort a list of stupid from silly. Not saying that you, sir, are (as from the tone of your response, it appears that you are a sensible person). But From my original post, I should have guessed that it'd be flamebait, rather than troll.

      --
      Politics; n. : A religion whereby man is god.
    3. Re:Hard AI by Okian+Warrior · · Score: 1

      AI is not all about making "computer programs that can think" (a.k.a "Strong AI"), it's more about creating systems that can adapt to their environment in order to improve their chance of success.

      Take a few moments and see if you can come up with a definition of Intelligence - one that can be used as a metric to see whether a topic falls within the field of study.

      Abstract Algebra has a precise definition, so it's pretty easy to tell whether something falls within the realm of Abstract Algebra. There are corner cases and some overlap, but the field is pretty-well delineated.

      In contrast, AI is all over the map. Anything and everything can be put under the term, including lots of stuff that more rightly belongs in other areas.

      If you can come up with a good precise definition, you'll be the first.

    4. Re:Hard AI by Anonymous Coward · · Score: 0

      Or, to put it another way, "I am unaware of the research in this field, and therefore have decided that this field is not investigating the problems it purports to solve. Let me present how I think this field I know little of should change: ..."

      lol

    5. Re:Hard AI by Eythian · · Score: 1

      I gave a definition for AI already. In fact, you quoted it. It's not the most precise one, but I don't have to provide a precise one. All I was doing was demonstrating that your implied definition was far, far too narrow.

      There are many things that are considered AI that don't match the range you suggested for it. Genetic algorithms, neural networks, language learning systems, robots that take information from their environment, discard it, and drive into walls. They're never* going to think, but they are rightfully AI.

      It's not a requirement of a field that it has tight definitions, especially one where different methods that seem in a similar spirit pop up somewhat regularly.

      In essence, writing a computer program that uses heuristical and adaptive methods to generate regexes to match/not match specific lists could be a form of genetic programming (though there are other options) which is comfortably under the AI umbrella.

      * never say never, but with the scale they're on at the moment, it seems a long ways off.

    6. Re:Hard AI by bunratty · · Score: 1

      This is why Russell and Norvig sidestep the issue in their book. They provide the notion of "behaving rationally", which is performing well according to some performance metric within a given environment. So essentially any problem for which we cannot write a program that quickly finds the best answer leads to some programs which can do better than others. The ones that do better behave more rationally than the ones that do worse. There really isn't any intelligence in the way that you mean in these programs. A translation program doesn't understand the sentences it's translating any more than a chess-playing program understands that it's playing chess any more than a sorting program understands it's sorting things.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    7. Re:Hard AI by retchdog · · Score: 1

      Since you brought it up, I should point out an interesting continuum I've noticed.

      On one end, we have stoner fuck-ups who think everything is an amazing manifestation of Nature (or Jesus, or Buddha, or whatever the fuck) that they're constantly and uniquely (maybe along with a few of their friends) in touch with. They can't really ever describe why, or convince anyone of anything, but they know that they are in touch with something transcendent, and if other folks are wearing blinders, so what? This is perfectly fine, of course.

      On the other end, we have people who've earned contact with Nature. Scientists like Feynman or Schroedinger, who have through incredible effort and ability actually achieved contact with the real thing which so many people experience only as a dim shadow on a cave wall or looking glass. Almost literally demigods; the first thing any of them would tell you, is to beware of that delusion, that cosmic banality and self-obsession, which drives the stoners on the other end of the spectrum. Not that it's bad per se. In fact, it can be an amazing and beautiful motivation, but it can also lead one to surrender and give up that natural skepticism which is as important as that thrill of exploration.

      So far, so good.

      Now what's interesting is, we have people in between. People like, for instance, me; people who can work and struggle and hope and dream, but will never be a Feynman, something which is all the more painful because we don't want the fame or whatever. We just know that something exists that we don't fully understand, but relish the partial understanding we do have. We just want more of that, but in the end we can't do it. Some of us give up in middle school algebra or earlier; some of us in intro to physics; some of us in E&M or differential geometry or nonlinear optics or whatever the fuck; some of us drop out of an advanced degree; some of us might even earn a doctorate, but at the end of the day, it just won't happen.

      What do we do? We know how to think, but we just can't think quite broadly or deeply enough. We can see the beauty of what we can't achieve, and on the other hand we can achieve plenty, but a plenty which falls short. We could give up entirely, of course, in whatever way; sell ourselves for the most money, or raise a family and call that good enough. Another choice is to revel in what we can do; that same mind that can't even approach quantum chromodynamics can perform a complete analysis of what regular expression is necessary for some problem, or of whatever program can generate whatever regular expression is necessary, or of how to optimize a rendering routine to run on totally inadequate hardware, or any number of other things.

      Perhaps the most fortunate, are the ones who never realize that Nature is a thing to be named. They simply think, unaware of a greater context, with an ignorance they are unknowingly fortunate of, solving and reveling in exactly those problems to which they are suited.

      But when someone like you starts talking about the glories of Nature, and the short-sightedness of those people in the middle of the spectrum, and blah blah blah, I always ask myself: is this person a stoner fuck-up, or is this person an unrecognized Feynman? ...

      Well, which one are you?

      --
      "They were pure niggers." – Noam Chomsky
    8. Re:Hard AI by Anonymous Coward · · Score: 1

      Let's consider the expression "the definition of insanity is doing the same thing over and over again and expecting different results." Such behavior would generally be described as unintelligent, so something that's intelligent should not exhibit this behavior at all, if possible. Doing the same thing over and over again is basically the computer equivalent of an infinite loop. If a computer program were intelligent it should be able to determine if a computer program is looping endlessly. This is more commonly known as the halting problem, which is just one of several undecidable problems. Intelligence is being able to recognize that fact instead of endlessly grinding away in the same manner.

      Humans can generally spot this type of behavior and stop the program themselves. Some are a bit more clever and can construct proofs about why computers aren't able to do that. We may have to start building different types of computers if we want them to be intelligent.

    9. Re:Hard AI by Antonovich · · Score: 4, Interesting

      You're both right. The term AI originally had much broader scope because (computer) people hadn't tried to implement it yet and realised that it was going to be *very* hard to do. Now people seem to use the terms Strong AI or Artificial General Intelligence for trying to get the kind of non-domain-specific "intelligence" humans (are supposed to) have, and "simple AI" just means tech that can adapt within a highly restricted domain (like search, navigation, etc.). The problem, of course, is that until we get an even remotely satisfactory definition of what "human intelligence" is, we aren't going to get very far. Unfortunately, it turns out that attempting to specify exactly what intelligence is ends up involving linguists, psychologists, biologists (neuroscientists, etc.) and worst of all, philosophers. Basically, no one agrees on anything and even when there are areas of agreement, each person seems to have deal-breaker elements that no one else agrees on. It gets messy, and quick. Worse still, it turns out that developmental/cultural factors (so development of an individual in a particular socio-cultural context) radically colours what kinds of definitions of intelligence people come up with, and the sorts of solutions it makes sense to think about (representationism vs enaction/actional theories, etc.). For example, a lot of work has been done on the psychological and cultural/historical effects of literacy (and numeracy) and show that someone growing up in a society in the Western Tradition (this is a specific term) that learns to read/write with an alphabet will often strongly prefer a particular view of the higher mental functions (language, thinking, "intelligence", etc.)(see John Goody, Roy Harris, David R Olson, and many more). If we want to create the tech then we need to go beyond that but it's incredibly hard to go beyond your own intellectual baggage. Researchers like Pei Wang have a bit of an advantage coming from a non-Western Tradition culture and he has some great thoughts on the different solution spaces that opens up. I like the way he uses levels (perspectives?) to look at this and the problem more generally. He talks about a Western bias for creating theories looking for "truth" (which has strong links to representationalist thinking) that he wants to avoid. It's certainly not impossible for someone from the Western Tradition (like Goertzel) to think otherwise but it requires significant amounts of meta-reflection - I have a theory that "works" but does everyone think it works? What kinds of people think it works, and what kind doesn't? Do people from all cultural traditions agree? What would my theory sound like to someone from the middle ages? Ancient Greece? Why would I come up with such a theory? How does it fit in to my view on other things (politics, religion, etc.)? If you scratch the surface you quickly realise that there is a vast world of stuff that affects even what we think it's interesting to think about. It's incredibly hard for us to not just assume that the Western Tradition is simply more advanced, purer and better. We are surrounded from birth with a moral framework (the UN, judeo-christian value system, etc.) and an educational system that promotes a Platonic view of pure forms and pure knowledge ("mathematics is the language of the universe", etc.). This sort of thing can even go completely out of whack, and result in the kind of stuff we saw with the Nazis. So if there is considerable debate on what it means to know something, then defining intelligence is not going to be a simple affair! I personally adhere to the school of thought that thinks that the best way to move forward is simply to re-create a human-like machine that acts/reacts in a way that most people would call "human". Almost everyone agrees humans are intelligent, so if it acts the same way then most of us will agree it's intelligent. This (almost) completely avoids the sort of shit we have to deal with that I mention above. Turns out it's far from trivial though...

    10. Re:Hard AI by cyborg_zx · · Score: 1

      If a computer program were intelligent it should be able to determine if a computer program is looping endlessly.

      And for certain programs you could certainly do that. The thing that is often forgotten about the Halting Problem when it is brought up in AI discussions is its application to things like the Goldbach Conjecture. If a Halting Problem algorithm existed it could be used to determine the truth of the Goldback Conjecture. That it is proven it cannot doesn't mean that the Goldbach Conjecture is like an infinite loop. It has a definite answer, it is just not currently known if it has a finite proof. No human intelligence has yet to demonstrate this and it is not a simple matter to say we can determine whether or not it will "loop endlessly" because we don't.

    11. Re:Hard AI by dkf · · Score: 1

      (And belaboring the obvious: If we had even simple AI constructs we could automate much of out work force, freeing us up for more leisurely pursuits. Whether this leads to a post-scarcity utopia or unemployment/welfare apocalypse depends on your political affiliation.)

      Or what it means to be rich and poor will be redefined. Having performances by skilled musicians in your home all the time used to be the mark of the fabulously wealthy, but now virtually anyone can afford to do it (provided you're satisfied with radio or recordings).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    12. Re:Hard AI by atomicxblue · · Score: 1

      The definition of intelligence I learned was the ability to take various inputs and act on them accordingly. There are various debates over the proper / fastest / correct way to do this, but it's in no ways dead. You just have to look to the facial recognition of the latest Xbox or the fact that last month the Kirobo project held the first human - robot NLP conversation in space.

    13. Re:Hard AI by atomicxblue · · Score: 1

      Norvig is a brutal teacher, but you learn quite a bit! The basics of research into machine understanding have already begun. Once you are able to teach a machine "emotions" then you're able to add weights to certain words based on the context.

    14. Re:Hard AI by dargaud · · Score: 1

      One of the field's many problems is the lack of a good definition for intelligence.

      Intelligence is the ability to draw correct conclusions from incomplete information. There, you have it.

      --
      Non-Linux Penguins ?
    15. Re:Hard AI by Anonymous Coward · · Score: 0

      So then intelligence is a matter of hindsight, since you cant/wont know the conclusions are correct until you have more-complete information.

    16. Re:Hard AI by BringsApples · · Score: 1

      is this person a stoner fuck-up, or is this person an unrecognized Feynman? ...

      Well, which one are you?

      We're all in the middle, sir. What, I think, you're referring to - regarding the extremes - are simply those that are trying, and those that are imagining trying (they try, but without effort). Richard Feynman said himself that he was a normal person, working with limited a capacity. What he had was what we call intelligence, and it was based on his ability to make sense of Nature. It was how he understood things at a really small scale that impressed us. Ultimately, his father started him thinking along these lines at a very early age, and he had a hard enough head to not be moved from his way of thinking (some people actually didn't like Richard because of his hard head).
      And for you to ask yourself whether or not people are a stoner fuckup, or an unrecognized genius, in my opinion, means that you have achieved a level of understanding of your own Self, and are able to extend that to others. This is, to me, is the truth behind genius. Discovering your own Nature, and seeing it everywhere. To me, that makes you smarter than Richard Feynman.

      --
      Politics; n. : A religion whereby man is god.
    17. Re:Hard AI by HiThere · · Score: 1

      How about:
      Intelligence is the ability to draw probably correct conclusions from incomplete information.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    18. Re:Hard AI by bob_super · · Score: 1

      See that big key on the right of your keyboard? The one that doesn't look like any of the ones on the left?
      It needs love. Caress it every now and then.

      Just like that.

    19. Re:Hard AI by Fatalis · · Score: 1

      Says "... and worst of all, philosophers." Proceeds to do philosophy.

      --
      Deus est fatalis
    20. Re: Hard AI by Antonovich · · Score: 1

      I think that qualifies as a whoosh...

  12. Re:Regex this by Anonymous Coward · · Score: 0

    Think of it this way: people who do things like this aren't out in public, adding to traffic on the highway (or possibly wandering out into the street aimlessly, getting hit by cars and making huge traffic backups). See, that's a plus for you.

    Or think of it another way: other people have no obligation to live their lives in any way that you may or may not approve of. After all, what's a bigger waste of time, writing regex golf code, or bitching about other people writing regex code?

  13. Re:Regex this by Threni · · Score: 1

    He works at Google on AI, and this problem is both amusing and interesting. Having two monitors is totally normal.

    Slashdot really has gone to the dogs, hasn't it. Would be great if there were some sort of technical test one could take here; the results of which could be used to filter out all the - for want of a better word - users.

  14. Re: Regex this by Anonymous Coward · · Score: 0

    Yeah, right. Look where you are.

  15. whitelisting with regexp by v1 · · Score: 3, Funny

    There's really no better way to scan a log file for odd log entries than to write a big regexp that filters out whitelisted entries. Lets you find log entries you're NOT expecting. (and occasionally, log entries that not even the developers are expecting)

    Editing them of course is a royal pain, (not to mention debugging!) so I usually write a script to compose the regexp. I just checked on one of mine, and it composes a 17,000+ character single-liner that scans my wired.log file.

    I've got a smaller one that keeps an eye on secure.log for anomalies.

    --
    I work for the Department of Redundancy Department.
    1. Re:whitelisting with regexp by Keybounce · · Score: 1

      Alright, can you provide your scripts that generate the log checkers?

      I can use that.

    2. Re:whitelisting with regexp by v1 · · Score: 1

      for what it's worth:
      http://vftp.net/virtual1/temp/comb

      the same technique can be applied to any log file that you can manage to parse consistently. devs that don't understand the need to escape delimiters in text fields can make your day go bad.

      --
      I work for the Department of Redundancy Department.
    3. Re:whitelisting with regexp by Keybounce · · Score: 1

      Thank you.

      Now, do you have a good way to deal with messages that are two-lines instead of one line?

      And what's wrong with apple's grep? I've been using the gnu one, just because the macports gnu base package installs so many things...

    4. Re:whitelisting with regexp by v1 · · Score: 1

      Now, do you have a good way to deal with messages that are two-lines instead of one line?

      Since SED and GREP only like to work on single lines, in some applications I will use TR to change linefeeds to some other character (like /x01) and do my replacements and then TR them back to linefeeds. I know SED supports pattern buffers but I have yet to take the time to play with it.

      And what's wrong with apple's grep?

      I forget. I know I've ran into problems in the past but I just grabbed an older build from an earlier version of OS X one time when I needed it. I honestly can't recall why I needed to do that. Either a behavior change or a bug I believe it was. Might have been a string length limitation that cropped up. (important if you're removing linefeeds)

      TR, GREP, CUT, and SED together tend to do whatever I need doing. I also use BBEdit quite a lot for handling large text files. Its find and replace supports regular expressions. For one-offs it's often easier to do transformations with BBEdit. If you haven't used CUT before, you'll want to look into it and all its modes of operation. Often CUT is a simpler option than SED.

      Final note: If your script works fine from command line but mysteriously fails when run by cron, check to see what is and isn't in cron's PATH. I've ran into that issue on several occasions when trying to do periodic batch file processing. (specific lack of SBIN iirc was the biggie)

      --
      I work for the Department of Redundancy Department.
  16. Re: Regex this by ShanghaiBill · · Score: 5, Funny

    Some of us programmers have families, a life, don't watch anime, and aren't basement dwelling nerds.

    Umm ... I just spent the last hour playing regex golf with my wife and kids.

  17. This problem has been studied for decades by DogPhilosopher · · Score: 5, Informative

    There's a field called Grammar Induction, and the problem of learning regular languages, aka regular inference, can be considered a subfield. People have been working on this since the '50s. Applications include learning DTDs for XML/wrapper induction, and all kinds of problems in bioinformatics and natural language processing.

    There's a strong link with the graph coloring problem, see
    http://www.cs.ru.nl/~sicco/papers/alt12.pdf

    In this field, the focus is generally on learning FSAs, but these can easily be transformed into regexps. There's work on learning regexps directly, see
    http://www.informatik.uni-trier.de/~fernau/papers/Fer05c.pdf

    Enjoy.

    1. Re:This problem has been studied for decades by thogard · · Score: 1

      The 70's research brought us Lex and Yacc. Bison and Flex just carry on the traditions.

    2. Re:This problem has been studied for decades by DogPhilosopher · · Score: 1

      I guess you're saying that it's still a legitimate subject, and that progress is made by building on previous results. Nobody is disputing that.

      But the TFA doesn't mention previous results or even the existence of the field, I have the impression that Norvig is not aware of it. So this is not contributing to the body of knowledge re automata induction, this is recreative CS. Nothing wrong with that per se, but there's also nothing wrong with providing some scientific background.

      Btw in the blog post, the approximative approach is motivated by NP-hardness of the exact problem. Given the link (parameter-preserving reduction) with graph coloring already mentioned, the problem can't be approximated in polynomial time with arbitrary error, unless P=NP. This theoretical result is backed up by practical experiments with approximating coloring algorithms, which often find solutions 100% off from the optimal nr of colors. So good luck with that.

      Also, informed regular inference is already NP-hard when restricted to DFAs with just 2 states:
      http://link.springer.com/chapter/10.1007%2F978-3-642-37064-9_25

      Perhaps surprisingly, the problem becomes tractable when the data is `complete', in the sense that it is consistent with just one single automaton (google RPNI).

    3. Re:This problem has been studied for decades by atomicxblue · · Score: 1

      Thanks for the links. Saved both of them and going to give them a good read through when I have the time.

  18. Cookies required? by Okian+Warrior · · Score: 1

    Be sure to allow cookies, else the site won't work.

    Cookies are an essential part of websites nowadays, dontcha' know...

  19. Re:Regex this by alienmole · · Score: 2

    Why does this upset you? Is it the recognition that people much more intelligent than you are spending time on problems whose significance you don't understand, and attacking them makes you feel better?

  20. Re:Regex this by Anonymous Coward · · Score: 1

    Take deep breaths. It is going to be ok.

    This is what I really can't stand in a teacher. Don't lie to your student. Don't sugar coat the hard truth. It's not going to be okay. He's going to die. Dead; no longer living. It may take time. It might be not that painful, but there's a fair chance that it's going to be awful. Just tell him the truth now. Don't say

    It is going to be ok.

    He will have to face it:

    In the end we are all going to be dead. Dead; DEAD.

    It might seem to him to be okay to be angry, about people with different hobbies, but if it causes him stress, make him take up smoking, or worse, restart smoking after giving up, and then die of a disease caused by his lack of tolerance, trust me he's not going to thank you for hiding the truth.

  21. Re:Regex this by RamiKro · · Score: 1

    It's a special case of automatic code optimisation: Since regular expressions should be reduce-able to logical gates, it stands to reason a minimized equivalent boolean function could be decompiled into a regex.
    On it's own, having compilers\interpreters that optimize regexes is beneficial. But how about you turn the table over and ask yourself this, "Can I design a high, expressive, programming language that could be fully optimised without sacrificing human readability and productivity?" As far as I can tell, this is the holy grail of system research, or what's left of it nowadays...

  22. NP-hard? by Anonymous Coward · · Score: 1

    Sure, it's NP-hard, but it's also nonrecursive via Rice's theorem.

    Calling this NP-hard is like calling the Andromeda Galaxy "bigger than the average rhino".

    1. Re:NP-hard? by Anonymous Coward · · Score: 1

      For clarification, this is referring to the problem of writing a program that identifies regex finders. The problem of finding the shortest regex is likely NP-hard, but it's not as obvious as the article attempts to imply.Giving a shitty algorithm to solve a problem, and then saying that it would also solve SET COVER, does not imply that any algorithm for solving the first problem can be used to solve SET COVER via a Karp reduction.

    2. Re:NP-hard? by DogPhilosopher · · Score: 1

      Giving a shitty algorithm to solve a problem, and then saying that it would also solve SET COVER, does not imply that any algorithm for solving the first problem can be used to solve SET COVER via a Karp reduction.

      Indeed. See my posts above for a reference to a Karp reduction from VERTEX COLORING (for the automaton version). Previous hardness and non-approximability results for this and related problems were found by Gold (early 70s!), Angluin and Abe.

      Myabe also worth mentioning that not all NP-hard problems are created equal. The natural parametrization of SET COVER is W[2]-hard, but there are no decent re-parametrizations known for coloring that give it a place in the W-hierarchy. I know only of one trivial one.

      Coloring is still NP-hard when graphs are restricted to be 3-colorable, planar and having maximum degree 4. it seems to be a very hard problem in certain ways.

    3. Re:NP-hard? by DogPhilosopher · · Score: 1

      Sure, it's NP-hard, but it's also nonrecursive via Rice's theorem.

      That depends a bit on how the problem is formulated. When stated in full generality, ie "given an arbitrary UTM, decide whether it's a regex finder", it's obviously not decidable.

      However, the full expressive power of Turing machines may not be needed. For some classes of grammars/automata, the question whether something is a member of the class can be decided by an automaton, this property is known as automaticity and has origins in the field of descriptive complexity. See for example
      http://semukhin.name/autofeed2.pdf

      Also relevant, work on the enumeration of regexps, eg
      http://arxiv.org/abs/1204.4982

    4. Re:NP-hard? by DogPhilosopher · · Score: 1

      .Giving a shitty algorithm to solve a problem, and then saying that it would also solve SET COVER, does not imply that any algorithm for solving the first problem can be used to solve SET COVER via a Karp reduction.

      Btw, SET COVER is relevant for some learning problems. Constructing a fastest learner for finite identification of a finite class of finite languages (`preset learner') is equivalent to SET COVER, see page 6 of
      http://staff.science.uva.nl/~dickdj/submission_COLT10_NGDdJ.pdf

  23. Re:Regex this by Anonymous Coward · · Score: 0

    Yeah, fuck people who have time to do useless things, like regex golf, and posting to Slashdot, and..... wait... what?

  24. Re:Regex this by Anonymous Coward · · Score: 0

    It's called exercising our 'little grey cells'. Beats watching TV.

    And no, I didn't use two monitors. I used pencil and paper.

  25. Re:Regex this by BringsApples · · Score: 1

    Ok, I get the need for optimization of coding habits. But as far as I understand this article however, it's about solving the XKCD "problem" - how to write a formula that separates elected presidents from unelected presidents. If I'm wrong, then I'm sorry. I read a bit of the article, and determined that it was a shot at being nifty - trying to get a bunch of brogramers to 'stroke out the biggest cock, in the shortest amount of time' sorta thing.

    --
    Politics; n. : A religion whereby man is god.
  26. Unless Pyhon has changed recently. by Impy+the+Impiuos+Imp · · Score: 1

    Python code? I would expect Perl or similar if parsing was your emphasis, or Lisp or some AI language if AI was your emphasis, but not a standard block-structured language.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    1. Re:Unless Pyhon has changed recently. by bunratty · · Score: 3, Informative

      Python is very similar to Perl, and also has most of the characteristics that make Lisp a good language to program AI algorithms.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    2. Re:Unless Pyhon has changed recently. by maxwell+demon · · Score: 1

      This is about regexes, that is, hard to decipher code that looks like line noise. Obviously Perl is the only language that matches.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:Unless Pyhon has changed recently. by retchdog · · Score: 1

      Then you're a moron.

      He's trying to communicate his neat ideas with as many people as he can, not optimize the shit out of something (in execution time, or length of code, or whatever). Python has become the standard language for communicating with a broad base of people who are not idiots, but not necessarily "real programmers" either. This is a good thing: it has basically replaced pseudocode.

      --
      "They were pure niggers." – Noam Chomsky
    4. Re:Unless Pyhon has changed recently. by phantomfive · · Score: 2

      Usually easier to use the language you know, rather than learn a whole new language for a project.

      --
      "First they came for the slanderers and i said nothing."
    5. Re:Unless Pyhon has changed recently. by dotancohen · · Score: 1

      Additionally, Python accepts in-line comments in Regexes. That alone makes it an above-average choice.

      --
      It is dangerous to be right when the government is wrong.
    6. Re:Unless Pyhon has changed recently. by Anonymous Coward · · Score: 1

      So does Perl:

      my $re = qr/the
                  \s+
                  [\d]+(?:st|nd|rd|th)
                  \s+
                  occ?urrence # typo in historical data
                  \s+
                  of[ ]this[ ]data[ ]point/x;
      # note /x modifier; matches the same as
      my $re = qr/the\s+[\d]+(?:st|nd|rd|th)\s+occ?urrence\s+of this data point/;

    7. Re:Unless Pyhon has changed recently. by dotancohen · · Score: 1

      Very cool, thanks.

      --
      It is dangerous to be right when the government is wrong.
    8. Re:Unless Pyhon has changed recently. by Anonymous Coward · · Score: 0

      This is about regexes, that is, hard to decipher code that looks like line noise

      Any sufficiently compressed data is indistinguishable from noise.

  27. Re: Regex this by Anonymous Coward · · Score: 0

    Maybe if you and BringsApples didn't spend so much time posting to Slashdot, you would have more time to do something constructive and potentially educational. And if you didn't spend so much mental effort worrying about what others do to your stereotype and what you think you can accomplish by complaining about it on the internet, then maybe you could come up with something fun to do with your family that accomplishes the same thing, so you don't have to act like family time is mutually exclusive with free time.

  28. Please, no, don't filter all users out by Xaedalus · · Score: 4, Interesting

    As a junior cadet code monkey of a user, I come to /. precisely because I know my limitations when it comes to understanding tech, coding, and hard science. Despite the -1's, trolls, and certainty-addicted neckbeards (or, because of them) I've learned a lot about how intrinsically cool it is to code, the artistic side of coding, the wonders of science (and how it rivals religious experiences for appreciating one's place in the universe), and I've improved my debating skills on here. /. is one of the net benefits of my life, and I consider it a necessity. Chances are good, I'm not the only user who thinks that.

    --
    Here's to hot beer, cold women, and Glaswegian kisses for all.
    1. Re:Please, no, don't filter all users out by atomicxblue · · Score: 1

      I agree with you fully. Sure, you have to wade through some stories, but occasionally, you stumble upon a fascinating debate such as this. I'm interested in AI and the possibilities for the future it can bring.

  29. could make google more "googlely" by LifesABeach · · Score: 2

    sowing only web pages for delta airlines; and not delta faucets? that's "googlely."

  30. Re: Regex this by Anonymous Coward · · Score: 0

    and I just closed a multi-billion-dollar business deal over a round of regex golf at a very exclusive regex golf resort in the Bahamas.

  31. Obligatory XKCD by Anonymous Coward · · Score: 1

    Obligatory XKCD.

    Oh, never mind.

  32. Re:Regex this by hermitdev · · Score: 1

    There are sorts of "problems" that may seem a waste. When Einstein was pondering Relativity, I'm sure it would have been thought of in the same manner. The point is, sometimes great things arise from "trivial" pursuits. This could also be construed as a implementation of a search algorithm that can take a relatively broad search term and produce a more generalized search pattern, i.e. now you cannot specify give me presidential candidates A, B, C, D, E, F according to whether or not they won an election.

    This may seem trivial, but it is this sort of logic/AI that may lead to computers in the future answers questions like we see in Star Trek.

  33. Re:Regex this by Anonymous Coward · · Score: 0

    So why did you "go to school for coding" if you clearly don't like it?

  34. Re:Regex this by BringsApples · · Score: 1

    Well, I certainly didn't mean to attack anyone, I just didn't really know that my comments would hurt feelings. And perhaps you're right, maybe I don't understand why 'sorting a list in the quickest way possible, being made into a game' is useful. Can you enlighten me?

    --
    Politics; n. : A religion whereby man is god.
  35. Re:Obligatory XKCD by RussR42 · · Score: 2

    I believe this is what you were looking for.

  36. Re:Regex this by Pseudonym+Authority · · Score: 1

    "Coding" is what replaceable, uncreative idiots do. It's better to solve an interesting problem, for whatever definition of interesting is to the solver, than it to be a brainless drone making another fucking login page. Why don't you try to actually make something instead of letting the potential (that you probably don't have, to be honest) waste away? If you can't do that, then at least leave the actual programmers and computer scientist to practice the art while you die having accomplish nothing but having your code thrown in the trash a single support cycle after you get laid off.

  37. Re: Regex this by Anonymous Coward · · Score: 0

    I think all of IT have gone that way. It has become diluted by people who just want to solve their problem at hand as sloppily as possible and get home to watch TV.

  38. ^https?://([a-z0-9\-]+\.)*foo\.com(/|$) by raymorris · · Score: 4, Interesting

    For matching URLs from a domain, here's a regex we came up with that covers some special cases. Hopefully Slashdot doesn't mangle it too badly.

    https?://([a-z0-9\-]+\.)*foo\.com(/|$)

    That covers:
    https as well as http
    "subdomains" like maps.google.com as well as www.google.com and google.com
    It's not fooled by google.com.hacker.ru

    1. Re:^https?://([a-z0-9\-]+\.)*foo\.com(/|$) by foobar+bazbot · · Score: 1

      That may be troublesome... Note that while DNS comparisons are supposed to be performed case-insensitively, uppercase letters are valid characters.

      If your regex-based web filter doesn't canonicalize the domain name first and/or use a case-insensitive regex match, users may be able to circumvent the filter with http://www.foo.com/ (or even http://foo.com/).

      Of course, browsers these days do so much "helpful" manipulation on anything typed in the URL bar, I'm not sure you can slip a mixed-case domain through them -- they may just canonicalize it for you. But stuff like wget has no problem using the URL exactly as typed.

    2. Re:^https?://([a-z0-9\-]+\.)*foo\.com(/|$) by ledow · · Score: 1

      Yep, pretty much what I crafted. Their problem was that their ()'s lacked the power to express that it had to have something-dot, or nothing at all (specifically the "nothing at all" part, so foo.com wasn't caught by a regexp for *.foo.com

  39. Re:Regex this by BringsApples · · Score: 1

    Oh, solving problems. See I thought that this was just about playing some programing-related game, via shortening algorithms. Hell, if we're talking about solving problems, let's get to world hunger and water filtration. I'm working on the world hunger bit. You wanna work with me, or do the water filtration bit?

    --
    Politics; n. : A religion whereby man is god.
  40. ah the old 1 or 2 problem by deviated_prevert · · Score: 1
    //solution is simple

    if a_1=nopaper

    if a_2=paper

    //problem with this solution though is that it tends to be gender specific,

    //and that is the problem with two compared variables in a statement it can easily lead to a 3turd condition.

    --
    This message was not sent from an iPhone because Peter Sellers really was a deviated prevert without a dime for the call
  41. Re:Regex this by Pseudonym+Authority · · Score: 1
    No, I don't particularly find those interesting and don't really care all that much. Let me clarify what I meant some; by problem, I mean a something closer to a puzzle, something intellectually challenging. The problems you describe are not those, as the solutions to such are evident but lacking the logistic plan to implement them. That is no less challenging perhaps, but not in the same way. It is the difference between developing a cure for HIV and actually navigating your way through a suspicious population while avoiding the wrath of the local warlords to actually distribute it to those in need. The problem you suggest is of the nature of the latter. Good for you. But just because that's what you find to be the noblest of goals doesn't mean that the guys working on, say, solving abstract mathematical problems need to stop and direct all their attention to your pet causes.

    Anyway, let me direct your attention to the qualifier that I have attached that bit:

    for whatever definition of interesting is to the solver

  42. Is the problem actually NP-complete? by Shaterri · · Score: 3, Interesting

    My reading of Norvig's blog post is that he suggests his specific approach (stapling together short regexps with ORs) requires solving the NP-Complete Set Cover problem, but he doesn't actually say anything about whether the core problem (match everything in list A and nothing in list B) does; it's conceivable that e.g. some sort of dynamical programming approach could do the job more efficiently than Norvig's algorithm does. Does anyone know whether the root problem (to avoid having to do the optimization, just phrase it as 'is there a separating regexp for the sets A and B of length less than k?') is specifically known to be NP-complete?

  43. Re: Regex this by atomicxblue · · Score: 1

    At least your family game night will give the kiddos useful skills for the future!

  44. Re:Regex this by atomicxblue · · Score: 1

    It's probably not very useful for a simple problem such as this, but imagine the space station being in a situation where every second counts to protect the life of the crew, such as a rupture that vents oxygen. It may have a n-length list of commands it can run, each with certain percentage chance of working in a particular situation and may need to run a quick simulation to guess which one has the greatest chance of saving the life of the crew. It would probably use some combo of min-max as well as quicksorting to find the one command with the greatest probability of survival in a minimal amount of time.

  45. Re:Regex this by atomicxblue · · Score: 1

    Agree fully. At first I thought *-search (pick your flavor) was just an odd mathematical quirk when it was introduced, until I was told it is used in the GPS system you have in your car. You start to see how each incremental step leads to ever greater things.

  46. Re: by Anonymous Coward · · Score: 0

    Turing complete languages don't have to be non-regular either. The tape for a Turing machine is obviously regular. How a language is structured and what it's capable of doing are pretty unrelated.

  47. Re:Regex this by wonkey_monkey · · Score: 1

    Have you ever noticed how everyone else in the world is not you? You might find it enlightening to think about that.

    --
    systemd is Roko's Basilisk.
  48. I'm interested in the presidents vs losers regexp by Mr+Reaney · · Score: 1

    I might be missing something here, but the list of winners and the list of losers in US presidential elections both contain Richard Nixon. How can a regexp match ALL the winners and NONE of the losers in that case?

  49. Re: by Paradise+Pete · · Score: 1

    The tape for a Turing machine is obviously regular

    I bought a touring machine once, but it hasn't come back yet.

  50. I want this on my business cards by Kiaser+Zohsay · · Score: 1

    This started Peter Norvig, the well-known computer scientist, director of research at Google and wearer of brightly colored shirts, thinking about the problem.

    Now I got to get me some brightly colored shirts.

    --
    I am not your blowing wind, I am the lightning.
  51. Re:Regex this by BringsApples · · Score: 1

    No, I don't particularly find those interesting and don't really care all that much

    You don't eat or drink? I see your point, but how is coming up with better ways to feed the multitudes with limited land and resources not a puzzle? See, the thing that I cannot stand about this article is that it is going to attract the attention of some really smart people (maybe you are one of these) and take your time, where if you would direct your gigantic brain and all of it's magnificent abilities (and no, I'm not being sarcastic here) to problems that aim to make life better for all, then your own life would also be made better. If you cannot understand how that works (life better for me is life better for you), then make it into a puzzle, or whatever you have to do to make it interesting for you. Otherwise, if you don't understand how life functions, then your intelligence is temporary.

    --
    Politics; n. : A religion whereby man is god.
  52. Re:Regex this by Anonymous Coward · · Score: 0

    Do you play any games at all? Video, board, general games with your kids? How about sports? Baseball, basketball? Any of them. I'm sure there is something you do in your "life-time" that could be considered "not useful".

  53. Re:Regex this by bob_super · · Score: 1

    Because people who are encouraged to create new solutions for fun may comes up with something that someone getting whipped into coding may not.
    More solutions, more creative ones, can mean the Terminators will be 20% faster at face-recognition and may not be so inaccurate at shooting.
    Kids learn well by playing games, machines may learn to learn faster out of humans playing games.

  54. Re: Regex this by Anonymous Coward · · Score: 0

    It is the IT vs CS dilemma.

  55. some answers for Regex Golf game by bzipitidoo · · Score: 1

    Aren't there any solutions posted anywhere? This is a good exercise if you want to spend time puzzling out the more arcane features of regular expression evaluators. Some of these aren't fully working solutions.

    • Plain strings: f.o
    • Anchors: k$
    • Ranges: [a-f]{4}
    • Backrefs: (...).*\1
    • Abba: .u|il[^l]|(.)[lm]\1|^[pv]|z|[mr]it
    • A man, a plan: (.)(.)[^r]?\2\1
    • Prime: ^xxx?$|^x{5}(xx(xxxx(xx(xxxx(xx(xxxx(x{6}(xx)?)?)?)?)?)?)?)?$|x{37}
    • Four: (.)(.\1){3}
    • Order: ^[^o].{1,5}$
    • Triples: 0.$
    • Glob: ^[blp]|^.[foprv]|^re|^mi|^ch|^.il|^.ta|rm$
    • Balance: ^<.*>$
    • Powers: ^x(x(xx(xxxx(x{8}(x{16}(x{32}(x{64})?(x{128}(x{256}(x{512})?)?)?)?)?)?)?)?)?$
    • Long Count:
    • Long Count v2:
    • Aplhabetical: ^a

    Abba is better as a negation of (.)(.)\2\1 , but I couldn't quickly figure out how to do it. Triples looks like it's supposed to make use of an old math trick to tell if a number is divisible by 3. Just add all the digits together to get another number. Repeat on the new number until a single digit is left, and if that digit is 0,3,6, or 9, then the number is divisble by 3. Will take me too much time to figure out how to program that into a regular expression. I cheated on Glob. Obviously it's supposed to sort the full matches from the partial matches. Balance is one of those exercises in being contrary, making a tool do what everyone says it's not meant to do. Just copied the matching string for the 2 Long count puzzles. 3012 points.

    --
    Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    1. Re:some answers for Regex Golf game by Anonymous Coward · · Score: 0

      My score: 2446

      I tried to use some (.)(.)\2\1 principle on Abba: ^(.)(.)(.)((?!(\2\1|\3)).)*$, but it doesn't work great.

      Got an improvement for Powers: ^(((((((((xx?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$, but I'm still hardcoding some references. I feel there is a much more elegant solution.

      I found a SO post for primes, but I can't make it work :/ http://stackoverflow.com/questions/3296050/how-does-this-regex-find-primes

    2. Re:some answers for Regex Golf game by Anonymous Coward · · Score: 0

      Here are my solutions. "Triples" took me longer than all the others combined!

      Plain strings (207): foo
      Anchors (208): k$
      Ranges (202): ^[a-f]*$
      Backrefs (201): (...).*\1
      Abba (193): ^(?!.*(.)(.)\2\1)
      A man, a plan (168): ^(.?)(.?)(.?).?\3\2\1$
      Prime (286): ^(?!(xx+)\1+$)
      Four (199): (.)(.\1){3}
      Order (156): ^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$
      Triples (523): ^([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))*$

      Filter error: Please use fewer 'junk' characters.

    3. Re:some answers for Regex Golf game by Anonymous Coward · · Score: 0

      ...continuation

      Glob (288) -[omitted due to filter error]-
      Balance (283): -[omitted as it is garbled by Slashdot]-
      Powers (93): ^(?!(x(xx)+)\1*$)
      Long count (227): ^((?=[01]{4} )(\d*)0\d* (?=\2[1])){15}1111$
      Long count v2 (227): ^((?=[01]{4} )(\d*)0\d* (?=\2[1])){15}1111$
      Alphabetical (194): ^(?!.*\b(\w*)[enrst].*\b\1[a])(?!.*\b(\w*)[nrst].*\b\2[ae])(?!.*\b(\w*)[rst].*\b\3[aen])(?!.*\b(\w*)[st].*\b\4[aenr])(?!.*\b(\w*)[t].*\b\5[aenrs])

      Score: 3655

  56. spoiler: r.e. presidents vs losers regexp by Fubari · · Score: 1
    spoiler alert: if you were to read TFA you'd find a link to the actual blog 'norvig.com' that is pretty interesting. In short, they handle the "ambiguity" of people that are both Winners+Losers ignoring any Winner's losses:

    From Norvig's blog:

    To avoid a contradiction and achieve Randall's intent, eliminate all winners from the set of losers:
    In [293]: losers = losers - winners

    The code on Norvig's blog is pretty interesting.
    This one was worth my coffee break time today.

    I might be missing something here, but the list of winners and the list of losers in US presidential elections both contain Richard Nixon. How can a regexp match ALL the winners and NONE of the losers in that case?

  57. Re:Regex this by Pseudonym+Authority · · Score: 1

    The methods to distribute food and to purify water are largely solved. If it weren't, then the developed world would not be able to support such large populations. It's a distribution and education problem in areas so afflicted. If you want, you can probably buy some tractors and water filters right now and have them shipped to poor villages. The techniques needed to make fertilizer or purify water by boiling are well known. What is lacking is the industrial capability to manufacture enough and the supporting infrastructure, and the unwillingness of foreign entities to spend money on it and hostility by groups that benefit from the status quo (be it the warlords that will seize the goods, the arms dealers that want the conflict, or anyone else who will accept bloodshed and misery in exchange for money). A political issue that no amount of "world changing" technology is going to fix. Not interesting.

    And besides that, tinkering around with regex to do neat things, while clever, doesn't mean that that cleverness will transfer to other tasks. Would you trust your rocket scientist to give you brain surgery just because both tasks are considered intelligently prestigious enough to be in a popular idiom proclaiming their brilliance?

  58. Re:Regex this by BringsApples · · Score: 1

    If you want, you can probably buy some tractors and water filters right now and have them shipped to poor villages.

    The length at which I'd like to point out the intricacies of growing food and filtering water in areas of the world that are heavily populated and dry, far exceed my want to type, and I'm sure your want to read. Let's just say that sending over a tractor and a filter would be as close to solving the problem that they're having as sending over a scalpel and air-mask to your house, when you need heart surgery.

    The techniques needed to make fertilizer or purify water by boiling are well known.

    Are they? If they're so well-known, let's hear your take on it. How do you make fertilizer? How do you 'boil' 50,000 gallons of water? Because if you ever need food or water (God forbid) to the point that these people need it, then that's going to be your main objective of the day, not regex.

    And let me remind you that if we cannot pull together as a people, then what we see in the areas where dictators rule food/water supply, we'll see here one day as well. $85 BILLION are printed each month. One day, someone is going to have to pay that back. It may not be you or I, and it may not be our kids, but one day the whole idea of simply going to the store and buying whatever you need is either going to be to expensive, or not an option. Humanity will need people like you (and whatever you teach your kids) to solve problems that we'll face.

    ...tinkering around with regex to do neat things, while clever, doesn't mean that that cleverness will transfer to other tasks.

    Exactly. That's why it's important to train your way of thinking to be in line with the rest of the natural world, and not some way to do some man-made confined way of thinking. This is exactly why I think it's a waste of great human potential.

    --
    Politics; n. : A religion whereby man is god.
  59. Yeah, NC for case insensitive, separate from regex by raymorris · · Score: 1

    That's true, the regex is used with the non-case-sensitive flag. We use it with mod_rewrite, so it would be[NC] something like:
    RewriteCond %{HTTP_REFERER} https?://([a-z0-9\-]+\.)*foo\.com(/|$) [NC]

    In for example Perl the regex would be used within //i (or another delimiter for clarity).

  60. Unlisted challenge by FreedomFirstThenPeac · · Score: 1
    Item 17 was missing from the list of challenges.

    It reads

    • Find if there are any expressions of the form
      • gsub("([3-9][ 0-9]*)","n","[[0-9]*[1-9]^^n + [0-9]*[1-9]^^n = [0-9]*[1-9]^^n")

    that are TRUE. I have an elegant solution, but it requires Perl and more space than Slashdot allows for its comments.

    --
    "There is no god but allah" - well, they got it half right.