Slashdot Mirror


Google Open-Sources SyntaxNet Natural-Language Understanding Library, Parsey McParseface Training Model

Google announced on Thursday that it is open sourcing its new language parsing model called SyntaxNet. It's a piece of natural-language understanding software, Google says, that you can use automatically parse sentences, as part of its TensorFlow open source machine learning library. The company also announced that it is releasing something called Parsey McParseface (Google has a sense of humor), which is a pre-trained model for parsing English-language text. Nate Swanner of The Next Web, attempts to explain it: Combining machine learning and search techniques, Parsey McParseface is 94 percent accurate, according to Google. It also leans on SyntaxNet's neural-network framework for analyzing the linguistic structure of a sentence or statement, which parses the functional role of each word in a sentence. If you're confused, here's the short version: Parsey and SyntaxNet are basically like five year old humans who are learning the nuances of language. In Google's simple example above, 'saw' is the root word (verb) for the sentence, while 'Alice' and 'Bob' are subjects (nouns). Parsey's scope can get a bit broader, too.

56 comments

  1. what's the point? by Anonymous Coward · · Score: 0

    We need useful AI, not stupid AI that acts like a five year old. Why is this useful to anyone at all?

    1. Re:what's the point? by Anonymous Coward · · Score: 0

      Bob fucks Alice in the ass.

    2. Re: what's the point? by Anonymous Coward · · Score: 0

      Bob fails because he's the Giver and Alice isn't the Goatse Man.

    3. Re:what's the point? by Anonymous Coward · · Score: 0

      Don't worry, scrote. There are plenty of 'tards out there living really kick-ass lives. My first wife was 'tarded. She's a pilot now.

    4. Re:what's the point? by Anonymous Coward · · Score: 1

      Knowledge workers process natural language inquiries and recall from the 0.01% of human knowledge they have managed to memorize the relevant details to solve the problem, identify where to look for more information, and/or refer the inquiring individual to the correct resource they need to solve the problem themselves.

      A computer capable of parsing natural language inquiries can construct an appropriate query of all publicly accessible digitized human knowledge and analyze the contents of that knowledge to identify what information is relevant to the inquiry.

      Processing the relevant information in to a solution will still require a capability to generate useful models from that understanding that allow it to identify optimal solutions. Preprocessing and structuring the data for analysis is one of the largest obstacles preventing forward-progress towards that goal.

      If the importance of Natural Language Processing is lost on you, that's a reflection of your own ignorance more than it's a reflection of the value of the achievement.

      I suppose you thought the important use-case to measure this achievement was beating the Turing test? That's the five-year old you're referencing right?

      What if this entire post was written by a Syntaxnet based AI? When the comments sections of websites are flooded with public opinion shaping bots from the NRA, Brady Campaign, and presidential campaigns: will you care then?

    5. Re: what's the point? by Anonymous Coward · · Score: 0

      Nice summary.

  2. Fail by Anonymous Coward · · Score: 1

    It's a piece of natural-language understanding software, Google says, that you can use automatically parse sentences, as part of its TensorFlow open source machine learning library.

    YOU CAN USE AUTOMATICALLY PARSE SENTENCES

    1. Re:Fail by Anonymous Coward · · Score: 0

      YOU CAN USE AUTOMATICALLY PARSE SENTENCES

      The real question is whether SyntaxNet can parse that sentence!

  3. SubjectsSuck by aardvarkjoe · · Score: 1

    So, can Parsey McParseface make sense of what manishs posts? Because I generally can't. I assume that the example sentence from the summary probably came from the article, but for some reason the "editor" didn't think to read his summary to make sure that it actually made sense out of context.

    --

    How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
    1. Re:SubjectsSuck by HiThere · · Score: 1

      The claim was parse, not make sense of. And it's not clear that it can parse all sentences. Some sentences can't be unambiguously parsed even when you know the context and each included word.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    2. Re:SubjectsSuck by Aighearach · · Score: 2

      It all read just fine to me. The only mistake I noticed was that

      natural-language understanding software

      should have been

      natural-language-understanding software

      since it is the software doing the understanding, not the language. The quote itself is clear and concise. If you didn't understand it that probably just means you lack the technical vocabulary to even make use of the tool.

    3. Re: SubjectsSuck by Anonymous Coward · · Score: 0

      Slashdot summaries are written by these bots. Manish isn't a human: it's a fucking manish mcmanface.

  4. What we really need... by Anonymous Coward · · Score: 0

    What we really need is software to automatically proofread and edit slashdot submissions.

  5. Great by Anonymous Coward · · Score: 0

    Another memememe mcmemeface to suffer through

    1. Re:Great by Anonymous Coward · · Score: 0

      Drew Carry was a man ahead of his time--his nemesis was Mimi.

  6. google tries to be humorous by sittingnut · · Score: 0

    "Parsey McParseface (Google has a sense of humor)"
    more like dour corp peons at google tries hard, very hard, to appear humorous.
    even tay had better humor

    1. Re: google tries to be humorous by Anonymous Coward · · Score: 0

      More like someone at Google decided latching onto a recent meme was a strategic move to synergise more brand awareness in a key demographic.

      Google is a marketing company. They don't take a dump unless it's in an effort to push their brand.

    2. Re:google tries to be humorous by Anonymous Coward · · Score: 0

      If you're familiar with Boaty McBoatface, the it becomes kinda funny.

    3. Re:google tries to be humorous by sittingnut · · Score: 1

      that is precisely why it is not funny. google peons are just parroting something funny to try to be funny.

  7. Prase this, McParseface by mythosaz · · Score: 1

    James while John had had had had had had had had had had had a better effect on the teacher.

    1. Re:Prase this, McParseface by mythosaz · · Score: 4, Interesting

      ...and while McParseface is at it, he can chew on:

      "Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips?"

    2. Re:Prase this, McParseface by Aighearach · · Score: 1

      It should be easy enough to set it up to parse that sort of thing as "blah blah blah" and leave it at that. ;)

      I'd also want anything more than triple negated to equal "blah blah blah."

    3. Re:Prase this, McParseface by Anonymous Coward · · Score: 0

      Likely an automated parser can make sense of this more quickly and accurately than you or I can. It's just a matter of keeping track of the recursion. Something like "Time flies like an arrow; fruit flies like a banana." would be harder as it requires semantic knowledge rather than just syntax.

    4. Re:Prase this, McParseface by phantomfive · · Score: 1

      Stanford parser seems to come up with something reasonable, but I have no idea what that sentence mans.

      --
      "First they came for the slanderers and i said nothing."
    5. Re:Prase this, McParseface by roman_mir · · Score: 0

      Most people wouldn't bother to try and parse tour sentence. I think a program would actually do a better job at it.

    6. Re:Prase this, McParseface by ChunderDownunder · · Score: 1

      That's quite a stutter. :)

    7. Re:Prase this, McParseface by Anonymous Coward · · Score: 0

      That's not a grammatical sentence. Interpunction is part of grammar, and some interpunction is not optional.

    8. Re:Prase this, McParseface by Anonymous Coward · · Score: 0

      No, it wouldn't have been clearer, as these words are already marked by being capitalized. What would have been clearer would have been to place commata after the first And and after Chips.

    9. Re:Prase this, McParseface by Anonymous Coward · · Score: 0

      ...and while McParseface is at it, he can chew on:

      "Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips?"

      I'm pretty sure it can parse that about better than the average Slashdot user.

  8. WELCOME OVERLORDS! HAW HAW HAW!! by MobileTatsu-NJG · · Score: 1

    The company also announced that it is releasing something called Parsey McParseface (Google has a sense of humor)..

    If by 'sense of humor' you mean 'a repeat of something that was humorous a while ago under a different context'.

    --

    "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    1. Re:WELCOME OVERLORDS! HAW HAW HAW!! by Aighearach · · Score: 1

      You parsed it wrong. "Sense of humor" here does not indicate that the words are funny; it indicates that the words are goofy or foolish, and that Google was willing to let a thing be named that way.

      I recommend checking a dictionary. There are about a dozen meanings of the word humor, and probably half of them cover this particular usage. One advantage of a computer parser is that it is unlikely to reject a valid statement merely because it didn't consider all of the known patterns.

    2. Re:WELCOME OVERLORDS! HAW HAW HAW!! by Anonymous Coward · · Score: 0

      I think the point was that repetition of a joke de-humorizes it.

    3. Re:WELCOME OVERLORDS! HAW HAW HAW!! by BronsCon · · Score: 1

      'a repeat of something that was humorous a while ago under a different context'

      like your sig?

      I'm here all night, try the veal.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  9. Time flies like an arrow. by jeffb+(2.718) · · Score: 3, Interesting

    Fruit flies like a banana.

    1. Re:Time flies like an arrow. by ChunderDownunder · · Score: 1

      What's a "time fly"?

    2. Re:Time flies like an arrow. by Anonymous Coward · · Score: 0

      I assume you are joking, but if not: "time flies" is noun-verb, unlike "fruit flies", which is a compound noun (and plural).

    3. Re:Time flies like an arrow. by Anonymous Coward · · Score: 0

      Perhaps something like this.

    4. Re:Time flies like an arrow. by cellocgw · · Score: 1

      What's a "time fly"?

      My guess is it's what happens when you cross Dr. Who with The Fly.

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
  10. lame by Anonymous Coward · · Score: 0

    > (Google has a sense of humor)

    No. They really don't.

    Since google basically owns the internet, it stands to reason that the company's intelligence level equals that of the hive. XXXsey McXXXface was *never* funny, in any context.

  11. Buffalo buffalo buffalo by raymorris · · Score: 1

    Bison from Buffalo, New York, are known to bully other Buffalo bison, who in turn bully (buffalo) other New York bison. In other words:

    Buffalo buffalo buffalo buffalo buffalo buffalo buffalo.

  12. Permutaton of all parsable sentences? by thinkwaitfast · · Score: 1
    How large is the permutation of all parsable sentences?

    A concise version of the Library of Babel expressing every idea if a language?

    1. Re:Permutaton of all parsable sentences? by jeffb+(2.718) · · Score: 1

      The set of all parsable sentences is trivially unbounded, at least in English.

      A sentence can go on, {and on,}* and on.

    2. Re:Permutaton of all parsable sentences? by thinkwaitfast · · Score: 1
      I once started to write some software to analyze books and find all sentence structure on a book, but got too lazy and quit.lso could not find any data sets.

      While all parsable sentences is unbounded, the ones limted to human understanding are.

    3. Re:Permutaton of all parsable sentences? by jeffb+(2.718) · · Score: 1

      I can't see why they would be. More rigorously, I don't think you can establish a bound on the length of sentences that are humanly understandable. The sentences generated by my little example are all humanly understandable, for example, even though they're of unbounded length.

  13. Shite by Hognoxious · · Score: 1

    Parsey McParseface (Google has a sense of humor)

    Not really, because Xy McXface is not funny for any value of X.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:Shite by Anonymous Coward · · Score: 0

      Not really, because Xy McXface is not funny for any value of X.

      A-MEN. I've got three-year-olds in this house who come up with better jokes than this infantile drivel. It's a tired formula that was never funny at all.

      I weep for the future of comedy.

    2. Re:Shite by Tablizer · · Score: 1

      ^ Grumpy McGrumpface

  14. Rule-based still easily best by Jezral · · Score: 2

    94% syntax is definitely good, for a machine learning parser. Now if you were to come to the land of rule-based parsers, 94% is the norm.

    Google loves machine learning, and it's easy to see why. That's how they made their whole stack. They have the huge amounts of data to train on, and the hardware to do so. It's so seductive to just throw a mathematical model at huge amounts of data and let it run for a few weeks.

    Rule-based systems don't need any data to work with - they just need a computational linguist to spend a year writing down the few thousand rules. But the end result is vastly better, fully debuggable, easily updatable, understandable, and domain independent. That last bit is really important. A system trained for legalese won't work on newspapers, but a rule-based system usually works equally well for all domains.

    In 2006, VISL had a rule-based parser doing 96% syntax for Spanish (PDF) - our other parsers are also in that range, and naturally improved since then. Google is hopelessly behind the state of the art.

    1. Re:Rule-based still easily best by TFlan91 · · Score: 1

      You kinda alluded to the reason yourself...

      > Rule-based systems don't need any data to work with - they just need a computational linguist to spend a year writing down the few thousand rules

      which seams much more expensive than

      > ... just throw a mathematical model at huge amounts of data and let it run for a few weeks.

      but can now yield nearly equal results. "Machine Learning" sounds cooler than a bunch of if statements too

    2. Re:Rule-based still easily best by Jezral · · Score: 2

      which seams much more expensive than

      It'd seem that way, but it's really not if you factor in the whole chain.

      Machine learning needs high quality annotated treebanks to train from. Creating those treebanks takes many many years. It is newsworthy when a new treebank of a mere 50k words is published. Add to that the fact that each treebank likely uses different annotations, and you need to adjust your machine learner for that, or add a filter. Plus each treebank is for a specific domain, so your finished parser is domain-specific. If you want to work with other kinds of text, you need to produce a treebank for that domain and then train on it.

      Thus, the bulk work is in annotation and mathematical models. Google skipped the step of creating a treebank, and instead use available ones. There aren't any usable treebanks for smaller languages, making the whole machine learning endeavor useless for all but the large languages.

      Rule-based parsers are the opposite of that. You can put the same amount of man hours into creating rules as you otherwise would a treebank plus mathematical model, but you can do so on any old laptop with almost zero data to work from. You just need to know the language. A parser produced in this way is not domain specific, but can be easily specialized for a domain if needed. And a rule-based parser can be used as a bootstrap engine for creating high quality treebanks, because the rules are upwards 99% accurate, meaning humans only need to put a fraction of work on top of it.

      And as I wrote, rules are debuggable. You can figure out exactly why a word was misanalyzed, and fix it. Machine learning can't do that. The edit-compile-test loop of machine learning is in weeks or hours - with rules it's in minutes or seconds.

    3. Re:Rule-based still easily best by Anonymous Coward · · Score: 0

      Google is hopelessly behind the state of the art.

      Who said they're giving away their best stuff?

    4. Re:Rule-based still easily best by Jezral · · Score: 1

      Who said they're giving away their best stuff?

      The nature of machine learning does. All they're giving away is an algorithm and a system trained using that algorithm. Linguistic machine learning is a field where even a 0.5% improvement takes years to get and is worth a paper. So even if they aren't giving away their top algorithm, their best one can't be much better.

    5. Re:Rule-based still easily best by mcswell · · Score: 1

      I have not read the original article, so take my comments with some grains of salt.

      But speaking as one who once wrote a syntactic grammar for a parser of English (still in use by a large manufacturer 30 years later, albeit in modified form), the problem with rule-based grammars that lack any statistical weights is that they come up with an unbelievably large number of parses for many real-world sentences. The problem is then to find which of those parses is the correct one, and that's what statistical weights (or some other kind of heuristic) allow you to do. Naturally the weights aren't perfect; as I see it, they're substituting for a real semantic and pragmatic analysis that would allow you to rule out nonsensical parses (e.g. the parse of "time flies like an arrow" that would give a semantic representation corresponding to the normal sense of "fruit flies like a banana"). The weights are lexically driven, which means they tend to be more domain-specific than the actual rules of a rule-based grammar.

      It's therefore hard to compare a rule-based parser with a statistical parser learned from a treebank, unless you specify whether the rule-based parser is being graded for whether the "correct" parse appears *among* its many parses, or it's being graded for having the correct parse as its one-best parse. There's also a ceiling for measured performance, which has to do with the inter-annotator agreement in the treebank that you're using for testing (and by the fact that some sentences are ambiguous even in context).

      Apropos of this, the VISL article has a caveat at the end: "...our evaluation was less rigid than optimally desirable, since it did not use multiple annotators and manual revision was performed on top of an automatic analysis, potentially creating a parser-friendly bias in ambiguous cases." So the 96% accuracy claim is suspect, not to mention that a comparison of the Google system is already difficult because Spanish =/= English. (Spanish has more morphology on verbs, it's pro-drop, it has relatively free word order compared to English,...)

      So I don't believe you can say that "Google is hopelessly behind the state of the art."

    6. Re:Rule-based still easily best by Jezral · · Score: 1

      ...the problem with rule-based grammars that lack any statistical weights is that they come up with an unbelievably large number of parses for many real-world sentences.

      Generative grammars suffer from that problem and scales very poorly, and may indeed be impractical to use for real world text. Our constraint grammars and finite-state analysers do not have that problem. With CG, we inject all the possible ambiguity into the very first analysis phase, then use contextual constraints to whittle them down, where context is the whole sentence or even multiple sentences. This means performance scales linearly with number of rules.

      So the 96% accuracy claim is suspect, not to mention that a comparison of the Google system is already difficult because Spanish =/= English. (Spanish has more morphology on verbs, it's pro-drop, it has relatively free word order compared to English,...)

      The paper is for Spanish, because that's what I could find. Our other parsers, including English, are also at the 96% or better stage, but because it's mindbogglingly boring to do a formal evaluation, we don't have up-to-date numbers.

      So I don't believe you can say that "Google is hopelessly behind the state of the art."

      Given that we had 96% in 2006, 10 years ago, and Google only now has reached 94% (90% for other domains), I feel confident in saying Google is very far behind.

  15. Lawyers For Horsey McHorseface Will Be In Touch by CycleFreak · · Score: 1

    A two-year-old gelding destined to race in Australia has been saddled with the name Horsey McHorseface. (pun intended by editors)

    http://www.bbc.com/news/world-...

  16. What does 94% mean by Anonymous Coward · · Score: 0

    94% of all sentences are parsed correctly, or 94% of all words?