Slashdot Mirror


Google Trains AI To Write Wikipedia Articles (theregister.co.uk)

The Register: A team within Google Brain -- the web giant's crack machine-learning research lab -- has taught software to generate Wikipedia-style articles by summarizing information on web pages... to varying degrees of success. As we all know, the internet is a never ending pile of articles, social media posts, memes, joy, hate, and blogs. It's impossible to read and keep up with everything. Using AI to tell pictures of dogs and cats apart is cute and all, but if such computers could condense information down into useful snippets, that would be really be handy. It's not easy, though. A paper, out last month and just accepted for this year's International Conference on Learning Representations (ICLR) in April, describes just how difficult text summarization really is. A few companies have had a crack at it. Salesforce trained a recurrent neural network with reinforcement learning to take information and retell it in a nutshell, and the results weren't bad.

26 of 59 comments (clear)

  1. Obligatory by darkain · · Score: 5, Funny

    Obligatory XKCD Reference: https://xkcd.com/810/

    1. Re:Obligatory by shanen · · Score: 1

      I think you should have said more to earn the click-through, but he's a sharp cookie (and even answers his email in helpful and constructive ways), so you got my click. But you wouldn't have gotten my mod point, if'n I ever got one to give.

      In his ever insightful way, he implicitly hit on all three of the applications in my initial (and longer) comment on this story.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  2. Turf Wars by Frosty+Piss · · Score: 3, Interesting

    It might be fun to watch the Google Wikipedia AI Bot get into "turf wars" with existing Wikipedia Bots...

    --
    If you want news from today, you have to come back tomorrow.
  3. Yeah, but how about the meta pages by russotto · · Score: 2

    Can this bot win edit wars, get Wikipedia administrators to side with it, drive n00bs off its pages? Without that, it's not very useful on Wikipedia itself.

  4. Can't do causal and counterfactual reasoning by Visarga · · Score: 3, Insightful

    Such models have no common sense yet - can't tell if "the use of the umbrella causes the rain or the other way around". They can't think like us, they just copy text and try to hit all the sub-topics with naturally sounding language based on the source material. It's more similar to Google translator than a human Wikipedia editor.

    1. Re:Can't do causal and counterfactual reasoning by AmiMoJo · · Score: 1

      Sounds perfect for Wikipedia. Research and logic are now allowed, all that matters is finding a reliable source that says something and summarising it. The only real skills required are writing summaries and defending the reliability of your source on the talk page.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    2. Re:Can't do causal and counterfactual reasoning by sg_oneill · · Score: 1

      Such models have no common sense yet - can't tell if "the use of the umbrella causes the rain or the other way around". They can't think like us, they just copy text and try to hit all the sub-topics with naturally sounding language based on the source material. It's more similar to Google translator than a human Wikipedia editor.

      Hmm. Don't be so sure. There is a certain sense that embodiment, being in a body, is a necessary part of familiar intelligence. Humans are to some extent the way we are because we are lugging a very needy meat machine around, but its also a meat machine that provides us with lots of useful "common sense", that the rain didnt come from the umbrella, for instance. But even without it, its basic induction that because it also rains when there isn't an umbrella, the umbrella isnt the important correlation. And thats something AI should in theory handle trivially.

      The only thing it wont handle so obviously, is the embodiment parts. What it actually means to be human. What the qualia of hunger is, what it feels like to really get unhinged over sexual attraction, the joy of sleep, and the misery of sickness. I mean it could fake it, but its going to struggle with it. Then again, I read stories about AI's being better at reading human emotions than humans, and I guess we're in unknown country with this science.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    3. Re:Can't do causal and counterfactual reasoning by HiThere · · Score: 1

      It's not that direct, and I see no reason that a recurrent network couldn't learn "common sense". (Well, at least as well as people can.) But if you need to include all the information involved in acquiring common sense, then you've radically increased the data requirements, including lots of time-series data sets, etc. The cheap way to do this is probably to embody it in a body with lots of sensors. Now one source of this information would be a fleet of automated cars...

      The problem is that this depends on storage capacities continuing to increase at the same rate they have been, and processor capabilities doing the same. (OTOH, for neural nets you don't need more capable CPUs, as distributed nets work quite well, and solve a lot of the cooling problems.)

      So it's doable. Whether it will be done is another question. It's generally cheaper to use the lowest amount of computing power that's needed to do the job. (Generally. Cellphones are an excellent counterexample, however.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  5. Giant crack machine by robotvoice · · Score: 2

    Great. Just what we need. A trained monkey that summarizes the summarizers.

    According the the article, "The generated sentences are taken from the earlier extraction phase and aren’t built from scratch, which explains why the structure is pretty repetitive and stiff."

    Mohammad Saleh, co-author of the paper and a software engineer in Google AI’s team, told The Register: “The extraction phase is a bottleneck that determines which parts of the input will be fed to the abstraction stage. Ideally, we would like to pass all the input from reference documents. “Designing models and hardware that can support longer input sequences is currently an active area of research that can alleviate these limitations. We are still a very long way off from effective text summarization or generation. And while the Google Brain project is rather interesting, it would probably be unwise to use a system like this to automatically generate Wikipedia entries. For now, anyway.

    Also, since it relies on the popularity of the first ten websites on the internet for any particular topic, if those sites aren’t particularly credible, the resulting handiwork probably won’t be very accurate either.

    My faux outrage is entirely synthetic.

    1. Re:Giant crack machine by bursch-X · · Score: 2

      A trained monkey that summarizes the summarizers

      Wikipedia editors summarized.

      Also, since it relies on the popularity of the first ten websites on the internet for any particular topic, if those sites aren’t particularly credible, the resulting handiwork probably won’t be very accurate either.

      And since Google essentially has quite some influence on which sites go there... Here we go — Google's very own reality distortion field.

      --
      There are two rules for success:
      1. Never tell everything you know.
  6. What else did you expect from the EVIL monster? by shanen · · Score: 1

    Insofar as the google understands that knowledge is power, I'm only surprised that they decided to show their hand. Maybe Wikipedia is less naive and less harmless than I thought? The google perceives an actual threat to their Gawd Profit?

    I've actually been considering this branch of technology in terms of specific applications, such as (1) Writing assistance to help people tell their hidden stories (in interesting ways, of course), (2) Email analysis for asymmetric celebrity email systems (as a dual of the spam problem), or (3) Aggregation of EPR (Earned Public Reputation) for such places as Wikipedia (and Slashdot), just to focus on the three that I keep banging my head on. Automatic digesting of knowledge for such purposes as Wikipedia articles is actually a more pragmatic branch with high relevance to the corporate cancers as they seek the elimination of their most serious cost: Paying the salaries of all those pesky human beings. Much better to harvest their knowledge and flush them away.

    I better confess that yes, I do have a personal axe to grind, having been flushed by one of those large cancers. I still think I have some mental capacity to work, and even a desire to do so, but they can always spot my age with one glance at my resume. Won't even give me a chance to underbid the young whippersnappers. If it ain't the question of how well I've kept up with the technologies, then it's the belief the youngsters have fewer bad habits to unlearn plus the life expectation thing...

    Ha ha! Last laugh's on them. The google can't harvest my abundant mistakes to learn anything from them. They were all erased before I turned in my last corporate computer. (Actually, most of my corporate career was fixing OTHER people's mistakes. With my terrible attitude, I was a natural for the work.)

    Almost forgot one more laugh. The problem faced by the corporate cancers is fundamentally unsolvable. There is NO largest profit that will solve their desperate need for the infinitely large profit number. There's only the threat of slumping next quarter.

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  7. The fourth paragraph by raymorris · · Score: 1, Insightful

    Most articles I find on the net follow a pretty consistent pattern, using one of two variations on that pattern:

    How To Foo a Fizz

    Fizz s very popular these days blah blah. First paragraph says nothing useful at all.

    Fizz is good for blah blah. Second paragraph also pointless.

    Sometimes it helps to Foo your Fizz. Some people like to Foo it because blah blah blah.

    You can Foo your Fizz by:
    Clicking the tiny menu at the bottom
    Choose Preferences
    Select "Foo"

    Now your Fizz is Foo and blah blah blah.

    Share this on Facebook. On Twitter. On Google Plus. MySpace. Yourspace. Farmers only . Black people meet. Stupid people meet.

    Pretty standard pattern. The first two or three paragraphs are pointless. Sometimes they forget to actually tell you how to do it, and ONLY have the fluff. That's really annoying.

    The "human interest" version is similar:

    How to Close a Resume Cover Letter

    Debbie Wood, a mother of two from Englewood, Colorado was driving home in her blue Mustang when she stopped for some fries. After eating them, with ketchup, she got a call saying she was fired. Blah blah blah.

    Blah blah blah about Debbie.

    Debbie worked at Poor Writing Inc for six years, starting out as an eraser. Blah blah blah.

    Debbie wrote "I'm looking forward to hearing from you" at the end of her cover letter. It worked great.

    Debbie now works at blah blah blah. She enjoys blah at her blah job blah blah blah.

    Share this on Facebook. On Twitter. On Google Plus. MySpace. Yourspace. Farmers only . Black people meet. Stupid people meet.

    Pretty much the entire useful part of the story is the fourth paragraph.

    1. Re:The fourth paragraph by Tablizer · · Score: 1

      I want my foo fizzed too, where do I sign up?

    2. Re:The fourth paragraph by mrbester · · Score: 1

      This was about fooing your fizz, not fizzing your foo. Doing the latter results in the unintentional bar side effect and you'll need to baz it back and start again.

      --
      "Wait. Something's happening. It's opening up! My God, it's full of apricots!"
    3. Re:The fourth paragraph by tender-matser · · Score: 1

      I wish they would flag bot-written pages accordingly, so I could filter them out in searches without having to fetch them first.

      I've tried to build an interface where a flag is drawn on a map whereever there's a wikipedia article geolocated there (no matter what language it's in).

      Either with the old wikimedia query interface or SPARQL, there's no way to get rid of the flurry of bot written pages, which are simply the same ridiculously innacurate geonames data, formatted into an article just to bump up the number of non-stub pages, or to prove something (what? that some jerk is able to write a crap-flooding script? great!)

      Consider this. Worthless garbage -- complete with made-up meteo data and a dozen of references. They have no article about the great inventor but have one about a small mountain in the opposite corner of the earth (that I'm probably the only guy that had the curiosity to climb in this decade).

      Of course, I could just filter the worst offenders (eg. the cebuano and southern min wikipedia), but this garbage has started to infect other wikipedias too (eg. the swedish one).

  8. Try it on some famous works by Michael+Woodhams · · Score: 4, Funny

    This is just crying out to be applied to some famous texts to amuse us with what it comes up with.

    The Hunting of the Snark. Fox in Socks. We're Going on a Bear Hunt. Ulysses. 50 Shades of Grey. Titus Andronicus. Sonnet 130. Harry Potter and the Portrait of What Looked Like a Large Pile of Ash. The Magna Carta. Genesis. Terms and Conditions for iTunes.

    --
    Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
    1. Re:Try it on some famous works by Knuckles · · Score: 1

      Personally I am looking forward to a concise rendition of Heidegger's Being and Time.

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    2. Re:Try it on some famous works by puddingebola · · Score: 1

      Jabberwocky also.

  9. ??? RSS Feeds. :P by wolfheart111 · · Score: 1

    And then just spin your article via word AI https://wordai.com/... Oh thats what the penguin was for was for. ::) Grrrr

    --
    [($)]
  10. Wikipedia is more about deleting info... by Anonymous Coward · · Score: 1

    than saving it. They are anti-information. I gave-up after I created a page for my uncle that had a platinum record and five gold records that was deleted as not being notable.

  11. exactly - far less than optimal by johnjones · · Score: 1

    most of the information within a wikipedia page is spread around on little visited websites in terrible formatting it actually takes someone who understands or wants to understand "the subject" to actually do a half decent job

    the fact it simply takes the summary of the summarizers basically makes it pointless... go back to you lisp machines "researchers"

    1. Re:exactly - far less than optimal by HiThere · · Score: 1

      It makes this particular pre-alpha version pointless. No argument there. But this particular version doesn't even handle things that could easily be handled, like capitalizing sentences. This is clear evidence that it's a pre-alpha version.

      All this tells us is that this is another area they're looking into. It give essentially no grounds for judging how well the first release will work.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  12. AI to create Wikipedia entries? by kenwd0elq · · Score: 1

    A Google AI could hardly do worse than a large number of Wikipedia entries.

  13. Google Trains by dunkelfalke · · Score: 2

    I didn't realise there is a Google Trains subsidiary. But even so, why does it have an AI and why would this AI edit Wikipedia?

    --
    "It's such a fine line between stupid and clever" -- David St. Hubbins, Spinal Tap
  14. Done before by elgatozorbas · · Score: 1

    This guy was already doing that in 2014?

  15. cool. Google notes instead of Cliff notes. by WindBourne · · Score: 1

    Seriously, if this is done right, it might actually give us 'cliff notes' on businesses, their web sites, ideally, even connect the subsidiaries.
    And perhaps they will show the dots WRT family connections.

    --
    I prefer the "u" in honour as it seems to be missing these days.