Slashdot Mirror


Google Open Sources Its Image-Captioning AI (zdnet.com)

An anonymous Slashdot reader quotes ZDNet: Google has open-sourced a model for its machine-learning system, called Show and Tell, which can view an image and generate accurate and original captions... The image-captioning system is available for use with TensorFlow, Google's open machine-learning framework, and boasts a 93.9 percent accuracy rate on the ImageNet classification task, inching up from previous iterations.

The code includes an improved vision model, allowing the image-captioning system to recognize different objects in images and hence generate better descriptions. An improved image model meanwhile aids the captioning system's powers of description, so that it not only identifies a dog, grass and frisbee in an image, but describes the color of grass and more contextual detail.

40 comments

  1. It's Google by Anonymous Coward · · Score: 0

    So I'm sure it spies on you.

    1. Re:It's Google by Fwipp · · Score: 2

      If only the source were open so we could find out....

    2. Re:It's Google by Anonymous Coward · · Score: 0

      It's a neural network. It doesn't spy on anything. If you pipe the output to a network adapter that's on you.

  2. And in version 2... by Anonymous Coward · · Score: 0

    With just a little tweaking... an automatic meme generation program!

    I, for one, welcome our new meme-generating AI overlords.

    1. Re: And in version 2... by Anonymous Coward · · Score: 0

      Exactly what i have in mind sir.

  3. Re:It's Google NIGGERS GAY ANUS OPEN FOR FUCK by Anonymous Coward · · Score: 0, Troll
    G_N_A_A (G.A.Y NIG.GER ASSOCIATION OF AMERICA) is the first organization which
    gathers G.A.Y NIG.GERS from all over America and abroad for one common goal - being G.A.Y NIG.GERS.

    Are you G.A.Y ?
    Are you a NIG.GER ?
    Are you a G.A.Y NIG.GER ?

    If you answered "Yes" to any of the above questions, then G_N_A_A (G.A.Y NIG.GER ASSOCIATION OF AMERICA) might be exactly what you've been looking for!
    Join G_N_A_A (G.A.Y NIG.GER ASSOCIATION OF AMERICA) today, and enjoy all the benefits of being a full-time G_N_A_A member.
    G_N_A_A (G.A.Y NIG.GER ASSOCIATION OF AMERICA) is the fastest-growing G.A.Y NIG.GER community with THOUSANDS of members all over United States of America. You, too, can be a part of G_N_A_A if you join today!

    Why not? It's quick and easy - only 3 simple steps!

    First, you have to obtain a copy of G.A.Y NIG.GERS FROM OUTER SPACE THE MOVIE and watch it.

    You can watch G.A.Y NIG.GERS FROM OUTER SPACE on Youtube.

    Second, you need to succeed in posting a G_N_A_A "first post" on slashdot.org , a popular "news for trolls" website

    Third, you need to join the official G_N_A_A irc channel #G_N_A_A on EFNet, and apply for membership.
    Talk to one of the ops or any of the other members in the channel to sign up today!

    If you are having trouble locating #G_N_A_A, the official G.A.Y NIG.GER ASSOCIATION OF AMERICA irc channel, you might be on a wrong irc network. The correct network is EFNet, and you can connect to irc.secsup.org or irc.easynews.com as one of the EFNet servers.
    If you do not have an IRC client handy, you are free to use the G_N_A_A Java IRC client by clicking here.

    If you have mod points and would like to support G_N_A_A, please moderate this post up.

    This post brought to you by Penisbird , a proud member of the G_N_A_A

    G_____________________________________naann_______ ________G
    N_____________________________nnnaa__nanaaa_______ ________A
    A____________________aanana__nannaa_nna_an________ ________Y
    A_____________annna_nnnnnan_aan_aa__na__aa________ ________*
    G____________nnaana_nnn__nn_aa__nn__na_anaann_MERI CA______N
    N___________ana__nn_an___an_aa_anaaannnanaa_______ ________I
    A___________aa__ana_nn___nn_nnnnaa___ana__________ ________G
    A__________nna__an__na___nn__nnn___SSOCIATION_of__ ________G
    G__________ana_naa__an___nnn______________________ ________E
    N__________ananan___nn___aan_IGGER________________ ________R
    A__________nnna____naa____________________________ ________S
    A________nnaa_____anan____________________________ ________*
    G________anaannana________________________________ ________A
    N________ananaannn_AY_____________________________ ________S
    A________ana____nn_________IRC-EFNET-#G_N_A_A________ ________S
    A_______nn_____na_________________________________ ________O
    *_______aaaan_____________________________________ ________C
    Gary Niger gary_niger@G_N_A_A.us G_N_A_A Corporate Headquarters 143 Rolloffle Avenue Tarzana, California 91356
    Enid Al-Punjabi enid_al_punjabi@G_N_A_A.us G_N_A_A World Headquarters No.33 Kyutei Bld. 2F, Shinjuku 2-11-7, Shinjuku-ku, Tokyo, Japan ????????2??11-6
    Copyright (c) 2003-2015 G.A.Y NIG.GER Association of America

    Ich Bindawalross (London) - G_

  4. Finally! by nospam007 · · Score: 1

    Finally can I build that automatic nemesis recognition missile.

  5. good going google by Anonymous Coward · · Score: 0

    now bots will be able to break captchas... FUCK!

    1. Re: good going google by Anonymous Coward · · Score: 0

      Except thats not what it does. A captcha would probably be dispayed as "letters" or "a word".

      If youre worried about potential abuse think of a picture:

      "Man wearing burkha near airport"

  6. So how can we try this on our own? by Anonymous Coward · · Score: 0

    Often every news item about CoolNeuralNetworkThingX are often obfuscated on where it's located and how to use it just for company prestige/hype value, or just left as a paper

    1. Re:So how can we try this on our own? by Fwipp · · Score: 1
  7. would be cool if it stayed on MY machine by ecloud · · Score: 1

    When I dreamed of having an intelligent computer a decade or two ago, I never dreamed that it could only be accomplished by sending queries to some big corporate-controlled cluster and getting responses back. I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something. When open-source AI is capable of doing something useful, then I will run it on my own machine.

    But can we ever expect an AI to get anything done without communicating? A lower standard: can we expect it to communicate to the extent necessary to get something done, but still respect our privacy? To have a positive answer requires an AI with ethics. It's probably more work for the AI to understand what is necessary to respect the user's privacy (like a good friend would do) than to answer the questions we ask of it.

    1. Re:would be cool if it stayed on MY machine by Anonymous Coward · · Score: 2, Informative

      It does stay on your machine. The Google Cloud Compute API doesn't even have image captioning as a service right now. If you want to test this: you're going to have to get a nice NVIDIA GPU and compile their Tensorflow code by following the Readme.MD on github.

      The reality is this isn't a useful product for robotics because the output of the network is a natural language caption. If you wanted to use this model for robotics, you would chop off the classifier and use the pre-trained Inception v3 model for whatever your needs were.

    2. Re:would be cool if it stayed on MY machine by yes-but-no · · Score: 1

      I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something.

      Did anyone ever put a gun on your head and made you buy/say/act against your wish? you get exploited when you are dumb; as simple as that. Increase your awareness.. don't blame/whine your opponent for being too strong.

    3. Re:would be cool if it stayed on MY machine by Anonymous Coward · · Score: 0

      Gotta say... that's a very simplistic view. Surely you can see how competitive pressures come into play?
      The only reason I can see for your attitude is if you are justifying working for a tech company that takes advantage of people.

  8. Training by Anonymous Coward · · Score: 0

    If it gets trained with imagery from GTA it should soon be able to recognise cop cars, swat teams, police helicopters and hookers with 99% accuracy.

  9. Please select all images with steet signs by Anonymous Coward · · Score: 0

    Thank you! Now, select all images with a house. Click verify once there are none left. Yes, and the new ones too. ......Thanks a lot. May we ask you for one more? This time, select all images with a store front. Oh, and these too, of course.

    Multiple correct solutions required, by the way.

    Finally, please prove you're not a robot by reposting a youtube video of a non-robot solving a captcha on your Google+. Please sign in to verify your age. ....Oh, sorry, *that* video was blocked on copyright grounds.

  10. Drawception by Anonymous Coward · · Score: 0

    Maybe we can finally have an AI that can properly play Drawception and not shove in memes/OCs/injokes/Undertale/MLP/Steven/Pokemon that ruin the game.

  11. 6.1 percent inaccurate by Anonymous Coward · · Score: 0

    Hope they fixed this bug: http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tags-black-people-as-gorillas-showing-limits-of-algorithms/

    1. Re:6.1 percent inaccurate by K.+S.+Kyosuke · · Score: 2, Funny

      They fixed it. The new version tags gorillas as black people.

      --
      Ezekiel 23:20
  12. Wish I could spend serious time on this by Camembert · · Score: 1

    With the advances in machine learning and the easy availability of tools like this, it would be so very satisfying to put serious time and energy in studying these interesting topics. However, like probably several others here, with a mortgage and in my case twin kids coming, it is near impossible to break away from the day job...

    1. Re:Wish I could spend serious time on this by Anonymous Coward · · Score: 5, Informative

      If you've got $1200 you've got enough money to play in the arena. If you want to do "DeepMind" level work: you need a substantially larger farm of GPUs.

      If you don't feel a need to replicate the latest flashy advances: there's still plenty of opportunity to make really interesting contributions with an NVIDIA GTX 960 training networks on MNIST 28x28x1 Resolution Images.

      Time requirement is mostly reading in 15-30 minutes chunks. It took me a year to read enough to feel fluent.

      Start here:
      http://www.dspguide.com/ch26.htm
      Then read these:
      https://en.wikipedia.org/wiki/Artificial_neuron
      https://en.wikipedia.org/wiki/Artificial_neural_network
      https://en.wikipedia.org/wiki/Multilayer_perceptron
      https://en.wikipedia.org/wiki/Softmax_function
      http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network
      http://image.slidesharecdn.com/cnn-toupload-final-151117124948-lva1-app6892/95/convolutional-neural-networks-cnn-44-638.jpg?cb=1455889178
      (TLDR: Using the Sigmoid/Tanh for your transfer function suffers from something called "vanishing gradients" where the derivative(used for "backpropagation") approaches zero as the weights of the network become large. Restricted Boltzmann Machines(RBM's) use an alternative to backpropagation known as "contrastive divergence", and so it was popular to stack these to form "deep belief networks"(just a multi-layer RBM trained one layer at a time). The ReLU transfer function has grown popular because it solves this problem more easily, which means you can safely ignore RBMs and DBNs from your reading, at least initially.)

      Then read these:
      https://en.wikipedia.org/wiki/Support_vector_machine
      https://en.wikipedia.org/wiki/Convolutional_neural_network (Will explain what "Pooling Layers" are)
      https://www.reddit.com/r/MachineLearning/comments/3klqdh/q_whats_the_difference_between_crossentropy_and/

      Difference between "regression" and "classification":
      A regression network outputs the activation of the output neurons directly, while a classifier network uses the softmax function to ensure that the sum of all the output neurons' activations add up to one.

      The most important thing to understand: it is trivial to train a neural network to perform well on it's own training data(that's what backpropagation DOES). What is difficult is collecting enough data(preferably labeled) to where you can hold out a significant portion for validation(prevents overtraining), and another set of holdout data for TESTING. Your goal is to teach the network to generalize to work on the general case. This is called "regularization". The test data hold out set is for verifying that the validation data wasn't overtrained via "hill climbing".
      Cool trick: https://en.wikipedia.org/wiki/Dropout_(neural_networks)
      http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/
      https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies (Neural Networks meet Evolutionary Algorithms)
      https://people.cs.uct.ac.za/~gnitschke/projects/papers/2009-Niche%20Particle%20Swarm%20Optimization%20for%20Neural%20Network%20Ensembles.pdf

      Other things to know: learning rate is how quickly the network adjusts it's weights(how quickly you jump around during stochastic gradient descent). Bigger steps = faster approach of local minima, but you tend to "overshoot" the high-performing valleys and get stuck on the low-performing surface. This is why it's generally a good idea to "aneal" your learning rate over time.
      http://sebastianruder.com/optimizing-gradient-descent/

      Other cool things to learn about:
      Autoencoders and "Transfer Learning" IE. You can get most of the value of having Google's enormous GPU farms by simply downloading their pretrained inception models, then using them as pretrained features for other experiments.

      Caffe vs. Tensorflow vs. Keras vs. Torch? I vote: Tensorflow.
      https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html

      Good luck!

    2. Re:Wish I could spend serious time on this by Camembert · · Score: 1

      Wow! Thanks so much for your friendly and ultra helpful reply. This will really help getting me started. There is so much cynisim here on /. - it is wonderful to read your very informative reply. Thanks again !

    3. Re:Wish I could spend serious time on this by yes-but-no · · Score: 1

      it's all about desire ordering..D1, D2, D3 .. you can always break-away from foo if your desire for bar is higher.. if you can giveup D3 for D2 and D2 for D1; you can realize any D1. [no one forced anyone to hv a mortgage or even kids..or raise them in expensive places/life-style.. or made one sit in a cubicle to pickup pay-check].. when a person lacks courage or passion for D1, he/she starts blaming the environment or say too much cynisim around.

    4. Re:Wish I could spend serious time on this by Camembert · · Score: 1

      Not blaming anything.
      No one indeed forced me to have kids, but it was something -let's call it D1- that we found very important, more important than my other personal interests.
      Also having your own house paid off is actually a good element to keep poverty at bay when old
      So, not blaming anything, not even unhappy with my job, and my own family priorities are more important. ML is a personal interest that I hope to develop.

    5. Re:Wish I could spend serious time on this by yes-but-no · · Score: 1

      sorry then why do u say something is going to be "so very satisfying?" I assume so-very means it falls in the top say 5 desires of a person. I like many things in life..but I wont' call them 'so very satisfying' ..in that case I will start throwing away stuff which is less important and focus on my top few..in fact life taught me I can't even have D2 if I wanted D1

    6. Re:Wish I could spend serious time on this by Anonymous Coward · · Score: 0

      I'm glad to share my passion.

      I forgot one important link:
      https://en.wikipedia.org/wiki/Hyperparameter_optimization

      Eventually you're going to feel like your model's have plateaued and you will want better performance. Hyperparameter optimization is an attempt to apply a methodological process to improving performance by modifying "hyperparameters" such as the number of hidden neurons in an MLP, or the learning rate.

      This process only yields so much additional performance which can be squeezed from the model, after which point: you have to return to the drawing board and devise a new scheme to get better performance from the same training data.

      If you're engineering vs. a machine learning scientist: one of the easiest ways to get better performance without going back to the drawing board is getting more/higher quality training data. This is why competitions usually have a separate tier for models which were trained using augmented data sets.

    7. Re: Wish I could spend serious time on this by Camembert · · Score: 1

      I guess I am an "and" person, not an "or" person. It would be great and satisfying to be able to spend good time learning this tech in depth. But my priority is my my family, that is the main source of my happiness.

    8. Re: Wish I could spend serious time on this by yes-but-no · · Score: 1

      nice word play there. having 'and' in dreams without action is as worthless as not having it at all. just thinking i'm an emperor doesnt' make one so.

  13. Son of a bitch! by Gravis+Zero · · Score: 2

    The nerve of this infernal program is so obscene it must be untenable! It captioned my dick pic as "YAUPFAN (Yet Another Unimpressive Penis From A Narcissist)"! Kudos for having it create it's own acronyms but I won't stand for a machine generated insult and neither should you! Though if you have a standing desk, it's cool, I totally get it. ;)

    --
    Anons need not reply. Questions end with a question mark.
  14. According to Google's calculations... by Anonymous Coward · · Score: 0

    ... I must be their slave and breed candy. But that's completely innacurate. But I've seen worse real life scenarios, trust me.

  15. License? by Anonymous Coward · · Score: 0

    So what's the license? I couldn't find it on GitHub.

    1. Re: License? by Anonymous Coward · · Score: 1

      If you look the source it says apache license.

  16. Re:It's Google NIGGERS GAY ANUS OPEN FOR FUCK by Anonymous Coward · · Score: 0

    G_N_A_A (G.A.Y NIG.GER ASSOCIATION OF AMERICA) is the first organization which

    To anyone sick of this spamming - it's not hard to stop, just lobby our overlords to block all posts from APK. This will reduce the amount of posts considerably, but it will double the signal to noise ratio.

  17. Re:I open sourced my ballsack and brought AI to li by Anonymous Coward · · Score: 0

    now my ball-sack clones are flying around the room and fight each other for the proud honor of landing on my head, spreading out to relax, and each testicle drops down like an icicle over each of my eyes. /goodfellas //"What can I say? It turned me on."

    You are a dickhead APK, and you can't see for nuts.

  18. The obligatory xkcd by Anonymous Coward · · Score: 0

    https://xkcd.com/1444/

    captcha: infamous

  19. How did it know? by Impy+the+Impiuos+Imp · · Score: 0

    Average Slashdotter: Lemme try it on a selfie!

    Computer returns analysis title: "300 Lb. Fatass Masturbating"

    Slashdotter: Eerie!

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  20. Gosh, that's something else. by dabeshu · · Score: 1

    Still not as impressive as the one that invented toothpaste and made art.