Slashdot Mirror


Google Open Sources Its Image-Captioning AI (zdnet.com)

An anonymous Slashdot reader quotes ZDNet: Google has open-sourced a model for its machine-learning system, called Show and Tell, which can view an image and generate accurate and original captions... The image-captioning system is available for use with TensorFlow, Google's open machine-learning framework, and boasts a 93.9 percent accuracy rate on the ImageNet classification task, inching up from previous iterations.

The code includes an improved vision model, allowing the image-captioning system to recognize different objects in images and hence generate better descriptions. An improved image model meanwhile aids the captioning system's powers of description, so that it not only identifies a dog, grass and frisbee in an image, but describes the color of grass and more contextual detail.

18 of 40 comments (clear)

  1. Re:It's Google by Fwipp · · Score: 2

    If only the source were open so we could find out....

  2. Finally! by nospam007 · · Score: 1

    Finally can I build that automatic nemesis recognition missile.

  3. would be cool if it stayed on MY machine by ecloud · · Score: 1

    When I dreamed of having an intelligent computer a decade or two ago, I never dreamed that it could only be accomplished by sending queries to some big corporate-controlled cluster and getting responses back. I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something. When open-source AI is capable of doing something useful, then I will run it on my own machine.

    But can we ever expect an AI to get anything done without communicating? A lower standard: can we expect it to communicate to the extent necessary to get something done, but still respect our privacy? To have a positive answer requires an AI with ethics. It's probably more work for the AI to understand what is necessary to respect the user's privacy (like a good friend would do) than to answer the questions we ask of it.

    1. Re:would be cool if it stayed on MY machine by Anonymous Coward · · Score: 2, Informative

      It does stay on your machine. The Google Cloud Compute API doesn't even have image captioning as a service right now. If you want to test this: you're going to have to get a nice NVIDIA GPU and compile their Tensorflow code by following the Readme.MD on github.

      The reality is this isn't a useful product for robotics because the output of the network is a natural language caption. If you wanted to use this model for robotics, you would chop off the classifier and use the pre-trained Inception v3 model for whatever your needs were.

    2. Re:would be cool if it stayed on MY machine by yes-but-no · · Score: 1

      I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something.

      Did anyone ever put a gun on your head and made you buy/say/act against your wish? you get exploited when you are dumb; as simple as that. Increase your awareness.. don't blame/whine your opponent for being too strong.

  4. Re:So how can we try this on our own? by Fwipp · · Score: 1
  5. Wish I could spend serious time on this by Camembert · · Score: 1

    With the advances in machine learning and the easy availability of tools like this, it would be so very satisfying to put serious time and energy in studying these interesting topics. However, like probably several others here, with a mortgage and in my case twin kids coming, it is near impossible to break away from the day job...

    1. Re:Wish I could spend serious time on this by Anonymous Coward · · Score: 5, Informative

      If you've got $1200 you've got enough money to play in the arena. If you want to do "DeepMind" level work: you need a substantially larger farm of GPUs.

      If you don't feel a need to replicate the latest flashy advances: there's still plenty of opportunity to make really interesting contributions with an NVIDIA GTX 960 training networks on MNIST 28x28x1 Resolution Images.

      Time requirement is mostly reading in 15-30 minutes chunks. It took me a year to read enough to feel fluent.

      Start here:
      http://www.dspguide.com/ch26.htm
      Then read these:
      https://en.wikipedia.org/wiki/Artificial_neuron
      https://en.wikipedia.org/wiki/Artificial_neural_network
      https://en.wikipedia.org/wiki/Multilayer_perceptron
      https://en.wikipedia.org/wiki/Softmax_function
      http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network
      http://image.slidesharecdn.com/cnn-toupload-final-151117124948-lva1-app6892/95/convolutional-neural-networks-cnn-44-638.jpg?cb=1455889178
      (TLDR: Using the Sigmoid/Tanh for your transfer function suffers from something called "vanishing gradients" where the derivative(used for "backpropagation") approaches zero as the weights of the network become large. Restricted Boltzmann Machines(RBM's) use an alternative to backpropagation known as "contrastive divergence", and so it was popular to stack these to form "deep belief networks"(just a multi-layer RBM trained one layer at a time). The ReLU transfer function has grown popular because it solves this problem more easily, which means you can safely ignore RBMs and DBNs from your reading, at least initially.)

      Then read these:
      https://en.wikipedia.org/wiki/Support_vector_machine
      https://en.wikipedia.org/wiki/Convolutional_neural_network (Will explain what "Pooling Layers" are)
      https://www.reddit.com/r/MachineLearning/comments/3klqdh/q_whats_the_difference_between_crossentropy_and/

      Difference between "regression" and "classification":
      A regression network outputs the activation of the output neurons directly, while a classifier network uses the softmax function to ensure that the sum of all the output neurons' activations add up to one.

      The most important thing to understand: it is trivial to train a neural network to perform well on it's own training data(that's what backpropagation DOES). What is difficult is collecting enough data(preferably labeled) to where you can hold out a significant portion for validation(prevents overtraining), and another set of holdout data for TESTING. Your goal is to teach the network to generalize to work on the general case. This is called "regularization". The test data hold out set is for verifying that the validation data wasn't overtrained via "hill climbing".
      Cool trick: https://en.wikipedia.org/wiki/Dropout_(neural_networks)
      http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/
      https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies (Neural Networks meet Evolutionary Algorithms)
      https://people.cs.uct.ac.za/~gnitschke/projects/papers/2009-Niche%20Particle%20Swarm%20Optimization%20for%20Neural%20Network%20Ensembles.pdf

      Other things to know: learning rate is how quickly the network adjusts it's weights(how quickly you jump around during stochastic gradient descent). Bigger steps = faster approach of local minima, but you tend to "overshoot" the high-performing valleys and get stuck on the low-performing surface. This is why it's generally a good idea to "aneal" your learning rate over time.
      http://sebastianruder.com/optimizing-gradient-descent/

      Other cool things to learn about:
      Autoencoders and "Transfer Learning" IE. You can get most of the value of having Google's enormous GPU farms by simply downloading their pretrained inception models, then using them as pretrained features for other experiments.

      Caffe vs. Tensorflow vs. Keras vs. Torch? I vote: Tensorflow.
      https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html

      Good luck!

    2. Re:Wish I could spend serious time on this by Camembert · · Score: 1

      Wow! Thanks so much for your friendly and ultra helpful reply. This will really help getting me started. There is so much cynisim here on /. - it is wonderful to read your very informative reply. Thanks again !

    3. Re:Wish I could spend serious time on this by yes-but-no · · Score: 1

      it's all about desire ordering..D1, D2, D3 .. you can always break-away from foo if your desire for bar is higher.. if you can giveup D3 for D2 and D2 for D1; you can realize any D1. [no one forced anyone to hv a mortgage or even kids..or raise them in expensive places/life-style.. or made one sit in a cubicle to pickup pay-check].. when a person lacks courage or passion for D1, he/she starts blaming the environment or say too much cynisim around.

    4. Re:Wish I could spend serious time on this by Camembert · · Score: 1

      Not blaming anything.
      No one indeed forced me to have kids, but it was something -let's call it D1- that we found very important, more important than my other personal interests.
      Also having your own house paid off is actually a good element to keep poverty at bay when old
      So, not blaming anything, not even unhappy with my job, and my own family priorities are more important. ML is a personal interest that I hope to develop.

    5. Re:Wish I could spend serious time on this by yes-but-no · · Score: 1

      sorry then why do u say something is going to be "so very satisfying?" I assume so-very means it falls in the top say 5 desires of a person. I like many things in life..but I wont' call them 'so very satisfying' ..in that case I will start throwing away stuff which is less important and focus on my top few..in fact life taught me I can't even have D2 if I wanted D1

    6. Re: Wish I could spend serious time on this by Camembert · · Score: 1

      I guess I am an "and" person, not an "or" person. It would be great and satisfying to be able to spend good time learning this tech in depth. But my priority is my my family, that is the main source of my happiness.

    7. Re: Wish I could spend serious time on this by yes-but-no · · Score: 1

      nice word play there. having 'and' in dreams without action is as worthless as not having it at all. just thinking i'm an emperor doesnt' make one so.

  6. Son of a bitch! by Gravis+Zero · · Score: 2

    The nerve of this infernal program is so obscene it must be untenable! It captioned my dick pic as "YAUPFAN (Yet Another Unimpressive Penis From A Narcissist)"! Kudos for having it create it's own acronyms but I won't stand for a machine generated insult and neither should you! Though if you have a standing desk, it's cool, I totally get it. ;)

    --
    Anons need not reply. Questions end with a question mark.
  7. Re:6.1 percent inaccurate by K.+S.+Kyosuke · · Score: 2, Funny

    They fixed it. The new version tags gorillas as black people.

    --
    Ezekiel 23:20
  8. Re: License? by Anonymous Coward · · Score: 1

    If you look the source it says apache license.

  9. Gosh, that's something else. by dabeshu · · Score: 1

    Still not as impressive as the one that invented toothpaste and made art.