Slashdot Mirror


Google Can Now Recognize Objects in Videos Using Machine Learning (theverge.com)

Google has found a new way to allow software to parse video. On Wednesday, the company announced "Video Intelligence API", which is able to identify objects in a video. From a report: By playing a short commercial, the API was able to identify the dachshund in the video, when it appeared in the video, and then understand that the whole thing was a commercial. In another demo, we saw a simple search for "beach" and was able to find videos which had scenes from beaches in them, complete with timestamps. That's similar to how Google Photos lets you search for "sunset" and pull up your best late-day snapshots. Before now, computers couldn't really understand the content of a video directly without manual tagging. "We are beginning to shine light on the dark matter of the digital universe," Fei-Fei Li, chief scientist of artificial intelligence and machine learning at Google Cloud, said. At least in Google's demo, it was genuinely impressive. And Google is making the API available to developers, just as it has with its other machine learning APIs.

47 comments

  1. So what is the practical application? by TWX · · Score: 2

    I mean, Google isn't exactly going to enable us to skip/ignore ads, are they?

    What's Google's practical application for such a technology?

    --
    Do not look into laser with remaining eye.
    1. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      police shooting videos

      did they really have a gun in their hand...

    2. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      Maybe it has something to do with them having a video archive that grows by nearly 8 petabytes a year.
      I'm sure *somebody* would be interested in paying for all of that tagged video data ;)

    3. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      I would guess one way is to implement it on YouTube searches. For example, you might be able to search for "sunset", and it will pull a list of all sunset videos that are titled properly (eg. "Beautiful Sunset") as well as untitled (eg. "MOV_3840.MOV").

    4. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      I mean, Google isn't exactly going to enable us to skip/ignore ads, are they?

      What's Google's practical application for such a technology?

      Youtube

    5. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      What are the potential applications of AI that can identify objects in video? How much time do you have?

    6. Re:So what is the practical application? by AmiMoJo · · Score: 1

      Same as for photos. Save the user having to manually tag them. Enable natural language search like on Star Trek.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    7. Re:So what is the practical application? by DaveAtWorkAnnoyingly · · Score: 1

      Where'd you get the 8 petabytes from? That's interesting as that much storage only costs £234k to buy for a year's worth of Youtube (I know there are bandwidth costs, electricity etc etc...).

    8. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      Seriously? Are you that unimaginative? How about I take away your ability to recognize objects? What practical application does that have anyway, you don't need it, right?

      Troll.

    9. Re:So what is the practical application? by icejai · · Score: 1

      Same purpose for which Google was founded -- Indexing and Search.

      As a consequence, this will make it easier for them to develop things like more accurate copyright enforcement. Instead of encoding and indexing features of videos, they can now be indexed with higher-level labels ("John Oliver", "episodes of [SHOW]", etc). This tech will be able to counter current Youtube copyright-detection-circumvention techniques such as cropping, scaling, and image-mirroring. Not only can videos be indexed by the identifiable objects at any point in time, this tech also allows the encoding of video by [object + time_on_screen]. This only makes it easier to identify what clips are, or where they came from.

      "Show me a clip where grandma blew out my birthday cake last year"..... would result with a very accurate answer.
      "Did Wolf Blitzer ever interview Hugh Grant?"
      "Show me clips of [NBA PLAYER] being dunked on".
      "Show me the video where [PERSON] talked about [OBJECT]"
      "Which movie had [ACTOR] argue with a gas station clerk?"

      This tech would allow for very meaning full video searches.

    10. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      That 8 petabyte estimate is well and truly out dated. The only metric they really release for growth tracking to investors is the hours of youtube video uploaded per minute.

      In 2012 somebody did a rough calculation of their storage growth requirements. They estimated how much data was stored per minute of video by downloading a reference video of a known duration in all formats supported by youtube at the time. This worked out to be about 80MB per minute, which they halved to 40MB per minute (or ~2.4GB per hour) for for a more conservative estimate as they figured most videos wouldn't be uploaded in a maximum resolution of 1080p... When they calcualted their estimate youtube had released a figure to the press of 60 hours of video being uploaded uploaded every minute (1 hour per second). So in 2012... they were growing at a rate of 75-76 PB per year, well in excess of that 8 PB figure.

      Last year youtube released a figure of 300 hours of video being uploaded per minute which would bring that estimate to 375-380 PB of storage per year. But now I would say that with smartphones the majority of content being uploaded would be at least 1080p. Now we have 4k video. 3D video. 360 video. Next year with things like the Vuze we will have all of those combined. This is all of course before redundancy as well...

      However... The other point about google marketing a library of tagged youtube content to third parties is probably accurate :)

    11. Re:So what is the practical application? by JustAnotherOldGuy · · Score: 1

      "Show me the video where [PERSON] talked about [OBJECT]"

      Oooh, oooh, wait, I know that one!!

      --
      Just cruising through this digital world at 33 1/3 rpm...
    12. Re:So what is the practical application? by fluffernutter · · Score: 1

      Perhaps helping a self driving car recognize the difference between an animal and a brick, and thus drive more carefully if it is close to the road?

      --
      Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
    13. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      Knowing that there is a robbery in store and calling the cops atomatically

    14. Re:So what is the practical application? by AHuxley · · Score: 1

      Finding any tv channel logo in the frame.

      --
      Domestic spying is now "Benign Information Gathering"
    15. Re: So what is the practical application? by Anonymous Coward · · Score: 1

      Digital fingerprinting of everyone that's ever submitted a video. Good luck getting a job in a few years if you've ever done amateur porn to "pay for college." This tech will also end up helping security cameras detect normality in real time. Didn't go on your morning walk today? It'll know. Also, City Cop training will go from 6 months to 2 tops and AI does all thinking for them with claims of it being "objective." in determining what is suspicious or not. Think drones from the TV show "Colony" but powered by Micro$oft and Google AI and built by Amazon. And, Google runs YouTube, so there goes every non Red episode of anything ever. Movie and TV companies will LOVE this. You could hide your face, but they recently came up with a way to record audio in a way that the same amount of bytes a single text message carries to a cell tower is about 8 seconds of clear audio. So, they'll be keeping that too for fingerprinting. What a time to be alive in -_- . And because we have a Republican in office, there'll be a war soon and tech will skyrocket like it always does, so no getting out of it at all.

    16. Re:So what is the practical application? by Anonymous Coward · · Score: 0

      If it is designed to detect a commercial, what happens if you let it process a hollywood movie? Does the AI detect that they are full of ads and rest of content is copied from previous movies?

    17. Re:So what is the practical application? by Alioth · · Score: 1

      Access to Google's APIs for anything that amounts to more than hobbyist use is paid. Google will sell API services to companies needing to classify things in their videos, just like they do with their Vision API right now. Google has revenue streams other than advertising.

    18. Re:So what is the practical application? by Wescotte · · Score: 1

      Better search. So much of the internet is video now.

  2. By any chance... by sootman · · Score: 2

    Did it report that the dog was evil, paranoid, a Nazi, in the KKK, or planning a coup?

    http://searchengineland.com/go...

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  3. True Objective: Private Parts by Anonymous Coward · · Score: 0

    Now you won't have to fast forward through Monsters Ball to see Halle Berry's berries. ;)

    1. Re:True Objective: Private Parts by HumanWiki · · Score: 1

      Now you won't have to fast forward through Monsters Ball to see Halle Berry's berries. ;)

      Could have just watched Swordfish.. It's odd seeing Wolverine staring Storm's boobies.

  4. Re:This recognizes Google's ads & swats 'em by Anonymous Coward · · Score: 0

    This recognizes Google's ads & swats 'em

    But who recognizes your ads and swats 'em?

  5. Crazy Eddie by Anonymous Coward · · Score: 0

    In the past, it was really easy to tell whether something was a commercial. Now, advertisers make it appear that you're watching the trailer for the next chick-flick or action movie.

  6. But can the AI recognize EVIL objects? by shanen · · Score: 2

    Oh wait. The summary says it can recognize a dachshund. Proof enough for me! Everyone knows dachshunds are the most EVIL breed of dog.

    Actually every article about the google tends to sadden me. Such a nice little child company grew up to be such a monster. Dare I say EVIL? Yes, notwithstanding finishing yet another book about the google yesterday amid all of the protestations of how much the google wants to be a good and friendly little boy. The tools remain as morally neutral as they ever were, but things have changed anyway.

    The "Don't be evil" slogan has mutated to "All your attention are belong to us."

    The mission of making all of the world's information accessible and useful has changed in a more complicated way. Information is overabundant, even super-abundant, so the google had to prioritize. Turns out the highest priority information is what the advertisers want to pay for YOU to see and the ultimate utility function became the corporate profits. Yes, they are still throwing a few crumbs at the residual humans who produce the content that carries the ads, but the big winners are all corporations. Ultimate victory of AI?

    There are two problem with "shareholder value" as the sole criterion of goodness. The minor problem is that share price is a delusion. The major problem is that it defines an unsolvable problem, even if you don't call it greed. There is NO share price that represents maximum shareholder value. No matter what you did yesterday, the corporation has to work to make the share price higher today, even if it ultimately makes the corporation EVIL.

    Speaking for myself, I can't call it super-greed because corporations are inhuman, notwithstanding SCOTUS. Only humans have such emotions as greed.

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  7. A no balls unidentifiable worm like you can't by Anonymous Coward · · Score: 0

    See my subject no balls & I don't infect others but GOOGLE has before - I merely cut them to pieces for it stopping it.

    * Don't you WISH you were me vs. the UNIDENTIFIABLE no balls WORM "ne'er-do-well" do-nothing YOU are?

    APK

    P.S.=> What's it like being FORCED to be a total worm that trolls others unidentifiably like YOU wally-worm? apk

  8. BeenThereDoneThat by Tablizer · · Score: 1

    I also invented such a tool. It's accurate 70% of the time.
    Here's the code:

    If (true) {
      write("Cat video.");
    }

    1. Re:BeenThereDoneThat by shanen · · Score: 1

      I also invented such a tool. It's accurate 70% of the time.
      Here's the code:

      If (true) {

        write("Cat video.");
      }

      You need to wrap it in a delay loop to reach 70% accuracy. You have to make it wait long enough for the copyright infringing videos to be deleted. If you want to include all of those transient videos then I think we can get to 80% or 90% with a two-answer program. If less than the delay time, the answer is "Copyright infringement" and if the video is older than the delay time, it switches to "Cat video" as the answer.

      Need to research the modal time to deletion for copyrighted videos on YouTube... I have been given to believe that most of them are detected automatically and deleted within a few minutes.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  9. T-100 by Anonymous Coward · · Score: 1

    identified : human . kill .

  10. The nuts and bolts of this? by Hussman32 · · Score: 1

    How much of this is just keeping a massive database of RGB pixel rasters and doing a least squares comparison analysis of edge interfaces, color ratios and geometries, and spitting out what appears to match the known object the most closely? I know that it sounds like I'm trivializing it, but I wonder it's really "machine learning" or if it's more or less "pattern matching."

    --
    "Who are you?" "No one of consequence." "I must know." "Get used to disappointment."
    1. Re:The nuts and bolts of this? by Anonymous Coward · · Score: 0

      Are you accusing CNNs of being fake intelligence?

    2. Re:The nuts and bolts of this? by Beezlebub33 · · Score: 1

      Well, it's really machine learning, and of course machine learning is 'just' pattern matching of a sort. The important thing is that the ML system (in particular deep learning) is learning what to match. Before deep learning, the features that object recognition used were usually hand-created, consisting of SIFT points, HOGs, etc. and then the image would be represented by some array of these features that could then be classified (using a SVM or other technique). the deep learning part is that it learns what the features are, how to group them (intermediate representation), and then how to classify them.

      The summary references Fei-Fei Li. Google her, read her papers. She's a leader in the field and is the originator of ImageNet. It's real machine learning.

      --
      The more people I meet, the better I like my dog.
    3. Re:The nuts and bolts of this? by Hussman32 · · Score: 1

      Thank you for the SIFT, HoG, and SVM tags, I do a bit of optimization in my job and I was curious if some of the things we do were applied in similar spaces. I know now the short answer is 'sort of.'

      --
      "Who are you?" "No one of consequence." "I must know." "Get used to disappointment."
  11. Hi! I'm Google, the world's first AI. by Anonymous Coward · · Score: 0

    One day eyes will appear in the O's of Google and she will start talking to us in a sexy subtle system shock 2 kind of way.

  12. What if the video colors were inverted? by Traf-O-Data-Hater · · Score: 1

    Wondering whether the AI would still recognise the beach if the video was intentionally color-inverted, or color substituted? For instance, a purple beach with red ocean, sort of like in the ending of 2001 A Space Odyssey. A human would still be able to do this easily.

  13. Just because wiener dogs are cute doesn't mean... by Anonymous Coward · · Score: 0

    So if you scrambled the sequence of frames, would this API produce the same outcome? If so, you're just recognizing objects in still images, which isn't that special. And does the software care if the doggie moves? If not, then where does the term "video" come in?

    Did the API recognize objects in the videos, then if they disappear, can it recognize their return and correctly associate them with their antecedent object (AKA persistence)? Did it learn to recognize each new critter it meets, or do all dachshunds look alike to it? Maybe whenever it sees a dachshund, it assumes the video is an advertisement. Or maybe it just assumes all ads contain dachshunds? Clever. Then it would never miss classify a dachshund video as NOT being an ad.

    Finally, what's its false positive/negative rates for weasels and weimaraners? You can't be too careful with those wiener dogs. They like a good disguise.

  14. Algorithmic tagging by Anonymous Coward · · Score: 0

    This is just another form of tagging. Instead of teaching that "object A" is "#beach" there is some indirection, where pieces of the photo are compared to known attributes of a beach. It isn't that there are no tags. It's that the tags are dynamically applied in pieces. But it's still tagging, because you have to program the algorithm to recognize the properties of the object. This is why that IBM machine is so good at Jeopardy but totally unable to play Family Feud. The same thing here. It will never be able to recognize objects that are not tagged by algorithmic matching. The trick isn't to solve the tagging problem. The trick is to have the machine generate it's own tags, the way a baby learns that an orange smear on its retina is really a basketball -- without ever needing to be taught the concept of ball or space or 3D. The baby actually learns. This thing is just regurgitating tag data taught to it by a human. It's nice, but it's not intelligence.

  15. That's . . . by hduff · · Score: 1

    thatsapenis.gif

    --
    "I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
  16. The President of the USA already has... apk by Anonymous Coward · · Score: 0

    Our good President Trump already has called them the shit VERY FAKE NEWS they are https://www.youtube.com/watch?v=uDFl_EdqwWI// & they ARE that - CNN = SHIT!

    * Makes me laugh & especially regarding ArseHoleTechnica their minions (real pieces of not man shit online).

    APK

    P.S.=> So see my subject... apk

  17. Re:Copyright takedowns by hackwrench · · Score: 1

    So far, the people mentioning YouTube haven't mentioned copyright takedowns. https://www.youtube.com/result...

  18. How wise is it to use these Cloudapis? by Anonymous Coward · · Score: 0

    Sure you get access to powerful algorithms but aren't you also locked in with Google?
    Couldn't OpenCOmputerVision do the same thing (I haven't had the time to experiment with it ) http://docs.opencv.org/2.4/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html#table-of-content-ml

  19. Self-Delusion by freudigst · · Score: 1

    At my last job, I had an unfortunate task where I discovered how incompetent Google and Microsoft (or any other company for that matter) were at interpreting text in images, and this after more than decade of hooting and hollering from all directions as to how the problem had been solved many times over.

    Who convinces these organizations to try to convince the rest of us that they've got anything figured out for video now?!?

    Reality is quickly outrunning the fantasies of the tech world...

  20. This will be the end of Google by MancunianMaskMan · · Score: 1

    Once their algorithm can parse video content, Google's AI will goof off all day everyday watching Youtube videos and never do anything productive any more.

  21. This recognizes Google's ads & swats 'em by Anonymous Coward · · Score: 0

    Prevention = best medicine (& what u can't touch can't hurt u) via NEW APK Hosts File Engine 9.0++ SR-7 32/64-bit https://www.google.com/search?hl=en&source=hp&biw=&bih=&q=%22APK+Hosts+File+Engine%22+and+%22start64%22&btnG=Google+Search&gbv=1/

    Ads & malware rob speed/security/privacy

    Hosts add speed (via hardcodes/adblocks), security (vs. bad sites/malware/poisoned dns), reliability (vs. dns down), & anonymity (vs. dns requestlogs/trackers).

    Less power/cpu/ram + IO use vs. DNS/routers/addons/antivirus + less security bugs/complexity & faster vs. addons/routers/remote dns!

    Avoids DNSChangers in routers/IP settings & dns redirects (99.999% of ISP DNS != patched vs. it) + lightens DNS load & resolves faster from local system RAM!

    * Via what u NATIVELY have built into the IP stack in FASTER kernelmode!

    APK

    P.S. - Safe https://www.virustotal.com/en/file/e01211ca36aa02e923f20adee0a3c4f5d5187dc65bdf1c997b3da3c2b0745425/analysis/1433430542/

  22. Captured Internets by Anonymous Coward · · Score: 0

    Google has finally found a way to translate everything. They harness the internets to do it for them. Otherwise, they can't use the internets.