Slashdot Mirror


An AI System For Editing Music in Videos (mit.edu)

Amateur and professional musicians alike may spend hours poring over YouTube clips to figure out exactly how to play certain parts of their favorite songs. But what if there were a way to play a video and isolate the only instrument you wanted to hear? MIT News: That's the outcome of a new AI project out of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL): a deep-learning system that can look at a video of a musical performance, and isolate the sounds of specific instruments and make them louder or softer. The system, which is "self-supervised," doesn't require any human annotations on what the instruments are or what they sound like. Trained on over 60 hours of videos, the "PixelPlayer" system can view a never-before-seen musical performance, identify specific instruments at pixel level, and extract the sounds that are associated with those instruments. For example, it can take a video of a tuba and a trumpet playing the "Super Mario Brothers" theme song, and separate out the soundwaves associated with each instrument. The researchers say that the ability to change the volume of individual instruments means that in the future, systems like this could potentially help engineers improve the audio quality of old concert footage. You could even imagine producers taking specific instrument parts and previewing what they would sound like with other instruments (i.e. an electric guitar swapped in for an acoustic one).

31 comments

  1. It's 'poring', not 'pouring' by imidan · · Score: 4, Informative

    Amateur and professional musicians alike may spend hours pouring over YouTube clips

    Really? What substance do they pour over the clips? And to what end? Do they pour a liquid, like coffee? Or a fluid-like solid, like sand?

    I'm sorry. Pouring vs poring is one that really bugs me, for some reason.

    1. Re: It's 'poring', not 'pouring' by Anonymous Coward · · Score: 0

      Agreed. Iâ(TM)m going to stick my penis into your butt now. You will be poring over in pain!

    2. Re: It's 'poring', not 'pouring' by Anonymous Coward · · Score: 0

      Oh dear, his pore butt.

    3. Re: It's 'poring', not 'pouring' by Anonymous Coward · · Score: 0

      He can't help his gluteus acne.

    4. Re:It's 'poring', not 'pouring' by drinkypoo · · Score: 2

      What's especially pathetic is that Trump is being mocked world-wide for making the same error in a tweet recently, and slipshod.org here just copied his embarrassing mistake, bigly. Sad!

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:It's 'poring', not 'pouring' by Hognoxious · · Score: 1

      Is it so exponentially annoying that it it literally makes your blood boil?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    6. Re: It's 'poring', not 'pouring' by Anonymous Coward · · Score: 0

      Thatâ(TM)s where you get the lube...

  2. This is so powerful ... by CaptainDork · · Score: 2

    ... it can isolate first chair clarinet farting.

    --
    It little behooves the best of us to comment on the rest of us.
  3. There is nothing more important... by Arzaboa · · Score: 2

    ...than being able to re-edit video game sound tracks.

    --
    "A trumpet says what?" - H. Stern

    1. Re:There is nothing more important... by 110010001000 · · Score: 1

      You don't understand. This AI can be applied to do do many other things. We aren't quite sure what, but trust us, this is important stuff. MIT!

    2. Re:There is nothing more important... by drinkypoo · · Score: 1

      Which is all this is going to be good at, because every time you pluck a string or play any musical instrument really, every note is subtly different... Unless you are talking about digital playback of samples, which is identical every time. You can hear the difference between a drum machine and a drummer as a consequence.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  4. Whoa! by 110010001000 · · Score: 2

    It can do image recognition frequency filtering and change the volume too? AI is amazing. I am so glad I live in 2018! One of these days these AI experts are gonna make AI do something really useful, and then watch out! The sky is the limit!

    1. Re:Whoa! by Anonymous Coward · · Score: 0

      Personally, I'm really looking forward to the day robots can detect sarcasm.

    2. Re:Whoa! by Anonymous Coward · · Score: 0

      Actually, the day after will be fun!

    3. Re: Whoa! by Anonymous Coward · · Score: 0

      The point is all those processes were manual. Now they are much less manual.

  5. Bizarre focus of the tech by king+neckbeard · · Score: 1

    This sounds like the system is combining object recognition of musical instrument and processing the audio in order to separate the musical instruments. But that's not the hard part. A middle school band student can provide a list of the instruments in a video. The hard part is going to be separating the instruments in a way that sounds good.

    As for previewing the same line on a different instrument, that's what MIDI is for. The issue would be the quality of the samples used.

    --
    This is my signature. There are many like it, but this one is mine.
    1. Re:Bizarre focus of the tech by 110010001000 · · Score: 1

      WRONG. That is just regular computer stuff. This is AI. Even better: AI from MIT!

  6. 60 hours? by Anonymous Coward · · Score: 0

    That is a hell of an AI that can do anything intelligent with only 60 hours of training videos

    1. Re: 60 hours? by Anonymous Coward · · Score: 0

      Also, why would you train an audio AI on videos?

    2. Re: 60 hours? by AHuxley · · Score: 1

      To test if some math can show the sale of such new music based on past music sales. On video and while doing music in real time.

      Have a new band play in front of the AI to see if they can make a music video that will sell bas ed on past sales.
      Eye contact, movement, voice, dancing, walking, running, clothing. To smile, not to smile. The lyrics. The energy and skill put into performing.
      The ability to perform to the style that is selling at that point in time.
      The artist who has decades of new sales. Talent for the future sales. No just that year.
      The AI can pick up on all that per frame and with the music.
      A group of humans looking at 100 performances to select a very few to invest in.
      An AI can give a feel for talent, presentation, skill, look, the ability to sell over all the performances.
      No risk in selecting on talent and trying to work with appearance later. To find out that cant be done.
      The group that looks amazing but cant be trusted work live on stage due to a lack of skill. No good to find out later they cant be educated to get the quality.
      An AI can detect all that and give its results every time. Skill, the wanted look for the video and a sound that will sell. Decades of sales on a new sound that could have been missed by humans.

      --
      Domestic spying is now "Benign Information Gathering"
    3. Re: 60 hours? by Hognoxious · · Score: 2

      Do you write your posts in Mandarin using a brick, then OCR them and translate them manually into Welsh then dictate them to someone from New Guinea who types them in for you?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    4. Re: 60 hours? by drinkypoo · · Score: 1

      Why would you use a band? Use AI and biofeedback, play music at people digitally and genetically evolve it. Much easier than involving excess humans in the process. Even Trent Reznor doesn't like humans because they make mistakes, imagine how much it will slow down an AI :)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re: 60 hours? by AHuxley · · Score: 1

      That would be neat. An AI watching both the audience and band wanting to get a contract. The enthusiasm, the reaction of the audience in terms of something new and unexpected.

      --
      Domestic spying is now "Benign Information Gathering"
  7. Challenge by Anonymous Coward · · Score: 0

    I wonder if they can remove Yoko Ono's banshee cries from this live performance of Johnny B Goode

    1. Re:Challenge by q_e_t · · Score: 2

      Can I remove all the sound from modern pop 'music'?

  8. Identifying a sound at "pixel level" by ayesnymous · · Score: 3, Insightful

    What does that mean?

  9. Mmmmh by nospam007 · · Score: 2

    I'm sure they can be fooled by playback, when the musicians only fake the playing.

  10. There is by Anonymous Coward · · Score: 0

    It's called training your ear to hear, and it's a necessary musical skill for a musician. This may still be useful, but man, most engineers are so left-brained it's a miracle they don't tip over.

  11. Better Use... by corezz · · Score: 1

    would be to use it to remove annoying background irrelevant heavy metal music that people add to their favorite movie clips so you can actually watch and hear the real in-movie clip without the annoying distraction.

    Also, on Twitch (and YT) many clips either get muted by the service because of copyright claims. Having this feature could strip out the offending sound but leave the streamer's voice (and room sounds) unscathed. Everyone wins.

  12. NSA is going to love this... by Anonymous Coward · · Score: 0

    If your fart noise can no longer stay hidden under layers of surrounding noises, imagine what this system could do for spies? Consider the surveillance feed of any noisy/busy place where people meet and converse. Now you can isolate and ease drop in on each conversation with clarity far exceeding what a normal surveillance system could ordinarily do.

    1. Re:NSA is going to love this... by Anonymous Coward · · Score: 0

      'ease drop'? Huh?