Slashdot Mirror


RockBox + Refurbished MP3 Players = Crowdsourced Audio Capture

An anonymous reader writes "Looking for an inexpensive means to capture audio from a dynamically moving crowd, I sampled many MP3 players' recording capabilities. Ultimately the best bang-for-the-buck was refurbished SanDisk Sansa Clip+ devices ($26/ea) loaded with (open source) RockBox firmware. The most massively multi-track event was a thorium conference in Chicago where many attendees wore a Clip+. Volunteers worked the room with cameras, and audio capture was decoupled from video capture. It looked like this. Despite having (higher quality) ZOOM H1n and wireless mics, I've continued to use the RockBox-ified Clip+ devices ... even if the H1n is running, the Clip+ serves as backup. There's no worry about interference or staying within wireless mic range. The devices have 4GB capacity, and RockBox allows WAV capture. They'll run at least 5 hours before the battery is depleted (with lots of storage left over). I would suggest sticking with 44kHz (mono) capture, as 48kHz is unreliable. To get an idea of their sound quality, here is a 10-person dinner conversation (about thorium molten salt nuclear reactors) in a very busy restaurant. I don't know how else I could have isolated everyone's dialog for so little money. (And I would NOT recommend Clip+ with factory firmware... they only support 22kHz and levels are too high for clipping on people's collars.)" This video incorporating much of that captured audio is worth watching for its content as well as the interesting repurposing.

15 of 66 comments (clear)

  1. huh? by swell · · Score: 4, Insightful

    Sorry, I have no idea what TFA is about. Please help.

    --
    ...omphaloskepsis often...
    1. Re:huh? by PopeRatzo · · Score: 2

      I also don't understand what's supposed to be wrong with the firmware of the Clip+.

      The audio recording functions are pretty poorly designed.

      Rockbox allows much more control over recording.

      --
      You are welcome on my lawn.
  2. Lots of work? by mpoulton · · Score: 3, Interesting

    Maybe I'm misunderstanding the process here, but this seems like it would create a HUGE amount of editing work. Are you manually switching which recorder's audio is used as different people speak? In other words, editing the video using as many simultaneous audio tracks as there are recorders, syncing them, and using the best one at any given instant during the video? That seems like it would add huge amounts of editing time.

    --
    I am a geek attorney, but not your geek attorney unless you've already retained me. This is not legal advice.
    1. Re:Lots of work? by Anonymous Coward · · Score: 5, Insightful

      That depends, there are some applications out there that can align audio automatically (PluralEyes: http://www.singularsoftware.com/pluraleyes.html) for example, so then all you would need to do is name the track after the person who it relates to, and alter the levels as needed. All video creation requires a "huge" amount of editing work.

    2. Re:Lots of work? by Anonymous Coward · · Score: 5, Insightful

      All video creation requires a "huge" amount of editing work.

      Exactly. Having dedicated audio sources for all speakers is great to have, and some increased editing time is worth it if your product is going to be higher quality.

      It sucks to have to struggle to hear what's going on in a video and live events can be terribly chaotic. Having well planned audio capture is critical to reducing your stress. This is a clever use of cheap tech, and I may have to give it a shot with my old 2gb clip floating around in my tech bins. If only there was a way to pipe a proper lav into it...

    3. Re:Lots of work? by bertok · · Score: 3, Insightful

      Just altering the levels provides a lot of isolation (as seen in the video clips), but I have to wonder if there's an audio equivalent of "image stacking" or Photosynth, that would correlate all of the audio streams, build a "model" of the audio-scape, and allow noise to be cancelled out. Or more accurately, allow a voice to be extracted with a higher specificity than just 100% of one source.

      I'm sensing that we're on the cusp of affordable setups where instead of just a few microphones, rooms could be set up with hundreds of microphones recording in parallel, with analysis done to track and extract individual sound sources moving in 3D. I suspect that a modern GPU already has the computer power, or will soon. This would allow individual speakers to be isolated even if they weren't set up with little clip-on recorders ahead of time.

    4. Re:Lots of work? by Anonymous Coward · · Score: 3, Informative

      I hate to post links to commercial products in a technical discussion, but 3D capture of sounds (as in "you can focus in real-time at any point of a room and listen to whatever happens there) already exists:

      http://www.mhacoustics.com/mh_acoustics/Eigenmike_microphone_array.html

      See also "microphone arrays" on google. Plenty of research in the past decades and for the coming ones. https://en.wikipedia.org/wiki/Microphone_array

    5. Re:Lots of work? by bertok · · Score: 4, Informative

      I've seen this MIT project before, but just like that product you linked, they all seem to be about "regular" arrays or arrangements.

      I'm thinking more along the lines of ad-hoc arrangements of microphones, which is more like what Photosynth does -- it arranges arbitrary photos together to make a 3D scene, instead of taking specific, precisely aligned photos.

      One interesting bit about the MIT project is that they have 1,020 microphones -- a world record -- generating 50MB/sec of data. A quick back-of-the-envelope calculation verifies that this represents 44.1Khz at 8 bits per sample. If you think about it, this amount of data is peanuts to a modern PC. Just one high-end GPU might have 200GB/sec of memory bandwidth and over 2 teraflops of processing power! This translates to about 38,000 operations per sound sample, in real time, at 32-bit precision. That should be enough to track moving sound sources, figure out what's an echo and what isn't, correlate sounds across multiple microphones, perform doppler-shift analysis, etc...

      Going to higher numbers of microphones ought to be easy, and could allow some fantastic applications, as well as some scary ones. There would be enough redundancy in the data to build a 3D scene with tracking of both moving sound sources and moving microphones. It may even be possible to determine room geometry, and the movement of large objects could be tracked based on their interaction with the sound field.

      One application I can think of would be for capturing sound during movie filming. Often, studios have to discard the recorded sound and re-dub everything because of background noises, but this kind of technology would allow the director to perform arbitrary filtering after-the-fact, comparable to the light-field cameras that allow "refocusing" after an image has been captured. An actors voice could be picked out and made louder, everything with a source "behind the camera" could be edited out, and surround sound effects could be generated from any scene setup.

    6. Re:Lots of work? by gordm · · Score: 3, Informative

      Have used PluralEyes but find not much harder to sync manually. Make 3 loud clapping sounds once recorders are all running, manually sync to that in timeline. The vast majority of the audio can't be put in sync manually because the audio is so different for each perspective (for 5 hours) compared to the 3 seconds where identical clapping can be heard. Ideally the devices are all activated & running (then you clap 3x) before the event starts, and deployed as needed. As opposed to starting them as they are deployed to collars.

  3. tl:dr Recipe for recording the audio of multiple i by buro9 · · Score: 5, Informative

    tl:dr Recipe for recording the audio of multiple individuals in a large crowd.

    Ingredients:

    Sandisk Sansa Clip+ MP3 Player - http://www.sandisk.co.uk/products/sansa-music-and-video-players/sandisk-sansa-clipplus-mp3-player
    Rockbox - http://www.rockbox.org/

    Instructions:

    Install Rockbox (open source firmware for MP3 players) on the Sansa Clip+. Configure to record on the Sansa Clip+ microphone in .wav format. Give a Sansa Clip+ to every person you want to record the audio for. Have every person start recording at roughly the same time, leave for 5 hours.

    Gather all Sansa Clip+s at the end of the session, and extract the .wav file. 10-participants = 10-track equivalent audio recording of the session.

    Mix and fade between the tracks to isolate the audio of single conversations between participants.

    He basically has created a relatively inexpensive and reliable way to get this audio. Much like using multiple Go Pro cameras to record action of sports events beats out using professional equipment (and in some ways has become professional equipment). He's arguing that the Sansa Clip+ together with the Rockbox open source firmware, is a better solution than using professional radio mic's and then having recording equipment receive those signals and store them on disk for editing later.

    I've no idea how "crowdsourced" fits into this though, nor how this is anything more than an advert even though the solution is a little interesting. It's useful enough and potentially cheap that you might imagine giving everyone at a Ted one of these as the conversations caught off-record might be even more valuable than the sessions.

  4. Clock Drift by Anonymous Coward · · Score: 2, Insightful

    Interesting idea, but it sounds like a pain in the ass to deal with in post production. Each recorder is running off it's own crystal for timing, with each crystal being ever so slightly different. This is why the professional approach is to route a mic signals to one recorder, or if you need more channel capacity to sync recorders to the same master clock.

    It's a neat hack, with some usefulness if you cherry pick recordings and edit the best parts together without mixing/overlapping sources together.

  5. Another use for the Rockbox recorder by StealthSock · · Score: 5, Interesting

    My ears got plugged up while swimming and I could barely hear the next day. Rockbox's recorder function outputs the microphone to headphones even when it is not recording. That $30 Clip+ worked reasonably well as a makeshift hearing aid, as long as I was facing the person I was trying to hear.

  6. Re:Nuice but causes problems. by Urza9814 · · Score: 3, Insightful

    and isolating people at a dinner party is not hard, 11 people? 11 wireless microphones into a field mixer and then into the camera. OR do it old Skool. Camera guy + audio guy with a boom and a shotgun microphone on it, Two would be better (two audio guys on mic booms) A pair of ME55's in a dead cat are magical.

    ...I think you just proved the utility in this. First, a hundreds or even thousands of dollars of professional equipment and techs vs. a couple $25 devices. Not to mention needing to clear a couple feet around the table for the people carrying your boom mics plus all the wires to your equipment and all of that set up somewhere...

    Sure, in most cases your professionals are still going to be using their professional quality equipment, because the techs and equipment are already paid for and probably cheaper than the editors anyway, and the space constraints aren't there in a studio. But there are CERTAINLY plenty of situations where repurposing a handfull of cheap MP3 players will come out ahead.

  7. Re:tl:dr Recipe for recording the audio of multipl by michrech · · Score: 2

    I don't think the article was meant to mean the approach to audio/video capture they took was "better" than using professional body-pack mics and professional recording gear. I think the point was how such could be accomplished when funds aren't available for the professional gear...

    After having watched a bit of the video they linked, I'd say it did rather well.

    --
    bork bork bork!
  8. Nice hack. by andrew_mike · · Score: 2

    This might make smartphone videos worth a toss. The audio's pretty terrible on those. Demux the video, mux it with the audio, and you'd be good. Not perfect, but good enough for YouTube.

    BTW, if anyone wants to experiment with this, Newegg's selling some refurbed Clip+ players for $26 here.

    --
    Being a smartass is a much better thing than being the alternative.