Slashdot Mirror


Google Releases DIY Open Source Raspberry Pi Voice Kit Hardware (betanews.com)

BrianFagioli writes: Google has decided to take artificial intelligence to the maker community with a new initiative called AIY. This initiative will introduce open source AI projects to the public that makers can leverage in a simple way. Today, Google announces the first-ever AIY project. Called "Voice Kit," it is designed to work with a Raspberry Pi 3 Model B to create a voice-based virtual assistant. Billy Rutledge, Director of AIY Projects for Google, explains, "The first open source reference project is the Voice Kit: instructions to build a Voice User Interface (VUI) that can use cloud services (like the new Google Assistant SDK or Cloud Speech API) or run completely on-device. This project extends the functionality of the most popular single board computer used for digital making -- the Raspberry Pi. The included Voice Hardware Accessory on Top (HAT) contains hardware for audio capture and playback: easy-to-use connectors for the dual mic daughter board and speaker, GPIO pins to connect low-voltage components like micro-servos and sensors, and an optional barrel connector for dedicated power supply. It was designed and tested with the Raspberry Pi 3 Model B."

31 comments

  1. Custom Names by Anonymous Coward · · Score: 0

    Now I can fulfill my dream of saying "Hey, Asshole" and having a computer assistant respond in the voice of Joe Pesci. Waiting for the Christopher Walken/ William Shatner mods.

  2. Kinda disappointed by mewsenews · · Score: 3, Interesting

    The theme is to DIY an AI project but all you are DIYing is a box that sends audio to google's servers to interpret and send back. I saw on HackADay that they may have promised you can do it all on-device but nobody has confirmed that. The whole thing seems like they are trying to convince techies/makers that it's a good idea to have an always-on microphone in their home and the tech press is parroting it. The Google Home and Amazon Alexa products are creepy as f**k

    1. Re:Kinda disappointed by ArylAkamov · · Score: 1

      I can't help but agree. I will be very excited if support for on device only setup, I have a fuckton of plans I've been considering making with Jasper, but holding off in case something better came along.

      Only if an internet connection is optional though. That is a complete deal breaker for me.

    2. Re:Kinda disappointed by Anonymous Coward · · Score: 0

      It is open source. Follow the links to the Githib page (sorry, I already closed it). There's no voice analysis library, it's all expected to be cloud based (and it costs money after the first 60 seconds each month!). But sure, you can do all the voice analysis on-device, just like you can with any random microphone on any computer.

      mewsenews shouldn't be modded as a troll. This is open source designed to run off a paid service. We need to coin a negative term for that.

    3. Re:Kinda disappointed by Anonymous Coward · · Score: 0

      Do you have any idea how massive the processing power and storage requirements are to do something like this? Look up Lucida, formerly Sirius. Especially https://github.com/claritylab/lucida/issues/127

      You're not running all that on a pi, or any other sbc/soc system. A client maybe, but not the whole thing. Your only real option is to have another well powered system on your network to do the heavy lifting, and use the pi/whatever as an interface.

    4. Re:Kinda disappointed by sheramil · · Score: 1

      ...I have a fuckton of plans I've been considering making with Jasper, but holding off in case something better came along.

      Would one of those projects be "make the LED blink"?

    5. Re:Kinda disappointed by ArylAkamov · · Score: 1

      One of them is an interface for project car that will tell bad jokes, tell me about obdII error codes, report blown fuse, complete with annoying personality.
      I might have been obsessed with mech warrior as a kid. I have most of the boards etched and sensors attached, Jasper is being a pain in my ass though.

    6. Re:Kinda disappointed by FatdogHaiku · · Score: 2
      --
      You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
    7. Re:Kinda disappointed by mewsenews · · Score: 1

      Follow-up comment:

      I was not trying to troll. This project is right up my alley and I've been using raspberry pi's recently as octoprint servers for my 3D printer. Turning one of my spares into a voice recognition box is interesting to me, which is why I was disappointed that it seemed to be a black box device sending data to "the cloud" in the worst traditions of IoT devices.

      A comment defending me mentioned this github:

      https://github.com/google/aiyp...

      It's clearly not the entire source code for the raspbian distro they are distributing, but it does suggest that there is an "Embedded Assistant API" that might run locally on the pi without internet connectivity.

      I'm still reserving judgement until we hear feedback from the brave souls who build this kit.

    8. Re:Kinda disappointed by Anonymous Coward · · Score: 0

      Did you watch the video? You have to press the button to activate the mic. It's not always on.

    9. Re:Kinda disappointed by Anonymous Coward · · Score: 0

      I for one am LEET HAXORS cuz I can fold a fucking cardboard box... YOU should FEAR me11!

  3. Someone Update the Poll by mentil · · Score: 3, Interesting

    One of the better aspects of this is that one can make a less-creepy digital assistant. For example, have it require a button press before it activates the microphone (I'm unsure if any existing ones already claim to do this, but one can ensure that their DIY device actually does this.) The source code presumably contains a URL the data is sent to; one could change this to send the audio data anywhere (without messing with routing/host files), your own computer running audio-processing software if you'd like. I'm still not sure I see the use-case for such a device, though. Quicker Trivial Pursuit fact-checking?

    --
    Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
  4. Re-read the summary by raymorris · · Score: 2

    Quoting the summary:

    can
    A) use cloud services (like the new Google Assistant SDK or Cloud Speech API)
    Or
    B) run completely on-device

    1. Re:Re-read the summary by mentil · · Score: 2

      quoting the GP:

      they may have promised you can do it all on-device but nobody has confirmed that

      As in, we'll believe it when we see it.

      --
      Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
  5. a.k.a. google-spybot by Anonymous Coward · · Score: 0

    Yes,
    Turn your IOT device into a spy-bot for google. NOT.

    FU Google. (And the same to Siri and Cortana: F' all of you)

    It's bad enough as it is.

    Things are really going to shit when stuff like this is blatantly done. They must think that we are all complete and total idiots. (Well the sheeple actually are, but that's another topic).

  6. nah, i already know what's in a whopper by Hugh+Jorgen · · Score: 0

    I don't need google to read The Whopper Wiki page to me.

  7. Online only. by Gravis+Zero · · Score: 1

    I was very interested in the possibility that they had made an offline version of the assistant but after looking at the code, it's all linked to google servers and there is no actual offline functionality. I think what they meant is that you can use the "voice hat" offline with your own code which is true but then you get no voice assistant functionality.

    --
    Anons need not reply. Questions end with a question mark.
  8. Just A Data Mining Tool by Anonymous Coward · · Score: 1

    I've read the article and DIY guide on the site (wow, that's a crappy web page. Not a website, it's just two pages which both take a couple seconds to load. Was this a intern's first attempt at web development? The content-void, link-less 'intro' reminds me of the annoying punch-the-monkey ads. Plus it's at a withgoogle.com address. I'd almost say it's a phishing attack, the withgoogle.com home page even redirects to google.com... WTF Google?). This is expected to connect to Google Assistant. You're not making your own assistant, you're not even parsing language, you're running the audio straight to Google's Assistant's API (account and dev key required) to a get a match/not-matched event response back. Can't we have truth in advertising laws again? Please? The Cloud Speech API is apparently a Google service too (not a standard, open API?). You get 60 second free a month and after that you have to pay (they give you a notice but don't cut you off?). Yeah, nice way to hit people with unexpected costs.

    The VoiceKit hardware is mainly just a dual-mic circuit board and speaker. It probably has some voice isolation SW/HW in it, but that's all it is (you can already buy boards/kits like this too).

    Technically since the hardware is a run-of-the-mill mic and speaker, you can do whatever you want with the audio signal, but there's certainly no voice analysis software included to help you with any of that.

    I've seen better documented Kickerstarter vaporware projects than this actual product. Google should be ashamed of itself.

  9. So many AI initiatives! by sheramil · · Score: 1

    They remind me of literary magazines. First issue is announced in a blaze of publicity, second issue is lame in comparison and there's rarely a third. Someone with a lot more spare time than me could put together a huge chart of Ai initiatives, with startup money available on the Y-axis and how long it lasted on the X.

  10. The thing with diy by Anonymous Coward · · Score: 0

    Is the ability to hook in voice sub circuit hacks, such as routing your own wake up command to ok, google.
    The rPi is for hackers, calling out google is not ok.

  11. Speech to text? by crow · · Score: 1

    How about a do-it-yourself system with local speech to text? Are there good libraries for that? What about something that it at least good enough to set up an activation and then pass things through to Google? I would love to have voice activation for my computer, provided I can control how and what it does.

    1. Re:Speech to text? by coofercat · · Score: 1

      The problem with speech-to-text is that it's awfully poor unless coupled with some sort of natural language processing. People just don't speak like they write, and so you need to do some post-processing to end up with anything resembling a proper sentence. All that processing is hard, and it's not a static problem either - it needs to learn at least gradually.

      Without wanting to be a G-Ad, the Google services do all this stuff for you - you just provide them with audio. You don't have to send them 'live' audio if you don't want to either, although of course if you're saving it to disk and then uploading it, you'll have to wait for the speech to stop before you can start sending it, and then you won't get a response until it's all uploaded and processed. But in theory at least, you could do some sort of privacy-enhancing DSP on the audio before Google get their hands on it.

      Now, back on topic... This seems like a potentially clever move by Google. It means that any number of new 'assistants' might spring up from all kinds of small companies. New shapes and sizes and new voices, and maybe new ways to interact (as others have noted, perhaps 'press to talk'). The key thing is that they'll all essentially be using "google's gateway to the Internet" instead of Amazon (or Microsoft).

      One form-factor I'd imagine could be useful would be attached to a desk IP phone. You press the button to talk to your PA, but actually it's a digital PA who fields the easy stuff or otherwise passes it on to a VA in somewhere like India or China. It'll have to be attached to your IP phone so it can "hold my calls unless they're important", so it'll answer your calls and listen to what the call is about and then determine 'importance'. All this would go really well on a mobile phone, but the integration isn't easy and the mobile nature of it means it'll have problems when your connection is spotty. All that is solved nicely when you're at your desk.

    2. Re:Speech to text? by DuckDodgers · · Score: 1

      The Mycroft project is doing something similar to this, with a Raspberry Pi that collects audio and sends it to Google servers for speech-to-text. Then they plan to keep the de-identified data and the Google text response, and use the mass amount of sample data to hone their own open source speech-to-text system https://openstt.org/

    3. Re:Speech to text? by Anonymous Coward · · Score: 0

      There are various (Linux-based) projects already doing that, 2 of which I use on a RPi (running Armbian) - I need to add without any network access for these services, they run on-board:

      pico2wav, which sounds surprisingly good, but only has one voice and about 7 language options, and is just a commandline tool which converts a text string to a WAV file, which you then have to play and delete if no longer used.

      spd-say is a command line tool that sounds a little more robot-like, but is fairly configurable with different voices and pitches etc.

      There are others. If you have a Linux box, it's simple to apt-get a number and play around with them on the command line. A websearch will also show you other OSS and commercial projects, some cross-platform. And I think there's a Java one in the making too, apparently still needs some work.

    4. Re:Speech to text? by crow · · Score: 1

      No, that's backwords. You're describing text-to-speech, not speech-to-text. I'm talking about walking up to my computer and saying something like, "Computer, launch Thunderbird." I'm not necessarily looking for a vocal response (though that might be nice for some things).

  12. Open APIs are not 'Open Source' by hughbar · · Score: 3, Informative
    Rinse and repeat. Generously, you are being offered the 'opportunity' to connect your Raspberry Pi to Google infrastructure, benefiting them and making your dwelling another listening outpost:

    connect it to the Google Assistant. Along with everything the Google Assistant already does, you can add your own question and answer pairs.

    I'm investigating for myself this at the moment and I believe that the most agnostic one is currently Mycroft: https://mycroft.ai/about-mycro... but this still needs to be 'paired' with: https://home.mycroft.ai/. So it's a question of degree and who do you trust/want to support.

    There's a niche for a full-stack open source one, I believe built from Sphinx etc.: http://cmusphinx.sourceforge.n... OK, I'm thinking like Stallman, but it's important not to get sucked into Google, Amazon and Facebook with the false lure of 'open source' NOT, as Wayne and Garth would say.

    --
    On y va, qui mal y pense!
    1. Re:Open APIs are not 'Open Source' by DuckDodgers · · Score: 1

      I just posted this up-thread, but I'll repeat it. The Mycroft project uses the same Google APIs for speech-to-text. Their plan is to collect the user audio and the Google text responses, and then use a giant collection of that to develop and test a free software speech-to-text system https://openstt.org/

      I think it's a wonderful idea. On the other hand, if they're making any progress at all it must be behind closed doors because their public site and github repository are very pretty voids.

    2. Re:Open APIs are not 'Open Source' by hughbar · · Score: 1

      Hi, thanks for annotating this, you've helped me (and others, I hope). A great many of my non-tech friends don't understand the implications of being 'digitally married' to some big corporate statistical AI. Another thing, I want to look at is a platform cooperative for this kind of work, looks like the work you mention may provide a good basis, too.

      --
      On y va, qui mal y pense!
    3. Re:Open APIs are not 'Open Source' by G00F · · Score: 1

      There are lots of open source projects that attempt this.

      I've tested a lot of programs from Mr house to Jasper and many others that all allow or use open source STT(Speech To Text) TTS (Text To Speech)

      The Google STT was the best, followed by other clouded based solutions like AT&T and Wit.ai. The best open source ones pocketsphinx or Julius leave much to be desired.

      For STT, open source fares better with MaryTTS being IMO the best, but very slow. It's also Java and eats RAM. But espeak can work good enough.

      Sadly the end result is a gimmick that barely works in ideal conditions. Granted it was 6+ months ago since last I peaked so things may have improved.

      --
      The spirit of resistance to government is so valuable on certain occasions that I wish it to be always kept alive
  13. How about more open option by Anonymous Coward · · Score: 0

    https://mycroft.ai/

  14. solve real problems by Anonymous Coward · · Score: 0

    This is intended to "solve real problems" like being a fucking lazy bum