GSoC Update 3 - HTML5 Speech API

From MozillaWiki
Jump to: navigation, search

Six weeks up since coding started. Progress on the project is looking good, though I would've liked to have finished a lot more by now. Lost about 10 days because of examinations, but last week has been quite productive.

Things accomplished since last time:

  • Got audio recording to work on mac. (It turns out that the issue I was running into was fixed in a newer version of portaudio. http://www.portaudio.com/trac/ticket/88)
  • Figured out how to send audio and receive results - This turned out to be easier than expected. A simple HTTP POST with the audio data gives me the recognition results.
curl -H "Content-Type: audio/x-flac; rate=16000" -F"myfile=@untitle.flac" "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"

gives me the result in JSON:

{
   "status":0,
   "id":"b4bbd509bedafc435393b59ce374447d-1",
   "hypotheses":
   [
       {
           "utterance":"this is a audio recording",
           "confidence":0.7447412
       }
   ]
}

I'm working on getting the same to work using xmlhttprequest.
Lots more to be done this week:

  • UI to get user permission for speech.
  • Integrating endpointing, speechrecognizer and everything else.