Changes

Jump to: navigation, search

Audio Data API

1,051 bytes removed, 00:16, 17 August 2010
merging review copy with this, this is main copy now.
===== Abstract =====
The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new Mozilla extension to this API, which allows web developers to read and write raw audio data.
===== Authors =====
* Thomas Saunders
* Ted Mielczarek
* Felipe Gomes
===== Status ===API Tutorial ==
'''This is a work in progress.''' This document reflects API extends the current thinking of its authors, HTMLMediaElement and is not an official specificationHTMLAudioElement (e. The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendationg. The authors hoped that this work, affecting <video> and the ideas it generated<audio>), would eventually find their way into Mozilla and other HTML5 compatible browsers. Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], implements the following basic API for reading and the announcement of an official [httpwriting raw audio data://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors.
The continuing work on this specification and API can be tracked here, and in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug]. Comments, feedback, and collaboration are all welcome. You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org.==== Reading Audio =====
===== Version =====Audio data is made available via an event-based API. As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''AudioAvailable'''. These samples may or may not have been played yet at the time of the event. The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element. Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.
'''NOTE:''' ''While Users of this patch/API is under review, I will be making changes and updates can register two callbacks on the <audio> or <video> element in order to [[Audio Data API Review Version|consume this version of the documentation]], leaving the following one in tact for people working on things using it.''data:
This is the second major version of this API <pre><audio src="song.ogg" onloadedmetadata="audioInfo(referred to by the developers as audio13)--the [[Audio Data API 1|previous version is available here]]. The primary improvements and changes are:;"&gt;</audio></pre>
* Removal of The '''mozSpectrumLoadedMetadata''' (ievent is a standard part of HTML5.e., native FFT calculation) -- will be done in JS It now.* Added WebGL Arrays indicates that a media element (i.e., fastaudio or video) has useful metadata loaded, typed, native float arrays) for the event framebuffer as well as '''mozWriteAudio()'''.* Changed '''mozWriteAudio()''' to only write as much as which can be written to the audio hardware without blocking, and to return the number of samples written.* Native array interfaces instead of accessed using accessors and IDL array arguments.* No zero padding of audio data occurs anymore. All frames are exactly channels * 1024 elements in length.* Added '''mozCurrentSampleOffset()''' method.* Removed undocumented position/buffer methods on audio element.* Added '''mozChannels''', '''mozSampleRate''', '''mozFrameBufferLength''' to '''loadedmetadata''' event.* Added '''mozSetFrameBufferSize()''' method.three new attributes:
Demos written for the previous version are '''not''' compatible, though can be made to be quite easily. See details below.* mozChannels* mozSampleRate* mozFrameBufferLength
== API Tutorial ==Prior to the '''LoadedMetadata''' event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio. These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''MozAudioAvailable''' events. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
We have developed The '''MozAudioAvailable''' event provides two pieces of data. The first is a proof of conceptframebuffer (i.e., experimental build of Firefox ([[#Obtaining_Code_and_Builds|builds provided below]]an array) which extends the HTMLMediaElement containing decoded audio sample data (i.e.g., affecting &lt;video&gt; and &lt;audio&gt;floats) and HTMLAudioElement, and implements . The second is the following basic API time for reading and writing raw audio datathese samples measured from the start in seconds. Web developers consume this event by registering an event listener in script like so:
<pre>&lt;audio id="audio" src==== Reading Audio ====="song.ogg"&gt;&lt;/audio&gt;&lt;script&gt;Audio data is made available via an event-based API var audio = document. As the getElementById("audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the "); audio layer--hence the name, .addEventListener('MozAudioAvailable''AudioWritten'''. Playing and pausing the audio all affect the streaming of this raw audio data as well., someFunction, false);&lt;/script&gt;</pre>
Users of this API can register two callbacks on the &lt;An audio&gt; or &lt;video&gt; element in order to consume this datacan also be created with script outside the DOM:
<pre>
<var audio = new Audio();audio.src="song.ogg"; onloadedmetadata="audioInfoaudio.addEventListener(event'MozAudioAvailable', someFunction, false);" onaudiowritten="audioWrittenaudio.play(event);"></audio>
</pre>
 
The '''LoadedMetadata''' event is a standard part of HTML5, and has been extended to provide more detailed information about the audio stream. Specifically, developers can obtain the '''number of channels''', '''sample rate per second''', and '''default size of the framebuffer''' that will be used in audiowritten events. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
 
The '''AudioWritten''' event provides two pieces of data. The first is a framebuffer (i.e., an array) containing sample data (i.e., floats) for the current frame. The second is the time for these samples measured from the start in milliseconds.
The following is an example of how both events might be used:
samples;
function audioInfo(event) { var audio = document.getElementById('audio');  // After loadedmetadata event, following media element attributes are known: channels = eventaudio.mozChannels; rate = eventaudio.mozSampleRate; frameBufferLength = eventaudio.mozFrameBufferLength;
}
function audioWrittenaudioAvailable(event) { var samples = event.mozFrameBufferframeBuffer; var time = event.mozTimetime;
for (var i = 0; i < frameBufferLength; i++) {
</head>
<body>
<audio id="audio-element" src="song.ogg"
controls="true"
onloadedmetadata="loadedMetadata(event);" onaudiowritten="audioWritten(event);"
style="width: 512px;">
</audio>
var canvas = document.getElementById('fft'),
ctx = canvas.getContext('2d'),
channels,
rate,
frameBufferLength,
fft;
function loadedMetadata(event) { var channels = eventaudio.mozChannels,; rate = eventaudio.mozSampleRate,; frameBufferLength = eventaudio.mozFrameBufferLength;
fft = new FFT(frameBufferLength / channels, rate),;
}
function audioWrittenaudioAvailable(event) { var fb = event.mozFrameBufferframeBuffer, t = event.time, /* unused, but it's there */
signal = new Float32Array(fb.length / channels),
magnitude;
}
}
 
var audio = document.getElementById('audio-element');
audio.addEventListener('MozAudioAvailable', audioAvailable, false);
// FFT from dsp.js, see below
if ( bufferSize !== buffer.length ) {
throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + bufferSize + " Buffer Size: " + buffer.length;
}
It is also possible to setup an &lt;audio&gt; element for raw writing from script (i.e., without a ''src'' attribute). Content scripts can specify the audio stream's characteristics, then write audio samples using the following methods:
<code>mozSetup(channels, sampleRate, volume)</code>
<pre>
// Create a new audio element
var audioOutput = new Audio();
// Set up audio element with 2 channel, 44.1KHz audio stream, volume set to full. audioOutput.mozSetup(2, 44100, 1);
</pre>
<pre>
// Get current position of the underlying audio stream, measured in samples writtenavailable.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();
</pre>
Since the '''AudioWrittenMozAudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:
<pre>
<audio id="a1"
src="song.ogg"
onloadedmetadata="loadedMetadata(event);" onaudiowritten="audioWritten(event);" controls="controls">
</audio>
<script>
buffer = [];
function loadedMetadata(event) {
// Mute a1 audio.
a1.volume = 0;
// Setup a2 to be identical to a1, and play through there.
a2.mozSetup(eventa1.mozChannels, eventa1.mozSampleRate, 1);
}
function audioWrittenaudioAvailable(event) {
// Write the current framebuffer
var frameBuffer = event.mozFrameBuffer;
writeAudio(frameBuffer);
}
a1.addEventListener('a1', audioAvailable, false);
function writeAudio(audio) {
</pre>
Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, and a writing interval chosen that is equal to of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
===== Complete Example: Creating a Web Based Tone Generator =====
var audio = new Audio();
audio.mozSetup(1, sampleRate, 1);
var currentWritePosition = 0;
== DOM Implementation ==
===== nsIDOMNotifyAudioMetadataEvent nsIDOMNotifyAudioAvailableEvent =====
Audio metadata data is provided made available via custom properties of the media element's '''loadedmetadata''' following event. This event occurs once when the browser first aquires information about the media resource. The event details are as follows:
* '''Event''': LoadedMetadataAudioAvailableEvent* '''Event handler''': onloadedmetadataonmozaudioavailable
The '''LoadedMetadataEventAudioAvailableEvent''' is defined as follows:
<pre>
interface nsIDOMNotifyAudioMetadataEvent nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
{
readonly attribute unsigned long mozChannels;// mozFrameBuffer is really a Float32Array readonly attribute unsigned long mozSampleRatejsval frameBuffer; readonly attribute unsigned long mozFrameBufferLengthfloat time;
};
</pre>
The '''mozChannelsframeBuffer''' attribute contains a typed array ('''Float32Array''') with the number of channels in raw audio data (32-bit float values) obtained from decoding the audio resource (e.g., 2the raw data being sent to the audio hardware vs. encoded audio). The '''mozSampleRate''' attribute contains This is of the number of samples per second that will be playedform <nowiki>[channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, for example 44100. The '''mozFrameBufferLength''' attribute contains the default number of samples that will be returned in each '''AudioWritten''' event..]</nowiki>. This number is All audio frames are normalized to a total for all length of channels, and * 1024 by default is set to , but could be the number any power of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total). You can change this size between 512 and 32768 if the user has set a different length using the '''mozSetFrameBufferSize()mozFrameBufferLength''' method to be another power of 2 between 512 and 32768 (see details below)attribute.
===== nsIDOMNotifyAudioWrittenEvent =====The '''time''' attribute contains a float representing the time in seconds since the start.
Audio data is made available via the following event:===== nsIDOMHTMLMediaElement additions =====
* Audio metadata is made available via three new attributes on the HTMLMediaElement. By default these attributes throw if accessed before the '''EventLoadedMetadata''': AudioWrittenEvent* event occurs. Users who need this info before the audio starts playing should not use '''Event handlerautoplay''': onaudiowritten, since the audio might start before a loadmetadata handler has run.
The '''AudioWrittenEvent''' is three new attributes are defined as follows:
<pre>
interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent{ // mozFrameBuffer is really a Float32Array, via dom_quickstubs readonly attribute nsIVariant mozFrameBufferunsigned long mozChannels; readonly attribute unsigned long long mozTimemozSampleRate;} attribute unsigned long mozFrameBufferLength;
</pre>
The '''mozFrameBuffermozChannels''' attribute contains a typed array ('''Float32Array''') with the raw audio data (32-bit float values) obtained from decoding number of channels in the audio resource (e.g., the raw data being sent to the audio hardware vs. encoded audio2). This is of the form <nowiki>[channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]</nowiki>. All audio frames are normalized to a length of channels * 1024 by default, but could be any power of 2 between 512 and 32768 if the user has set a different size using The '''mozSetFrameBufferSize()mozSampleRate'''attribute contains the number of samples per second that will be played, for example 44100. Both are read-only.
The '''mozTimemozFrameBufferLength''' attribute contains an unsigned integer (64-bit) representing indicates the time number of samples that will be returned in milliseconds since the startframebuffer of each '''MozAudioAvailable''' event. This number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).
===== nsIDOMHTMLAudioElement additions =====The '''mozFrameBufferLength''' attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etc. The size given '''must''' be a power of 2 between 512 and 32768. The following are all valid lengths:
Audio write access is achieved by adding two new methods to the HTML media element:* 512* 1024* 2048* 4096* 8192* 16384* 32768
<pre>void mozSetup(Using any other size will result in long channelsan exception being thrown. The best time to set a new length is after the '''loadedmetadata''' event fires, in long ratewhen the audio info is known, in float volume);but before the audio has started or '''MozAudioAvailable''' events begun firing.
unsigned long mozWriteAudio(array); // array is Array() or Float32Array()===== nsIDOMHTMLAudioElement additions =====
unsigned long long mozCurrentSampleOffset();The HTMLAudioElement has also been extended to allow write access. Audio writing is achieved by adding three new methods:
<pre> void mozSetFramebufferSizemozSetup(sizein long channels, in long rate); unsigned long mozWriteAudio(array); // size must be a power of 2 between 512 and 32768array is Array() or Float32Array() unsigned long long mozCurrentSampleOffset();
</pre>
The '''mozSetup()''' method allows an &lt;audio&gt; element to be setup for writing from script. This method '''must''' be called before '''mozWriteAudio''' or '''mozCurrentSampleOffset''' can be called, since an audio stream has to be created for the media element. It takes three two arguments:
# '''channels''' - the number of audio channels (e.g., 2)
# '''rate''' - the audio's sample rate (e.g., 44100 samples per second)
# '''volume''' - the initial volume to use (e.g., 1.0)
The choices made for '''channel''' and '''rate''' are significant, because they determine the amount of data you must pass to '''mozWriteAudio()'''. That is, you must pass either an array with 0 elements--similar to flushing the audio stream--or enough data for each channel specified in '''mozSetup()'''.
The '''mozSetup()''' method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call. Thus it is safe to call this more than once, but unnecessary.
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''. It allows audio data to be written directly from script. It takes one argument: # , '''array''' - this . This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N elements in length, where N % channels == 0, otherwise a DOM error occursan exception is thrown.
The '''mozWriteAudio()''' method returns the number of samples that were just written, which may or may not be the same as the number in '''array'''. Only the number of samples that can be written without blocking the audio hardware will be written. It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).
The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''. It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with '''mozWriteAudio()'''.
The All of '''mozSetFrameBufferSizemozWriteAudio()''' is used to change the default framebuffer size for , '''AudioWrittenmozCurrentSampleOffset()''' events. By default, this value will be 1024 * channels and '''mozSetup(e.g., 2048 for 2 channels). You can give another size ''' will throw exceptions if you need lower latency, or larger amounts called out of data, etcorder. The size you give '''mustmozSetup()''' be will also throw if a power of 2 between 512 and 32768''src'' attribute has previously been set on the audio element (i.e., you can't do both at the same time). The following are all valid:
* 512===== Security =====* 1024* 2048Similar to the &lt;canvas&gt; element and its '''getImageData''' method, the '''MozAudioAvailable''' event's '''frameBuffer''' attribute protects against information leakage between origins.* 4096* 8192* 16384* 32768The '''MozAudioAvailable''' event's '''frameBuffer''' attribute will throw if the origin of audio resource does not match the document's origin. NOTE: this will affect users who have the security.fileuri.strict_origin_policy set, and are working locally with file:/// URIs.
Using any other size will result in an exception being thrown. The best time to call '''mozSetFrameBufferSize()''' is in the '''loadedmetadata''' event, when the audio info is known, but before the audio has started or events begun firing.===== Compatibility with Audio Backends =====
All of '''mozWriteAudio()''The current MozAudioAvailable implementation integrates with Mozilla's decoder abstract base classes, and therefore, '''mozCurrentSampleOffset()'''any audio decoder which uses these base classes automatically dispatches MozAudioAvailable events. At the time of writing, this includes the Ogg and WebM decoders but '''mozSetup()not''' will throw exceptions if called out of orderthe Wave decoder.
== Additional Resources ==
A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25. Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].
=== Obtaining Code and Builds Bug === A patch is available in the [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug], if you would like to experiment with this API. We have also created builds you can download and run locally. You can download the 'audio13h' builds here (don't use the *-debug builds):
httpThe work on this API is available in Mozilla [https://ftpbugzilla.mozilla.org/pub/mozillashow_bug.org/firefox/tryserver-builds/davidcgi?id=490705 bug 490705].humphrey@senecac.on.ca-bf04114969ea/
The Linux builds do not have working WebGL at this time due to [https://bugzilla.mozilla.org/show_bug.cgi?id=567095 bug 567095].== Obtaining Code and Builds ===
By request, a [http'''Latest Try Server Builds://scotland.proximity.on.ca/dxr/tmp/audio/audio13a/firefox-3.7a5pre.en-US.linux-x86_64.tar.bz2 Fedora Linux 64-bit build] is available too.'''
A Win32 version of Firefox combining [httpshttp://bugzillaftp.mozilla.org/show_bugpub/mozilla.cgi?id=508906 Multiorg/firefox/tryserver-Touch screen input from Felipe Gomes] and audio data access from David Humphrey can be downloaded [http:builds//guldavid.humphrey@senecac.on.lyca-ecf5c7f4e806/5q here].
=== JavaScript Audio Libraries ===
* We have started work on a JavaScript library to make building audio web apps easier. Details are [[Audio Data API JS Library|here]].
* [http://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers. ''NOTE:'' not necessarily up-to-date with this version of the API.
=== Working Audio Data Demos ===
A number of working demos have been created, including:
 
* Writing Audio from JavaScript, Digital Signal Processing
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
** API Example: [http://code.bocoup.com/audio-data-api/examples/mid-side-microphone-decoder/ Mid-Side Microphone Decoder]
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
** API Example: [http://code.bocoup.com/audio-data-api/examples/worker-thread-audio-processing/ Worker Thread Audio Processing]
 
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor4HD.html (video [http://www.youtube.com/watch?v=dym4DqpJuDk&fmt=22 here])
'''NOTE:''' ''If you try to run demos created with the original API using a build that implements the new API, you may encounter [https://bugzilla.mozilla.org/show_bug.cgi?id=560212 bug 560212]. We are aware of this, as is Mozilla, and it is being investigated.''
==== Demos Working on Current Needing to be Updated to New API ====
* FFT visualization (calculated with js)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
 
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD-13a.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD-13a.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13a.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor4HD.html (video [http://www.youtube.com/watch?v=dym4DqpJuDk&fmt=22 here])
* Writing Audio from JavaScript, Digital Signal Processing
** Reverb effect http://code.almeros.com/code-examples/reverb-firefox-audio-api/ (video [http://vimeo.com/13386796 here])
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/instruments/shaker.htm
 
==== Demos Needing to be Updated to New API ====
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
** API Example: [http://code.bocoup.com/audio-data-api/examples/mid-side-microphone-decoder/ Mid-Side Microphone Decoder]
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
** Biquad filter http://www.ricardmarxer.com/audioapi/biquad/ (demo by Ricard Marxer)
** Interactive Audio Application, Bloom http://code.bocoup.com/bloop/color/bloop.html (video [http://vimeo.com/11346141 here] and [http://vimeo.com/11345133 here])
* http://news.slashdot.org/story/10/05/26/1936224/Breakthroughs-In-HTML-Audio-Via-Manipulation-With-JavaScript
* http://ajaxian.com/archives/amazing-audio-api-javascript-demos
* http://www.webmonkey.com/2010/08/sampleplayer-makes-your-browser-sing-sans-flash/
Confirm
656
edits

Navigation menu