Back to Tensorflow

TensorAudio

tensorflow/lite/g3doc/api_docs/java/org/tensorflow/lite/support/audio/TensorAudio.html

2.21.08.1 KB
Original Source

public class TensorAudio

Defines a ring buffer and some utility functions to prepare the input audio samples.

It maintains a Ring Buffer to hold input audio data. Clients could feed input audio data via load methods and access the aggregated audio samples via getTensorBuffer method.

Note that this class can only handle input audio in Float (in AudioFormat.ENCODING_PCM_16BIT) or Short (in AudioFormat.ENCODING_PCM_FLOAT). Internally it converts and stores all the audio samples in PCM Float encoding.

Typical usage in Kotlin

val tensor = TensorAudio.create(format, modelInputLength)
   tensor.load(newData)
   interpreter.run(tensor.getTensorBuffer(), outputBuffer);

Another sample usage with AudioRecord

val tensor = TensorAudio.create(format, modelInputLength)
   Timer().scheduleAtFixedRate(delay, period) {
     tensor.load(audioRecord)
     interpreter.run(tensor.getTensorBuffer(), outputBuffer)
   }

Nested Classes

| class | TensorAudio.TensorAudioFormat | Wraps a few constants describing the format of the incoming audio samples, namely number of channels and the sample rate. |

Public Methods

| static TensorAudio | create(AudioFormat format, int sampleCounts) Creates a TensorAudio instance with a ring buffer whose size is sampleCounts * format.getChannelCount().

| | static TensorAudio | create(TensorAudio.TensorAudioFormat format, int sampleCounts) Creates a AudioRecord instance with a ring buffer whose size is sampleCounts * format.getChannels().

| | TensorAudio.TensorAudioFormat | getFormat() | | TensorBuffer | getTensorBuffer() Returns a float TensorBuffer holding all the available audio samples in AudioFormat.ENCODING_PCM_FLOAT i.e.

| | void | load(short[] src) Converts the input audio samples src to ENCODING_PCM_FLOAT, then stores it in the ring buffer.

| | void | load(float[] src, int offsetInFloat, int sizeInFloat) Stores the input audio samples src in the ring buffer.

| | void | load(short[] src, int offsetInShort, int sizeInShort) Converts the input audio samples src to ENCODING_PCM_FLOAT, then stores it in the ring buffer.

| | int | load(AudioRecord record) Loads latest data from the AudioRecord in a non-blocking way.

| | void | load(float[] src) Stores the input audio samples src in the ring buffer.

|

Inherited Methods

From class java.lang.Object

| boolean | equals(Object arg0) | | final Class<?> | getClass() | | int | hashCode() | | final void | notify() | | final void | notifyAll() | | String | toString() | | final void | wait(long arg0, int arg1) | | final void | wait(long arg0) | | final void | wait() |

Public Methods

public static TensorAudio create (AudioFormat format, int sampleCounts)

Creates a TensorAudio instance with a ring buffer whose size is sampleCounts * format.getChannelCount().

Parameters

| format | the AudioFormat required by the TFLite model. It defines the number of channels and sample rate. | | sampleCounts | the number of samples to be fed into the model |

public static TensorAudio create (TensorAudio.TensorAudioFormat format, int sampleCounts)

Creates a AudioRecord instance with a ring buffer whose size is sampleCounts * format.getChannels().

Parameters

| format | the expected TensorAudio.TensorAudioFormat of audio data loaded into this class. | | sampleCounts | the number of samples to be fed into the model |

public TensorAudio.TensorAudioFormat getFormat ()

public TensorBuffer getTensorBuffer ()

Returns a float TensorBuffer holding all the available audio samples in AudioFormat.ENCODING_PCM_FLOAT i.e. values are in the range of [-1, 1].

public void load (short[] src)

Converts the input audio samples src to ENCODING_PCM_FLOAT, then stores it in the ring buffer.

Parameters

| src | input audio samples in AudioFormat.ENCODING_PCM_16BIT. For multi-channel input, the array is interleaved. |

public void load (float[] src, int offsetInFloat, int sizeInFloat)

Stores the input audio samples src in the ring buffer.

Parameters

| src | input audio samples in AudioFormat.ENCODING_PCM_FLOAT. For multi-channel input, the array is interleaved. | | offsetInFloat | starting position in the src array | | sizeInFloat | the number of float values to be copied |

Throws

| IllegalArgumentException | for incompatible audio format or incorrect input size |

public void load (short[] src, int offsetInShort, int sizeInShort)

Converts the input audio samples src to ENCODING_PCM_FLOAT, then stores it in the ring buffer.

Parameters

| src | input audio samples in AudioFormat.ENCODING_PCM_16BIT. For multi-channel input, the array is interleaved. | | offsetInShort | starting position in the src array | | sizeInShort | the number of short values to be copied |

Throws

| IllegalArgumentException | if the source array can't be copied |

public int load (AudioRecord record)

Loads latest data from the AudioRecord in a non-blocking way. Only supporting ENCODING_PCM_16BIT and ENCODING_PCM_FLOAT.

Parameters

| record | an instance of AudioRecord |

Returns
  • number of captured audio values whose size is channelCount * sampleCount. If there was no new data in the AudioRecord or an error occurred, this method will return 0.
Throws

| IllegalArgumentException | for unsupported audio encoding format | | IllegalStateException | if reading from AudioRecord failed |

public void load (float[] src)

Stores the input audio samples src in the ring buffer.

Parameters

| src | input audio samples in AudioFormat.ENCODING_PCM_FLOAT. For multi-channel input, the array is interleaved. |