speech-commands/training/soft-fft/README.md
Before you can train your model that uses spectrogram from the browser's WebAudio as input features, you need to download the speech-commands data set v0.01 or data set v0.02.
The node.js training package comes with a command line tool that will assist your training. Here are the steps:
yarn
yarn start
Following are command supported by the CLI:
Commands:
help [command...] Provides help for a given command.
exit Exits application.
create_model [labels...] create the audio model
load_dataset all <dir> Load all the data from the root directory by the labels
load_dataset <dir> <label> Load the dataset from the directory with the label
dataset size Show the size of the dataset
train [epoch] train all audio dataset
save_model <filename> save the audio model
local@piyu~$ create up down left right
_________________________________________________________________
Layer (type) Output shape Param #
=================================================================
conv2d_Conv2D1 (Conv2D) [null,95,39,8] 72
_________________________________________________________________
max_pooling2d_MaxPooling2D1 [null,47,19,8] 0
_________________________________________________________________
conv2d_Conv2D2 (Conv2D) [null,44,18,32] 2080
_________________________________________________________________
max_pooling2d_MaxPooling2D2 [null,22,9,32] 0
_________________________________________________________________
conv2d_Conv2D3 (Conv2D) [null,19,8,32] 8224
_________________________________________________________________
max_pooling2d_MaxPooling2D3 [null,9,4,32] 0
_________________________________________________________________
conv2d_Conv2D4 (Conv2D) [null,6,3,32] 8224
_________________________________________________________________
max_pooling2d_MaxPooling2D4 [null,5,1,32] 0
_________________________________________________________________
flatten_Flatten1 (Flatten) [null,160] 0
_________________________________________________________________
dense_Dense1 (Dense) [null,2000] 322000
_________________________________________________________________
dropout_Dropout1 (Dropout) [null,2000] 0
_________________________________________________________________
dense_Dense2 (Dense) [null,4] 8004
=================================================================
Total params: 348604
Trainable params: 348604
Non-trainable params: 0
local@piyu~$ load_dataset all /tmp/audio/data
✔ finished loading label: up (0)
✔ finished loading label: left (2)
✔ finished loading label: down (1)
✔ finished loading label: right (3)
You can also load data per label using 'load' command. For example loading data for the 'up' label.
local@piyu~$ load_dataset /tmp/audio/data/up up
local@piyu~$ dataset size
dataset size = xs: 8534,98,40,1 ys: 8534,4
local@piyu~$ train 5
✔ epoch: 0, loss: 1.35054, accuracy: 0.34792, validation accuracy: 0.42740
✔ epoch: 1, loss: 1.23458, accuracy: 0.45339, validation accuracy: 0.50351
✔ epoch: 2, loss: 1.06478, accuracy: 0.55833, validation accuracy: 0.62529
✔ epoch: 3, loss: 0.88953, accuracy: 0.63073, validation accuracy: 0.68735
✔ epoch: 4, loss: 0.78241, accuracy: 0.67799, validation accuracy: 0.73770
7 Save the trained model.
local@piyu~$ save_model /tmp/audio_model
✔ /tmp/audio_model saved.