examples/recurrent_addition/README.md
This example illustrates how to use a RNN in tiny-dnn to perform sequential two-number addition, i.e.,
y[t+1] = x[t] + x[t+1]
Make sure cmake is set to build examples:
cd build && cmake -DBUILD_EXAMPLES=Yes -DBUILD_TESTS=Yes -DCMAKE_BUILD_TYPE=Release .. && cmake -j4
To execute (assumed to be in ./build): ./examples/example_recurrent_addition
Basicaly, we need to define a recurrent_layer, that will manage the state transitions of a
given recurrent cell like gru, lstm, or rnn:
void construct_net(N &nn,
const std::string rnn_type,
tiny_dnn::core::backend_t backend_type) {
using fc = tiny_dnn::fully_connected_layer;
using tanh = tiny_dnn::tanh_layer;
using recurrent = tiny_dnn::recurrent_layer;
const int hidden_size = 100; // recurrent state size
const int seq_len = 2; // amount of numbers to add
if (rnn_type == "rnn") {
nn << recurrent(tiny_dnn::rnn(1, hidden_size), seq_len);
} else if (rnn_type == "gru") {
nn << recurrent(tiny_dnn::gru(1, hidden_size), seq_len);
} else if (rnn_type == "lstm") {
nn << recurrent(tiny_dnn::lstm(1, hidden_size), seq_len);
}
nn << tanh() << fc(hidden_size, 1);
}
seq_len is set to 2.hidden_size is the size of the recurrent state.fc layer, projects the output of the recurrent_layer to a single "desired" number.Now, we need a dataset. Note that recurrent_layer requires data to be in the form:
(n sequences, sequence length, batch_size). This means that for: 1+2, 3+4, 5+6, 7+8, with a batch_size of 2
the data would be 13245768. This is done in gen_dataset:
for (int i = 0; i < n_samples; i++) {
input.push_back({tiny_dnn::uniform_rand<float_t>(-1, 1)});
}
// Fill output in order n_samples * seq_len * batch-size
output.resize(input.size());
int n_seqs = n_samples / (seq_len * batch_size);
for (int n = 0; n < n_seqs; n++) {
int seq_start = n * batch_size * seq_len;
for (int s = 0; s < seq_len; s++) {
int item0_start = seq_start + (s - 1) * batch_size; // x[t]
int item1_start = seq_start + s * batch_size; // x[t+1]
for (int b = 0; b < batch_size; b++) {
// y[0] = x[0]
// y[t+1] = x[t] + x[t+1]
output[item1_start + b].push_back(
(s == 0 ? input[item1_start + b][0]
: input[item1_start + b][0] + input[item0_start + b][0]));
}
}
}
Note that since we are adding two numbers, there is no x[-1] so that y[0] = x[-1] + x[0]. Thus we start the output
sequence with y[0]=x[0].
Next, we define the training loop. As it can be seen inside the train function, batch_size is not used anymore but:
int total_length = n_minibatch * seq_len;
This is because recurrent layers need seq_len * batch_size samples at each step, and these must be
also processed independently by the rest of non-recurrent layers.
Finally, we use MSE regression to train the model:
// training
nn.fit<tiny_dnn::mse>(optimizer, dataset[0], dataset[1], total_length,
n_train_epochs, on_enumerate_minibatch,
on_enumerate_epoch);
After training for some epochs, the code will enter in "demo mode". There you can continuously input two numbers and the recurrent model will perform an approximate addition:
Input numbers between -1 and 1.
Input number 1: 0.1
Input number 2: 0.4
Sum: 0.514308
Input number 1: 0.6
Input number 2: -0.9
Sum: -0.299533
Input number 1: 1.0
Input number 2: 1.0
Sum: 1.91505 # performance is worse at the extremes
Input number 1: 0
Input number 2: 0
Sum: 0.00183523
In the demo section, it is important to note that
layer->seq_len(1);
layer->bptt_max(2);
Allows the rnn to be fed one input at a time while still remembering up to bptt_max steps.