docs/developer_guides/Writing-Custom-Activations.md
This section describes the procedure to extend tiny-dnn and write custom
activation functions. Activations are implemented as separate layers in
tiny-dnn. All the current activation classes inherit activation_layer
class. To form a new activation function, one must create a class and make
it inherit activation_layer.
Let's define a custom activation layer. activation_layer already provides
five types of constructors - which allow setting the dimensions of layer
through constructor arguments.
class my_activation_layer : public activation_layer {
public:
using activation_layer::activation_layer;
};
If your activation layer has its own variables, say a scalar alpha, then you
need to form your own constructors. activation_layer already has a member
named in_shape_ of shape3d type, to store the dimensions of layer.
class my_activation_layer : public activation_layer {
public:
my_activation_layer(const float_t alpha = 1.0,
const shape3d& in_shape)
: alpha_(alpha), activation_layer(in_shape) {};
// todo ...
float_t get_alpha() { return alpha; }
private:
float_t alpha;
The activation_layer class has four virtual methods which every child
class must override. They are the ones outlining the behaviour of our custom
activation.
layer_typeThis method returns a string, representing name of the current activation function.
std::string layer_type() const override {
return "my-custom-activation";
}
forward_activationThis method contains the main logic of our activation function. It takes in two vectors passed by reference and fills the second one by applying activation function to the first one.
For example, let our activation function be simply a scalar multiplication. The implementation would look like:
void forward_activation(const vec_t &x, vec_t &y) override {
for (size_t j = 0; j < x.size(); j++) {
y[j] = alpha * x[j];
}
}
We could have easily accepted float_t arguments and applied activation
function on one element. But this function would have been called for each
neuron of the layer. Calling a virtual function inside a tight for loop
hurts performance. Hence this is how the method is implemented.
Practically, each vec_t here will represent a single flattened Tensor out
of the minibatch of a particular epoch.
backward_activationThis method contains the backward gradient flow of our activation. Gradients of outputs are accepted as input, along with corresponding output and input vectors. Gradients of input are filled in-place.
For example, the backward_activation method for our activation function
would look like:
void backward_activation(const vec_t &x,
const vec_t &y,
vec_t &dx,
const vec_t &dy) override {
for (size_t j = 0; j < x.size(); j++) {
// dx = dy * (gradient of my activation)
dx[j] = dy[j] * alpha;
}
}
scaleThis method returns a pair of float_t, denoting the range of target value
for learning.
std::pair<float_t, float_t> scale() const override {
return std::make_pair(float_t(0.1), float_t(0.9));
};
That's it ! Your new activation is now ready as a layer of the network. You can use it easily as:
network<sequential> net;
net << fully_connected_layer(256, 64) << my_activation_layer(64);
// specifying input dimensions is optional if activation layer is not the first
// layer of our network
Note: The information further is optional, if you wish to do some rough temporary prototyping, you can skip the following content.
If you wish to serialize your activation layer, you must add these lines to your class implementation:
#ifndef CNN_NO_SERIALIZATION
friend struct serialization_buddy;
#endif
Register a macro at tiny-dnn/util/serialization_layer_list.h just like other layers are listed.
Both of these should go in cereal namespace.
template <>
struct LoadAndConstruct<tiny_dnn::my_activation_layer> {
template <class Archive>
static void load_and_construct(
Archive &ar, cereal::construct<tiny_dnn::my_activation_layer> &construct) {
tiny_dnn::shape3d in_shape;
float_t alpha;
ar(cereal::make_nvp("in_size", in_shape));
ar(cereal::make_nvp("alpha", alpha);
construct(in_shape, alpha);
}
};
template <class Archive>
struct specialize<Archive,
tiny_dnn::my_activation_layer,
cereal::specialization::non_member_serialize> {};
serialization_buddy structtemplate <class Archive>
static inline void serialize(Archive &ar, tiny_dnn::my_activation_layer &layer) {
layer.serialize_prolog(ar);
ar(cereal::make_nvp("in_size", layer.in_shape()[0]));
ar(cereal::make_nvp("alpha", layer.get_alpha()));
}
serialization_functions.h as welltemplate <class Archive>
void serialize(Archive &ar, tiny_dnn::my_activation_layer &layer) {
serialization_buddy::serialize(ar, layer);
}
Now you can get your layer represented in JSON structure of the network, if serialized by serialization helpers of tiny-dnn.