docs/python_docs/python/tutorials/getting-started/crash-course/4-components.md
Data and models are not the only components that you need to train a deep learning model. In this notebook, you will learn about the common components involved in training deep learning models. Here is a list of components necessary for training models in MXNet.
from mxnet import np, npx,gluon
import mxnet as mx
from mxnet.gluon import nn
npx.set_np()
device = mx.cpu()
In a previous notebook, you used net.initialize() to initialize the network
before a forward pass. Now, you will learn about initialization in a little more
detail.
First, define and initialize the sequential network from earlier.
After you initialize it, print the parameters using collect_params() method.
net = nn.Sequential()
net.add(nn.Dense(5, in_units=3, activation="relu"),
nn.Dense(25, activation="relu"),
nn.Dense(2)
)
net
net.initialize()
params = net.collect_params()
for key, value in params.items():
print(key, value)
Next, you will print shape and params after the first forward pass.
x = np.random.uniform(-1, 1, (10, 3))
net(x) # Forward computation
params = net.collect_params()
for key, value in params.items():
print(key, value)
MXNet makes it easy to initialize by providing many common initializers. A subset that you will be using in the following sections include:
For more information, see Initializers
When you use net.intialize(), MXNet, by default, initializes the weight matrices uniformly
by drawing random values with a uniform-distribution between −0.07 and 0.07 and
updates the bias parameters by setting them all to 0.
To initialize your network using different built-in types, you have to use the
init keyword argument in the initialize() method. Here is an example using
constant and normal initialization.
from mxnet import init
# Constant init initializes the weights to be a constant value for all the params
net.initialize(init=init.Constant(3), device=device)
print(net[0].weight.data()[0])
If you use Normal to initialize your weights then you will use a normal
distribution with a mean of zero and standard deviation of sigma. If you have
already initialized the weight but want to reinitialize the weight, set the
force_reinit flag to True.
net.initialize(init=init.Normal(sigma=0.2), force_reinit=True, device=device)
print(net[0].weight.data()[0])
Till now you have seen how to create an algorithm and how to initialize it using mxnet APIs; additionally you have learned the basics of using mxnet. When you start training the ML algorithm, how do you actually teach the algorithm to learn or train?
There are three main components for training an algorithm.
You have already learned about autograd in the previous notebook. In this notebook, you will learn more about loss functions and optimizers.
Loss functions are used to train neural networks and help the algorithm learn from the data. The loss function computes the difference between the output from the neural network and ground truth. This output is used to update the neural network weights during training. Next, you will look at a simple example.
Suppose you have a neural network net and the data is stored in a variable
data. The data consists of 5 total records (rows) and two features (columns)
and the output from the neural network after the first epoch is given by the
variable nn_output.
net = gluon.nn.Dense(1)
net.initialize()
nn_input = np.array([[1.2, 0.56],
[3.0, 0.72],
[0.89, 0.9],
[0.89, 2.3],
[0.99, 0.52]])
nn_output = net(nn_input)
nn_output
The ground truth value of the data is stored in groundtruth_label is
groundtruth_label = np.array([[0.0083],
[0.00382],
[0.02061],
[0.00495],
[0.00639]]).reshape(5, 1)
For this problem, you will use the L2 Loss. L2Loss, also called Mean Squared Error, is a regression loss function that computes the squared distances between the target values and the output of the neural network. It is defined as:
$$L = \frac{1}{2N}\sum_i{|label_i − pred_i|)^2}$$
The L2 loss function creates larger gradients for loss values which are farther apart due to the square operator and it also smooths the loss function space.
def L2Loss(output_values, true_values):
return np.mean((output_values - true_values) ** 2, axis=1) / 2
L2Loss(nn_output, groundtruth_label)
Now, you can do the same thing using the mxnet API
from mxnet.gluon import nn, loss as gloss
loss = gloss.L2Loss()
loss(nn_output, groundtruth_label)
A network can improve by iteratively updating its weights to minimise the loss. Some tasks use a combination of multiple loss functions, but often you will just use one. MXNet Gluon provides a number of the most commonly used loss functions. The choice of your loss function will depend on your network and task. Some common tasks and loss function pairs include:
regression: L1Loss, L2Loss
classification: SigmoidBinaryCrossEntropyLoss, SoftmaxCrossEntropyLoss
embeddings: HingeLoss
You can also create custom loss functions using Loss Blocks.
You can inherit the base Loss class and write your own forward method. The
backward propagation will be automatically computed by autograd. However, that
only holds true if you can build your loss from existing mxnet operators.
from mxnet.gluon.loss import Loss
class custom_L1_loss(Loss):
def __init__(self, weight=None, batch_axis=0, **kwargs):
super(custom_L1_loss, self).__init__(weight, batch_axis, **kwargs)
def forward(self, pred, label):
l = np.abs(label - pred)
l = l.reshape(len(l),)
return l
L1 = custom_L1_loss()
L1(nn_output, groundtruth_label)
l1=gloss.L1Loss()
l1(nn_output, groundtruth_label)
The loss function determines how much to change the parameters based on how far the
model is from the groundtruth. Optimizer determines how the model
weights or parameters are updated based on the loss function. In Gluon, this
optimization step is performed by the gluon.Trainer.
Here is a basic example of how to call the gluon.Trainer method.
from mxnet import optimizer
trainer = gluon.Trainer(net.collect_params(),
optimizer="Adam",
optimizer_params={
"learning_rate":0.1,
"wd":0.001
})
When creating a Gluon Trainer, you must provide the trainer object with
trainer.step is called. For more information, see
optimizerscurr_weight = net.weight.data()
print(curr_weight)
batch_size = len(nn_input)
trainer.step(batch_size, ignore_stale_grad=True)
print(net.weight.data())
print(curr_weight - net.weight.grad() * 1 / 5)
MXNet includes a metrics API that you can use to evaluate how your model is
performing. This is typically used during training to monitor performance on the
validation set. MXNet includes many commonly used metrics, a few are listed below:
Now, you will define two arrays for a dummy binary classification example.
# Vector of likelihoods for all the classes
pred = np.array([[0.1, 0.9], [0.05, 0.95], [0.83, 0.17], [0.63, 0.37]])
labels = np.array([1, 1, 0, 1])
Before you can calculate the accuracy of your model, the metric (accuracy) should be instantiated before the training loop
from mxnet.gluon.metric import Accuracy
acc = Accuracy()
To run and calculate the updated accuracy for each batch or epoch, you can call
the update() method. This method uses labels and predictions which can be
either class indexes or a vector of likelihoods for all of the classes.
acc.update(labels=labels, preds=pred)
In addition to built-in metrics, if you want to create a custom metric, you can
use the following skeleton code. This code inherits from the EvalMetric base
class.
def MyCustomMetric(EvalMetric):
def __init__(self):
super().init()
def update(self, labels, preds):
pass
Here is an example using the Precision metric. First, define the two values
labels and preds.
labels = np.array([0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1])
preds = np.array([0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0])
Next, define the custom metric class precision and instantiate it
from mxnet.gluon.metric import EvalMetric
class precision(EvalMetric):
def __init__(self):
super().__init__(name="Precision")
def update(self,labels, preds):
tp_labels = (labels == 1)
true_positives = sum(preds[tp_labels] == 1)
fp_labels = (labels == 0)
false_positives = sum(preds[fp_labels] == 1)
return true_positives / (true_positives + false_positives)
p = precision()
And finally, call the update method to return the results of precision for your data
p.update(np.array(labels), np.array(preds))
Now that you have learned all the components required to train a neural network, you will see how to load your data using the Gluon API in Step 5: Gluon Datasets and DataLoader