Back to Recommenders

Submit an Existing Notebook to AzureML

examples/run_notebook_on_azureml.ipynb

1.2.113.7 KB
Original Source

<i>Copyright (c) Recommenders contributors.</i>

<i>Licensed under the MIT License.</i>

Submit an Existing Notebook to AzureML

Introduction to Azure Machine Learning

The Azure Machine Learning service (AzureML) provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models. By using Azure Machine Learning service, you can start training on your local machine and then scale out to the cloud. With many available compute targets, like Azure Machine Learning Compute and Azure Databricks, and with advanced hyperparameter tuning services, you can build better models faster by using the power of the cloud.

Data scientists and AI developers use the main Azure Machine Learning Python SDK to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks or your favorite Python IDE. The Azure Machine Learning SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.

This noteboook provides a scaffold to directly submit an existing notebook to AzureML compute targets. Now a user doesn't have to rewrite the training script, instead, just by replacing file name, now user can submit notebook directly.

Advantages of using AzureML:

  • Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
  • Train models either locally or by using cloud resources, including GPU-accelerated model training.
  • Easy to scale out when dataset grows - by just creating and pointing to new compute target

Prerequisities

  • Azure Subscription

Setup environment

Install azure.contrib.notebook package

We need notebook_run_config from azureml.contrib.notebook to run this notebook. Since azureml.contrib.notebook contains experimental components, it's not included in generated .yaml file by default.

python
#!pip install "azureml.contrib.notebook>=1.0.21.1"
python
import os
import azureml.core
from azureml.core import Workspace
from os import path, makedirs
import shutil
from azureml.core import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.widgets import RunDetails
from azureml.core.runconfig import RunConfiguration
from azureml.contrib.notebook import NotebookRunConfig, PapermillExecutionHandler
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from tempfile import TemporaryDirectory

print("azureml.core version: {}".format(azureml.core.VERSION))

Connect to an AzureML workspace

An AzureML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models.

The function below will get or create an AzureML Workspace and save the configuration to aml_config/config.json.

It defaults to use provided input parameters or environment variables for the Workspace configuration values. Otherwise, it will use an existing configuration file (either at ./aml_config/config.json or a path specified by the config_path parameter).

Lastly, if the workspace does not exist, one will be created for you. See this tutorial to locate information such as subscription id.

python
ws = Workspace.create(
    name="<WORKSPACE_NAME>",
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group="<RESOURCE_GROUP>",
    location="<WORKSPACE_REGION>"
    exist_ok=True,
)

This example uses run-based AzureML Compute, which requires minimum setup. Alternatively, you can use persistent compute target (cpu or gpu cluster) or remote virtual machines. For more details, see How to set up training targets.

Cost-wise, according to Azure Pricing calculator, with example VM size STANDARD_D2_V2, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact Azure support.

Note:

  • See list of all available VM sizes here.
  • As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read these instructions on the default limits and how to request more quota.

Create or Attach Azure Machine Learning Compute

We create a cpu cluster as our remote compute target. If a cluster with the same name already exists in your workspace, the script will load it instead. You can read Set up compute targets for model training to learn more about setting up compute target on different locations. You can also create GPU machines when larger machines are necessary to train the model.

According to Azure Pricing calculator, with example VM size STANDARD_D2_V2, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact Azure support.

Note:

  • 10m and 20m dataset requires more capacity than STANDARD_D2_V2, such as STANDARD_NC6 or STANDARD_NC12. See list of all available VM sizes here.
  • As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read these instructions on the default limits and how to request more quota.

Learn more about Azure Machine Learning Compute

<details> <summary>Click to learn more about compute types</summary>

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created within your workspace region and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user.

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service.

You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.

In addition to vm_size and max_nodes, you can specify:

  • min_nodes: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute
  • vm_priority: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted
  • idle_seconds_before_scaledown: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes
  • vnet_resourcegroup_name: Resource group of the existing VNet within which AmlCompute should be provisioned
  • vnet_name: Name of VNet
  • subnet_name: Name of SubNet within the VNet
</details> ---
python
# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.
VM_SIZE = 'STANDARD_D2_V2'
# Cluster nodes
MIN_NODES = 0
MAX_NODES = 2

CLUSTER_NAME = 'cpucluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except:
    print("Creating a new compute target...")
    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(
        vm_size=VM_SIZE,
        min_nodes=MIN_NODES,
        max_nodes=MAX_NODES
    )
    # Create the cluster with the specified name and configuration
    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    # Wait for the cluster to complete, show the output log
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

This example uses persistent compute target, which requires creation of the computing resource for first time. Alternatively, you can use run-based compute target or remote virtual machines.

python
NOTEBOOK_NAME = 'sar_movielens.ipynb' # use different notebook file name if trying to run other notebooks
experiment_name = NOTEBOOK_NAME.strip(".ipynb")
tmp_dir = TemporaryDirectory()
notebook_path = path.join(tmp_dir.name, NOTEBOOK_NAME)
source_path = path.join('00_quick_start/', NOTEBOOK_NAME)
shutil.copy(source_path, notebook_path)
        
exp = Experiment(workspace=ws, name=experiment_name)

run_config = RunConfiguration()
run_config.target = "cpucluster" # possible to use alternative compute targets
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE
run_config.environment.python.user_managed_dependencies = False
run_config.auto_prepare_environment = True
run_config.environment.python.conda_dependencies = CondaDependencies.create(python_version='3.6.2')
run_config.environment.python.conda_dependencies.add_pip_package('sklearn')

cfg = NotebookRunConfig(source_directory='../',
                            notebook='notebooks/00_quick_start/' + NOTEBOOK_NAME,
                            output_notebook='outputs/out.ipynb',
                            parameters={"MOVIELENS_DATA_SIZE": "100k", "TOP_K": 10},
                            run_config=run_config)

exp.submit() will submit source_directory folder and designate notebook to run

python
run = exp.submit(cfg)
run

Above cell should output similar table as below After clicking "Link to Azure Portal", experiment run details tab looks like this You can find the executed notebook out.ipynb under outputs tab.

Jupyter widget

You can use a Jupyter widget to monitor the progress of the run. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

python
RunDetails(run).show()

You can view logged metrics with run.get_metrics()

python
# run below after run is complete, otherwise metrics is empty
metrics = run.get_metrics()
print(metrics)

Deprovision compute resource

To avoid unnecessary charges, if you created compute target that doesn't scale down to 0, make sure the compute target is deprovisioned after use.

python
# delete () is used to deprovision and delete the AmlCompute target. 
# do not run below before experiment completes

# compute_target.delete()

# deletion will take a few minutes. You can check progress in Azure Portal / Computing tab
python
# clean up temporary directory
tmp_dir.cleanup()