Back to Mujoco

Least Squares

python/least_squares.ipynb

3.8.070.9 KB
Original Source

<h1><center>Least Squares

</center></h1>

This notebook describes a utility function included in the MuJoCo Python library performing box-bounded nonlinear least squares optimization. We provide some theoretical background, describe our implementation and show example usage. <a href="https://colab.research.google.com/github/google-deepmind/mujoco/blob/main/python/least_squares.ipynb"></a>

<!-- Copyright 2022 DeepMind Technologies Limited Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

All imports

python
# Install mujoco.
!pip install mujoco
from google.colab import files
import distutils.util
import os
import subprocess
if subprocess.run('nvidia-smi').returncode:
  raise RuntimeError(
      'Cannot communicate with GPU. '
      'Make sure you are using a GPU Colab runtime. '
      'Go to the Runtime menu and select Choose runtime type.')

# Add an ICD config so that glvnd can pick up the Nvidia EGL driver.
# This is usually installed as part of an Nvidia driver package, but the Colab
# kernel doesn't install its driver via APT, and as a result the ICD is missing.
# (https://github.com/NVIDIA/libglvnd/blob/master/src/EGL/icd_enumeration.md)
NVIDIA_ICD_CONFIG_PATH = '/usr/share/glvnd/egl_vendor.d/10_nvidia.json'
if not os.path.exists(NVIDIA_ICD_CONFIG_PATH):
  with open(NVIDIA_ICD_CONFIG_PATH, 'w') as f:
    f.write("""{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}
""")
#
# Configure MuJoCo to use the EGL rendering backend (requires GPU)
print('Setting environment variable to use GPU rendering:')
%env MUJOCO_GL=egl

try:
  print('Checking that the installation succeeded:')
  import mujoco
  from mujoco import minimize
  from mujoco import rollout
  mujoco.MjModel.from_xml_string('<mujoco/>')
except Exception as e:
  raise e from RuntimeError(
      'Something went wrong during installation. Check the shell output above '
      'for more information.\n'
      'If using a hosted Colab runtime, make sure you enable GPU acceleration '
      'by going to the Runtime menu and selecting "Choose runtime type".')
print('MuJoCo installation successful.')

print('Installing mediapy:')
!command -v ffmpeg >/dev/null || (apt update && apt install -y ffmpeg)
!pip install -q mediapy
#
from IPython.display import clear_output
clear_output()

# Other imports.
import mediapy as media
import time
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from typing import Tuple, Optional, Union

Background

We begin with a quick primer. If you know this stuff, feel free to skip this section.

Newton's method

Nonlinear optimization (unconstrained, for now) means finding the value of a vector decision variable $x \in \mathbb R^n$, which minimizes the objective $f(x): \mathbb R^n \rightarrow \mathbb R$, here assumed to be a smooth function. This is often written as $$x^* = \arg \min_x f(x).$$

Second-order optimization, a.k.a Newton's method, is an iterative procedure that finds a sequence of candidates $x_k$ which reduce the value of $f(x_k)$, until convergence to the minimum. At every iteration we measure the local 1st and 2nd derivatives of $f(\cdot)$, called the gradient vector $g = \nabla f(x_k)$ and the Hessian matrix $H = \nabla^2 f(x_k)$. With these we have a local quadratic approximation of $f(x_k)$: $$ f(x_k + \delta x) \approx f(x_k) + \delta x^T g + \frac{1}{2} \delta x^T H\delta x $$ By differentiating with respect to $\delta x$, equating to 0 and solving, we hope to jump to the minimium of the quadratic, giving us the next candidate: $$ x_{k+1} = x_k + \delta x = x_k - H^{-1}g $$

Levenberg-Marquardt regularization

Raw Newton's method, in its form above, is not a good idea. There is no guarantee that the candidates $x_{k}$ will converge or indeed that the objective will be reduced at all. This because

  1. The local quadratic approximation can be very bad (valid only for a small neighborhood).
  2. Even if the approximation is good, $H$ could be negative-definite (or indefinite) rather than positive-definite. In this case we would be jumping to the maximum (or saddle-point) rather than the minimum of the quadratic, taking the situation from bad to worse.

Levenberg-Marquardt (LM) regularization solves both these problems by introducing a regularization parameter $\mu \ge 0$: $$ \delta x_{LM} = - (H + \mu I)^{-1}g $$ For $\mu=0$ we get the classic Newton step. For $\mu$ so large that $\mu I \gg H$, we get $\delta x \approx -\tfrac{1}{\mu}g$: a small gradient step, which is guaranteed to reduce the objective for a large enough $\mu$. Run the cell below for a visualization of LM regularization. For the simple 2D quadratic and the initial guess $$ f(x) = \frac{1}{2}x^T \begin{pmatrix} 2 & 0.9\ 0.9 & 1 \end{pmatrix} x, \qquad x_0 = \begin{pmatrix} -0.9\ 0.5 \end{pmatrix}, $$ We plot the trajectory of $x_1(\mu)$, varying the parameter logarithmically: $\mu = 10^{-4} \ldots 10^2$

python
#@title LM curve visualization

# Problem setup.
H = np.array([[2, 0.9], [0.9, 1]])
x0 = np.array((-0.9, 0.5))

# Logarithmically spaced mu regularizer values.
lm_values = 10**np.linspace(-4, 2, 40)

# Sequence of next candidates x1(mu).
g0 = H.dot(x0)
dx = np.asarray([-np.linalg.inv(H + mu*np.eye(2)).dot(g0) for mu in lm_values])
x1 = x0 + dx

# Grid of function values, contour plot.
res = 400
X,Y = np.meshgrid(np.linspace(-1, 1, res),
                  np.linspace(-1, 1, res))
Z = 0.5 * (H[0,0]*X**2 + H[1,1]*Y**2 + 2*H[1,0]*X*Y)
fig = plt.figure(figsize=(10,10))
contours = np.linspace(0, np.sqrt(Z.max()), 20)**2
plt.contour(X, Y, Z, contours, linewidths=0.5)

# Draw x0, actual minimum, x1(mu).
point_spec = {'markersize':10, 'markeredgewidth':2, 'fillstyle':'none',
              'marker':'o', 'linestyle':'none'}
curve_spec = {'markersize':1.5, 'linewidth':1, 'color':'red'}
plt.plot(x0[0], x0[1], color='grey', **point_spec)
plt.plot(0, 0, color='pink', **point_spec)
plt.plot(x1[:,0], x1[:,1], '-o', **curve_spec)

# Finalize figure.
plt.title('Levenberg–Marquardt curve: $x_1(\mu),\; \mu = 10^{-4} \ldots 10^2$')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['$x_0$', 'minimum', 'LM trajectory'])
plt.rc('font', family='serif')

In the plot above we can see the "LM curve", starting at the minimum of the quadratic for small $\mu$ and then, as $\mu$ grows, curving towards $x_0$, approaching it along the gradient. Powell's dog-leg trajectory can be considered a piecewise-linear approximation of the LM curve.

Importantly, note how the points clump at the ends of the curve. The range of $\mu$ values where the curve is "valuable", enabling meaningful search, is rather narrow, especially considering that we are varying it by 6 orders of magnitude. This implies that careful control of $\mu$ is critical for efficiency of convergence.

Box constraints

Additional stabillity and robustness can be achieved by limiting the search space. An important type of constraint is the box constraint: $$ \begin{aligned} x^* &= \arg \min_x f(x)\ \textrm{s.t.} &\quad l \preccurlyeq x \preccurlyeq u \end{aligned} $$ Where $l$ and $u$ are respectively lower and upper bound vectors and the inequalities $l \preccurlyeq x \preccurlyeq u$ are read elementwise. Solving for the minimizing $\delta x$ is now a bit more involved than solving a linear system, and the optimization method is now referred to as Sequential Quadratic Programming (SQP). Fortunately MuJoCo has an efficient box-constrained QP solver: the mju_boxQP function.

Least Squares and the Gauss-Newton Hessian

Additional simplification is achieved by a structured objective: $f(x) = \frac{1}{2} r(x)^T\cdot r(x)$, which is called the Least Squares objective. The vector function $r(x): \mathbb R^n \rightarrow \mathbb R^m$, is called the residual, and the optimization problem is referred to as Nonlinear Least Squares. For readers with a statistics background, the Least Squares problem lends itself to reinterpertation as Maximum Likelihood Estimation of a Gaussian posterior $x^* = \arg \max_x p(x)$ with $p(x) \sim e^{-\frac{1}{2}r(x)^Tr(x)}$. To see why the least-squares structure is beneficial, consider the gradient and Hessian of the objective. Letting $J$ be the $n \times m$ Jacobian matrix of the residual $J = \nabla r(x)$ we have $$ \begin{align} g &= J^Tr\ H &= J^TJ + \nabla J \cdot r \end{align} $$ The Gauss-Newton approximation of the Hessian $H$ involves dropping the second term: $H \approx H_{GN}= J^TJ$. This approximation has two obvious benefits:

  1. The approximate Hessian $H_{GN}$, being a matrix square, cannot be negative-definite and is at least semidefinite. For any $\mu \gt 0$, the LM-regularized matrix $H_{GN} + \mu I$ is guaranteed to be Symmetric Positive Definite (SPD).
  2. In order to compute $H_{GN}$ we only require the Jacobian $J$. In other words, we get the goodness of 2nd-order optimization with only 1st-order derivatives.

This completes the background section and allows us to accurately describe the function explored in the rest of the notebook.

Least Norm Generalization

The Least Squares problem described above can be generalized as follows. Instead of the quadratic norm $\frac{1}{2} r(x)^T\cdot r(x)$, we can use some other smooth, convex norm $f(x) = n(r(x))$. Letting $\nabla n = \frac{\partial n}{\partial r}$ and $\nabla^2 n = \frac{\partial^2 n}{\partial r^2}$ be respectively the gradient and Hessian of $n$ with respect to $r$, we have $$ \begin{align} g &= J^T\cdot\nabla n\ H_{GN} &= J^T\cdot \nabla^2 n \cdot J \end{align} $$ It is easy to verify that for the quadratic norm, these expression reduce to the ones above as $\nabla (\frac{1}{2}r^T\cdot r) = r$ and $\nabla^2 (\frac{1}{2}r^T\cdot r) = I_m$.

This completes the background section and allows us to accurately describe the function explored in the rest of the notebook.

minimize.least_squares

The minimize.least_squares function in the mujoco Python library solves the Box-Constrained Nonlinear Least Squares problem. It uses the Gauss-Newton Hessian approximation, and takes Levenberg-Marquardt search steps, while adjusting the LM parameter $\mu$.

Motivation and assumptions

The minimize.least_squares() function is motivated by two problems:

  • The System Identification (sysID) problem is the main use case. SysID involves finding the parameters of a simulated model to better match the behaviour of the real system. In the notation of the previous section, this problem can be cast as a Least Squares problem with the decision variable $x$ corresponding to the model parameters one wishes to identify and the residual $r$ corresponding to the difference between measured and simulated sensor values. A full sysID tutorial with examples will be made available soon.
  • The Inverse Kinematics (IK) problem. In this case the decision variables are joint angles, while the residual is the pose difference between an end-effector and some desired pose. The IK problem was less central to our design choices, yet is solved very efficiently with our code.

This motivation led us to the following assumptions and design choices.

  1. The dimension of the decision variable is small: $\dim(x) \lesssim 100$. While Cholesky factorization (the main cost in the QP solver) scales cubically in $\dim(x)$, it is very fast at these sizes.
  2. The dimension of the residual vector $r$ (corresponding to the number of measured sensor values, in the sysID case) can be much larger.
  3. Since each evaluation of the residual involves rolling out the physics to obtain simulated sensor values, computing $r(x)$ is the most expensive part of the optimization.
  4. Analytic Jacobians $J = \frac{\partial r}{\partial x}$ are usually not available and must be obtained with finite-differencing.
  5. Due to the sematics of $x$, box-bounds are usually sufficient (for example, masses and friction coefficients cannot be negative, joint angles should not exceed their limits).
  6. The implementation should be efficient yet readable. The least_squares function takes up ~250 lines of code.

Let's look at the function's docstring and then discuss some implementation notes.

python
print(minimize.least_squares.__doc__)

Implementation notes

  1. The residual funciton must be vectorized: besides taking a column vector $x$ and returning the residual $r(x)$, it must accept an $n\times k$ matrix $X$, returning an $m\times k$ matrix $R$. The vectorized format is used by the internal finite-difference implementation and can be exploited to speed up the minimization by using multi-threading inside the residual function implementation.
  2. Bounds must be None or fully specified for all dimensions of $x$.
  3. The jacobian callback can be supplied by the user and is finite-differenced otherwise. Note that this callback is not made available in the case the user knows the analytic Jacobian (this is very rare in the sysID context), but in case the user wants to implement their own multi-threaded fin-diff callback.
  4. Automatic forward/backward differencing, chosen to avoid crossing the bounds, with optional central differencing. The fin-diff epsilon eps is scaled by the size of the bounds, if provided.
  5. The termination criterion is based on small step size $||\delta x|| < \textrm{tol}$.
  6. We use the simple yet affective $\mu$-search strategy described in Bazaraa et-al.. Backtracking $\mu$-increases are careful, attempting to find the smallest $\mu$ where sufficient reduction is found. $\mu$-decreases are aggressive, allowing fast quadratic convergence to a local minimum.
  7. The user may optionally provide a norm different than the quadratic norm (the default), this is covered in more detail below.

Toy examples

python
# @title Plotting utility
def plot_2D(residual, name, plot_range, true_minimum, trace=None, bounds=None):
  n = 400
  x_range, y_range = plot_range
  x_grid = np.linspace(x_range[0], x_range[1], n)
  y_grid = np.linspace(y_range[0], y_range[1], n)
  X, Y = np.meshgrid(x_grid, y_grid)

  R = residual(np.stack((X, Y)))
  Z = 0.5 * np.sum(R**2, axis=0)

  fig = plt.figure(figsize=(10, 10))
  cntr_levels = np.linspace(0, np.log1p(Z.max()), 30)
  plt.contour(X, Y, np.log1p(Z), cntr_levels, linewidths=0.5)
  plt.title(name)
  plt.xlabel('x')
  plt.ylabel('y')
  plt.rc('font', family='serif')

  # Draw bounds.
  hbounds = None
  if bounds is not None:
    lower, upper = bounds
    width = upper[0] - lower[0]
    height = upper[1] - lower[1]
    rect = Rectangle(lower, width, height, edgecolor='blue', facecolor='none',
                     fill=False, lw=1)
    hbounds = fig.axes[0].add_patch(rect)

  # Draw global minimum, initial point
  point_spec = {'markersize': 10, 'markeredgewidth': 2, 'fillstyle': 'none',
                'marker': 'o', 'linestyle': 'none'}
  curve_spec = {'marker': 'o', 'markersize': 2, 'linewidth': 1.5,
                'color': 'red', 'alpha':0.5}
  final_spec = {'marker': 'o', 'markersize': 4, 'color': 'green',
                'linestyle': 'none'}
  hmin = plt.plot(true_minimum[0], true_minimum[1], color='pink', **point_spec)

  def add_curve(t):
    x0 = plt.plot(t[0, 0], t[0, 1], color='grey', **point_spec)
    traj = plt.plot(t[:, 0], t[:, 1], **curve_spec)
    final = plt.plot(t[-1, 0], t[-1, 1], **final_spec)
    return x0, traj, final

  if trace is not None:
    if isinstance(trace, list):
      for t in trace:
        x0, traj, final = add_curve(t)
    else:
      x0, traj, final = add_curve(trace)

  handles = [x0[0], traj[0], hmin[0], final[0]]
  labels = ['$x_0$', 'optimization trace', 'true minimum', 'solution']
  if hbounds is not None:
    handles.append(hbounds)
    labels.append('box bounds')
  plt.legend(handles, labels)

  plt.show()

Rosenbrock function

The classic Rosenbrock test function can be written in least-squares form:

$f(x,y)=r^Tr,$ with the residual $r(x,y)=\begin{pmatrix} 1-x \ 10\cdot(y-x^2) \end{pmatrix}$.

Note that here in the 2D example section, we use $x, y$ to denote the two coordinates of our decision variable. Let's see minimize.least_squares find the minimum:

python
# Minimize Rosenbrock function.
def rosenbrock(x):
  return np.stack([1 - x[0, :], 10 * (x[1, :] - x[0, :] ** 2)])

x0 = np.array((0.0, 0.0))
x, rb_trace = minimize.least_squares(x0, rosenbrock);
python
#@title Visualize Rosenbrock solution

plot_range = (np.array((-0.5, 1.5)), np.array((-1.5, 2.)))
minimum = (1, 1)
points = np.asarray([t.candidate for t in rb_trace])
plot_2D(rosenbrock, 'Rosenbrock Function', plot_range, minimum, trace=points)

Beale function

A somewhat more elaborate example is the Beale function:

$f(x,y)=r^Tr,$ with the residual $r(x,y)=\begin{pmatrix} 1.5-x+xy \ 2.25-x+xy^2 \ 2.625-x+xy^3 \end{pmatrix}$.

python
#@title Minimize, visualize Beale
def beale(x):
  return np.stack((1.5-x[0, :]+x[0, :]*x[1, :],
                   2.25-x[0, :]+x[0, :]*x[1, :]*x[1, :],
                   2.625-x[0, :]+x[0, :]*x[1, :]*x[1, :]*x[1, :]))

x0 = np.array((-3.0, -3.0))
x, bl_trace = minimize.least_squares(x0, beale)

# Visualize solution.
plot_range = (np.array((-4., 4.)), np.array((-4., 4.)))
minimum = (3., 0.5)
points = np.asarray([t.candidate for t in bl_trace])
plot_2D(beale, 'Beale Function', plot_range, minimum, trace=points)

Box bounds

Let's see solutions for these problems with box bounds:

python
#@title Rosenbrock with bounds

# Choose bounds.
lower = np.array([-.3, -1.])
upper = np.array([0.9, 1.9])
bounds = [lower, upper]

# Make some initial points, minimize, save taces.
num_points = 4
px = np.linspace(lower[0]+.1, upper[0]-.1, num_points)
py = np.linspace(lower[1]+.1, upper[1]-.1, num_points)
traces = []
for i in range(num_points):
  for j in range(num_points):
    x0 = np.array((px[j], py[num_points-i-1]))
    _, trace = minimize.least_squares(x0, rosenbrock, bounds, verbose=0)
    traces.append(np.asarray([t.candidate for t in trace]))

# Plot.
plot_range = (np.array((-0.5, 1.5)), np.array((-1.5, 2.5)))
minimum = (1, 1)
plot_2D(rosenbrock, 'Rosenbrock Function', plot_range, minimum, traces, bounds)
python
#@title Beale with bounds

# Choose bounds.
lower = np.array([-2, -1.3])
upper = np.array([1.5, 3.])
bounds = [lower, upper]

# Make some initial points, minimize, save taces.
num_points = 5
p0 = np.linspace(lower[0]+.4, upper[0]-.4, num_points)
p1 = np.linspace(lower[1]+.4, upper[1]-.4, num_points)
traces = []

for i in range(num_points):
  for j in range(num_points):
    x0 = np.array((p0[j], p1[i]))
    x, trace = minimize.least_squares(x0, beale, bounds, verbose=0)
    traces.append(np.asarray([t.candidate for t in trace]))

# Plot.
plot_range = (np.array((-3., 3.5)), np.array((-3., 4.)))
minimum = (3., 0.5)
plot_2D(beale, 'Beale Function', plot_range, minimum, traces, bounds)

n-dimensional Rosenbrock

An $n$-dimensional generalization of the 2D rosenbrock function is $$ f(x) = \sum_{i=1}^{n-1} \left[ 100 \cdot (x_{i+1} - x_{i}^{2})^{2} + \left(1 - x_{i}\right)^{2}\right], $$ which can also be written in Least Squares form and solved by our optimizer:

python
n = 20

def rosenbrock_n(x):
  res0 = [1 - x[i, :] for i in range(n - 1)]
  res1 = [10 * (x[i, :] - x[i + 1, :] ** 2) for i in range(n - 1)]
  return np.asarray(res0 + res1)

x0 = np.zeros(n)
x, _ = minimize.least_squares(x0, rosenbrock_n);

# Expected solution is a vector of 1's
assert np.linalg.norm(x-1) < 1e-8

Inverse Kinematics

Forward kinematics means computing the Cartesian poses of articulated bodies, given joint angles. This computation is easy and well defined. Inverse Kinematics (IK) tries to do the opposite: Given a desired Cartesian pose of an articulated body (usually called the end effector), find the joint angles which put the end effector at this pose. The IK problem can easily be over or under determined:

  • Over-determined: The end effector cannot reach the desired pose at all, or can match either position or orientation, but not both.
  • Under-detemined: The articulated chain has more than the minimum required number of degrees-of-freedom (dofs), so after reaching the target there is still a free subspace (imagine rotating your eblow while your hand and shoulder are fixed).

We adress both of these situations in our implementation below, but first mention features which are currently not implemented, but could be added in the future.

  1. We currently support only one end-effector and target pose. Adding mutiple targets would be straightforward.
  2. Our implementation currently does not support quaternions in the kinematic chain (ball and free joints). Adding this is possible and would require a small modification to the least_squares function (explicit support for non-Cartesian tangent spaces).
  3. We only give examples with simple box bounds, which in the IK context means joint range limits. In order to take into account constraints like collision avoidance, one would need to make use of MuJoCo's constraint Jacobians. Please let us know if you'd like to see an example of this.

7-dof arm

Below, we copy the MuJoCo Menagerie's model of Franka's Panda, with the following modifications:

  1. Intergrated the scene elements into a single XML.
  2. Added a "target" mocap body and site with 3 colored, non-colliding geoms that overlap with the site frame.
python
#@title Panda XML
panda = '''
<mujoco model="panda scene">
  <compiler angle="radian" autolimits="true"/>

  <option integrator="implicitfast"/>

  <statistic center="0.3 0 0.4" extent="1"  meansize=".15"/>

  <visual>
    <headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
    <rgba haze="0.15 0.25 0.35 1"/>
    <global azimuth="120" elevation="-20"/>
  </visual>

  <default>
    <geom condim="1"/>
    <default class="panda">
      <material specular="0.5" shininess="0.25"/>
      <joint armature="0.1" damping="1" axis="0 0 1" range="-2.8973 2.8973"/>
      <general dyntype="none" biastype="affine" ctrlrange="-2.8973 2.8973" forcerange="-87 87"/>
      <default class="finger">
        <joint axis="0 1 0" type="slide" range="0 0.04"/>
      </default>

      <default class="visual">
        <geom type="mesh" contype="0" conaffinity="0" group="2"/>
        <default class="target">
          <geom type="box" size=".015"/>
        </default>
      </default>
      <default class="collision">
        <geom type="mesh" group="3"/>
        <default class="fingertip_pad_collision_1">
          <geom type="box" size="0.0085 0.004 0.0085" pos="0 0.0055 0.0445"/>
        </default>
        <default class="fingertip_pad_collision_2">
          <geom type="box" size="0.003 0.002 0.003" pos="0.0055 0.002 0.05"/>
        </default>
        <default class="fingertip_pad_collision_3">
          <geom type="box" size="0.003 0.002 0.003" pos="-0.0055 0.002 0.05"/>
        </default>
        <default class="fingertip_pad_collision_4">
          <geom type="box" size="0.003 0.002 0.0035" pos="0.0055 0.002 0.0395"/>
        </default>
        <default class="fingertip_pad_collision_5">
          <geom type="box" size="0.003 0.002 0.0035" pos="-0.0055 0.002 0.0395"/>
        </default>
      </default>
    </default>
  </default>

  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.3 0.5 0.7" rgb2="0 0 0" width="512" height="3072"/>
    <texture type="2d" name="groundplane" builtin="checker" mark="edge" rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3"
      markrgb="0.8 0.8 0.8" width="300" height="300"/>
    <material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
  </asset>

  <asset>
    <material class="panda" name="white" rgba="1 1 1 1"/>
    <material class="panda" name="off_white" rgba="0.901961 0.921569 0.929412 1"/>
    <material class="panda" name="black" rgba="0.25 0.25 0.25 1"/>
    <material class="panda" name="green" rgba="0 1 0 1"/>
    <material class="panda" name="light_blue" rgba="0.039216 0.541176 0.780392 1"/>

    <!-- Collision meshes -->
    <mesh name="link0_c" file="link0.stl"/>
    <mesh name="link1_c" file="link1.stl"/>
    <mesh name="link2_c" file="link2.stl"/>
    <mesh name="link3_c" file="link3.stl"/>
    <mesh name="link4_c" file="link4.stl"/>
    <mesh name="link5_c0" file="link5_collision_0.obj"/>
    <mesh name="link5_c1" file="link5_collision_1.obj"/>
    <mesh name="link5_c2" file="link5_collision_2.obj"/>
    <mesh name="link6_c" file="link6.stl"/>
    <mesh name="link7_c" file="link7.stl"/>
    <mesh name="hand_c" file="hand.stl"/>

    <!-- Visual meshes -->
    <mesh file="link0_0.obj"/>
    <mesh file="link0_1.obj"/>
    <mesh file="link0_2.obj"/>
    <mesh file="link0_3.obj"/>
    <mesh file="link0_4.obj"/>
    <mesh file="link0_5.obj"/>
    <mesh file="link0_7.obj"/>
    <mesh file="link0_8.obj"/>
    <mesh file="link0_9.obj"/>
    <mesh file="link0_10.obj"/>
    <mesh file="link0_11.obj"/>
    <mesh file="link1.obj"/>
    <mesh file="link2.obj"/>
    <mesh file="link3_0.obj"/>
    <mesh file="link3_1.obj"/>
    <mesh file="link3_2.obj"/>
    <mesh file="link3_3.obj"/>
    <mesh file="link4_0.obj"/>
    <mesh file="link4_1.obj"/>
    <mesh file="link4_2.obj"/>
    <mesh file="link4_3.obj"/>
    <mesh file="link5_0.obj"/>
    <mesh file="link5_1.obj"/>
    <mesh file="link5_2.obj"/>
    <mesh file="link6_0.obj"/>
    <mesh file="link6_1.obj"/>
    <mesh file="link6_2.obj"/>
    <mesh file="link6_3.obj"/>
    <mesh file="link6_4.obj"/>
    <mesh file="link6_5.obj"/>
    <mesh file="link6_6.obj"/>
    <mesh file="link6_7.obj"/>
    <mesh file="link6_8.obj"/>
    <mesh file="link6_9.obj"/>
    <mesh file="link6_10.obj"/>
    <mesh file="link6_11.obj"/>
    <mesh file="link6_12.obj"/>
    <mesh file="link6_13.obj"/>
    <mesh file="link6_14.obj"/>
    <mesh file="link6_15.obj"/>
    <mesh file="link6_16.obj"/>
    <mesh file="link7_0.obj"/>
    <mesh file="link7_1.obj"/>
    <mesh file="link7_2.obj"/>
    <mesh file="link7_3.obj"/>
    <mesh file="link7_4.obj"/>
    <mesh file="link7_5.obj"/>
    <mesh file="link7_6.obj"/>
    <mesh file="link7_7.obj"/>
    <mesh file="hand_0.obj"/>
    <mesh file="hand_1.obj"/>
    <mesh file="hand_2.obj"/>
    <mesh file="hand_3.obj"/>
    <mesh file="hand_4.obj"/>
    <mesh file="finger_0.obj"/>
    <mesh file="finger_1.obj"/>
  </asset>

  <worldbody>
    <light pos="0 0 1.5" dir="0 0 -1" directional="true"/>
    <geom name="floor" size="0 0 0.05" type="plane" material="groundplane"/>

    <light name="top" pos="0 0 2" mode="trackcom"/>
    <body name="link0" childclass="panda">
      <inertial mass="0.629769" pos="-0.041018 -0.00014 0.049974"
        fullinertia="0.00315 0.00388 0.004285 8.2904e-7 0.00015 8.2299e-6"/>
      <geom mesh="link0_0" material="off_white" class="visual"/>
      <geom mesh="link0_1" material="black" class="visual"/>
      <geom mesh="link0_2" material="off_white" class="visual"/>
      <geom mesh="link0_3" material="black" class="visual"/>
      <geom mesh="link0_4" material="off_white" class="visual"/>
      <geom mesh="link0_5" material="black" class="visual"/>
      <geom mesh="link0_7" material="white" class="visual"/>
      <geom mesh="link0_8" material="white" class="visual"/>
      <geom mesh="link0_9" material="black" class="visual"/>
      <geom mesh="link0_10" material="off_white" class="visual"/>
      <geom mesh="link0_11" material="white" class="visual"/>
      <geom mesh="link0_c" class="collision"/>
      <body name="link1" pos="0 0 0.333">
        <inertial mass="4.970684" pos="0.003875 0.002081 -0.04762"
          fullinertia="0.70337 0.70661 0.0091170 -0.00013900 0.0067720 0.019169"/>
        <joint name="joint1"/>
        <geom material="white" mesh="link1" class="visual"/>
        <geom mesh="link1_c" class="collision"/>
        <body name="link2" quat="1 -1 0 0">
          <inertial mass="0.646926" pos="-0.003141 -0.02872 0.003495"
            fullinertia="0.0079620 2.8110e-2 2.5995e-2 -3.925e-3 1.0254e-2 7.04e-4"/>
          <joint name="joint2" range="-1.7628 1.7628"/>
          <geom material="white" mesh="link2" class="visual"/>
          <geom mesh="link2_c" class="collision"/>
          <body name="link3" pos="0 -0.316 0" quat="1 1 0 0">
            <joint name="joint3"/>
            <inertial mass="3.228604" pos="2.7518e-2 3.9252e-2 -6.6502e-2"
              fullinertia="3.7242e-2 3.6155e-2 1.083e-2 -4.761e-3 -1.1396e-2 -1.2805e-2"/>
            <geom mesh="link3_0" material="white" class="visual"/>
            <geom mesh="link3_1" material="white" class="visual"/>
            <geom mesh="link3_2" material="white" class="visual"/>
            <geom mesh="link3_3" material="black" class="visual"/>
            <geom mesh="link3_c" class="collision"/>
            <body name="link4" pos="0.0825 0 0" quat="1 1 0 0">
              <inertial mass="3.587895" pos="-5.317e-2 1.04419e-1 2.7454e-2"
                fullinertia="2.5853e-2 1.9552e-2 2.8323e-2 7.796e-3 -1.332e-3 8.641e-3"/>
              <joint name="joint4" range="-3.0718 -0.0698"/>
              <geom mesh="link4_0" material="white" class="visual"/>
              <geom mesh="link4_1" material="white" class="visual"/>
              <geom mesh="link4_2" material="black" class="visual"/>
              <geom mesh="link4_3" material="white" class="visual"/>
              <geom mesh="link4_c" class="collision"/>
              <body name="link5" pos="-0.0825 0.384 0" quat="1 -1 0 0">
                <inertial mass="1.225946" pos="-1.1953e-2 4.1065e-2 -3.8437e-2"
                  fullinertia="3.5549e-2 2.9474e-2 8.627e-3 -2.117e-3 -4.037e-3 2.29e-4"/>
                <joint name="joint5"/>
                <geom mesh="link5_0" material="black" class="visual"/>
                <geom mesh="link5_1" material="white" class="visual"/>
                <geom mesh="link5_2" material="white" class="visual"/>
                <geom mesh="link5_c0" class="collision"/>
                <geom mesh="link5_c1" class="collision"/>
                <geom mesh="link5_c2" class="collision"/>
                <body name="link6" quat="1 1 0 0">
                  <inertial mass="1.666555" pos="6.0149e-2 -1.4117e-2 -1.0517e-2"
                    fullinertia="1.964e-3 4.354e-3 5.433e-3 1.09e-4 -1.158e-3 3.41e-4"/>
                  <joint name="joint6" range="-0.0175 3.7525"/>
                  <geom mesh="link6_0" material="off_white" class="visual"/>
                  <geom mesh="link6_1" material="white" class="visual"/>
                  <geom mesh="link6_2" material="black" class="visual"/>
                  <geom mesh="link6_3" material="white" class="visual"/>
                  <geom mesh="link6_4" material="white" class="visual"/>
                  <geom mesh="link6_5" material="white" class="visual"/>
                  <geom mesh="link6_6" material="white" class="visual"/>
                  <geom mesh="link6_7" material="light_blue" class="visual"/>
                  <geom mesh="link6_8" material="light_blue" class="visual"/>
                  <geom mesh="link6_9" material="black" class="visual"/>
                  <geom mesh="link6_10" material="black" class="visual"/>
                  <geom mesh="link6_11" material="white" class="visual"/>
                  <geom mesh="link6_12" material="green" class="visual"/>
                  <geom mesh="link6_13" material="white" class="visual"/>
                  <geom mesh="link6_14" material="black" class="visual"/>
                  <geom mesh="link6_15" material="black" class="visual"/>
                  <geom mesh="link6_16" material="white" class="visual"/>
                  <geom mesh="link6_c" class="collision"/>
                  <body name="link7" pos="0.088 0 0" quat="1 1 0 0">
                    <inertial mass="7.35522e-01" pos="1.0517e-2 -4.252e-3 6.1597e-2"
                      fullinertia="1.2516e-2 1.0027e-2 4.815e-3 -4.28e-4 -1.196e-3 -7.41e-4"/>
                    <joint name="joint7"/>
                    <geom mesh="link7_0" material="white" class="visual"/>
                    <geom mesh="link7_1" material="black" class="visual"/>
                    <geom mesh="link7_2" material="black" class="visual"/>
                    <geom mesh="link7_3" material="black" class="visual"/>
                    <geom mesh="link7_4" material="black" class="visual"/>
                    <geom mesh="link7_5" material="black" class="visual"/>
                    <geom mesh="link7_6" material="black" class="visual"/>
                    <geom mesh="link7_7" material="white" class="visual"/>
                    <geom mesh="link7_c" class="collision"/>
                    <body name="hand" pos="0 0 0.107" quat="0.9238795 0 0 -0.3826834">
                      <inertial mass="0.73" pos="-0.01 0 0.03" diaginertia="0.001 0.0025 0.0017"/>
                      <geom mesh="hand_0" material="off_white" class="visual"/>
                      <geom mesh="hand_1" material="black" class="visual"/>
                      <geom mesh="hand_2" material="black" class="visual"/>
                      <geom mesh="hand_3" material="white" class="visual"/>
                      <geom mesh="hand_4" material="off_white" class="visual"/>
                      <geom mesh="hand_c" class="collision"/>
                      <site name="effector" pos="0 0 0.08"/>
                    </body>
                  </body>
                </body>
              </body>
            </body>
          </body>
        </body>
      </body>
    </body>

    <body name="target" pos="0 .6 .4" quat=".5 -.2 1 0" mocap="true">
      <site name="target" group="2"/>
      <geom class="target" fromto=".07 0 0 .15 0 0" rgba="1 0 0 1"/>
      <geom class="target" fromto="0 .07 0 0 .15 0" rgba="0 1 0 1"/>
      <geom class="target" fromto="0 0 .07 0 0 .15" rgba="0 0 1 1"/>
    </body>
  </worldbody>

  <actuator>
    <general class="panda" name="actuator1" joint="joint1" gainprm="4500" biasprm="0 -4500 -450"/>
    <general class="panda" name="actuator2" joint="joint2" gainprm="4500" biasprm="0 -4500 -450"
      ctrlrange="-1.7628 1.7628"/>
    <general class="panda" name="actuator3" joint="joint3" gainprm="3500" biasprm="0 -3500 -350"/>
    <general class="panda" name="actuator4" joint="joint4" gainprm="3500" biasprm="0 -3500 -350"
      ctrlrange="-3.0718 -0.0698"/>
    <general class="panda" name="actuator5" joint="joint5" gainprm="2000" biasprm="0 -2000 -200" forcerange="-12 12"/>
    <general class="panda" name="actuator6" joint="joint6" gainprm="2000" biasprm="0 -2000 -200" forcerange="-12 12"
      ctrlrange="-0.0175 3.7525"/>
    <general class="panda" name="actuator7" joint="joint7" gainprm="2000" biasprm="0 -2000 -200" forcerange="-12 12"/>
  </actuator>

  <keyframe>
    <key name="home" qpos="0 0 0 -1.57079 0 1.57079 -0.7853" ctrl="0 0 0 -1.57079 0 1.57079 -0.7853"/>
  </keyframe>
</mujoco>
'''
python
#@title Get Panda assets
!git clone https://github.com/google-deepmind/mujoco_menagerie
from os import walk
from os.path import join
assets_path = 'mujoco_menagerie/franka_emika_panda/assets/'
asset_names = next(walk(assets_path), (None, None, []))[2]
assets = {}
for name in asset_names:
  with open(join(assets_path, name), 'rb') as f:
    assets[name] = f.read()

python
#@title Load model, render {vertical-output: true}
model = mujoco.MjModel.from_xml_string(panda, assets)
data = mujoco.MjData(model)

# Reset the state to the "home" keyframe.
key = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_KEY, 'home')
mujoco.mj_resetDataKeyframe(model, data, key)
mujoco.mj_forward(model, data)

# If a renderer exists, close it.
if 'renderer' in locals():
  renderer.close()

# Make a Renderer and a camera.
renderer = mujoco.Renderer(model, height=360, width=480)
camera = mujoco.MjvCamera()
mujoco.mjv_defaultFreeCamera(model, camera)
camera.distance = 1.7
camera.elevation = -15
camera.azimuth = -130
camera.lookat = (0, 0, .3)

# Visualize site frames and labels
voption = mujoco.MjvOption()
voption.frame = mujoco.mjtFrame.mjFRAME_SITE
voption.label = mujoco.mjtLabel.mjLABEL_SITE
renderer.update_scene(data, camera, voption)
voption.label = mujoco.mjtLabel.mjLABEL_NONE

media.show_image(renderer.render())

Note how the target site frame has boxes along the axes, to make it visually distinct from the end-effector frame.

Orientation weighting

A common feature of the IK problem is that matching poses requires matching both positions and orientations. However, these residuals have different units. Position differences are in units of length (say, Meters), while orientation differences (log-map of a quaternion difference) are in unitless Radians. The user needs to provide a conversion factor denoting how much they "care" about matching orientation vs. position. Since this factor has units of length we call it radius, which can be thought of as the radius of a sphere around the effector frame on the surface of which angular errors are measured.

IK regularization

Our Panda arm has 7 degrees-of-freedom (dofs), making the IK problem under-determined, since the IK problem has 6 constraints (3 translation + 3 rotation). In order to enforce a single solution, we introduce a fixed regularizer attracting the configuration to some base pose.

Problem definition

We are now in a position to see the problem definition:

Bounds, initial guess

python
# Bounds at the joint limits.
bounds = [model.jnt_range[:, 0], model.jnt_range[:, 1]]

# Inital guess is the 'home' keyframe.
x0 = model.key('home').qpos

IK residual

python
def ik(x, pos=None, quat=None, radius=0.04, reg=1e-3, reg_target=None):
  """Residual for inverse kinematics.

  Args:
    x: joint angles.
    pos: target position for the end effector.
    quat: target orientation for the end effector.
    radius: scaling of the 3D cross.

  Returns:
    The residual of the Inverse Kinematics task.
  """

  # Move the mocap body to the target
  id = model.body('target').mocapid
  data.mocap_pos[id] = model.body('target').pos if pos is None else pos
  data.mocap_quat[id] = model.body('target').quat if quat is None else quat

  # Set qpos, compute forward kinematics.
  res = []
  for i in range(x.shape[1]):
    data.qpos = x[:, i]
    mujoco.mj_kinematics(model, data)

    # Position residual.
    res_pos = data.site('effector').xpos - data.site('target').xpos

    # Effector quat, use mju_mat2quat.
    effector_quat = np.empty(4)
    mujoco.mju_mat2Quat(effector_quat, data.site('effector').xmat)

    # Target quat, exploit the fact that the site is aligned with the body.
    target_quat = data.body('target').xquat

    # Orientation residual: quaternion difference.
    res_quat = np.empty(3)
    mujoco.mju_subQuat(res_quat, target_quat, effector_quat)
    res_quat *= radius

    # Regularization residual.
    reg_target = model.key('home').qpos if reg_target is None else reg_target
    res_reg = reg * (x[:, i] - reg_target)

    res_i = np.hstack((res_pos, res_quat, res_reg))
    res.append(np.atleast_2d(res_i).T)

  return np.hstack(res)

Analytic Jacobian

The IK problem is special in another way. Unlike the general case, here analytic Jacobians are computable by MuJoCo. Let's write down this Jacobian and then compare solution timing using finite differencing with timing using the analytic Jacobian.

python
def ik_jac(x, res, pos=None, quat=None, radius=.04, reg=1e-3):
  """Analytic Jacobian of inverse kinematics residual

  Args:
    x: joint angles.
    pos: target position for the end effector.
    quat: target orientation for the end effector.
    radius: scaling of the 3D cross.

  Returns:
    The Jacobian of the Inverse Kinematics task.
  """
  # least_squares() passes the value of the residual at x which is sometimes
  # useful, but we don't need it here.
  del res

  # Call mj_kinematics and mj_comPos (required for Jacobians).
  mujoco.mj_kinematics(model, data)
  mujoco.mj_comPos(model, data)

  # Get end-effector site Jacobian.
  jac_pos = np.empty((3, model.nv))
  jac_quat = np.empty((3, model.nv))
  mujoco.mj_jacSite(model, data, jac_pos, jac_quat, data.site('effector').id)

  # Get Deffector, the 3x3 mju_subquat Jacobian
  effector_quat = np.empty(4)
  mujoco.mju_mat2Quat(effector_quat, data.site('effector').xmat)
  target_quat = data.body('target').xquat
  Deffector = np.empty((3, 3))
  mujoco.mjd_subQuat(target_quat, effector_quat, None, Deffector)

  # Rotate into target frame, multiply by subQuat Jacobian, scale by radius.
  target_mat = data.site('target').xmat.reshape(3, 3)
  mat = radius * Deffector.T @ target_mat.T
  jac_quat = mat @ jac_quat

  # Regularization Jacobian.
  jac_reg = reg * np.eye(model.nv)

  return np.vstack((jac_pos, jac_quat, jac_reg))

Let's compare the performance of the function with finite-differenced and analytic Jacobians:

python
print('Finite-differenced Jacobian:')
x_fd, _ = minimize.least_squares(x0, ik, bounds, verbose=1);
print('Analytic Jacobian:')
x_analytic, _ = minimize.least_squares(x0, ik, bounds, jacobian=ik_jac,
                                       verbose=1, check_derivatives=True);

# Assert that we got a nearly identical solution
assert np.linalg.norm(x_fd - x_analytic) < 1e-5

Nice speed-up! This will become more pronounced the harder the specific IK problem. We'll do a more comprehensive timing comparison a few cells down (this specific configuration happens to be solved rather slowly).

Note that we passed check_derivatives=True to ask the function to verify that our analytic Jacobian is correct, by making a comparison to the internal finite-difference function at the first timestep.

Visualizing solutions

Let's see what the solution to our problem looks like.

python
#@title Basic solution {vertical-output: true}

x, _ = minimize.least_squares(x0, ik, bounds, jacobian=ik_jac,
                              verbose=0);

# Update and visualize
data.qpos = x
mujoco.mj_kinematics(model, data)
mujoco.mj_camlight(model, data)
camera.distance = 1
camera.lookat = data.site('effector').xpos
renderer.update_scene(data, camera, voption)
media.show_image(renderer.render())

In order to get a better intuition for the effects of the radius argument, let's give the IK solver a more difficult target pose, and vary the radius:

python
#@title Small `radius` {vertical-output: true}

pos = np.array((.2, .8, .5))
quat = np.array((1, 1, 0, 0))
radius = 0.04

ik_target = lambda x: ik(x, pos=pos, quat=quat, radius=radius)
jac_target = lambda x, r: ik_jac(x, r, pos=pos, quat=quat,
                                 radius=radius)

x, _ = minimize.least_squares(x0, ik_target, bounds,
                              jacobian=jac_target,
                              verbose=0);

# Update and visualize
data.qpos = x
mujoco.mj_kinematics(model, data)
mujoco.mj_camlight(model, data)
camera.distance = 1
camera.lookat = data.site('effector').xpos
camera.azimuth = -150
camera.elevation = -20
renderer.update_scene(data, camera, voption)
media.show_image(renderer.render())

We can see that the positions match well, but the orientations are very wrong.

python
#@title Large `radius` {vertical-output: true}

pos = np.array((.2, .8, .5))
quat = np.array((1, 1, 0, 0))
radius = 0.5

ik_target = lambda x: ik(x, pos=pos, quat=quat, radius=radius)
jac_target = lambda x, r: ik_jac(x, r, pos=pos, quat=quat,
                                 radius=radius)

x, _ = minimize.least_squares(x0, ik_target, bounds,
                              jacobian=jac_target,
                              verbose=0);

# Update and visualize
data.qpos = x
mujoco.mj_kinematics(model, data)
mujoco.mj_camlight(model, data)
camera.distance = 1
camera.lookat = data.site('effector').xpos
camera.azimuth = -150
camera.elevation = -20
renderer.update_scene(data, camera, voption)
media.show_image(renderer.render())

Now the orientations are a better match, but the position error is much larger.

Smooth motion

It is often the case that one wants to smoothly track a given end-effector trajectory. Let's invent some smooth traget trajectory:

python
# @title Continuous target trajectory {vertical-output: true}
def pose(time):
  pos = (0.4 * np.sin(time),
         0.4 * np.cos(time),
         0.4 + 0.2 * np.sin(3 * time))
  quat = np.array((1.0, np.sin(2 * time), np.sin(time), 0))
  quat /= np.linalg.norm(quat)
  return pos, quat

# Number of frames
n_frame = 400

# Reset the camera, make the arm point straight up.
camera.distance = 1.5
camera.elevation = -15
camera.azimuth = -130
camera.lookat = (0, 0, 0.3)
mujoco.mj_resetData(model, data)

frames = []
for t in np.linspace(0, 2 * np.pi, n_frame):
  # Move the target
  pos, quat = pose(t)
  id = model.body('target').mocapid
  data.mocap_pos[id] = pos
  data.mocap_quat[id] = quat

  mujoco.mj_kinematics(model, data)
  mujoco.mj_camlight(model, data)
  renderer.update_scene(data, camera, voption)

  frames.append(renderer.render())

media.show_video(frames)

Let's see what the arm motion looks like if we solve the IK problem independently for each frame, always starting at our x0 initial guess. While we're doing this, we'll also time our solution, with and without analytic derivatives, to get a better sense of how long the optimization takes.

python
#@title Full IK for every frame {vertical-output: true}

frames = []
time_fd = []
time_analytic = []
for t in np.linspace(0, 2 * np.pi, n_frame):
  # Get target pose
  pos, quat = pose(t)

  # Define IK problem
  ik_target = lambda x: ik(x, pos=pos, quat=quat)
  jac_target = lambda x, r: ik_jac(x, r, pos=pos, quat=quat)

  # Solve while timing, fin-diff Jacobian
  t_start = time.time()
  x, _ = minimize.least_squares(x0, ik_target, bounds,
                                verbose=0);
  time_fd.append(1000*(time.time() - t_start))

  # Solve while timing, analytic Jacobian
  t_start = time.time()
  x, _ = minimize.least_squares(x0, ik_target, bounds,
                                jacobian=jac_target,
                                verbose=0);
  time_analytic.append(1000*(time.time() - t_start))

  mujoco.mj_kinematics(model, data)
  mujoco.mj_camlight(model, data)
  renderer.update_scene(data, camera, voption)

  frames.append(renderer.render())

media.show_video(frames, loop=False)

mean_analytic = np.asarray(time_analytic).mean()
mean_fd = np.asarray(time_fd).mean()
prfx = 'Mean solution time using '
print(prfx + f'fin-diff Jacobian: {mean_fd:3.1f}ms')
print(prfx + f'analytic Jacobian: {mean_analytic:3.1f}ms')

Yikes! Solutions are being found, but what's with all the "glitches"? The IK problem often has multiple local minima and since we are solving from scratch each time, we end up in a different one, somewhat arbitrarily.

A very easy way to mitigate this is to initialize ("warmstart") the solver with the previous solution:

python
#@title Warmstart with previous solution {vertical-output: true}

frames = []
x = x0
for t in np.linspace(0, 2 * np.pi, n_frame):
  # Get target pose
  pos, quat = pose(t)

  # Define IK problem
  ik_target = lambda x: ik(x, pos=pos, quat=quat)
  jac_target = lambda x, r: ik_jac(x, r, pos=pos, quat=quat)

  x, _ = minimize.least_squares(x, ik_target, bounds,
                                jacobian=jac_target,
                                verbose=0);

  mujoco.mj_kinematics(model, data)
  mujoco.mj_camlight(model, data)
  renderer.update_scene(data, camera, voption)

  frames.append(renderer.render())

media.show_video(frames, loop=False)

This looks much better, but there are still a few "glitches". An esy way to mitigate those is to modify the regularization target to also be the previous solution. We are effectively telling the solver not to stray too far.

python
#@title Warmstart with and regularize to previous solution {vertical-output: true}

frames = []
x = x0
for t in np.linspace(0, 2 * np.pi, n_frame):
  # Get target pose
  pos, quat = pose(t)

  x_prev = x.copy()

  # Define IK problem
  ik_target = lambda x: ik(x, pos=pos, quat=quat,
                           reg_target=x_prev, reg=.1)
  jac_target = lambda x, r: ik_jac(x, r, pos=pos, quat=quat)

  x, _ = minimize.least_squares(x, ik_target, bounds,
                                jacobian=jac_target,
                                verbose=0);

  mujoco.mj_kinematics(model, data)
  mujoco.mj_camlight(model, data)
  renderer.update_scene(data, camera, voption)

  frames.append(renderer.render())

media.show_video(frames, loop=False)

Non-quadratic norms

We'll now use least_squares to solve a trajectory optimization (control) problem. While this is not its intended purpose, it demonstrates the power and general usability of the function.

After using regular Least Squares, we'll define a custom non-quadratic norm and solve again.

Below, we copy MuJoCo's standard humanoid model, with the following modifications:

  1. Added a "target" mocap body with a pink spherical site.
  2. Changed the color of the right hand to pink.
  3. Replaced the torque actuators with position actuators.
  4. Added sensors corresponding to the residual.
python
#@title Humanoid XML
xml = """
<mujoco model="Humanoid">
  <option timestep="0.005"/>

  <visual>
    <map force="0.1" zfar="30"/>
    <rgba haze="0.15 0.25 0.35 1"/>
    <global offwidth="2560" offheight="1440" elevation="-20" azimuth="120"/>
  </visual>

  <statistic center="0 0 0.7"/>

  <asset>
    <texture type="skybox" builtin="gradient" rgb1=".3 .5 .7" rgb2="0 0 0" width="32" height="512"/>
    <texture name="body" type="cube" builtin="flat" mark="cross" width="128" height="128" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" random="0.01"/>
    <material name="body" texture="body" texuniform="true" rgba="0.8 0.6 .4 1"/>
    <texture name="grid" type="2d" builtin="checker" width="512" height="512" rgb1=".1 .2 .3" rgb2=".2 .3 .4"/>
    <material name="grid" texture="grid" texrepeat="1 1" texuniform="true" reflectance=".2"/>
  </asset>

  <default>
    <position inheritrange="0.95"/>
    <default class="body">

      <!-- geoms -->
      <geom type="capsule" condim="1" friction=".7" solimp="0.9 .99 .003" solref=".015 1" material="body" group="1"/>
      <default class="thigh">
        <geom size=".06"/>
      </default>
      <default class="shin">
        <geom fromto="0 0 0 0 0 -.3"  size=".049"/>
      </default>
      <default class="foot">
        <geom size=".027"/>
        <default class="foot1">
          <geom fromto="-.07 -.01 0 .14 -.03 0"/>
        </default>
        <default class="foot2">
          <geom fromto="-.07 .01 0 .14  .03 0"/>
        </default>
      </default>
      <default class="arm_upper">
        <geom size=".04"/>
      </default>
      <default class="arm_lower">
        <geom size=".031"/>
      </default>
      <default class="hand">
        <geom type="sphere" size=".04"/>
      </default>

      <!-- joints -->
      <joint type="hinge" damping=".2" stiffness="1" armature=".01" limited="true" solimplimit="0 .99 .01"/>
      <default class="joint_big">
        <joint damping="5" stiffness="10"/>
        <default class="hip_x">
          <joint range="-30 10"/>
        </default>
        <default class="hip_z">
          <joint range="-60 35"/>
        </default>
        <default class="hip_y">
          <joint axis="0 1 0" range="-150 20"/>
        </default>
        <default class="joint_big_stiff">
          <joint stiffness="20"/>
        </default>
      </default>
      <default class="knee">
        <joint pos="0 0 .02" axis="0 -1 0" range="-160 2"/>
      </default>
      <default class="ankle">
        <joint range="-50 50"/>
        <default class="ankle_y">
          <joint pos="0 0 .08" axis="0 1 0" stiffness="6"/>
        </default>
        <default class="ankle_x">
          <joint pos="0 0 .04" stiffness="3"/>
        </default>
      </default>
      <default class="shoulder">
        <joint range="-85 60"/>
      </default>
      <default class="elbow">
        <joint range="-100 50" stiffness="0"/>
      </default>
    </default>
  </default>

  <worldbody>
    <body name="target" pos=".2 -.2 1" mocap="true">
      <site name="target" size=".05" rgba="1 0 1 .4"/>
    </body>
    <geom name="floor" size="0 0 .05" type="plane" material="grid" condim="3"/>
    <light name="spotlight" mode="targetbodycom" target="torso" diffuse=".8 .8 .8" specular="0.3 0.3 0.3" pos="0 -6 4" cutoff="30"/>
    <body name="torso" pos="0 0 1.282" childclass="body">
      <light name="top" pos="0 0 2" mode="trackcom"/>
      <camera name="back" pos="-3 0 1" xyaxes="0 -1 0 1 0 2" mode="trackcom"/>
      <camera name="side" pos="0 -3 1" xyaxes="1 0 0 0 1 2" mode="trackcom"/>
      <freejoint name="root"/>
      <geom name="torso" fromto="0 -.07 0 0 .07 0" size=".07"/>
      <geom name="waist_upper" fromto="-.01 -.06 -.12 -.01 .06 -.12" size=".06"/>
      <body name="head" pos="0 0 .19">
        <geom name="head" type="sphere" size=".09"/>
        <camera name="egocentric" pos=".09 0 0" xyaxes="0 -1 0 .1 0 1" fovy="80"/>
      </body>
      <body name="waist_lower" pos="-.01 0 -.26">
        <geom name="waist_lower" fromto="0 -.06 0 0 .06 0" size=".06"/>
        <joint name="abdomen_z" pos="0 0 .065" axis="0 0 1" range="-45 45" class="joint_big_stiff"/>
        <joint name="abdomen_y" pos="0 0 .065" axis="0 1 0" range="-75 30" class="joint_big"/>
        <body name="pelvis" pos="0 0 -.165">
          <joint name="abdomen_x" pos="0 0 .1" axis="1 0 0" range="-35 35" class="joint_big"/>
          <geom name="butt" fromto="-.02 -.07 0 -.02 .07 0" size=".09"/>
          <body name="thigh_right" pos="0 -.1 -.04">
            <joint name="hip_x_right" axis="1 0 0" class="hip_x"/>
            <joint name="hip_z_right" axis="0 0 1" class="hip_z"/>
            <joint name="hip_y_right" class="hip_y"/>
            <geom name="thigh_right" fromto="0 0 0 0 .01 -.34" class="thigh"/>
            <body name="shin_right" pos="0 .01 -.4">
              <joint name="knee_right" class="knee"/>
              <geom name="shin_right" class="shin"/>
              <body name="foot_right" pos="0 0 -.39">
                <joint name="ankle_y_right" class="ankle_y"/>
                <joint name="ankle_x_right" class="ankle_x" axis="1 0 .5"/>
                <geom name="foot1_right" class="foot1"/>
                <geom name="foot2_right" class="foot2"/>
              </body>
            </body>
          </body>
          <body name="thigh_left" pos="0 .1 -.04">
            <joint name="hip_x_left" axis="-1 0 0" class="hip_x"/>
            <joint name="hip_z_left" axis="0 0 -1" class="hip_z"/>
            <joint name="hip_y_left" class="hip_y"/>
            <geom name="thigh_left" fromto="0 0 0 0 -.01 -.34" class="thigh"/>
            <body name="shin_left" pos="0 -.01 -.4">
              <joint name="knee_left" class="knee"/>
              <geom name="shin_left" fromto="0 0 0 0 0 -.3" class="shin"/>
              <body name="foot_left" pos="0 0 -.39">
                <joint name="ankle_y_left" class="ankle_y"/>
                <joint name="ankle_x_left" class="ankle_x" axis="-1 0 -.5"/>
                <geom name="foot1_left" class="foot1"/>
                <geom name="foot2_left" class="foot2"/>
              </body>
            </body>
          </body>
        </body>
      </body>
      <body name="upper_arm_right" pos="0 -.17 .06">
        <joint name="shoulder1_right" axis="2 1 1"  class="shoulder"/>
        <joint name="shoulder2_right" axis="0 -1 1" class="shoulder"/>
        <geom name="upper_arm_right" fromto="0 0 0 .16 -.16 -.16" class="arm_upper"/>
        <body name="lower_arm_right" pos=".18 -.18 -.18">
          <joint name="elbow_right" axis="0 -1 1" class="elbow"/>
          <geom name="lower_arm_right" fromto=".01 .01 .01 .17 .17 .17" class="arm_lower"/>
          <body name="hand_right" pos=".18 .18 .18">
            <geom name="hand_right" zaxis="1 1 1" class="hand" rgba="1 0 1 1"/>
          </body>
        </body>
      </body>
      <body name="upper_arm_left" pos="0 .17 .06">
        <joint name="shoulder1_left" axis="-2 1 -1" class="shoulder"/>
        <joint name="shoulder2_left" axis="0 -1 -1"  class="shoulder"/>
        <geom name="upper_arm_left" fromto="0 0 0 .16 .16 -.16" class="arm_upper"/>
        <body name="lower_arm_left" pos=".18 .18 -.18">
          <joint name="elbow_left" axis="0 -1 -1" class="elbow"/>
          <geom name="lower_arm_left" fromto=".01 -.01 .01 .17 -.17 .17" class="arm_lower"/>
          <body name="hand_left" pos=".18 -.18 .18">
            <geom name="hand_left" zaxis="1 -1 1" class="hand"/>
          </body>
        </body>
      </body>
    </body>
  </worldbody>

  <contact>
    <exclude body1="waist_lower" body2="thigh_right"/>
    <exclude body1="waist_lower" body2="thigh_left"/>
  </contact>

  <tendon>
    <fixed name="hamstring_right" limited="true" range="-0.3 2">
      <joint joint="hip_y_right" coef=".5"/>
      <joint joint="knee_right" coef="-.5"/>
    </fixed>
    <fixed name="hamstring_left" limited="true" range="-0.3 2">
      <joint joint="hip_y_left" coef=".5"/>
      <joint joint="knee_left" coef="-.5"/>
    </fixed>
  </tendon>

  <actuator>
    <position name="abdomen_z"       kp="40"  joint="abdomen_z"/>
    <position name="abdomen_y"       kp="40"  joint="abdomen_y"/>
    <position name="abdomen_x"       kp="40"  joint="abdomen_x"/>
    <position name="hip_x_right"     kp="40"  joint="hip_x_right"/>
    <position name="hip_z_right"     kp="40"  joint="hip_z_right"/>
    <position name="hip_y_right"     kp="120" joint="hip_y_right"/>
    <position name="knee_right"      kp="80"  joint="knee_right"/>
    <position name="ankle_y_right"   kp="20"  joint="ankle_y_right"/>
    <position name="ankle_x_right"   kp="20"  joint="ankle_x_right"/>
    <position name="hip_x_left"      kp="40"  joint="hip_x_left"/>
    <position name="hip_z_left"      kp="40"  joint="hip_z_left"/>
    <position name="hip_y_left"      kp="120" joint="hip_y_left"/>
    <position name="knee_left"       kp="80"  joint="knee_left"/>
    <position name="ankle_y_left"    kp="20"  joint="ankle_y_left"/>
    <position name="ankle_x_left"    kp="20"  joint="ankle_x_left"/>
    <position name="shoulder1_right" kp="20"  joint="shoulder1_right"/>
    <position name="shoulder2_right" kp="20"  joint="shoulder2_right"/>
    <position name="elbow_right"     kp="40"  joint="elbow_right"/>
    <position name="shoulder1_left"  kp="20"  joint="shoulder1_left"/>
    <position name="shoulder2_left"  kp="20"  joint="shoulder2_left"/>
    <position name="elbow_left"      kp="40"  joint="elbow_left"/>
  </actuator>

  <sensor>
    <framepos objtype="geom" objname="hand_right" reftype="xbody" refname="target"/>
    <actuatorfrc actuator="abdomen_z"/>
    <actuatorfrc actuator="abdomen_y"/>
    <actuatorfrc actuator="abdomen_x"/>
    <actuatorfrc actuator="hip_x_right"/>
    <actuatorfrc actuator="hip_z_right"/>
    <actuatorfrc actuator="hip_y_right"/>
    <actuatorfrc actuator="knee_right"/>
    <actuatorfrc actuator="ankle_y_right"/>
    <actuatorfrc actuator="ankle_x_right"/>
    <actuatorfrc actuator="hip_x_left"/>
    <actuatorfrc actuator="hip_z_left"/>
    <actuatorfrc actuator="hip_y_left"/>
    <actuatorfrc actuator="knee_left"/>
    <actuatorfrc actuator="ankle_y_left"/>
    <actuatorfrc actuator="ankle_x_left"/>
    <actuatorfrc actuator="shoulder1_right"/>
    <actuatorfrc actuator="shoulder2_right"/>
    <actuatorfrc actuator="elbow_right"/>
    <actuatorfrc actuator="shoulder1_left"/>
    <actuatorfrc actuator="shoulder2_left"/>
    <actuatorfrc actuator="elbow_left"/>
  </sensor>

  <keyframe>
    <!--
    The values below are split into rows for readibility:
      torso position
      torso orientation
      spinal
      right leg
      left leg
      arms
    -->
    <key name="squat"
         qpos="0 0 0.596
               0.988015 0 0.154359 0
               0 0.4 0
               -0.25 -0.5 -2.5 -2.65 -0.8 0.56
               -0.25 -0.5 -2.5 -2.65 -0.8 0.56
               0 0 0 0 0 0"/>
    <key name="stand_on_left_leg"
         qpos="0 0 1.21948
               0.971588 -0.179973 0.135318 -0.0729076
               -0.0516 -0.202 0.23
               -0.24 -0.007 -0.34 -1.76 -0.466 -0.0415
               -0.08 -0.01 -0.37 -0.685 -0.35 -0.09
               0.109 -0.067 -0.7 -0.05 0.12 0.16"/>
    <key name="prone"
         qpos="0.4 0 0.0757706
               0.7325 0 0.680767 0
               0 0.0729 0
               0.0077 0.0019 -0.026 -0.351 -0.27 0
               0.0077 0.0019 -0.026 -0.351 -0.27 0
               0.56 -0.62 -1.752
               0.56 -0.62 -1.752"/>
    <key name="supine"
         qpos="-0.4 0 0.08122
               0.722788 0 -0.69107 0
               0 -0.25 0
               0.0182 0.0142 0.3 0.042 -0.44 -0.02
               0.0182 0.0142 0.3 0.042 -0.44 -0.02
               0.186 -0.73 -1.73
               0.186 -0.73 -1.73"/>
  </keyframe>
</mujoco>
"""

Let's load the model and render the initial state for our control problem, chosen to be the "squat" keyframe.

python
# Load model, make data
model = mujoco.MjModel.from_xml_string(xml)
data = mujoco.MjData(model)

# Set the state to the "squat" keyframe, call mj_forward.
key = model.key('squat').id
mujoco.mj_resetDataKeyframe(model, data, key)
mujoco.mj_forward(model, data)

# If a renderer exists, close it.
if 'renderer' in locals():
  renderer.close()

# Make a Renderer and a camera.
renderer = mujoco.Renderer(model, height=480, width=640)
camera = mujoco.MjvCamera()
mujoco.mjv_defaultFreeCamera(model, camera)
camera.distance = 3
camera.elevation = -10

# Point the camera at the humanoid, render.
camera.lookat = data.body('torso').subtree_com
renderer.update_scene(data, camera)
media.show_image(renderer.render())

Problem definition

We define the following optimal control problem defintion:

  • A trajectory is rolled out from $t=0\ldots T$, starting at the "squat" keyframe shown above.
  • Controls $u_t$ are applied during the rollout which are a linear interpolation of first control $u_0$ and the last one $u_T$. These two vectors are our decision variable $x = \begin{pmatrix} u_0 & u_T \end{pmatrix}$.
  • The residual is a concatenation, over all time steps, of:
    • The vector from the right hand to the target.
    • The torques applied by the actuators, scaled by some factor (they are numerically much larger than the hand-target distances).

Our residual uses mujoco.rollout to evaluate parallel trajectories:

python
def reach(ctrl0T, target, T, torque_scale, traj=None):
  """Residual for target-reaching task.

  Args:
    ctrl0T: contatenation of the first and last control vectors.
    target: target to which the right hand should reach.
    T: final time for the rollout.
    torque_scale: coefficient by which to scale the torques.
    traj: optional list of positions to be recorded.

  Returns:
    The residual of the target-reaching task.
  """
  # Extract the initial and final ctrl vectors, transpose to row vectors
  ctrl0 = ctrl0T[:model.nu, :].T
  ctrlT = ctrl0T[model.nu:, :].T

  # Move the mocap body to the target
  mocapid = model.body('target').mocapid
  data.mocap_pos[mocapid] = target

  # Append the mocap targets to the controls
  nbatch  = ctrl0.shape[0]
  mocap = np.tile(data.mocap_pos[mocapid], (nbatch, 1))
  ctrl0 = np.hstack((ctrl0, mocap))
  ctrlT = np.hstack((ctrlT, mocap))

  # Define control spec (ctrl + mocap_pos)
  mjtState = mujoco.mjtState
  control_spec = mjtState.mjSTATE_CTRL | mjtState.mjSTATE_MOCAP_POS

  # Interpolate and stack the control sequences
  nstep = int(np.round(T / model.opt.timestep))
  control = np.stack(np.linspace(ctrl0, ctrlT, nstep), axis=1)

  # Reset to the "squat" keyframe, get the initial state
  key = model.key('squat').id
  mujoco.mj_resetDataKeyframe(model, data, key)
  spec = mjtState.mjSTATE_FULLPHYSICS
  nstate = mujoco.mj_stateSize(model, spec)
  state = np.empty(nstate)
  mujoco.mj_getState(model, data, state, spec)

  # Perform rollouts (sensors.shape == nbatch, nstep, nsensordata)
  states, sensors = rollout.rollout(model, data, state, control,
                                    control_spec=control_spec)

  # If requested, extract qpos into traj
  if traj is not None:
    assert states.shape[0] == 1
    # Skip the first element in state (mjData.time)
    traj.extend(np.split(states[0, :, 1:model.nq+1], nstep))

  # Scale torque sensors
  sensors[:, :, 3:] *= torque_scale

  # Reshape to stack the sensor values, transpose to column vectors
  sensors = sensors.reshape((sensors.shape[0], -1)).T

  # The normalizer keeps objective values similar when changing T or timestep.
  normalizer = 100 * model.opt.timestep / T
  return normalizer * sensors

Now let's give the rest of the problem definition:

  1. The trajectory is integrated for 0.7s.
  2. Torques are scaled by 0.003.
  3. Since our decision variable is two copies of mjData.ctrl, the bounds are two concatenated copies of mjData.actuator_ctrlrange.
  4. Our initial guess $x_0 = \begin{pmatrix} u_0 & u_T \end{pmatrix} = \begin{pmatrix} q_\textrm{squat} & q_\textrm{stand} \end{pmatrix}$ is the joint angles at the squatting position, followed by the angles at the default (standing position). We can use angles to initialize our controls because the position actuators have angle semantics.
python
T = 0.7               # Rollout length (seconds)
torque_scale = 0.003  # Scaling for the torques

# Bounds are the stacked control bounds.
lower = np.atleast_2d(model.actuator_ctrlrange[:,0]).T
upper = np.atleast_2d(model.actuator_ctrlrange[:,1]).T
bounds = [np.vstack((lower, lower)), np.vstack((upper, upper))]

# Initial guess is midpoint of the bounds
x0 = 0.5 * (bounds[1] + bounds[0])

Let's define a utility function for rendering frames and visualize the initial guess:

python
def render_solution(x, target):
  # Ask reach to save positions to traj.
  traj = []
  reach(x, target, T, torque_scale, traj=traj);

  frames = []
  counter = 0
  print('Rendering frames:', flush=True, end='')
  for qpos in traj:
    # Set positions, call mj_forward to update kinematics.
    data.qpos = qpos
    mujoco.mj_forward(model, data)

    # Render and save frames.
    camera.lookat = data.body('torso').subtree_com
    renderer.update_scene(data, camera)
    pixels = renderer.render()
    frames.append(pixels)
    counter += 1
    if counter % 10 == 0:
      print(f' {counter}', flush=True, end='')
  return frames

# Use default target.
target = data.mocap_pos[model.body('target').mocapid]

# Visualize the initial guess.
media.show_video(render_solution(x0, target))

Solutions to the reach task

Let's solve once for some target and look at the optimization printout:

python
target = (.4, -.3, 1.2)

reach_target = lambda x: reach(x, target, T, torque_scale, traj=None)

r0 = reach_target(x0)
print(f'The decision variable x has size {x0.size}')
print(f'The residual r(x) has size {r0.size}\n')

x, _ = minimize.least_squares(x0, reach_target, bounds);

Let's see what this solution looks like:

python
media.show_video(render_solution(x, target))

Let's rerun this for several target values and make a video of all of them:

python
targets = [(0.4, 0., 0.), (0.2, -1., 0.5), (-1., -.3, 1.), (0., -.2, 2.2)]

frames = []
for target in targets:
  res_target = lambda x: reach(x, target, T, torque_scale)
  print(f'Optimizing for target at {target}', flush=True)
  x, trace = minimize.least_squares(x0, res_target, bounds,
                                    verbose=minimize.Verbosity.FINAL)
  frames += render_solution(x, target)
  print('\n')

print('Making video', flush=True)
media.show_video(frames)

Non-quadratic norms

As explained in the background section, Least Squares can be generalized to norms other than the quadratic. We are now in a position to show how to define a non-quadratic norm, which is important in the estimation and system-identification contexts, where long-tailed, disturbance-rejecting distributions are proportional to the exponent of a non-quadratic function.

Let's say that we wish the task residual i.e., the vector from the hand to the target, to be evaluated with the "Smooth L2" function $c(r)$ which, for a given smoothing radius $d \gt 0$ is $$ c(r) = \sqrt{r^T\cdot r + d^2 } - d $$ This function is quadratic in a $d$-sized neighborhood of the origin, and then grows linearly thereafter, like the L2 norm. The first and second derivatives are $$ \begin{align} s&=\sqrt{r^T\cdot r + d^2 }\ g &= \tfrac{\partial c}{\partial r} = \frac{r}{s} \ H &=\tfrac{\partial^2 c}{\partial r^2} = \frac{I_{n_r} - g\cdot g^T}{s} \end{align} $$ There is no particularly good reason to use this norm for this optimization task, it is merely an example.

Let's read the documentation of the minimize.Norm class:

python
print(minimize.Norm.__doc__)

Our sensors are 3 r_pos residual values for the hand-to-object vector followed by 21 r_torque actuator torques, for a total of ns = 24 sensors. These are concatenated for the entire trajectory, leading to a residual of size 24*N, where N is the number of timesteps in a trajectory. After reshaping and slicing appropriately, the norm implementation looks as follows:

python
class SmoothL2(minimize.Norm):
  def __init__(self):
    self.n = model.nsensordata  # Equals 24.
    self.d = 0.1  # The smoothing radius (length).

  def value(self, r):
    rr = r.reshape((self.n, -1), order='F')
    r_pos = rr[:3, :]
    s = np.sqrt(np.sum(r_pos**2, axis=0) + self.d**2)
    y_pos = (s - self.d).sum()
    r_torque = rr[3:, :]
    y_torque = 0.5 * (r_torque.T**2).sum()
    return y_pos + y_torque

  def grad_hess(self, r, proj):
    rr = r.reshape((self.n, -1), order='F')
    r_pos = rr[:3, :]
    s = np.sqrt(np.sum(r_pos**2, axis=0) + self.d**2)
    g_pos = r_pos / s
    g_torque = rr[3:, :]
    g = np.vstack((g_pos, g_torque))
    grad = proj.T @ g.reshape((-1, 1), order='F')
    h_proj = proj.copy()  # norm Hessian * projection matrix
    for i in range(g_pos.shape[1]):
      h_i = (np.eye(3) - g_pos[:, i:i+1] @ g_pos[:, i:i+1].T) / s[i]
      j = self.n*i
      h_proj[j:j+3, :] = h_i @ proj[j:j+3, :]
    hess = proj.T @ h_proj
    return grad, hess

Before running the optimization, let's ask least_squares to check our norm implementation. We'll do this with a short trajectory simulation time T, to avoid creating huge matrices.

python
target = (.4, -.3, 1.2)

T_short = 0.02

reach_target = lambda x: reach(x, target, T=T_short,
                               torque_scale=torque_scale, traj=None)

x, _ = minimize.least_squares(x0, reach_target, bounds, norm=SmoothL2(),
                              max_iter=1, check_derivatives=True);

Now that we are confident in our implemetation, we can see what the solution looks like:

python
target = (.4, -.3, 1.2)

reach_target = lambda x: reach(x, target, T, torque_scale, traj=None)

x, _ = minimize.least_squares(x0, reach_target, bounds, norm=SmoothL2());

media.show_video(render_solution(x, target))