Advanced plotting examples

If you want to try out this notebook with a live Python kernel, use mybinder:

Vaex uses matplotlib for creating plots, which allows for great flexibility. To avoid repetative "boilerplate" code, Vaex tries to cover many use-cases where you want to plot one or more panels using a simple declarative style.

The following examples will make use of the example dataset, which is a the results of a numerical simulation of how a galaxy like our own Milky Way was formed (source). The data contains the 3D position, velocity, angular momentum, energy and iron content for each start particle in the simulation.

Let us start by loading the data:

python

import vaex
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

python

df = vaex.example()
df.head()

A single plot

The simplest case is a single heatmap created by two axes, specified by the first two arguments:

python

df.viz.heatmap('x', 'y', title='Face on galaxy', limits='99%')

Multiple plots of the same type

The first argument can be a list of axes pairs. This produces multiple plots:

python

df.viz.heatmap([["x", "y"], ["x", "z"]], title="Face on and edge on", figsize=(10, 4), limits='99%');

Multiple plots, same axes, different statistics

If the what argument is a list, it will by default create multiple subplots:

python

df.viz.heatmap("x", "y", what=["count(*)", "mean(vx)", "correlation(vy,vz)"], 
               title="Different statistics", 
               figsize=(10, 5), limits='99%');

Multiple plots, different axes, different statistics

One can specify multiple axes pairs as tje first argument, as well as a list of what arguments. The resulting figure with have a number of subplots where the different axes combinations will form the rows, and the different what statistics will form the columns:

python

df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               title="Different statistics and plots", 
               figsize=(14,12), 
               limits='99%');

One can also specify the layout of the figure via the visual argument, which can be used to swap the row and column ordering of the subplots:

python

df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               visual=dict(row="what", column="subspace"),
               title="Different statistics and plots", 
               figsize=(14,12), 
               limits='99%');

Slices in a 3rd dimension

If a 3rd axis (z) is given, you can "slice" through the data, displaying the z slices as rows. Note that here the rows are wrapped, which can be changed with the wrap_columns argument:

python

df.viz.heatmap("Lz", "E", z="FeH:-3,-1,8", 
               visual=dict(row="z"), 
               figsize=(12, 8), 
               f="log", 
               wrap_columns=3, 
               limits='99%');

Many plots with wrapping

If one attempt to create a figure with many subplots, they will be nice wrapped. Where we create heatmaps of all combinations of columns in the example dataset, sorted by their mutual information:

python

# Get all column pars
pairs = df.combinations(exclude=['id'])
# Calculate the mutual information for each pair, sorted by the largest value
mi, pairs_sorted = df.mutual_information(pairs, sort=True)

# Create the figure
df.viz.heatmap(pairs_sorted, f='log', colorbar=False, figsize=(14, 20), limits='99%', wrap_columns=5);

Plotting selections

If the selection argument is used, than only the selection is plotted:

python

df.viz.heatmap("x", "y", selection="sqrt(x**2+y**2) < 5", limits=[-10, 10]);

If a list of selections is specified (False or None indicates no selection), than every selection by default forms a different "layer" of the figure produced:

python

df.viz.heatmap("x", "y", 
               selection=[None, "sqrt(x**2+y**2) < 5", "(sqrt(x**2+y**2) < 7) & (x < 0)"], 
               limits=[-10, 10]);

Overplotting a vector field on a heatmap

Astronomers argue that galaxies such as our own Milky Way were formed from many pre-galactic clumps that have merged and mixed together. One way to try and find the original pre-galactic fragments is to inspect the 2-dimensinoal distribution of their energy (𝐸) and angular momentum (𝐿𝑧). So let us make such a plot:

python

df.viz.heatmap('Lz', 'E', f='log', figsize=(9, 6));

Now, to show that the stars in each clump on the figure above are indeed moving coherently in space, we can overplot their velocity vectors on a positional heatmap.

First, let's select the stars that belong to one the clusters:

python

# specify ranges of angular momentum (Lz) and energy (E)
limits_Lz_E_clump = (1181.770, 1291.92), (-70850.91, -68491.16)

# Use the rectangle selection method
df.select_rectangle("Lz", "E", limits_Lz_E_clump, name="stream")

# Check how many stars we have selected
print(f'Selection contains {df.count(selection="stream")} "stars".')

We can also overplot the selected region, to convince ourselves that we have chosen a good region:

python

df.viz.heatmap("Lz", "E", selection=[None, "stream"], f="log", figsize=(9, 6));

Now let us plot the 𝑣𝑦 and 𝑣𝑧 velocity vectors on top of 𝑦−𝑧 plot. To start, lets compute a grid of mean 𝑣𝑦 and 𝑣𝑧 velocities. Notice that we are limiting the range of the 𝑣𝑦 and 𝑣𝑧 values to go between -20 and 20, and the grid resolution is 32x32 bins:

python

limits = [-20, 20]
shape_vector = 32
mean_vy = df.mean("vy", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mean_vz = df.mean("vz", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')

Next, let us create a meshgrid to hold the centres of the bins:

python

# create a 2d array with holds the center of the bins
centers = np.linspace(*limits, shape_vector, endpoint=False) + (limits[1] - limits[0])/2./shape_vector
z, y = np.meshgrid(centers, centers)

To keep the plot "clean", we also do not want visualize the velocity of the bins with low number counts:

python

# we don't want to show bins with low number of counts
counts = df.count(binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mask = counts.flatten( ) > 10

Finally we can plot a background density map of $v_y$ vs $v_z$, and then use plt.quiver to overplot the velocity vectors:

python

df.viz.heatmap("y", "z", limits=limits, f="log1p", figsize=(10, 9), selection=[None, "stream"], shape=128)

# overplot the mean velocity vectors
plt.quiver(y.flatten()[mask], 
           z.flatten()[mask], 
           mean_vy.flatten()[mask], 
           mean_vz.flatten()[mask], 
           color="white", 
           alpha=0.75);

We indeed see that the stars we selected move together, and form a stream!

Plotting a healpix map

Healpix is made available via the healpy package. Vaex does not need special support for healpix, but some helper functions are introduced to make working with healpix easier.

Make sure you have healpy installed. If you do not, you can install it with one of these commands:

!pip install healpy  # if you prefer pip
!conda install -c conda-forge healpy if you are using a conda package manager

To understand this better, we will start from the beginning. If we want to make a density sky plot, we would like to pass to healpy a 1d numpy array where each value represents the density at a location of the sphere, where the location is determined by the array size (the healpix level) and the offset (the location).

This example uses a simulated Gaia dataset. The Gaia data includes the healpix index encoded in the source_id column. By diving source_id by 34359738368 you get a healpix index level 12, and diving it further will take you to lower levels.

Let us start by fetching the dataset (Note: the dataset is ~700MB on disk).

python

import healpy as hp

python

df = vaex.datasets.tgas(full=True)
df.head()

Let's plot a healpix figure of level 2. We can start by counting the number of stars in each healpix region:

python

level = 2
factor = 34359738368 * (4**(12-level))
nmax = hp.nside2npix(2**level)
counts = df.count(binby="source_id/" + str(factor), limits=[0, nmax], shape=nmax)
counts

Using the healpy package, we can plot this in a molleweide projection

python

hp.mollview(counts, nest=True);

To avoid tying the above code all over again, we can use the df.healpix_count method instead:

python

counts = df.healpix_count(healpix_level=6)
hp.mollview(counts, nest=True)

Instead of using healpy, we can use vaex' df.viz.healpix_plot method:

python

df.viz.healpix_heatmap(f="log1p", healpix_level=6, figsize=(10,8), healpix_output="ecliptic")