docs/source/guides/advanced_plotting.ipynb
If you want to try out this notebook with a live Python kernel, use mybinder:
<a class="reference external image-reference" href="https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Fexample_advanced_plotting.ipynb"></a>
Vaex uses matplotlib for creating plots, which allows for great flexibility. To avoid repetative "boilerplate" code, Vaex tries to cover many use-cases where you want to plot one or more panels using a simple declarative style.
The following examples will make use of the example dataset, which is a the results of a numerical simulation of how a galaxy like our own Milky Way was formed (source). The data contains the 3D position, velocity, angular momentum, energy and iron content for each start particle in the simulation.
Let us start by loading the data:
import vaex
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
df = vaex.example()
df.head()
The simplest case is a single heatmap created by two axes, specified by the first two arguments:
df.viz.heatmap('x', 'y', title='Face on galaxy', limits='99%')
The first argument can be a list of axes pairs. This produces multiple plots:
df.viz.heatmap([["x", "y"], ["x", "z"]], title="Face on and edge on", figsize=(10, 4), limits='99%');
If the what argument is a list, it will by default create multiple subplots:
df.viz.heatmap("x", "y", what=["count(*)", "mean(vx)", "correlation(vy,vz)"],
title="Different statistics",
figsize=(10, 5), limits='99%');
One can specify multiple axes pairs as tje first argument, as well as a list of what arguments. The resulting figure with have a number of subplots where the different axes combinations will form the rows, and the different what statistics will form the columns:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
title="Different statistics and plots",
figsize=(14,12),
limits='99%');
One can also specify the layout of the figure via the visual argument, which can be used to swap the row and column ordering of the subplots:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
visual=dict(row="what", column="subspace"),
title="Different statistics and plots",
figsize=(14,12),
limits='99%');
If a 3rd axis (z) is given, you can "slice" through the data, displaying the z slices as rows. Note that here the rows are wrapped, which can be changed with the wrap_columns argument:
df.viz.heatmap("Lz", "E", z="FeH:-3,-1,8",
visual=dict(row="z"),
figsize=(12, 8),
f="log",
wrap_columns=3,
limits='99%');
If one attempt to create a figure with many subplots, they will be nice wrapped. Where we create heatmaps of all combinations of columns in the example dataset, sorted by their mutual information:
# Get all column pars
pairs = df.combinations(exclude=['id'])
# Calculate the mutual information for each pair, sorted by the largest value
mi, pairs_sorted = df.mutual_information(pairs, sort=True)
# Create the figure
df.viz.heatmap(pairs_sorted, f='log', colorbar=False, figsize=(14, 20), limits='99%', wrap_columns=5);
If the selection argument is used, than only the selection is plotted:
df.viz.heatmap("x", "y", selection="sqrt(x**2+y**2) < 5", limits=[-10, 10]);
If a list of selections is specified (False or None indicates no selection), than every selection by default forms a different "layer" of the figure produced:
df.viz.heatmap("x", "y",
selection=[None, "sqrt(x**2+y**2) < 5", "(sqrt(x**2+y**2) < 7) & (x < 0)"],
limits=[-10, 10]);
Astronomers argue that galaxies such as our own Milky Way were formed from many pre-galactic clumps that have merged and mixed together. One way to try and find the original pre-galactic fragments is to inspect the 2-dimensinoal distribution of their energy (πΈ) and angular momentum (πΏπ§). So let us make such a plot:
df.viz.heatmap('Lz', 'E', f='log', figsize=(9, 6));
Now, to show that the stars in each clump on the figure above are indeed moving coherently in space, we can overplot their velocity vectors on a positional heatmap.
First, let's select the stars that belong to one the clusters:
# specify ranges of angular momentum (Lz) and energy (E)
limits_Lz_E_clump = (1181.770, 1291.92), (-70850.91, -68491.16)
# Use the rectangle selection method
df.select_rectangle("Lz", "E", limits_Lz_E_clump, name="stream")
# Check how many stars we have selected
print(f'Selection contains {df.count(selection="stream")} "stars".')
We can also overplot the selected region, to convince ourselves that we have chosen a good region:
df.viz.heatmap("Lz", "E", selection=[None, "stream"], f="log", figsize=(9, 6));
Now let us plot the π£π¦ and π£π§ velocity vectors on top of π¦βπ§ plot. To start, lets compute a grid of mean π£π¦ and π£π§ velocities. Notice that we are limiting the range of the π£π¦ and π£π§ values to go between -20 and 20, and the grid resolution is 32x32 bins:
limits = [-20, 20]
shape_vector = 32
mean_vy = df.mean("vy", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mean_vz = df.mean("vz", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
Next, let us create a meshgrid to hold the centres of the bins:
# create a 2d array with holds the center of the bins
centers = np.linspace(*limits, shape_vector, endpoint=False) + (limits[1] - limits[0])/2./shape_vector
z, y = np.meshgrid(centers, centers)
To keep the plot "clean", we also do not want visualize the velocity of the bins with low number counts:
# we don't want to show bins with low number of counts
counts = df.count(binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mask = counts.flatten( ) > 10
Finally we can plot a background density map of $v_y$ vs $v_z$, and then use plt.quiver to overplot the velocity vectors:
df.viz.heatmap("y", "z", limits=limits, f="log1p", figsize=(10, 9), selection=[None, "stream"], shape=128)
# overplot the mean velocity vectors
plt.quiver(y.flatten()[mask],
z.flatten()[mask],
mean_vy.flatten()[mask],
mean_vz.flatten()[mask],
color="white",
alpha=0.75);
We indeed see that the stars we selected move together, and form a stream!
Healpix is made available via the healpy package. Vaex does not need special support for healpix, but some helper functions are introduced to make working with healpix easier.
Make sure you have healpy installed. If you do not, you can install it with one of these commands:
!pip install healpy # if you prefer pip
!conda install -c conda-forge healpy if you are using a conda package manager
To understand this better, we will start from the beginning. If we want to make a density sky plot, we would like to pass to healpy a 1d numpy array where each value represents the density at a location of the sphere, where the location is determined by the array size (the healpix level) and the offset (the location).
This example uses a simulated Gaia dataset. The Gaia data includes the healpix index encoded in the source_id column. By diving source_id by 34359738368 you get a healpix index level 12, and diving it further will take you to lower levels.
Let us start by fetching the dataset (Note: the dataset is ~700MB on disk).
import healpy as hp
df = vaex.datasets.tgas(full=True)
df.head()
Let's plot a healpix figure of level 2. We can start by counting the number of stars in each healpix region:
level = 2
factor = 34359738368 * (4**(12-level))
nmax = hp.nside2npix(2**level)
counts = df.count(binby="source_id/" + str(factor), limits=[0, nmax], shape=nmax)
counts
Using the healpy package, we can plot this in a molleweide projection
hp.mollview(counts, nest=True);
To avoid tying the above code all over again, we can use the df.healpix_count method instead:
counts = df.healpix_count(healpix_level=6)
hp.mollview(counts, nest=True)
Instead of using healpy, we can use vaex' df.viz.healpix_plot method:
df.viz.healpix_heatmap(f="log1p", healpix_level=6, figsize=(10,8), healpix_output="ecliptic")