dev-docs/RFCs/v8.0/binary-support-improvements-rfc.md
This proposal is intended to be an intermediate step towards full Arrow (or generic binary data frame) support in v8.x.
The purpose is to introduce necessary user-facing API changes in the 8.0 release, that will be compatible with the eventual Arrow support.
During a layer update, there are three main bottle necks in terms of performance:
For performance-sensitive applications that deal with a large amount of data or high-frequency updates, it is typical to transfer binary format between the server and the client.
Before v7, once the data is downloaded, the application is still required to construct a JavaScript array from the binary, adding memor & CPU time tolls on the client side. In v7.x (Binary Data RFC), we added two ways to use binary data that completely circumvent this step, hence eliminating the performance hit in step 1.
index-based accessor:
// binaryData is packed on the server, in the format of [x, y, r, g, b, a]
const DATA = {src: binaryData, length: binaryData.length / 6};
new ScatterplotLayer({
data: DATA,
getPosition: (object, {index, data, target}) => {
target[0] = data.src[index * 6];
target[1] = data.src[index * 6 + 1];
target[2] = 0;
return target;
},
getColor: (object, {index, data, target}) => {
target[0] = data.src[index * 6 + 2] * 255;
target[1] = data.src[index * 6 + 3] * 255;
target[2] = data.src[index * 6 + 4] * 255;
target[3] = data.src[index * 6 + 5] * 255;
return target;
},
...
})
This method involves writing an accessor that reads relevant information out of the binary blob at each object index. During attribute update, an internal buffer is allocated, and the accessor is called numInstance times to populate it. Hence the performance is comparable to using plain JavaScript array data at step 2 and 3.
supplying external buffers directly:
// binaryData is packed on the server, in the format of [x, y, r, g, b, a]
const DATA = {
length: binaryData.length / 6,
attributes: {
instancePositions: {value: binaryData, size: 2, stride: 4 * 6, offset: 0},
instanceColors: {value: binaryData, size: 4, stride: 4 * 6, offset: 4 * 2, normalized: true}
}
};
new PointCloudLayer({
data: DATA,
getNormal: [0, 0, 0]
});
This method skips step 2 entirely, and is possibly the most performant we'll ever get. However, it requires the application (BE and FE) to have knowledge of the internal implementation of a layer, including attribute names, array types and layouts, which is not documented and prone to breakage between minor releases.
AttributeManager, by applying a pre-defined transform to the logical attribute(s). The Deck attribute is mapped 1:1 with a WebGL buffer.For example, for scatterplot positions, the logical attribute is a Float64Array in the format of x0, y0, z0, x1, y1, z1, .... The Deck attribute is an interleved Float32Array in the format of x0, y0, z0, x0Low, y0Low, z0Low, .... Two shader attributes are created from the Deck attribute: instancePositions and instancePositions64Low.
Example2: for polygon positions, the logical attribute is an array that flattens all polygon vertices. The Deck attribute include a normalized positions array (vertices may be added to close loops) and an indices array from triangulation.
Moving into v8.x, we want to make binary data a first-class citizen, in the following ways:
Allow data.attributes use an accessor name as the key that map to a loaders.gl-compatible "attribute descriptor" object.
We shall explain this in the binary data developer guide as follows:
Each key-value pair in data.attributes maps from an accessor prop name (e.g. getPosition) to one of the following formats:
Buffer instancebuffer (Buffer)value (TypedArray)size (Number) - the number of elements per vertex attribute.offset (Number) - offset of the first vertex attribute into the buffer, in bytesstride (Number) - the offset between the beginning of consecutive vertex attributes, in bytesThe value array represents a flat buffer that contains the value for each object that would otherwise be returned by a function accessor. For example, getPosition: d => [d.x, d.y, d.z] is equivalent to getPosition: {value: new Float64Array([x0, y0, z0, x1, y1, z1, ...])}. This buffer can be constructed by something like data.flatMap(d => [d.x, d.y, d.z]).
Additionally, data.startIndices must be specified if the attributes contain variable-width data, for example paths and polygons. startIndices is an array that contains the index of the first vertex in each object. For most layers, it is assumed to be [0, 1, 2, 3, ...] as in one vertex per object.
// EXAMPLE 1 - PointCloudLayer
/*
binaryData is packed on the server:
original data: [
{x, y, r, g, b}, // d0
{x, y, r, g, b}, // d1
...
]
binary data: {
positionsAndColors: [d0x, d0y, d0r, d0g, d0b, d0a, d1x, d1y, ...]
}
*/
new PointCloudLayer({
data: {
length: binaryData.length / 6,
attributes: {
getPosition: {value: binaryData, size: 2, stride: 24, offset: 0},
getColor: {value: binaryData, size: 4, stride: 24, offset: 8, normalized: true},
getNormal: {value: [0, 0, 0], constant: true}
}
}
});
// EXAMPLE 2 - PathLayer
/*
binaryData is packed on the server:
original data: [
{path: [[x, y], [x, y], [x, y]]}, // p0
{path: [[x, y], [x, y]]}, // p1
...
]
binary data: {
positions: [p00x, p00y, p01x, p01y, p02x, p02y, p10x, p10y, ...],
colors: [p00r, p00g, p00b, p01r, p01g, p01b, p02r, p02g, p02b, p10r, p10g, p10b, ...],
startingIndices: [0, 3, 5, ...]
}
*/
new PathLayer({
data: {
length: binaryData.positions.length / 2,
startIndices: binaryData.startingIndices,
attributes: {
getPath: {value: binaryData.positions, size: 2},
getColor: {value: binaryData.colors, size: 3}
}
}
});
Implementation notes:
data.attributes, the AttributeManager will pass it to the Attribute instance as a "logical attribute".The advantages of this new API include:
data.flatMap(accessor), where accessor is a function that already works in the traditional use case. If tesselation/transform is required, the Attribute class will do it under the hood, hiding the implementation detail from applications.Relevant RFCs for full data frame support:
Currenly, composite layers support index-based accessors, but not external attribute buffers. They will not support this new feature initially.
If the user opts to use an interleved buffer, it should not be uploaded to the GPU once for each attribute. This may require a global data manager (similar to TypedArrayManager) that creates, tracks and destroys GL resources.
This can be implemented as a performance improvement after the initial release.