dev-docs/RFCs/v6.0/gpu-screengrid-aggregation-rfc.md
Given a set of points we need to render a grid on the screen, where color of each cell is a linear function of number of points that fall into the cell. We first project the points to the screen and then aggregate them by the cell they fall into.
In existing ScreenGridLayer, projections and aggregation are performed on CPU, where each point is processed sequentially, and resulting data is set as an attribute to render the grid cells with correct color.
Current solution can be optimized by performing projection and aggregation on GPU. Multiple points can be processed in parallel and the result of aggregation can stay in GPU memory and be consumed by the rendering cycle of grid cells. This approach is faster than implementing this sequentially on CPU and then transferring data to GPU.
A new class ‘GPUGridAggregator’ will be added that takes points data and grid dimensions as input and performs aggregation. Aggregation can be requested on GPU or CPU, when requested on GPU and current browser supports all required WebGL features, aggregation will be performed on GPU otherwise it is performed on CPU. Result of aggregation is WebGL Buffer objects which can be set as attributes or uniforms (uniform buffer objects).
‘ScreenGridAggregator’ constructor takes following arguments and constructs an object.
Performs aggregation either on CPU or GPU based on the provided options and browser’s WebGL capabilities.
Input:
Performing aggregation on CPU:
const opts = {
positions, // (lng, lat) array
weights,
cellSize,
viewport,
useGPU: false
};
const sa = new ScreenGridAggregator(gl);
const result = sa.run(opts);
//result contains Buffer object with aggregated data.
Performing aggregation on GPU and reading back results:
const opts = {
positions, // (lng, lat) array
weights,
cellSize,
viewport,
useGPU: true
};
const sa = new ScreenGridAggregator(gl);
const result = sa.run(opts);
//result contains Buffer object with aggregated data.
This will match with current implementation where each point is first projected using current viewport then depending on which cell the pint belongs corresponding counter, and weight value are updated, this data is used to create Buffer objects that are returned.
Aggregation is implemented by rendering all points to a framebuffer using custom shaders and additive blending. Then these textures are asynchronously read into provided Buffer objects (without doing CPU adn GPU sync) and returned.
A float texture with R32F format with size equal to the grid is created. Each cell is represented by one pixel in this texture. Blending is set to “gl.FUNC_ADD’” , so every time a pixel at this location is rendered, the existing pixel color and current color being rendered are added and stored at this pixel. Provided ‘positions’ and ‘weights’ are set as attributes, and grid params such as ‘cellSize’, and ‘gridSize’ as set as uniforms. Projection matrix is retirieved from “viewport” object and also set as uniforms. In vertex shader we first perform viewport projections on point position, then it will be converted to grid coordinate. Finally the position is moved to pixel center to guarantee the correct pixel is rendered.
vec2 windowPos = project_position(positions);
windowPos = project_to_pixel(windowPos);
// Transform (0,0):windowSize -> (0, 0): gridSize
vec2 pos = floor(windowPos / cellSize);
// Transform (0,0):gridSize -> (-1, -1):(1,1)
pos = (pos * (2., 2.) / (gridSize)) - (1., 1.);
// Move to pixel center, pixel-size in screen sapce (2/gridSize) * 0.5 => 1/gridSize
vec2 offset = 1.0 / gridSize;
pos = pos + offset;
gl_Position = vec4(pos, 0.0, 1.0);
In Fragment shader we output 1.0 for red channel and weight of the point (passed as varying from vertex shader) for green channel.
gl_FragColor = vec4(1., vWeights, 0, 0.0);
Blending is set to ‘gl.FUNC_ADD’ so after rendering all points, each pixels red channel contains total number points and green channel contains total weight of all points that are accumulated into corresponding grid cell.
To find out maximum weight in any given cell and total number of points and total weight, we can do another round of rendering step. In this step, a single point is rendered for each grid-cell, count is obtained by sampling render target texture from above step and passed to Fragment shader. Render target format is set to RGBA32F texture, and use blendEquationSeperate to set ‘gl.FUNC_ADD’ for RGB channels and ‘gl.MAX’ for Alpha channel. At end of the rendering step red channel will contain total point count, green channel contains total weight, and alpha channel contains maximum weight. Based on these values, we can also calculate average weight value for each grid cell.
Vertex shader maps every vertex to same location, and passes down texture coordinates to Fragment shader.
attribute vec2 positions;
attribute vec2 texCoords;
varying vec2 vTextureCoord;
void main(void) {
// Map each position to single pixel
gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
vTextureCoord = texCoords;
}
Fragment shader obtains the count for each grid-cell by sampling the texture generated in first step. Fragment color is generated with both Red and Alpha channel containing the count value.
varying vec2 vTextureCoord;
uniform sampler2D uSampler;
void main(void) {
vec4 textureColor = texture2D(uSampler, vec2(vTextureCoord.s, vTextureCoord.t));
// Red: total count, Green: total weight, Alpha: maximum weight
gl_FragColor = vec4(textureColor.r, textureColor.g, 0., textureColor.g);
}
“GPUAggregator” class returns aggregation data in “Buffer” objects. countsBuffer : This Buffer object contains count and weight of each grid, this can be set as an attribute to layer’s vertex shader. maxCountBuffer : This Buffer object contains total count, total weight and max weight values. These values don’t change per grid cell, and this Buffer can be set as Uniform Buffer object. In layer’s vertex shader, above data can be used to determine color the cell, that represents the amount of data aggregated to this cell.
Following unit tests will be written to verify the aggregation results, that include count and weight for each cell, maximum weight of all cells and total number of points aggregated. A known set of input points aggregated using GPU and results are verified. A random large set of input points will be aggregated once using CPU and once using GPU and results will be compared for equality.
A new layer “GPUScreenGridLayer” will be added, and a new render test case will be added taking same set of parameters as existing “ScreenGridLayer”. The output of the new layer will be compared against existing core layer golden-image.
100 thousand random points:
├─ CPU 100K: 8.26 iterations/s
├─ GPU 100K: 32.3 iterations/s
1 Million random points:
├─ CPU 1M: 1.03 iterations/s
├─ GPU 1M: 4.76 iterations/s
1 Million points:
GPU: 50 - 53 FPS
CPU: 2 - 3 FPS
5 Million points:
GPU: 30 - 38 FPS
CPU: 0 -1 FPS
10 Million points:
GPU: 15 - 20 FPS
CPU: UI is completely blocked and doesn't respond.
Above numbers show GPU Aggregation can be 25 to 30 times faster.
Existing GPGPU frameworks such as gpu.js and deeplearnjs.org provide interface to perform basic vector operations using Arrays and Matrices. Such operations are performed on GPU using float textures and shaders. The Aggregation problem this RFC is addressing requires customization of entire GPU pipeline (operations at camera or screen space, multiple render passes, custom blending and custom vertex and fragment shaders), which are not addressable above mentioned frameworks.
Picking information is not available. CPU aggregation maintains list of all aggregated points in a JS object, which is accessible based on instance index. This data can't be saved using GPU aggregation. Alternatively, with additional texture reads (based on current mouse position, or [lng, lat]) we can provide aggregated value and count as picking information. (i.e. when user hovers over a grid-cell, picking information can only contain the aggregated value, and aggregation count of that cell). If we want to match existing picking information, a CPU aggregation has to be performed , may be we can provide this information on demand, like when user does a mouse-click.
“GPUGridAggregator” returns it results in form of Buffers, one Buffer is used as attribute to provide vertex data and another Buffer is used as Uniform Buffer Object (UBO) to provide uniforms. UBO is supported only in Shading Language version 3.0, and layers shaders have to be upgraded to this version. When that is done, existing Shader Modules (like, “picking” , “projection” etc) can not be used due to incompatibility between 1.0 and 3.0 Shading Language versions. (https://github.com/visgl/deck.gl/issues/1592). This is not a limitation if we make color not depend on maxCount/maxWeight values (by using ‘colorRange’ and ‘colorDomain’ instead of ‘minColor’ and ‘maxColor’);
When this is extended to Grid and Hexagon layers, these props can't be supported. This requires first sorting all bins based on a aggregated value. For GPU we have to enforce these values to 0 and 100 respectively.