docs/sql-ref-geospatial-types.md
Spark SQL supports GEOMETRY and GEOGRAPHY types for spatial data, as defined in the Open Geospatial Consortium (OGC) Simple Feature Access specification. At runtime, values are represented as Well-Known Binary (WKB) and are associated with a Spatial Reference Identifier (SRID) that defines the coordinate system. How values are persisted is determined by each data source.
| Type | Coordinate system | Typical use and notes |
|---|---|---|
| GEOMETRY | Cartesian (planar) | Projected or local coordinates; planar calculations. Represents points, lines, polygons in a flat coordinate system. Suitable for Web Mercator (SRID 3857), UTM, or local grids (e.g. engineering/CAD). Default SRID in Spark is 4326. |
| GEOGRAPHY | Geographic (latitude/longitude) | Earth-based data; distances and areas on the sphere/ellipsoid. Coordinates in longitude and latitude (degrees). Edge interpolation is always SPHERICAL. Default SRID is 4326 (WGS 84). |
Choose GEOMETRY when:
Choose GEOGRAPHY when:
Using the wrong type can give misleading results: for example, the shortest path between London and New York on a sphere crosses Canada, whereas a planar GEOMETRY may suggest a path that does not.
In SQL you must specify the type with an SRID or ANY:
GEOMETRY(srid) — e.g. GEOMETRY(4326), GEOMETRY(3857)GEOGRAPHY(srid) — e.g. GEOGRAPHY(4326)GEOMETRY(ANY)GEOGRAPHY(ANY)Unparameterized GEOMETRY or GEOGRAPHY (without (srid) or (ANY)) is not supported in SQL.
-- Fixed SRID: all values must use the given SRID (e.g. WGS 84)
CREATE TABLE points (
id BIGINT,
pt GEOMETRY(4326)
);
CREATE TABLE locations (
id BIGINT,
loc GEOGRAPHY(4326)
);
-- Mixed SRID: each row can have a different SRID
CREATE TABLE mixed_geoms (
id BIGINT,
geom GEOMETRY(ANY)
);
Values are created from Well-Known Binary (WKB) using built-in functions. WKB is a standard binary encoding for spatial shapes (points, lines, polygons, etc.). See Well-known binary for the format.
From WKB (binary):
ST_GeomFromWKB(wkb) — returns GEOMETRY with default SRID 0.ST_GeomFromWKB(wkb, srid) — returns GEOMETRY with the given SRID.ST_GeogFromWKB(wkb) — returns GEOGRAPHY with SRID 4326.Example (point in WKB, then use in a table):
-- Point (1, 2) in WKB (little-endian point, 2D)
SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040');
SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326);
SELECT ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040');
INSERT INTO points (id, pt)
VALUES (1, ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326));
When parsing WKB, Spark applies the following rules. Violations result in a parse error.
POINT EMPTY in Well-Known Text). LineString and Polygon (and points inside them) do not allow NaN in coordinate values.ST_GeogFromWKB), longitude must be in [-180, 180] (inclusive) and latitude in [-90, 90] (inclusive). GEOMETRY does not enforce these bounds.Spark SQL provides scalar functions for working with GEOMETRY and GEOGRAPHY values. They are grouped under st_funcs in the Built-in Functions API.
| Function | Description |
|---|---|
ST_AsBinary(geo) | Returns the GEOMETRY or GEOGRAPHY value as WKB (BINARY). |
ST_GeomFromWKB(wkb) | Parses WKB and returns a GEOMETRY with default SRID 0. |
ST_GeomFromWKB(wkb, srid) | Parses WKB and returns a GEOMETRY with the given SRID. |
ST_GeogFromWKB(wkb) | Parses WKB and returns a GEOGRAPHY with SRID 4326. |
ST_Srid(geo) | Returns the SRID of the GEOMETRY or GEOGRAPHY value (NULL if input is NULL). |
ST_SetSrid(geo, srid) | Returns a new GEOMETRY or GEOGRAPHY with the given SRID. |
Examples:
SELECT hex(ST_AsBinary(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040')));
-- 0101000000000000000000F03F0000000000000040
SELECT ST_Srid(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'));
-- 4326
SELECT ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040'), 3857));
-- 3857
ST_SetSrid to set the value’s SRID to match the column).GEOMETRY(ANY) or GEOGRAPHY(ANY)): Values can have different SRIDs. Only valid SRIDs are allowed.For the full list of supported data types and API usage in Scala, Java, Python, and SQL, see Data Types.