sql/connect/common/src/main/protobuf/README.md
This directory contains the .proto files that define the Spark Connect protocol.
After modifying any .proto file here, regenerate the Python stubs under
python/pyspark/sql/connect/proto/ using one of the two methods below.
This method does not require any local tool installation and produces a reproducible environment.
docker build -t connect-cg dev/spark-test-image/connect-gen-protos/
From the root of the Spark repository:
docker run --cpus 1 -it --rm -v "$(pwd)":/spark connect-cg
The container mounts the repository at /spark, runs dev/connect-gen-protos.sh
inside the container, and writes the generated files to
python/pyspark/sql/connect/proto/ in your local checkout.
Install the required tools:
buf — protobuf code generatorInstall the required Python packages. Check dev/requirements.txt for the latest
pinned versions of mypy, mypy-protobuf, and black, then run:
pip install 'mypy==<version>' 'mypy-protobuf==<version>' 'black==<version>'
For example, based on the current dev/requirements.txt:
pip install 'mypy==1.19.1' 'mypy-protobuf==3.3.0' 'black==26.3.1'
From the root of the Spark repository:
./dev/connect-gen-protos.sh
The generated Python files will be written to python/pyspark/sql/connect/proto/.
You can also generate to a custom output directory by passing a path:
./dev/connect-gen-protos.sh /tmp/my-proto-output