python/docs/source/development/testing.rst
.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,
.. code-block:: bash
build/mvn -DskipTests clean package
.. code-block:: bash
build/sbt -Phive clean package
After that, the PySpark test cases can be run via using python/run-tests. For example,
.. code-block:: bash
python/run-tests --python-executable=python3
Note that you may set OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable to YES if you are running tests on Mac OS.
Please see the guidance on how to |building_spark|,
run tests for a module, or individual tests <https://spark.apache.org/developer-tools.html>.
You can run a specific test via using python/run-tests, for example, as below:
.. code-block:: bash
python/run-tests --testnames pyspark.sql.tests.test_arrow
Please refer to Testing PySpark <https://spark.apache.org/developer-tools.html>_ for more details.
breakpoint() Support in PySpark TestsTo debug a certain test, you can add breakpoint() in the test code, and run the test with
python/run-tests as usual. The script will stop at the breakpoint() line and open an
interactive pdb debugging session.
You can run the full PySpark tests by using GitHub Actions in your own forked GitHub
repository with a few clicks. Please refer to
Running tests in your forked repository using GitHub Actions <https://spark.apache.org/developer-tools.html>_ for more details.
Running Tests for Python Client
In order to test the changes in Protobuf definitions, for example, at
`spark/sql/connect/common/src/main/protobuf/spark/connect <https://github.com/apache/spark/tree/master/sql/connect/common/src/main/protobuf/spark/connect>`_,
you should regenerate Python Protobuf client first by running ``dev/connect-gen-protos.sh``.
Running PySpark Shell with Python Client
The command below starts Spark Connect server automatically locally, and creates a Spark Connect client connected to the server.
.. code-block:: bash
bin/pyspark --remote "local[*]"