docs/supported_apis/defaulting_to_pandas.rst
Currently Modin does not support distributed execution for all methods from pandas API. The remaining unimplemented methods are being executed in a mode called "default to pandas". This allows users to continue using Modin even though their workloads contain functions not yet implemented in Modin. Here is a diagram of how we convert to pandas and perform the operation:
.. image:: /img/convert_to_pandas.png :align: center
We first convert to a pandas DataFrame, then perform the operation. There is a performance penalty for going from a partitioned Modin DataFrame to pandas because of the communication cost and single-threaded nature of pandas. Once the pandas operation has completed, we convert the DataFrame back into a partitioned Modin DataFrame. This way, operations performed after something defaults to pandas will be optimized with Modin.
The exact methods we have implemented are listed in the respective subsections:
DataFrame </supported_apis/dataframe_supported>Series </supported_apis/series_supported>utilities </supported_apis/utilities_supported>I/O </supported_apis/io_supported>We have taken a community-driven approach to implementing new methods. We did a study on pandas usage_ to learn what the most-used APIs are. Modin currently supports 93%
of the pandas API based on our study of pandas usage, and we are actively expanding the
API.
To request implementation, file an issue at https://github.com/modin-project/modin/issues
or send an email to [email protected].
.. _study on pandas usage: https://github.com/modin-project/study_kaggle_usage