doc/source/data/api/from_other_data_libs.rst
.. _api-guide-for-users-from-other-data-libs:
Ray Data is a data loading and preprocessing library for ML. It shares certain similarities with other ETL data processing libraries, but also has its own focus. This guide provides API mappings for users who come from those data libraries, so you can quickly map what you may already know to Ray Data APIs.
.. note::
.. _api-guide-for-pandas-users:
.. list-table:: Pandas DataFrame vs. Ray Data APIs :header-rows: 1
ds.show() <ray.data.Dataset.show>, :meth:ds.take() <ray.data.Dataset.take>, or :meth:ds.take_batch() <ray.data.Dataset.take_batch>ds.schema() <ray.data.Dataset.schema>ds.count() <ray.data.Dataset.count>ds.limit() <ray.data.Dataset.limit>ds.iter_rows() <ray.data.Dataset.iter_rows>ds.drop_columns() <ray.data.Dataset.drop_columns>ds.map_batches() <ray.data.Dataset.map_batches> or :meth:ds.map() <ray.data.Dataset.map>ds.groupby() <ray.data.Dataset.groupby>ds.groupby().map_groups() <ray.data.grouped_data.GroupedData.map_groups>ds.random_sample() <ray.data.Dataset.random_sample>ds.sort() <ray.data.Dataset.sort>ds.union() <ray.data.Dataset.union>ds.aggregate() <ray.data.Dataset.aggregate>ds.min() <ray.data.Dataset.min>ds.max() <ray.data.Dataset.max>ds.sum() <ray.data.Dataset.sum>ds.mean() <ray.data.Dataset.mean>ds.std() <ray.data.Dataset.std>.. _api-guide-for-pyarrow-users:
.. list-table:: PyArrow Table vs. Ray Data APIs :header-rows: 1
pa.Table.schemads.schema() <ray.data.Dataset.schema>pa.Table.num_rowsds.count() <ray.data.Dataset.count>pa.Table.filter()ds.filter() <ray.data.Dataset.filter>pa.Table.drop()ds.drop_columns() <ray.data.Dataset.drop_columns>pa.Table.add_column()ds.with_column() <ray.data.Dataset.with_column>pa.Table.groupby()ds.groupby() <ray.data.Dataset.groupby>pa.Table.sort_by()ds.sort() <ray.data.Dataset.sort>For more details, see the :ref:Migrating from PyTorch to Ray Data <migrate_pytorch>.