Back to Spark

DataFrame

python/docs/source/reference/pyspark.pandas/frame.rst

4.1.17.2 KB
Original Source

.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

.. _api.dataframe:

========= DataFrame

.. currentmodule:: pyspark.pandas

Constructor

.. autosummary:: :toctree: api/

DataFrame

Attributes and underlying data

.. autosummary:: :toctree: api/

DataFrame.index DataFrame.info DataFrame.columns DataFrame.empty

.. autosummary:: :toctree: api/

DataFrame.dtypes DataFrame.shape DataFrame.axes DataFrame.ndim DataFrame.size DataFrame.select_dtypes DataFrame.values

Conversion

.. autosummary:: :toctree: api/

DataFrame.copy DataFrame.isna DataFrame.astype DataFrame.isnull DataFrame.notna DataFrame.notnull DataFrame.bool

Indexing, iteration

.. autosummary:: :toctree: api/

DataFrame.at DataFrame.iat DataFrame.head DataFrame.idxmax DataFrame.idxmin DataFrame.loc DataFrame.iloc DataFrame.insert DataFrame.items DataFrame.iterrows DataFrame.itertuples DataFrame.keys DataFrame.pop DataFrame.tail DataFrame.xs DataFrame.get DataFrame.where DataFrame.mask DataFrame.query

Binary operator functions

.. autosummary:: :toctree: api/

DataFrame.add DataFrame.radd DataFrame.div DataFrame.rdiv DataFrame.truediv DataFrame.rtruediv DataFrame.mul DataFrame.rmul DataFrame.sub DataFrame.rsub DataFrame.pow DataFrame.rpow DataFrame.mod DataFrame.rmod DataFrame.floordiv DataFrame.rfloordiv DataFrame.lt DataFrame.gt DataFrame.le DataFrame.ge DataFrame.ne DataFrame.eq DataFrame.dot DataFrame.combine_first

Function application, GroupBy & Window

.. autosummary:: :toctree: api/

DataFrame.apply DataFrame.applymap DataFrame.map DataFrame.pipe DataFrame.agg DataFrame.aggregate DataFrame.groupby DataFrame.rolling DataFrame.expanding DataFrame.transform

.. _api.dataframe.stats:

Computations / Descriptive Stats

.. autosummary:: :toctree: api/

DataFrame.abs DataFrame.all DataFrame.any DataFrame.clip DataFrame.corr DataFrame.corrwith DataFrame.count DataFrame.cov DataFrame.describe DataFrame.ewm DataFrame.kurt DataFrame.kurtosis DataFrame.max DataFrame.mean DataFrame.min DataFrame.median DataFrame.mode DataFrame.pct_change DataFrame.prod DataFrame.product DataFrame.quantile DataFrame.rank DataFrame.nunique DataFrame.sem DataFrame.skew DataFrame.sum DataFrame.std DataFrame.var DataFrame.cummin DataFrame.cummax DataFrame.cumsum DataFrame.cumprod DataFrame.round DataFrame.diff DataFrame.eval

Reindexing / Selection / Label manipulation

.. autosummary:: :toctree: api/

DataFrame.add_prefix DataFrame.add_suffix DataFrame.align DataFrame.at_time DataFrame.between_time DataFrame.drop DataFrame.droplevel DataFrame.drop_duplicates DataFrame.duplicated DataFrame.equals DataFrame.filter DataFrame.first DataFrame.head DataFrame.last DataFrame.reindex DataFrame.reindex_like DataFrame.rename DataFrame.rename_axis DataFrame.reset_index DataFrame.set_index DataFrame.swapaxes DataFrame.swaplevel DataFrame.take DataFrame.isin DataFrame.sample DataFrame.truncate

.. _api.dataframe.missing:

Missing data handling

.. autosummary:: :toctree: api/

DataFrame.backfill DataFrame.dropna DataFrame.fillna DataFrame.replace DataFrame.bfill DataFrame.ffill DataFrame.interpolate DataFrame.pad

Reshaping, sorting, transposing

.. autosummary:: :toctree: api/

DataFrame.pivot_table DataFrame.pivot DataFrame.sort_index DataFrame.sort_values DataFrame.nlargest DataFrame.nsmallest DataFrame.stack DataFrame.unstack DataFrame.melt DataFrame.explode DataFrame.squeeze DataFrame.T DataFrame.transpose

Combining / joining / merging

.. autosummary:: :toctree: api/

DataFrame.assign DataFrame.merge DataFrame.join DataFrame.update

.. autosummary:: :toctree: api/

DataFrame.resample DataFrame.shift DataFrame.first_valid_index DataFrame.last_valid_index

Serialization / IO / Conversion

.. autosummary:: :toctree: api/

DataFrame.from_dict DataFrame.from_records DataFrame.to_table DataFrame.to_delta DataFrame.to_parquet DataFrame.to_csv DataFrame.to_orc DataFrame.to_pandas DataFrame.to_html DataFrame.to_numpy DataFrame.to_spark DataFrame.to_string DataFrame.to_feather DataFrame.to_stata DataFrame.to_json DataFrame.to_dict DataFrame.to_excel DataFrame.to_hdf DataFrame.to_clipboard DataFrame.to_markdown DataFrame.to_records DataFrame.to_latex DataFrame.style

DataFrame.spark provides features that does not exist in pandas but in Spark. These can be accessed by DataFrame.spark.<function/property>.

.. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst

DataFrame.spark.frame DataFrame.spark.cache DataFrame.spark.persist DataFrame.spark.hint DataFrame.spark.to_table DataFrame.spark.to_spark_io DataFrame.spark.apply DataFrame.spark.repartition DataFrame.spark.coalesce

.. _api.dataframe.plot:

Plotting

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

.. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst

DataFrame.plot.area DataFrame.plot.bar DataFrame.plot.barh DataFrame.plot.box DataFrame.plot.density DataFrame.plot.hist DataFrame.plot.kde DataFrame.plot.line DataFrame.plot.pie DataFrame.plot.scatter

.. autosummary:: :toctree: api/

DataFrame.hist DataFrame.boxplot DataFrame.kde

Pandas-on-Spark specific

DataFrame.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark. These can be accessed by DataFrame.pandas_on_spark.<function/property>.

.. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst

DataFrame.pandas_on_spark.apply_batch DataFrame.pandas_on_spark.transform_batch