pandas-on-Spark Series that corresponds to pandas Series logically. |
The index (axis labels) Column of the Series. |
Return the dtype object of the underlying data. |
Return the dtype object of the underlying data. |
Return an int representing the number of array dimensions. |
Return name of the Series. |
Return a tuple of the shape of the underlying data. |
Return a list of the row axis labels. |
Return an int representing the number of elements in this object. |
Returns true if the current object is empty. |
Return the transpose, which is by definition self. |
Return True if it has any missing values. |
Return a Numpy representation of the DataFrame or the Series. |
Cast a pandas-on-Spark object to a specified dtype |
Make a copy of this object’s indices and data. |
Return the bool of a single element in the current object. |
Indexing, iteration¶
Access a single value for a row/column label pair. |
Access a single value for a row/column pair by integer position. |
Access a group of rows and columns by label(s) or a boolean Series. |
Purely integer-location based indexing for selection by position. |
Return alias for index. |
Return item and drop from series. |
This is an alias of |
Lazily iterate over (index, value) tuples. |
Return the first element of the underlying data as a Python scalar. |
Return cross-section from the Series. |
Get item from object for given key (DataFrame column, Panel slice, etc.). |
Binary operator functions¶
Return Addition of series and other, element-wise (binary operator +). |
Return Floating division of series and other, element-wise (binary operator /). |
Return Multiplication of series and other, element-wise (binary operator *). |
Return Reverse Addition of series and other, element-wise (binary operator +). |
Return Reverse Floating division of series and other, element-wise (binary operator /). |
Return Reverse Multiplication of series and other, element-wise (binary operator *). |
Return Reverse Subtraction of series and other, element-wise (binary operator -). |
Return Reverse Floating division of series and other, element-wise (binary operator /). |
Return Subtraction of series and other, element-wise (binary operator -). |
Return Floating division of series and other, element-wise (binary operator /). |
Return Exponential power of series of series and other, element-wise (binary operator **). |
Return Reverse Exponential power of series and other, element-wise (binary operator **). |
Return Modulo of series and other, element-wise (binary operator %). |
Return Reverse Modulo of series and other, element-wise (binary operator %). |
Return Integer division of series and other, element-wise (binary operator //). |
Return Reverse Integer division of series and other, element-wise (binary operator //). |
Return Integer division and modulo of series and other, element-wise (binary operator divmod). |
Return Integer division and modulo of series and other, element-wise (binary operator rdivmod). |
Combine Series values, choosing the calling Series’s values first. |
Compare if the current value is less than the other. |
Compare if the current value is greater than the other. |
Compare if the current value is less than or equal to the other. |
Compare if the current value is greater than or equal to the other. |
Compare if the current value is not equal to the other. |
Compare if the current value is equal to the other. |
Return the product of the values. |
Compute the dot product between the Series and the columns of other. |
Function application, GroupBy & Window¶
Invoke function on values of Series. |
Aggregate using one or more operations over the specified axis. |
Aggregate using one or more operations over the specified axis. |
Call |
Map values of Series according to input correspondence. |
Group DataFrame or Series using one or more columns. |
Provide rolling transformations. |
Provide expanding transformations. |
Apply func(self, *args, **kwargs). |
Computations / Descriptive Stats¶
Return a Series/DataFrame with absolute numeric value of each element. |
Return whether all elements are True. |
Return whether any element is True. |
Return boolean Series equivalent to left <= series <= right. |
Trim values at input threshold(s). |
Compute correlation with other Series, excluding missing values. |
Count non-NA cells for each column. |
Return cumulative maximum over a DataFrame or Series axis. |
Return cumulative minimum over a DataFrame or Series axis. |
Return cumulative sum over a DataFrame or Series axis. |
Return cumulative product over a DataFrame or Series axis. |
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding |
Subset rows or columns of dataframe according to labels in the specified index. |
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
Return the mean absolute deviation of values. |
Return the maximum of the values. |
Return the mean of the values. |
Return the minimum of the values. |
Return the mode(s) of the dataset. |
Return the largest n elements. |
Return the smallest n elements. |
Percentage change between the current and a prior element. |
Return the product of the values. |
Return number of unique elements in the object. |
Return boolean if values in the object are unique |
Return value at the given quantile. |
Compute numerical data ranks (1 through n) along axis. |
Return unbiased standard error of the mean over requested axis. |
Return unbiased skew normalized by N-1. |
Return sample standard deviation. |
Return the sum of the values. |
Return the median of the values for the requested axis. |
Return unbiased variance. |
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
Return unique values of Series object. |
Return a Series containing counts of unique values. |
Round each value in a Series to the given number of decimals. |
First discrete difference of element. |
Return boolean if values in the object are monotonically increasing. |
Return boolean if values in the object are monotonically increasing. |
Return boolean if values in the object are monotonically decreasing. |
Reindexing / Selection / Label manipulation¶
Align two objects on their axes with the specified join method. |
Return Series with specified index labels removed. |
Return Series with requested index level(s) removed. |
Return Series with duplicate values removed. |
Compare if the current value is equal to the other. |
Prefix labels with string prefix. |
Suffix labels with string suffix. |
Select first periods of time series data based on a date offset. |
Return the first n rows. |
Return the row label of the maximum value. |
Return the row label of the minimum value. |
Check whether values are contained in Series or Index. |
Select final periods of time series data based on a date offset. |
Alter Series name. |
Set the name of the axis for the index or columns. |
Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. |
Return a Series with matching indices as other object. |
Generate a new DataFrame or Series with the index reset. |
Return a random sample of items from an axis of object. |
Swap levels i and j in a MultiIndex. |
Interchange axes and swap values axes appropriately. |
Return the elements in the given positional indices along an axis. |
Return the last n rows. |
Replace values where the condition is False. |
Replace values where the condition is True. |
Truncate a Series or DataFrame before and after some index value. |
Missing data handling¶
Synonym for DataFrame.fillna() or Series.fillna() with |
Synonym for DataFrame.fillna() or Series.fillna() with |
Detect existing (non-missing) values. |
Detect existing (non-missing) values. |
Detect existing (non-missing) values. |
Detect existing (non-missing) values. |
Synonym for DataFrame.fillna() or Series.fillna() with |
Return a new Series with missing values removed. |
Fill NA/NaN values. |
Reshaping, sorting, transposing¶
Return the integer indices that would sort the Series values. |
Return int position of the smallest value in the Series. |
Return int position of the largest value in the Series. |
Sort object by labels (along an axis) |
Sort by the values. |
Unstack, a.k.a. |
Transform each element of a list-like to a row. |
Repeat elements of a Series. |
Squeeze 1 dimensional axis objects into scalars. |
Encode the object as an enumerated type or categorical variable. |
Combining / joining / merging¶
Concatenate two or more Series. |
Compare to another Series and show the differences. |
Replace values given in to_replace with value. |
Modify Series in place using non-NA values from passed Series. |
Pandas API on Spark provides dtype-specific methods under various accessors.
These are separate namespaces within Series
that only apply
to specific data types.
Data Type |
Accessor |
Datetime |
String |
Categorical |
Date Time Handling¶
can be used to access the values of the series as
datetimelike and return several properties.
These can be accessed like Series.dt.<property>
Datetime Properties¶
Returns a Series of python objects (namely, the date part of Timestamps without timezone information). |
The year of the datetime. |
The month of the timestamp as January = 1 December = 12. |
The days of the datetime. |
The hours of the datetime. |
The minutes of the datetime. |
The seconds of the datetime. |
The microseconds of the datetime. |
The week ordinal of the year. |
The week ordinal of the year. |
The day of the week with Monday=0, Sunday=6. |
The day of the week with Monday=0, Sunday=6. |
The ordinal day of the year. |
The quarter of the date. |
Indicates whether the date is the first day of the month. |
Indicates whether the date is the last day of the month. |
Indicator for whether the date is the first day of a quarter. |
Indicator for whether the date is the last day of a quarter. |
Indicate whether the date is the first day of a year. |
Indicate whether the date is the last day of the year. |
Boolean indicator if the date belongs to a leap year. |
The number of days in the month. |
The number of days in the month. |
Datetime Methods¶
Convert times to midnight. |
Convert to a string Series using specified date_format. |
Perform round operation on the data to the specified freq. |
Perform floor operation on the data to the specified freq. |
Perform ceil operation on the data to the specified freq. |
Return the month names of the series with specified locale. |
Return the day names of the series with specified locale. |
String Handling¶
can be used to access the values of the series as
strings and apply several methods to it. These can be accessed
like Series.str.<function/property>
Convert Strings in the series to be capitalized. |
Not supported. |
Filling left and right side of strings in the Series/Index with an additional character. |
Test if pattern or regex is contained within a string of a Series. |
Count occurrences of pattern in each string of the Series. |
Not supported. |
Not supported. |
Test if the end of each string element matches a pattern. |
Not supported. |
Not supported. |
Return lowest indexes in each strings in the Series where the substring is fully contained between [start:end]. |
Find all occurrences of pattern or regular expression in the Series. |
Extract element from each string or string list/tuple in the Series at the specified position. |
Not supported. |
Return lowest indexes in each strings where the substring is fully contained between [start:end]. |
Check whether all characters in each string are alphanumeric. |
Check whether all characters in each string are alphabetic. |
Check whether all characters in each string are digits. |
Check whether all characters in each string are whitespaces. |
Check whether all characters in each string are lowercase. |
Check whether all characters in each string are uppercase. |
Check whether all characters in each string are titlecase. |
Check whether all characters in each string are numeric. |
Check whether all characters in each string are decimals. |
Join lists contained as elements in the Series with passed delimiter. |
Computes the length of each element in the Series. |
Filling right side of strings in the Series with an additional character. |
Convert strings in the Series/Index to all lowercase. |
Remove leading characters. |
Determine if each string matches a regular expression. |
Return the Unicode normal form for the strings in the Series. |
Pad strings in the Series up to width. |
Not supported. |
Duplicate each string in the Series. |
Replace occurrences of pattern/regex in the Series with some other string. |
Return highest indexes in each strings in the Series where the substring is fully contained between [start:end]. |
Return highest indexes in each strings where the substring is fully contained between [start:end]. |
Filling left side of strings in the Series with an additional character. |
Not supported. |
Split strings around given separator/delimiter. |
Remove trailing characters. |
Slice substrings from each element in the Series. |
Slice substrings from each element in the Series. |
Split strings around given separator/delimiter. |
Test if the start of each string element matches a pattern. |
Remove leading and trailing characters. |
Convert strings in the Series/Index to be swapcased. |
Convert Strings in the series to be titlecase. |
Map all characters in the string through the given mapping table. |
Convert strings in the Series/Index to all uppercase. |
Wrap long strings in the Series to be formatted in paragraphs with length less than a given width. |
Pad strings in the Series by prepending ‘0’ characters. |
Categorical accessor¶
Categorical-dtype specific methods and attributes are available under
The categories of this categorical. |
Whether the categories have an ordered relationship. |
Return Series of codes as well as the index. |
Rename categories. |
Reorder categories as specified in new_categories. |
Add new categories. |
Remove the specified categories. |
Remove categories which are not used. |
Set the categories to the specified new_categories. |
Set the Categorical to be ordered. |
Set the Categorical to be unordered. |
is both a callable method and a namespace attribute for
specific plotting methods of the form Series.plot.<kind>
alias of |
Draw a stacked area plot. |
Vertical bar plot. |
Make a horizontal bar plot. |
Make a box plot of the Series columns. |
Generate Kernel Density Estimate plot using Gaussian kernels. |
Draw one histogram of the DataFrame’s columns. |
Plot DataFrame/Series as lines. |
Generate a pie plot. |
Generate Kernel Density Estimate plot using Gaussian kernels. |
Draw one histogram of the DataFrame’s columns. |
Serialization / IO / Conversion¶
Return a pandas Series. |
A NumPy ndarray representing the values in this DataFrame or Series. |
Return a list of the values. |
Render a string representation of the Series. |
Convert Series to {label -> value} dict or dict-like object. |
Copy object to the system clipboard. |
Render an object to a LaTeX tabular environment table. |
Print Series or DataFrame in Markdown-friendly format. |
Convert the object to a JSON string. |
Write object to a comma-separated values (csv) file. |
Write object to an Excel sheet. |
Convert Series to DataFrame. |
Pandas-on-Spark specific¶
provides pandas-on-Spark specific features that exists only in pandas API on Spark.
These can be accessed by Series.pandas_on_spark.<function/property>
Transform the data with the function that takes pandas Series and outputs pandas Series. |