pyspark.pandas.Series.diff#
- Series.diff(periods=1)[source]#
First discrete difference of element.
Calculates the difference of a Series element compared with another element in the DataFrame (default is the element in the same column of the previous row).
Note
the current implementation of diff uses Spark���s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.
- Parameters
- periodsint, default 1
Periods to shift for calculating difference, accepts negative values.
- Returns
- diffedSeries
Examples
>>> df = ps.DataFrame({'a': [1, 2, 3, 4, 5, 6], ... 'b': [1, 1, 2, 3, 5, 8], ... 'c': [1, 4, 9, 16, 25, 36]}, columns=['a', 'b', 'c']) >>> df a b c 0 1 1 1 1 2 1 4 2 3 2 9 3 4 3 16 4 5 5 25 5 6 8 36
>>> df.b.diff() 0 NaN 1 0.0 2 1.0 3 1.0 4 2.0 5 3.0 Name: b, dtype: float64
Difference with previous value
>>> df.c.diff(periods=3) 0 NaN 1 NaN 2 NaN 3 15.0 4 21.0 5 27.0 Name: c, dtype: float64
Difference with following value
>>> df.c.diff(periods=-1) 0 -3.0 1 -5.0 2 -7.0 3 -9.0 4 -11.0 5 NaN Name: c, dtype: float64