pyspark.sql.streaming.StatefulProcessor.handleInputRows#

abstract StatefulProcessor.handleInputRows(key, rows)[source]#

Function that will allow users to interact with input data rows along with the grouping key. It should take parameters (key, Iterator[pandas.DataFrame]) and return another Iterator[pandas.DataFrame]. For each group, all columns are passed together as pandas.DataFrame to the function, and the returned pandas.DataFrame across all invocations are combined as a DataFrame. Note that the function should not make a guess of the number of elements in the iterator. To process all data, the handleInputRows function needs to iterate all elements and process them. On the other hand, the handleInputRows function is not strictly required to iterate through all elements in the iterator if it intends to read a part of data.

Parameters
keyAny

grouping key.

rowsiterable of pandas.DataFrame

iterator of input rows associated with grouping key