pyspark.sql.DataFrame.mergeInto#

DataFrame.mergeInto(table, condition)[source]#

Merges a set of updates, insertions, and deletions based on a source table into a target table.

New in version 4.0.0.

Parameters

tablestr: Target table name to merge into.
conditionColumn: The condition that determines whether a row in the target table matches one in the source DataFrame.

Returns

MergeIntoWriter: MergeIntoWriter to use further to specify how to merge the source DataFrame into the target table.

Notes

This method does not support streaming queries.

Examples

>>> from pyspark.sql.functions import expr
>>> source = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["id", "name"])
>>> (source.mergeInto("target", "id")  
...     .whenMatched().update({ "name": source.name })
...     .whenNotMatched().insertAll()
...     .whenNotMatchedBySource().delete()
...     .merge())