FB_init

Saturday, November 17, 2018

PySpark: Concatenate two DataFrame columns using UDF


Problem Statement:
  Using PySpark, you have two columns of a DataFrame that have vectors of floats and you want to create a new column to contain the concatenation of the other two columns.

This is how you can do it:

1 comment:

Anonymous said...

Hi Gustavo,

Thank you for your blog, it has helped in many of my problems.
When concatenating numerical columns in pyspark I use:

functions.concat() from pyspark.sql.

Wouldn't that perhaps be more efficient than a user defined function? Also there is the: functions.concat_ws() for text.

Thanks in advance,

Ferran