Gustavo Frederico: PySpark: Concatenate two DataFrame columns using UDF

Saturday, November 17, 2018

PySpark: Concatenate two DataFrame columns using UDF

Problem Statement:
Using PySpark, you have two columns of a DataFrame that have vectors of floats and you want to create a new column to contain the concatenation of the other two columns.

This is how you can do it:

1 comment:

Anonymous said...: Hi Gustavo,

Thank you for your blog, it has helped in many of my problems.
When concatenating numerical columns in pyspark I use:

functions.concat() from pyspark.sql.

Wouldn't that perhaps be more efficient than a user defined function? Also there is the: functions.concat_ws() for text.

Thanks in advance,

Ferran; 30 May 2019 at 05:45

FB_init

Saturday, November 17, 2018

PySpark: Concatenate two DataFrame columns using UDF

1 comment: