PySpark: Concatenate two DataFrame columns using UDF
Problem Statement:
Using PySpark, you have two columns of a DataFrame that have vectors of floats and you want to create a new column to contain the concatenation of the other two columns.
This is how you can do it:
1 comment:
Anonymous
said...
Hi Gustavo,
Thank you for your blog, it has helped in many of my problems. When concatenating numerical columns in pyspark I use:
functions.concat() from pyspark.sql.
Wouldn't that perhaps be more efficient than a user defined function? Also there is the: functions.concat_ws() for text.
1 comment:
Hi Gustavo,
Thank you for your blog, it has helped in many of my problems.
When concatenating numerical columns in pyspark I use:
functions.concat() from pyspark.sql.
Wouldn't that perhaps be more efficient than a user defined function? Also there is the: functions.concat_ws() for text.
Thanks in advance,
Ferran
Post a Comment