python - Can we use a Pandas function in a Spark DataFrame column ? If so, how? -
i have pandas dataframe called "pd_df".
i want modify it's column , this:
import pandas pd pd_df['notification_dt'] = pd.to_datetime(pd_df['notification_dt'], format="%y-%m-%d") it works.
on same database, created spark dataframe called "spark_df"
i want same function (pd.to_datatime) on it's column perform same operation. did this.
pyspark.sql.functions import userdefinedfunction pyspark.sql.types import timestamptype udf = userdefinedfunction(lambda x: pd.to_datetime(x, format="%y-%m-%d"), timestamptype()) spark_df2 = spark_df.withcolumn("notification_dt1", (udf(spark_df["notification_dt"]))) it should work, according me. on
spark_df.show() i encounter following error after minute or so: 
so, got fixed.
udf = userdefinedfunction(lambda x: pd.to_datetime(x, format="%y-%m-%d"), timestamptype()) should be
udf = userdefinedfunction(lambda x: str(pd.to_datetime(x, format="%y-%m-%d")), timestamptype()) it failing convert result timestamptype()
Comments
Post a Comment