python - Can we use a Pandas function in a Spark DataFrame column ? If so, how? -


i have pandas dataframe called "pd_df".

i want modify it's column , this:

    import pandas pd      pd_df['notification_dt'] = pd.to_datetime(pd_df['notification_dt'], format="%y-%m-%d") 

it works.

on same database, created spark dataframe called "spark_df"

i want same function (pd.to_datatime) on it's column perform same operation. did this.

    pyspark.sql.functions import userdefinedfunction      pyspark.sql.types import timestamptype      udf = userdefinedfunction(lambda x: pd.to_datetime(x, format="%y-%m-%d"), timestamptype())      spark_df2 = spark_df.withcolumn("notification_dt1", (udf(spark_df["notification_dt"]))) 

it should work, according me. on

   spark_df.show() 

i encounter following error after minute or so: enter image description here

so, got fixed.

 udf = userdefinedfunction(lambda x: pd.to_datetime(x, format="%y-%m-%d"), timestamptype()) 

should be

 udf = userdefinedfunction(lambda x: str(pd.to_datetime(x, format="%y-%m-%d")), timestamptype()) 

it failing convert result timestamptype()


Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -