python 3.x-无法合并类型 <class 'pyspark.sql.types.StructType'> 和 <class 'pyspark.sql.types.DateType'>

哈客cc lv.2

发布时间：2022-03-02 21:38:21 577

相关标签： # node.js

我有一个胶水工作，我正在读取一些S3存储桶，在完成一些操作后，我打算在最后创建一个CSV文件。

由于我没有使用spark的经验，我将它转换成更熟悉的东西（熊猫），并基于以下示例：

import pandas as pd
import numpy as np
data = {'product': ['aaa','bbb', 'ccc'],
        'price': ['210','null','50.0'],
        'some_boolean':['true','false', 'true'],
        'other': ['2021-11-25','null','2022-02-16'],
       }
pd_df_opp = pd.DataFrame(data)
pd_df_opp['price'] = np.where(pd_df_opp['price']=='null', 0, pd_df_opp['price'])
pd_df_opp['price'] = pd_df_opp['price'].astype(float).astype(int)
pd_df_opp['some_boolean'] = np.where(pd_df_opp['some_boolean']=='true', True, False)

该栏为；“其他”；这意味着是一个日期类型，其中包含一些带有null的字符串，我正在执行以下操作：

pd_df_opp['other'] = pd.to_datetime(pd_df_opp['other'], errors='coerce').dt.date

执行以下操作时会触发错误：

final_pd_df_opp = sqlContext.createDataFrame(pd_df_opp)

所以现在，为了让脚本能够工作，如果我将null替换为假日期（1970-01-01），那么我没有问题：

pd_df_opp['other'] = np.where(pd_df_opp['other']=='null', '1970-01-01', pd_df_opp['other'])
pd_df_opp['other'] = pd.to_datetime(pd_df_opp['other'], errors='coerce').dt.date

有没有办法得到一个空值？或者我需要有一个假日期，并在进行数据分析时记住这一点？

特别声明：以上内容（图片及文字）均为互联网收集或者用户上传发布，本站仅提供信息存储服务！如有侵权或有涉及法律问题请联系我们。