返回

python-比较具有特殊字符的多列并合并数据帧

发布时间:2022-07-04 21:52:33 270
# node.js
df1_ID    Col1_df1      Col2_df1    Col3_df1
ABC-001   a.102_103i    k159*       Test1
DEF-002   a.36-89E      k188        Test2
GHI-003   ab.23<

df2:

df2_ID1         df2_ID2    Count    Count_A  Count_B    To_Check
                ABC-001    10       0        10         FIRSTLINE:a.102_103i:ANYTHING:EXTRA
DEF-002;GHI-003            20       2        18         SECONDLINE:ab.23<

结果(DF3):

df1_ID  Col1_df1    Col2_df1    Col3_df1    df2_ID1 df2_ID2 Count   Count_A Count_B To_Check
ABC-001 a.102_103i  k159*       Test1               ABC-001 10      0       10      FIRSTLINE:a.102_103i:ANYTHING:EXTRA:k159*
DEF-002 a.36-89     k188        Test2       ABC-001;DEF-002 15      3       12      THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA
GHI-003 ab.23<

我想检查Col1_df1Col2_df1存在于df2第列,共列To_Check. 如果值存在于Col1_df1Col2_df1 AND df1_ID存在于df2_ID1或在df2_ID2,然后合并该行df2df1. 如果不匹配,则应为空。

这个问题的延伸:

Vlookup函数/合并熊猫,但不完全匹配

但在这个问题上,我们只是在处理字符串。在我的数据中,我们也有特殊的字符。

在尝试查找df2中存在的值时,此语法似乎也不起作用:

df1 = df1.assign(result=df1['Col1_df1'].isin(df2['To_Check']))

还写了另一种语法,但也不起作用:

output = open("output.csv", "a")
with open("df1.csv", "r") as df1:
    first_line = df2.readline()
    output.write(first_line)
    with open("df2.csv", "r") as df2:
        second_first = df2.readline()
        output.write(second_first)
        for line_df1 in df1:
            df1_names = [x for x in line_df1.split(',')]
            for line_df2 in df2:
                df2_names = [x for x in line_df2.split(',')]
                check1 = any(df1_names[1] in string for string in df2_names[6])
                print(check1)

这个check1总是False尽管该值存在。

提前谢谢你的帮助。

*更新

data_1={'df1_ID':['ABC-001','DEF-002','GHI-003']
      ,'Col1_df1':['a.102_103i','a.36-89E','ab.23<<X']
      ,'Col2_df1':['k159*','k188','e542m']
      ,'Col3_df1':['Test1','Test2','Test3']}

data_2={'df2_ID1':['','DEF-002;GHI-003','ABC-001;DEF-002']
      ,'df2_ID2':['ABC-001','','']
      ,'Count':['10','20','15']
      ,'Count_A':['0','2','3']
        ,'To_Check':['FIRSTLINE:a.102_103i:ANYTHING:EXTRA','SECONDLINE:ab.23<<X:ANYTHING:EXTRA','THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA']}
特别声明:以上内容(图片及文字)均为互联网收集或者用户上传发布,本站仅提供信息存储服务!如有侵权或有涉及法律问题请联系我们。
举报
评论区(1)
按点赞数排序
用户头像