python-比较具有特殊字符的多列并合并数据帧
发布时间:2022-07-04 21:52:33 270
相关标签: # node.js
df1_ID Col1_df1 Col2_df1 Col3_df1
ABC-001 a.102_103i k159* Test1
DEF-002 a.36-89E k188 Test2
GHI-003 ab.23<
df2:
df2_ID1 df2_ID2 Count Count_A Count_B To_Check
ABC-001 10 0 10 FIRSTLINE:a.102_103i:ANYTHING:EXTRA
DEF-002;GHI-003 20 2 18 SECONDLINE:ab.23<
结果(DF3):
df1_ID Col1_df1 Col2_df1 Col3_df1 df2_ID1 df2_ID2 Count Count_A Count_B To_Check
ABC-001 a.102_103i k159* Test1 ABC-001 10 0 10 FIRSTLINE:a.102_103i:ANYTHING:EXTRA:k159*
DEF-002 a.36-89 k188 Test2 ABC-001;DEF-002 15 3 12 THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA
GHI-003 ab.23<
我想检查Col1_df1
和Col2_df1
存在于df2
第列,共列To_Check
. 如果值存在于Col1_df1
和Col2_df1
AND
df1_ID
存在于df2_ID1
或在df2_ID2
,然后合并该行df2
到df1
. 如果不匹配,则应为空。
这个问题的延伸:
但在这个问题上,我们只是在处理字符串。在我的数据中,我们也有特殊的字符。
在尝试查找df2中存在的值时,此语法似乎也不起作用:
df1 = df1.assign(result=df1['Col1_df1'].isin(df2['To_Check']))
还写了另一种语法,但也不起作用:
output = open("output.csv", "a")
with open("df1.csv", "r") as df1:
first_line = df2.readline()
output.write(first_line)
with open("df2.csv", "r") as df2:
second_first = df2.readline()
output.write(second_first)
for line_df1 in df1:
df1_names = [x for x in line_df1.split(',')]
for line_df2 in df2:
df2_names = [x for x in line_df2.split(',')]
check1 = any(df1_names[1] in string for string in df2_names[6])
print(check1)
这个check1
总是False
尽管该值存在。
提前谢谢你的帮助。
*更新
data_1={'df1_ID':['ABC-001','DEF-002','GHI-003']
,'Col1_df1':['a.102_103i','a.36-89E','ab.23<<X']
,'Col2_df1':['k159*','k188','e542m']
,'Col3_df1':['Test1','Test2','Test3']}
data_2={'df2_ID1':['','DEF-002;GHI-003','ABC-001;DEF-002']
,'df2_ID2':['ABC-001','','']
,'Count':['10','20','15']
,'Count_A':['0','2','3']
,'To_Check':['FIRSTLINE:a.102_103i:ANYTHING:EXTRA','SECONDLINE:ab.23<<X:ANYTHING:EXTRA','THIRDLINE:a.105:a.36-89D:ANYTHING:k188:EXTRA']}
特别声明:以上内容(图片及文字)均为互联网收集或者用户上传发布,本站仅提供信息存储服务!如有侵权或有涉及法律问题请联系我们。
举报