两个DataFrame:
import pandas as pddf1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] })df2 = pd.DataFrame({ 'Langs': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] })
预期结果:
Langs Values Names0 C 1 coder11 Py 2 coder22 Java 3 coder3
或
Langs Values Names0 C 1 coder11 C 1 coder32 Java 3 coder33 Py 2 coder24 Py 2 coder3
1、使用thefuzz包和extractOne函数
安装thefuzz:
pip install thefuzz
pip install python-Levenshtein
实现代码:
import pandas as pdfrom thefuzz import processdf1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] })df2 = pd.DataFrame({ 'Langs': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] })langs = lambda x: process.extractOne(x, df2["Langs"])[2] df1['Names'] = df2.loc[df1["Langs"].map(langs).values, 'Names'].valuesprint(df1)
2、使用merge()和groupby()
import pandas as pdfrom thefuzz import processdf1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] })df2 = pd.DataFrame({ 'Langs_y': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] })df1['join'] = 1df2['join'] = 1dfFull = df1.merge(df2, on='join',).drop('join', axis=1)df2.drop('join', axis=1, inplace=True)dfFull['match'] = dfFull.apply(lambda x: x.Langs_y.find(x.Langs), axis=1).ge(0)dfFull = dfFull[dfFull['match']==True]dfResult = dfFull.groupby(["Langs", "Names"]).max().reset_index()[['Langs', 'Values' , 'Names' ]]print(dfResult)