开发手册 欢迎您!
软件开发者资料库

Python pandas 通过字段部分匹配合并连接两个DataFrame

本文主要介绍Python中,使用pandas通过字段的部分匹配,来连接合并两个 DataFrame的方法,以及相关的示例代码。

两个DataFrame:

import pandas as pddf1 = pd.DataFrame({                    'Langs': ['C','Py','Java'],                     'Values': [1,2,3]                  })df2 = pd.DataFrame({                    'Langs': ['C语言','Python','CJavaPy'],                     'Names': ['coder1','coder2','coder3']                  })

预期结果:

  Langs  Values   Names0     C       1  coder11    Py       2  coder22  Java       3  coder3

  Langs  Values   Names0     C       1  coder11     C       1  coder32  Java       3  coder33    Py       2  coder24    Py       2  coder3

1、使用thefuzz包和extractOne函数

安装thefuzz:

pip install thefuzz
pip install python-Levenshtein

实现代码:

import pandas as pdfrom thefuzz import processdf1 = pd.DataFrame({                    'Langs': ['C','Py','Java'],                     'Values': [1,2,3]                  })df2 = pd.DataFrame({                    'Langs': ['C语言','Python','CJavaPy'],                     'Names': ['coder1','coder2','coder3']                  })langs = lambda x: process.extractOne(x, df2["Langs"])[2] df1['Names'] = df2.loc[df1["Langs"].map(langs).values, 'Names'].valuesprint(df1)

2、使用merge()和groupby()

import pandas as pdfrom thefuzz import processdf1 = pd.DataFrame({                    'Langs': ['C','Py','Java'],                     'Values': [1,2,3]                  })df2 = pd.DataFrame({                    'Langs_y': ['C语言','Python','CJavaPy'],                     'Names': ['coder1','coder2','coder3']                  })df1['join'] = 1df2['join'] = 1dfFull = df1.merge(df2, on='join',).drop('join', axis=1)df2.drop('join', axis=1, inplace=True)dfFull['match'] = dfFull.apply(lambda x: x.Langs_y.find(x.Langs), axis=1).ge(0)dfFull = dfFull[dfFull['match']==True]dfResult = dfFull.groupby(["Langs", "Names"]).max().reset_index()[['Langs', 'Values' , 'Names' ]]print(dfResult)