开发手册 欢迎您!
软件开发者资料库

Python Pandas 实现两个DataFrame连接(INNER (LEFT RIGHT FULL) OUTER) JOIN

本文主要介绍Python Pandas DataFrame实现两个DataFrame之间连接,类似关系数据中(INNER(LEFT RIGHT FULL) OUTER) JOIN,以及相关内联接、外联接、左联接、右联接、全联接等示例代码。

示例数据:

np.random.seed(0)left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})    right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})

left

  key     value0   A  1.7640521   B  0.4001572   C  0.9787383   D  2.240893

right

  key     value0   B  1.8675581   D -0.9772782   E  0.9500883   F -0.151357

1、实现内连接(INNER JOIN)

left.merge(right, on='key')# 或者
# left.merge(right, on='key', how='inner')

  key   value_x   value_y0   B  0.400157  1.8675581   D  2.240893 -0.977278

httpswwwwonherocom

2、实现左连接( LEFT JOIN)

left.merge(right, on='key', how='left')

  key   value_x   value_y0   A  1.764052       NaN1   B  0.400157  1.8675582   C  0.978738       NaN3   D  2.240893 -0.977278

httpswwwwonherocom

3、实现右连接(RIGHT JOIN)

left.merge(right, on='key', how='right')

  key   value_x   value_y0   B  0.400157  1.8675581   D  2.240893 -0.9772782   E       NaN  0.9500883   F       NaN -0.151357

httpswwwwonherocom

4、实现全连接(FULL OUTER JOIN)

left.merge(right, on='key', how='outer')

  key   value_x   value_y0   A  1.764052       NaN1   B  0.400157  1.8675582   C  0.978738       NaN3   D  2.240893 -0.9772784   E       NaN  0.9500885   F       NaN -0.151357

httpswwwwonherocom

5、排除左连接

先执行左外连接,然后只过滤(不包括)来自左的行。

(left.merge(right, on='key', how='left', indicator=True)     .query('_merge == "left_only"')     .drop('_merge', 1))

  key   value_x  value_y0   A  1.764052      NaN2   C  0.978738      NaN

left.merge(right, on='key', how='left', indicator=True)

  key   value_x   value_y     _merge0   A  1.764052       NaN  left_only1   B  0.400157  1.867558       both2   C  0.978738       NaN  left_only3   D  2.240893 -0.977278       both

httpswwwwonherocom

6、排除右连接

执行右外连接,然后只过滤(不包括)来自右的行。

(left.merge(right, on='key', how='right', indicator=True)     .query('_merge == "right_only"')     .drop('_merge', 1))

  key  value_x   value_y2   E      NaN  0.9500883   F      NaN -0.151357

httpswwwwonherocom

7、排除全连接

(left.merge(right, on='key', how='outer', indicator=True)     .query('_merge != "both"')     .drop('_merge', 1))

  key   value_x   value_y0   A  1.764052       NaN2   C  0.978738       NaN4   E       NaN  0.9500885   F       NaN -0.151357

httpswwwwonherocom

8、键列的不同名称

如果键列的名称不同(例如,left用keyLeft和right用keyRight代替),key那么将必须指定left_on和right_on作为参数,而不是on:

示例数据:

left2 = left.rename({'key':'keyLeft'}, axis=1)right2 = right.rename({'key':'keyRight'}, axis=1)

left2

  keyLeft     value0       A  1.7640521       B  0.4001572       C  0.9787383       D  2.240893

right2

  keyRight     value0        B  1.8675581        D -0.9772782        E  0.9500883        F -0.151357

例如,

left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')

  keyLeft   value_x keyRight   value_y0       B  0.400157        B  1.8675581       D  2.240893        D -0.977278

9、避免在输出中重复键列

在keyLeftfromleft和keyRightfrom上进行合并时right,如果只希望在输出中使用keyLeft或keyRight(但不同时使用)中的任何一个,则可以通过将索引设置,

示例数据:

left2 = left.rename({'key':'keyLeft'}, axis=1)right2 = right.rename({'key':'keyRight'}, axis=1)

left2

  keyLeft     value0       A  1.7640521       B  0.4001572       C  0.9787383       D  2.240893

right2

  keyRight     value0        B  1.8675581        D -0.9772782        E  0.9500883        F -0.151357

例如,

left3 = left2.set_index('keyLeft')left3.merge(right2, left_index=True, right_on='keyRight')

    value_x keyRight   value_y0  0.400157        B  1.8675581  2.240893        D -0.977278

10、仅合并其中一个的单个列 DataFrames

right3 = right.assign(newcol=np.arange(len(right)))

right3

  key     value  newcol0   B  1.867558       01   D -0.977278       12   E  0.950088       23   F -0.151357       3

如果只需要合并“new_val”(不包含任何其他列),通常可以在合并之前仅将其子集作为子集:

left.merge(right3[['key', 'newcol']], on='key')

  key     value  newcol0   B  0.400157       01   D  2.240893       1

如果要进行左外部联接,则性能更高的解决方案将涉及map:

# left['newcol'] = left['key'].map(right3.set_index('key')['newcol']))left.assign(newcol=left['key'].map(right3.set_index('key')['newcol']))

  key     value  newcol0   A  1.764052     NaN1   B  0.400157     0.02   C  0.978738     NaN3   D  2.240893     1.0

11、合并多列

要加入对多列,指定列表on(或left_on和right_on)。

left.merge(right, on=['key1', 'key2'] ...)

列名称不同

left.merge(right, left_on=['lkey1', 'lkey2'], right_on=['rkey1', 'rkey2'])