Updated April 3, 2023
Introduction to Python Pandas Join
The process of join could be denoted as a way of merging the columns of two dataframes as per buisness needs. Basically the pandas dataset have a very large set of SQL like functionality. this makes pandas dataframe very structured and very much closely related to SQL tables. In pandas the joins can be achieved by two ways one is using the join() method and other is using the merge() method. Specifically to denote both join() and merge are very closely related and almost can be used interchangeably used to attain the joining needs in python.
Python Pandas Join Methods with Examples
Python pandas join methods with example are given below:
1. Join() in Pandas
The join method is used to join two columns of a dataframes either on its index or by the one which acts as key column.
Syntax:
DataFrame.join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Parameter | Description |
Self | Denoted is join has to happen on the same dataframe |
Orther | Mentions the orther dataframe which needs to be joined |
On | Specifies the key on which join has to happen |
How | Mentions the type of join |
left_suffix | Suffix to use from left frame’s overlapping columns. |
right_suffix | Suffix to use from right frame’s overlapping columns. |
sort | Sort the output |
Example #1
Code:
import pandas as pd
df1 = pd.DataFrame({'A':['K0','K1','K4','K7'],
'B':[45,23,45,2]})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(df1)
print("")
print(df2)
print("")
print(df1.set_index('A').join(df2.set_index('key'),lsuffix='_caller', rsuffix='_other'))
print(df2.set_index('key').join(df1.set_index('A'),lsuffix='_caller', rsuffix='_other'))
Output:
Code Explanation: Here the two dataframes are declared namely DF1 and DF2. DF1 is made of two columns and whereas DF2 is made of three columns. the first column in both the dataframes is acting as key for it. here keys are of the range ‘K*’. There is some point of mutuality in the keys of both the dataframes. so a join method is used to join the the dataframes. the join method works as like it takes a key column from first dataframe and a key column from the second dataframe and makes a join there. Here in this example the join is performed on both ways were the first dataframe is pulled with values of second dataframe and similarly the second dataframe is also pulled with values from second dataframe. the resulting joined data is printed on the console for both the instances.
Example #2 – (left join, Right join)
Code:
import pandas as pd
df1 = pd.DataFrame({'DF1_KEY':['K0','K1','K4','K7','K9'],
'B':[45,23,45,56,5]})
df2 = pd.DataFrame({'DF2_key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(df1)
print("")
print(df2)
print(" LEFT JOIN: ")
print(df1.join(df2,how='left', lsuffix='_caller', rsuffix='_other'))
print(" RIGHT JOIN: ")
print(df1.join(df2,how='right', lsuffix='_caller', rsuffix='_other'))
Output:
Code Explanation: Here the two dataframes are left joined and right joined separately and then printed on to the console. The kind of join to happen is considered using the type of join mentioned in the ‘how’ parameter of the function.
2. merge() in Pandas
The Merge method in pandas can be used to attain all database oriented joins like left join , right join , inner join etc.
Merge Method | SQL Equivalent | Description |
left | LEFT OUTER JOIN | ALL RECORDS OF LEFT AND MATCHING RECORDS IN RIGHT |
right | RIGHT OUTER JOIN | ALL RECORDS OF RIGHT AND MATCHING RECORDS IN LEFT |
outer | FULL OUTER JOIN | ALL RECORDS FROM LEFT AND RIGHT |
inner | INNER JOIN | ONLY MACTHING RECORDS IN LEFT AND RIGHT |
Syntax:
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
Parameter | Description |
Left | Dataframe on left end |
Right | Dataframe on Right end |
How | Mentions whether it needs to be a left join , right join , inner join or outer join. |
left_on | Keys from the left dataframe |
right_on | Keys from the Right dataframe |
left_index | Joining index from the left |
right_index | Joining index from the Right |
sort | Sort the output |
Example #1 – (Inner Join)
Code:
import pandas as pd
left_df = pd.DataFrame({'key':['K0','K1','K4','K7'],
'B':[45,23,45,2]})
right_df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(left_df)
print("")
print(right_df)
print("")
print(pd.merge(left_df,right_df,on=['key','key']))
print("")
print(pd.merge(right_df,left_df,on=['key','key']))
Output:
Code Explanation: Here the dataframes used for the join() method example is used again here, the dataframes are joined on a specific key using the merge method. here a inner join happens which means the matching rows from both the dataframes are alone been displayed. here join is achieved by two means where the datasets are interchanged on their left , right position and printed accordingly.
Example #2 – (Left Join)
Code:
import pandas as pd
left_df = pd.DataFrame({'key':['K0','K1','K4','K7'],
'B':[45,23,45,2]})
right_df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(left_df)
print("")
print(right_df)
print("")
print(pd.merge(left_df,right_df,on=['key','key'],how='left'))
Output:
Code Explanation: In this instance the left join is been performed and printed on to the console. the left join is achieved by setting the ‘how’ Parameter of the merge method as ‘left’ . the outcome of the merge operation is printed on to the console.
Example #3 – (Right Join)
Code:
import pandas as pd
left_df = pd.DataFrame({'key':['K0','K1','K4','K7'],
'B':[45,23,45,2]})
right_df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(left_df)
print("")
print(right_df)
print("")
print(pd.merge(left_df,right_df,on=['key','key'],how='right'))
Output:
Code Explanation: In this instance the Right join is been performed and printed on to the console. the Right join is achieved by setting the ‘how’ Parameter of the merge method as ‘right’ . the outcome of the merge operation is printed on to the console.
Example #4 – (Outer Join)
Code:
import pandas as pd
left_df = pd.DataFrame({'key':['K0','K1','K4','K7'],
'B':[45,23,45,2]})
right_df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
'A': ['1', '2', '4', '23', '2', '78'],
'B': ['4', '41', '32', '23', '74', '5']})
print(left_df)
print("")
print(right_df)
print("")
print(pd.merge(left_df,right_df,on=['key','key'],how='outer'))
Output:
Code Explanation: In this instance the Outer join is been performed and printed on to the console. the Outer join is achieved by setting the ‘how’ Parameter of the merge method as ‘outer’ . the outcome of the merge operation is printed on to the console.
Recommended Articles
We hope that this EDUCBA information on “Python Pandas Join” was beneficial to you. You can view EDUCBA’s recommended articles for more information.