What is the difference between merge and join in python?

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    Pandas provide various facilities for easily combining Series or DataFrame with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations. 

    Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.

    Let’s first create two dataframes to show the effect of the two methods.

    Python3

    import pandas as pd

    left = pd.DataFrame([['a', 1], ['b', 2]], list('XY'), list('PQ'))

    right = pd.DataFrame([['c', 3], ['d', 4]], list('XY'), list('PR'))

    Output:

    What is the difference between merge and join in python?

    Now let’s see the effect of the two methods on the dataframes one by one.

    join

    The join method takes two dataframes and joins them on their indexes (technically, you can pick the column to join on for the left dataframe). If there are overlapping columns, the join will want you to add a suffix to the overlapping column name from the left dataframe. Our two dataframes do have an overlapping column name P.

    Example :

    Python3

    joined_df = left.join(right, lsuffix='_')

    print(joined_df)

    Output :

    What is the difference between merge and join in python?

    Notice the index is preserved and we have four columns.We can also separately specify a specific column of the left dataframe with a parameter on to use as the join key, but it will still use the index from the right.

    Example :

    Python3

    joined_df2 = left.reset_index().join(right, on='index', lsuffix='_')

    print(joined_df2)

    Output :

    What is the difference between merge and join in python?

    merge

    At a basic level, merge more or less does the same thing as join. Both methods are used to combine two dataframes together, but merge is more versatile, it requires specifying the columns as a merge key. We can specify the overlapping columns with parameter on, or can separately specify it with left_on and right_on parameters.

    Example :

    Python3

    merged_df = left.merge(right, on='P', how='outer')

    print(merged_df)

    Output :

    What is the difference between merge and join in python?

    Here, notice that the merge method destroyed the index. 

    We can explicitly specify that we are merging on the basis of index with the left_index or right_index parameter.

    Example :

    Python3

    merged_df = left.merge(right, left_index=True,

                           right_index=True, suffixes=['_', ''])

    print(merged_df)

    Output :

    What is the difference between merge and join in python?


    Which is better merge or join?

    The join method works best when we are joining dataframes on their indexes (though you can specify another column to join on for the left dataframe). The merge method is more versatile and allows us to specify columns besides the index to join on for both dataframes.

    What is the difference between join and merge and concat in pandas?

    For pandas. DataFrame , both join and merge operates on columns and rename the common columns using the given suffix. In terms of row-wise alignment, merge provides more flexible control. Different from join and merge , concat can operate on columns or rows, depending on the given axis, and no renaming is performed.

    Is join faster than merge pandas?

    merge) Of course, they are equivalent.

    What is merge in Python?

    The merge() method updates the content of two DataFrame by merging them together, using the specified method(s).