top of page

Combine Data in R: Merge Data

how to merge data in R

Introduction

Merging datasets in R is a fundamental task in data analysis, allowing you to combine information from different sources based on common variables.

For example, these three tables related based on the common variable “Product code” can be combined by merging them to create a unified dataset for analysis as shown in the resulting dataset.

an example of merged dataset

Note: These are the different types of merges you can use.

  • Inner Join: Retains only rows with matching keys in both datasets.

  • Left Join: Retains all rows from the left dataset and matched rows from the right dataset.

  • Right Join: Retains all rows from the right dataset and matched rows from the left dataset.

  • Full Join: Retains all rows from both datasets, filling in missing values with NA.

How to merge data in R?

In this case, I want to merge data from these two excel worksheets, namely Sales and Inventory from the workbook named “Online Retail Sales Data”.

Here’s a concise guide on how to perform the merges using the dplyr package.

First, I’ll need to load the necessary packages, in this case the dplyr package (for data manipulation) and readxl package (for reading excel files).

loading dplyr and readxl packages in R

Next, I will set the working directory (file path to the location of any files I need to read in R, in this case the excel dataset).

Then read the sales and inventory datasets as follows.

setting working directory and reading data in r

Now, let’s merge the two excel files using the inner join.

merging two datasets in r

Previewing the merged data using the View () function we’ve.

a preview of merged data in r

(The above view shows data from both the Sales and Inventory worksheets merged to create unified data for analysis)

Note: The following would apply when merging data using other merge types.

  • Left Join: left_join (sales, inventory, by = "Product code")

  • Right Join: right_join (sales, inventory, by = "Product code")

  • Full Join: full_join (sales, inventory, by = "Product code")

Note: When merging data on multiple columns, you can specify them as a vector as shown below.

Merged_data = inner_join (X, Y, by = c(“Variable A”, “Variable B”))

Conclusion

This guide provides a foundation for merging datasets in R effectively using the dplyr approach.

Some key considerations when merging data in R.

  • Ensure that the key variables have the same data type across datasets.

  • Handle missing values appropriately depending on your analysis needs.

If you like the work we do and would like to work with us, drop us an email on our contacts page and we’ll reach out!

Thank you for reading!

Blog.png
bottom of page