I have following dataframe in R, with duplicated records I need to set as TRUE/FALSE in a new column:
library(tidyverse)
test_data <- data.frame(
origin_country = c('US', 'US', 'BR', 'CN'),
destin_country = c('DE', 'DE', 'UK', 'IT'),
year = c(2020, 2020, 2019, 2019),
item = c('wheat', 'wheat', 'wheat', 'rice'),
value = c(2000, 2000, 3000, 2500))
The result should look like this:
expected_answer <- data.frame(
origin_country = c('US', 'US', 'BR', 'CN'),
destin_country = c('DE', 'DE', 'UK', 'IT'),
year = c(2020, 2020, 2019, 2019),
item = c(wheat, wheat, wheat, rice),
value = c(2000, 2000, 3000, 2500),
duplicated_record = c(TRUE, TRUE, FALSE, FALSE))
I tried mutate with a case_when, but I get all values TRUE, while the last two values should be FALSE
test_data <- mutate(test_data, duplicated_record = case_when(
test_data$origin_country == test_data$origin_country & test_data$destin_country == test_data$destin_country & test_data$year == test_data$year & test_data$item == test_data$item ~ TRUE,
TRUE ~ FALSE
))
view(test_data)
Comparing across rows proved beyond my reach, any ideas? Thanks in advance!