0

I have following dataframe in R, with duplicated records I need to set as TRUE/FALSE in a new column:

library(tidyverse)
test_data <- data.frame(
  origin_country = c('US', 'US', 'BR', 'CN'),
  destin_country = c('DE', 'DE', 'UK', 'IT'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

The result should look like this:

expected_answer <- data.frame(
      origin_country = c('US', 'US', 'BR', 'CN'),
      destin_country = c('DE', 'DE', 'UK', 'IT'),
      year = c(2020, 2020, 2019, 2019),
      item = c(wheat, wheat, wheat, rice),
      value = c(2000, 2000, 3000, 2500),
      duplicated_record = c(TRUE, TRUE, FALSE, FALSE))

I tried mutate with a case_when, but I get all values TRUE, while the last two values should be FALSE

test_data <- mutate(test_data, duplicated_record = case_when(
  test_data$origin_country == test_data$origin_country & test_data$destin_country == test_data$destin_country & test_data$year == test_data$year & test_data$item == test_data$item ~ TRUE,
  TRUE ~ FALSE
))
view(test_data)

Comparing across rows proved beyond my reach, any ideas? Thanks in advance!

0 Answers0