How can I transform a column of characters written as
c("0 y", "0 m", "23 d", "0 y", "0 m", "8 d")
into number values
c(0, 0, 23, 0, 0, 0)
example of what I'm talking about
another example that has some single-digit dates
How can I transform a column of characters written as
c("0 y", "0 m", "23 d", "0 y", "0 m", "8 d")
into number values
c(0, 0, 23, 0, 0, 0)
example of what I'm talking about
another example that has some single-digit dates
Assuming y and m are always 0
Oy.date.diff <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d", "0 y, 0 m, 15 d")
as.numeric(gsub(" d", "", gsub("0 y, 0 m, ", "", Oy.date.diff)))
# [1] 12 13 12 15
Note that R does not allow variables (or columns) to begin with a digit so the first character is uppercase letter O.
You could use the lubridate package to parse the string to a period object then you can use the extractor functions to pull out whatever component(s) you might be interested in. Using dcarlson's example data:
library(lubridate)
day(period(c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d", "0 y, 0 m, 15 d") ))
[1] 12 13 12 15
We can use sub to capture the digits before the space followed by 'd'
as.integer(sub(".*\\s(\\d+) d", "\\1", v1))
#[[1] 12 13 12 15 12
Or with regmatches/regexpr
regmatches(v1, regexpr("(\\d+)(?= d$)", v1, perl = TRUE))
#[1] "12" "13" "12" "15" "12"
If we need to convert to all days, then
library(dplyr)
library(tidyr)
tibble(col1 = v1) %>%
tidyr::extract(col1, into = c('year', 'month', 'day'),
"^(\\d+) y, (\\d+) m, (\\d+) d$", convert = TRUE) %>%
transmute(days = year * 365 + month * 30 + day)
v1 <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d",
"0 y, 0 m, 15 d", "1 y, 2 m, 12 d")
You can try this capturing regex with gsub, which captures any numbers before a " d" and doesn't make any assumptions about the rest of the string:
x <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d", "0 y, 0 m, 15 d")
gsub("^.*(\\d+) d.*$", "\\1", x)
#> [1] "2" "3" "2" "5"