0

I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.

I have a data frame and I have a column of type Character

  • The column is called " Eircode "

The postal codes go from D01 to D24 ( these are Dublin postal codes )

The values are inputted like so What you see in red is what needs to be removed.

I need to be able to remove the characters after the last digit.

My dataframe is called "MainSchools"

So if the " Eircode " is D03P820, I need to have it as D03 after my change.

I would preferably like to be able to do this with the Tidyverse package if possible.

mrwahl
  • 49
  • 7
  • 2
    Welcome to Stack Overflow! Can you please read and incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Especially the aspects of using `dput()` for the input. – wibeasley Nov 17 '21 at 15:09

2 Answers2

2

You may use sub here:

df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
                 stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df

  Eircode
1     D15
2 K78YD27
3     D03

The regex pattern used above matches and captures Dublin postal codes as follows:

D           match D
(?:
    0[1-9]  followed by 0-9
    |       OR
    1[0-9]  10-19
    |       OR
    2[0-4]  20-24
)

Then, we use \1 as the replacement in sub, leaving behind only the 3 character Dublin postal code.

Tim Biegeleisen
  • 451,927
  • 24
  • 239
  • 318
  • I tried the above code and it worked exactly like expected . Thank you . I also have values like `D6WHP92` which do not have the D06 format . How can I tailor the above to suit this requirement also? – mrwahl Nov 17 '21 at 15:32
  • Maybe use: `^(D(?:[0-9]|0[1-9]|1[0-9]|2[0-4])).*$` ... assuming that you _do_ want to target the value `D6WHP92` – Tim Biegeleisen Nov 17 '21 at 15:39
  • For values such as `A94FC44` or `K32VK33` essentially anything that is not D( plus the number ) I know I can sub them manually but what quick function can I use to simply delete any values that are not in my D01 / D1 format ? – mrwahl Nov 17 '21 at 15:51
  • Use the above regex pattern with `grepl`. You should open a new question at this point. – Tim Biegeleisen Nov 17 '21 at 15:55
  • `MainSchools$Eircode – mrwahl Nov 17 '21 at 16:27
0

I like to use the stringr package for such operations.

library(dplyr)
library(sitrngr)

df %>% mutate(Eircode = str_extract_all(Eircode, '^[A-Z][0-9]{2}'))

output with the data from @Tim Biegeleisen:

  Eircode
1     D15
2     K78
3     D03
GuedesBF
  • 5,927
  • 5
  • 12
  • 31