Ignore last "/" in R regex

Question

Given the string "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275/", I need to generate a regex filter so that it ignores the last char if it is an "/" .

I tried the following regex "(http:////)?compras\\.dados\\.gov\\.br.*\\?.*(?<!//)" as of regexr.com/4om61, but it doesn´t work when I run in R as:

regex_exp_R   <- "(http:////)?compras\\.dados\\.gov\\.br.*\\?.*(?<!//)"
grep(regex_exp_R, "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275/", perl = T, value = T)

I need this to work in pure regex and grep function, without using any string R package. Thank you.

Simplified Case: After important contributions of you all, one last issue remains. Because I will use regex as an input in another friunction, the solution must work with pure regex and grep.

The remaining point is a very basic one: given the strings "a1bc/" or "a1bc", the regex must return "a1bc". Building on suggestions I received, I tried

grep(".*[^//]" ,"a1bc/", perl = T, value = T), but still get "a1bc/" instead of "a1bc". Any hints? Thank you.

Use this as regex: `(?:http://)?compras\.dados\.gov\.br.*\?[^/]*` There is no need to use lookbehind here. — anubhava, Nov 12 '19 at 16:47
`gsub('/$', '', x)` will make a copy of `x` without the `/` at the end (if there is one for the given element of `x`) — IceCreamToucan, Nov 12 '19 at 16:53
I am not completely clear on what you are looking for--what do you mean by ignore? Do you want it returned without the last `/` or do you want it to be an optional element of your search pattern. — Andrew, Nov 12 '19 at 16:55
Dear Andrew, I want the string returned without the last "/". Thank you — Fabio Correa, Nov 12 '19 at 17:01
@anubhava, yes, it did not work. Simplifying the problem and building on your proposed solution, when I try grep(".*[^//]" ,"abc/", perl = T, value = T), I get "abc/" instead of "abc". Thank you. — Fabio Correa, Nov 12 '19 at 17:14
With `grep()`, even if you correctly match part of the string, it will return the original string regardless. E.g., `grep("a", "abc", value = T)` — Andrew, Nov 12 '19 at 17:16
@FabioCorrea: Check this: https://stackoverflow.com/a/23901600/548225 — anubhava, Nov 12 '19 at 17:20
I edited the original question for a simplified last issue, after all contributions. Thank you. — Fabio Correa, Nov 12 '19 at 19:08
@FabioCorrea, it cannot work with only `grep()` because `grep()` is not designed to return partial matches. It is designed to return an index--`grep("c", letters)`--but it can return the value of the original string instead of the index--`grep("c", letters, value = T)`. I would suggest using another base function such as `gsub` (on its own, or with `grep`). Read the value header in `?grep` — Andrew, Nov 12 '19 at 19:30

Andrew · Answer 1 · 2019-11-12T17:20:48.497

If you want to return the string without the last / you can do this several ways. Below are a couple options using base R:

Using a back-reference in gsub() (sub() would work too here):

gsub("(.*?)/*$", "\\1", x)
[1] "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275"

# or, adapting your original pattern
gsub("((http:////)?compras\\.dados\\.gov\\.br.*\\?.*?)/*$", "\\1", x)
[1] "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275"

By position using ifelse() and substr() (this will proabbly be a little bit faster if scaling matters)

ifelse(substr(x, nchar(x), nchar(x)) == "/", substr(x, 1, nchar(x)-1), x)
[1] "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275"

Data:

x <- "http://compras.dados.gov.br/materiais/v1/materiais.html?pdm=08275/"

score 0 · Answer 2 · answered Dec 07 '19 at 09:15

Use sub to remove a trailing /:

x <- c("a1bc/", "a2bc")
sub("/$", "", x)

This changes nothing on a string that does not end in /.

As others have pointed out, grep does not modify strings. It returns a numeric vector of indices of the matched strings or a vector of the (unmodified) matched items. It's usually used to subset a character vector.

score 0 · Answer 3 · answered Dec 07 '19 at 10:07

You can use a negative look-behind at the end to ensure it doesn't end with the character you don't want (in this case, a /). The regex would then be:

.+(?<!\/)

You can view it here with your three input examples: https://regex101.com/r/XB9f7K/1/. If you only want it to match urls, then you would change the .+ part at the beginning to your url regex.

score -1 · Answer 4 · answered Nov 12 '19 at 20:09

-1

How about trying gsub("(.*?)/+$","\\1",s)?

answered Nov 12 '19 at 20:09

ThomasIsCoding

80,151
7
17
65

Ignore last "/" in R regex

4 Answers4