64

I am trying to use a regex for replacing text in a file (replace a full url with just protocol/domain/):

:%s/\(https\?:\/\/.*?\/\).*/\1/gc

Unfortunately .*? does not match the string, even trying to escape the ? quantifier? How should non-greedy quantifier be escaped in vim?

200_success
  • 9,549
  • 5
  • 51
  • 64
guido
  • 1,757
  • 1
  • 20
  • 25

3 Answers3

93

Vim's regex has special syntax for non-greedy versions of operators (it's kind of annoying, but you just have to memorize them): http://vimregex.com/#Non-Greedy

The non-greedy version of * is \{-}. So, simply replace .* with .\{-}:

:%s/\(https\?:\/\/.\{-}\/\).*/\1/gc
Doorknob
  • 15,237
  • 3
  • 48
  • 70
  • 9
    I can see now why some people prefer emacs. – tejasvi88 Feb 06 '21 at 04:59
  • 2
    What's the non-greedy version of \+ – minseong Mar 27 '21 at 12:33
  • This is really inexcusable. What ever happened to perldo? – James Bowery Oct 04 '21 at 17:30
  • 1
    @theonlygusti {-n,} is the non-greedy version of + – yaccob Aug 25 '22 at 14:23
  • The non-greedy version on \+ is \{-1,} – Cyrille Pontvieux Sep 22 '23 at 07:16
  • The comparison of Vim & Emacs is only possible when it is not understood that Vim is a modal editor, and therein lies its power @tejasvi88. Vim & Emacs are compared on their support for various customisations & automations upon which they compete very evenly, but at the end of the day, as a colleague once said, "Emacs is chords all the way down". Emacs is not modal so it will never compete against Vim with respect to Vim's most powerful feature. Once you understand this you choose to learn Vim's rare quirks with delight and start to see amazing congruence amongst its various mnemonics. – NeilG Sep 29 '23 at 02:03
  • Before someone else says it, you can customise Emacs to implement modal editing, as these Emacs advocates recommend. In the long term I avoid customisations for various reasons but also if I'm on a box I haven't customised I'm still competent on the vanilla Vim install. In general if it's designed for an approach, rather than customised to it, it's better. – NeilG Sep 29 '23 at 02:18
22

I prefer always breaking the problem into two steps:

/\v(https?):\/\/(.{-})\/.*        <-- Search
:%s,,Protocol:\1 - Domain:\2,g    <-- Substitution

Using very magic "\v" to avoid many backslashes, referencing the last search in substitution and changing substitution delimiter. All these changes make the code more readable.

enter image description here

SergioAraujo
  • 1,165
  • 11
  • 12
  • 1
    I like the very magic switch and the converse, the no magic switch (\V). If you know you're ramping up for a complex regex or one with edge cases like this prefixing with very magic simplifies, but on the other hand if you know you're mostly seeking for a simple literal match using \V can also save keystrokes. – NeilG Sep 29 '23 at 02:39
3

You can also use the [^\]+/. to prevent greediness. [^/] means "match anything expect /, and + repeats that one or more times..

:%s!\v^(https?)\://([^/]+)/.*$!Protocol:\1 \t Domain:\2!g

If I have / in the regex, I will use ! as a separator so that I don't have to escape /.

Example

Let's suppose you have the following urls:

http://academy.mises.org/courses/econgd/
http://academy.mises.org/moodle/course/view.php?id=172
http://acmsel.safaribooksonline.com/book/-/9781449358204?bookview=overview
http://acmsel.safaribooksonline.com/home
http://acordes.lacuerda.net/bebo__cigala/lagrimas_negras-2.shtml
http://acordes.lacuerda.net/jose_antonio_labordeta/albada.shtml
http://anarchitext.wordpress.com/category/new-middle-east/
https://courses.edx.org/courses/course-v1%3ADelftX%2BFP101x%2B3T2015/wiki/DelftX.FP101x.3T2015/resources-and-links/
https://cseweb.ucsd.edu/classes/wi11/cse230/lectures.html
https://developer.mozilla.org/en-US/docs/CSS
https://developers.google.com/edu/python
https://developers.google.com/structured-data/testing-tool/

Applying the substitution you would get that :

Protocol:http    Domain:academy.mises.org
Protocol:http    Domain:academy.mises.org
Protocol:http    Domain:acmsel.safaribooksonline.com
Protocol:http    Domain:acmsel.safaribooksonline.com
Protocol:http    Domain:acordes.lacuerda.net
Protocol:http    Domain:acordes.lacuerda.net
Protocol:http    Domain:anarchitext.wordpress.com
Protocol:https   Domain:courses.edx.org
Protocol:https   Domain:cseweb.ucsd.edu
Protocol:https   Domain:developer.mozilla.org
Protocol:https   Domain:developers.google.com
Protocol:https   Domain:developers.google.com
Martin Tournoij
  • 62,054
  • 25
  • 192
  • 271
Samir Sadek
  • 131
  • 2