0

I have some lines of code I am trying to remove some leading text from which appears like so:

Line 1: myApp.name;
Line 2: myApp.version
Line 3: myApp.defaults, myApp.numbers;

I am trying and trying to find a regex that will remove anything up to (but excluding) myApp.

I have tried various regular expressions, but they all seem to fail when it comes to line 3 (because myApp appears twice).

The closest I have come so far is:

.*?myApp

Pretty simple - but that matches both instances of myApp occurrences in Line 3 - whereas I'd like it to match only the first.

There's a few hundred lines - otherwise I'd have deleted them all manually by now.

Can somebody help me? Thanks.

keldar
  • 5,946
  • 9
  • 48
  • 69

2 Answers2

2

You need to add an anchor ^ which matches the starting point of a line ,

^.*?(myApp)

DEMO

Use the above regex and replace the matched characters with $1 or \1. So that you could get the string myApp in the final result after replacement.

Pattern explanation:

  • ^ Start of a line.
  • .*?(myApp) Shortest possible match upto the first myApp. The string myApp was captured and stored into a group.(group 1)
  • All matched characters are replaced with the chars present inside the group 1.
Avinash Raj
  • 166,785
  • 24
  • 204
  • 249
  • Thanks! Can you explain why the carat makes all the difference? I need to brush up on my regex skills. – keldar Sep 13 '14 at 17:22
  • I understand it signifies the beginning of a line, but cannot understand how it helps. – keldar Sep 13 '14 at 17:23
1

Your regular expression works in Perl if you add the ^ to ensure that you only match the beginnings of lines:

cat /tmp/test.txt  | perl -pe 's/^.*?myApp/myApp/g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;

If you wanted to get fancy, you could put the "myApp" into a group that doesn't get captured as part of the expression using (?=) syntax. That way it doesn't have to be replaced back in.

cat /tmp/test.txt  | perl -pe 's/^.*?(?=myApp)//g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;
Stephen Ostermiller
  • 21,408
  • 12
  • 81
  • 104
  • Thanks - why does the carat make all the difference? That was all that was missing from mine and I am struggling to see how it helps. – keldar Sep 13 '14 at 17:29
  • It means to match from the beginning of the string. Without it, there are two matches in the third line: `Line 3: myApp` and `.defaults, myApp`, both of which get removed. Only one of them starts the line, so the caret helps. – Stephen Ostermiller Sep 13 '14 at 17:52