51

I have a problem with the replaceAll for a multiline string:

String regex = "\\s*/\\*.*\\*/";
String testWorks = " /** this should be replaced **/ just text";
String testIllegal = " /** this should be replaced \n **/ just text";

testWorks.replaceAll(regex, "x"); 
testIllegal.replaceAll(regex, "x"); 

The above works for testWorks, but not for testIllegal!? Why is that and how can I overcome this? I need to replace something like a comment /* ... */ that spans multiple lines.

mtk
  • 12,450
  • 15
  • 69
  • 109
Robert
  • 545
  • 1
  • 5
  • 6
  • And what about this string: `"String s = \"/*\"; /* comment */"` – Bart Kiers Nov 11 '10 at 12:28
  • Well the point is that the mathing regex should match only in the beginning of the string. Now it looks like this:(?s)^\\s*/\\*.*\\*/ Not sure though, if to make it reluctant (?s)^\\s*/\\*.*?\\*/ – Robert Nov 11 '10 at 12:41

3 Answers3

89

You need to use the Pattern.DOTALL flag to say that the dot should match newlines. e.g.

Pattern.compile(regex, Pattern.DOTALL).matcher(testIllegal).replaceAll("x")

or alternatively specify the flag in the pattern using (?s) e.g.

String regex = "(?s)\\s*/\\*.*\\*/";
rogerdpack
  • 56,766
  • 33
  • 241
  • 361
mikej
  • 63,686
  • 16
  • 149
  • 130
  • 1
    This is the best solution because it does not interact with the regex string itself, you just specify a flag. I did not know that, Thanks! – Robert Nov 11 '10 at 12:31
  • 1
    If you have multiple "multi-line" comments, this method will remove text between those comments as well. Use the method posted by Boris instead. – lepe Nov 29 '11 at 03:58
14

Add Pattern.DOTALL to the compile, or (?s) to the pattern.

This would work

String regex = "(?s)\\s*/\\*.*\\*/";

See Match multiline text using regular expression

rogerdpack
  • 56,766
  • 33
  • 241
  • 361
tchrist
  • 76,727
  • 28
  • 123
  • 176
7

The meta character . matches any character other than newline. That is why your regex does not work for multi line case.

To fix this replace . with [\d\D] that matches any character including newline.

Code In Action

codaddict
  • 429,241
  • 80
  • 483
  • 523
  • 1
    Swapping in `[\d\D]` for `.` (which normally means `[^\n]`, at least in `Pattern.UNIX_LINES` mode) strikes me as inappropriate because it is not obvious what it is doing, because it is 6 chars for 1, and because there are other ways of doing this. – tchrist Nov 11 '10 at 12:25