62

My program can accept data that has newline characters of \n, \r\n or \r (eg Unix, PC or Mac styles)

What is the best way to construct a regular expression that will match whatever the encoding is?

Alternatively, I could use universal_newline support on input, but now I'm interested to see what the regex would be.

hippietrail
  • 14,735
  • 16
  • 96
  • 147
Alan
  • 2,120
  • 1
  • 23
  • 30

2 Answers2

96

The regex I use when I want to be precise is "\r\n?|\n".

When I'm not concerned about consistency or empty lines, I use "[\r\n]+", I imagine it makes my programs somewhere in the order of 0.2% faster.

too much php
  • 85,414
  • 33
  • 126
  • 134
10

The pattern can be simplified to \r?\n for a little performance gain, as you probably don't have to deal with the old Mac style (OS 9 is unsupported since February 2002).

Diego V
  • 5,505
  • 6
  • 38
  • 44