0

I would like to ask a slight variation from a previous question (Remove new lines from string):

I would like to remove new lines from a string except when there is a space "   " before the text on a new line (which would indicate that this is a new paragraph. Here is an example:

   This is the first bit of text. It continues for a while until
there is a line break. It then continues for a bit longer but 
the line break also continues alongside it. 
   What you see here is the second paragraph. You'll notice that 
there is a space to mark the beginning of the paragraph. However
When joining these lines, a computer may not realize that the next
paragraph exists in this way. 
Community
  • 1
  • 1
Josh
  • 43
  • 8

3 Answers3

3
$result = preg_replace('/\n++(?! )/', ' ', $subject);

does exactly this.

Explanation:

\n++  # Match one or more newlines; don't backtrack
(?! ) # only if it's impossible to match a space afterwards
Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544
  • Thanks but this gives me strange characters and does not put the new paragraph on a new line:   This is the first bit of text. It continues for a while until there is a line break. It then continues for a bit longer but the line break also continues alongside it.   What you see here is the second paragraph. You'll notice that there is a space to mark the beginning of the paragraph. However When joining these lines, a computer may not realize that the next paragraph exists in this way. – Josh Sep 15 '13 at 15:35
  • 1
    +1 I like it, but perhaps you would care to explain the difference between `\n++` and `\n+` – pguardiario Sep 15 '13 at 15:35
  • @Josh: You're seeing [BOMs (Byte Order Mark)](http://en.wikipedia.org/wiki/Byte_order_mark). You seem to be having problems reading the file in its correct encoding (UTF-8) and have to solve that first. – Tim Pietzcker Sep 15 '13 at 17:51
0

Explode text by end-line char (use PHP_EOL constant) and use trim.

$lines = explode(PHP_EOL,$text);
$lines = array_map(function($line){
    return trim($line);
},$lines);
$text = implode(PHP_EOL,$lines);

// or if you are not familiar w/ array_map, simple use foreach
$lines = array();
foreach(explode(PHP_EOL,$text) as $line)
     $lines[] = trim($line);
$text = implode(PHP_EOL,$lines);
Michal Hatak
  • 789
  • 1
  • 7
  • 21
  • Thanks this join the lines but this does not keep the new paragraph on a new line. How would one do this? – Josh Sep 15 '13 at 17:03
0

Looks like embedded little endian BOM's in your file.

They have to be either removed or rewrite the file without them.

You may be able to strip them out with something with a regex like this \xFF\xFE.

Here is part of your text in a hex editor.

 00 61 00 6C 00 6F 00 6E 00 67 00 73 00 69 00 64 
 00 65 00 20 00 69 00 74 00 2E 00 20 00 0D 00 20 
 00 FF FE 20 00 FF FE 20 00 FF FE 57 00 68 00 61 
 00 74 00 20 00 79 00 6F 00 75 00 


 alongside it. 
    What you