2

I want to remove new lines from some html (with php) except in <pre> tags where whitespace is obviously important.

Jonny Barnes
  • 405
  • 1
  • 12
  • 28
  • 5
    This is essentially html minification, which is the subject of another post: http://stackoverflow.com/questions/728260/html-minification. – David Andres Sep 13 '09 at 20:37

3 Answers3

10

It may be 3 years later, but... The following code will remove all line breaks and whitespace at long as it is outside of pre tags. Cheers!

function sanitize_output($buffer)
{
    $search = array(
        '/\>[^\S ]+/s', //strip whitespaces after tags, except space
        '/[^\S ]+\</s', //strip whitespaces before tags, except space
        '/(\s)+/s'  // shorten multiple whitespace sequences
        );
    $replace = array(
        '>',
        '<',
        '\\1'
        );

    $blocks = preg_split('/(<\/?pre[^>]*>)/', $buffer, null, PREG_SPLIT_DELIM_CAPTURE);
    $buffer = '';
    foreach($blocks as $i => $block)
    {
      if($i % 4 == 2)
        $buffer .= $block; //break out <pre>...</pre> with \n's
      else 
        $buffer .= preg_replace($search, $replace, $block);
    }

    return $buffer;
}

ob_start("sanitize_output");
smdrager
  • 7,137
  • 6
  • 39
  • 49
1

If the html is well formed, you can rely on the fact that <pre> tags aren't allowed to be nested. Make two passes: First you split the input into block of pre tags and everything else. You can use a regular expression for this task. Then you strip new lines from each non-pre block, and finally join them all back together.

Note that most html isn't well formed, so this approach may have some limits to where you can use it.

troelskn
  • 111,113
  • 23
  • 130
  • 153
1

Split the content up. This is easily done with...

$blocks = preg_split('/<(|\/)pre>/', $html);

Just be careful, because the $blocks elements won't contain the pre opening and closing tags. I feel that assume the HTML is valid is acceptable, and therefore you can expect the pre-blocks to be every other element in the array (1, 3, 5, ...). Easily tested with $i % 2 == 1.

Example "complete" script (modify as you need to)...

<?php
//out example HTML file - could just as easily be a read in file
$html = <<<EOF
<html>
  <head>
    <title>test</title>
  </head>
  <body>
    <h1>Title</h1>
    <p>
      This is an article about...
    </p>
    <pre>
      line one
      line two
      line three
    </pre>
    <div style="float: right:">
      random
    </div>
    </body>
</html>
EOF;

//break it all apart...
$blocks = preg_split('/<(|\/)pre>/', $html);

//and put it all back together again
$html = ""; //reuse as our buffer
foreach($blocks as $i => $block)
{
  if($i % 2 == 1)
    $html .= "\n<pre>$block</pre>\n"; //break out <pre>...</pre> with \n's
  else 
    $html .= str_replace(array("\n", "\r"), "", $block, $c);
}

echo $html;
?>
Sam Bisbee
  • 4,470
  • 19
  • 25