3

I am looking for a Regular expression in PHP that removes all style attributes from "p" and "span" tags but leave style attributes from "td" and so on untouched.

I have this now: (this finds all style="..." stuff)

$pattern = '[style=("[^"]*")]';
$content = '<td style="blabla">
               <p style="blabla">
                    text <span style=blabla">more text</span>
               </p>
            </td>';

To be used in preg_replace()

$newcontent = preg_replace($pattern, '', $content);

But this removes the style in the td too and I don't want that.

So in the end after replacement I want to have

<td style="blabla">
    <p>text <span>more text</span>
    </p>
</td>
Julesezaar
  • 2,015
  • 1
  • 17
  • 18
  • Could there be any other attributes, such as IDs and Classes *before* the `style`? If not, you could use [` – Kaspar Lee Oct 18 '16 at 14:38
  • 1
    Another way is to go with DOMElement::removeAttribute, and then remove your DOM attributes [http://php.net/manual/en/domelement.removeattribute.php] In your case search for p or span with attributes – Denis Solakovic Oct 18 '16 at 14:41
  • I checked this \ with your example and it works perfectly – Armin.G Oct 18 '16 at 14:44
  • the content could be anything. The user copy / pastes styled text with sometimes tables in it into an article. And I want to remove only text styling. Not table styling – Julesezaar Oct 18 '16 at 14:46
  • 1
    Use a parser and remove the `style` attribute, or you could remove all attributes from the `p`s and `span`s. Don't use regex. – chris85 Oct 18 '16 at 14:48
  • @Armin.G , that removes the whole p and span tag. I need to remove the style attribute only – Julesezaar Oct 18 '16 at 14:48
  • 1
  • ok, it shows that using preg_replace() is a wrong way, (maybe), but there is always true ways in programming. – Armin.G Oct 18 '16 at 15:25

6 Answers6

2

Here is a pretty simple example using DomDocument rather than a regular expression.

$doc = new DomDocument();
$doc->loadHtml($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach (['p', 'span'] as $tag) {
    foreach ($doc->getElementsByTagName($tag) as $node) {
        $node->removeAttribute('style');
    }
}
$result = $doc->saveHtml();

You may not need the LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD options, depending on your actual content, but those were needed for this specific example.

Don't Panic
  • 39,820
  • 10
  • 58
  • 75
  • 3
    this worked ;) It's not a regular expression but it did solve my problem. – Julesezaar Oct 18 '16 at 15:12
  • This solution is working but if you consider performance it is not that good. – krasipenkov Oct 18 '16 at 15:15
  • 3
    Indeed there are multiple ways to solve this problem, as indicated in the possible duplicate link. The best one for the situation depends on multiple factors, including whether you are more concerned with whether it works reliably/consistently or just quickly. – Don't Panic Oct 18 '16 at 15:23
  • 2
    Why'd this get downvoted? This is the correct approach here. – chris85 Oct 18 '16 at 15:27
  • At localhost this works fine but it does give me an error on the live site.: syntax error, unexpected '[' – Julesezaar Oct 18 '16 at 15:27
  • 1
    @Julesezaar Your live site probably is running older PHP change `[` to `array(` and `]` to `)` in the `foreach`. – chris85 Oct 18 '16 at 15:28
  • @chris85 I assume the downvote came from someone offended at my insinuation that the dom parser solution would work more reliably. – Don't Panic Oct 18 '16 at 15:41
  • And changing to the old `array()` syntax is fine for a short-term fix, but really the solution is to use a supported PHP version. If the "new" `[]` syntax causes an error, then you're using PHP 5.3, which has been [EOL for over two years](http://php.net/eol.php). – Don't Panic Oct 18 '16 at 15:46
1

The following worked for me:

$text = preg_replace('/style=\\"[^\\"]*\\"/', '',$text);
Skully
  • 2,160
  • 3
  • 17
  • 30
Isma'el
  • 35
  • 6
0

It is well know that you shouldn't parse/traverse/modify xml or html with regex and ideally you should use a html/xml parser.

However, if you don't want to use a parser, then you can use a simple regex like this:

<(p|span).*?>

With a replacement string:

<\1>

Working demo

$re = '/<(p|span).*?>/';
$str = '<td style="blabla">
               <p style="blabla">
                    tekst <span style=blabla">more text</span>
               </p>
            </td>';
$subst = '<\\1>';

$result = preg_replace($re, $subst, $str);
Federico Piazza
  • 28,830
  • 12
  • 78
  • 116
0

To come back on my own old question, the trick is to work with group captures. So groups go between ( ) and can be used afterwards as '$1' etc. In this example we have 3 groups (group1)(group2)(group3) as $1$2$3

$txt = '<td style="abc">
           <p id="test" style="abc"></p>
        </td>';
$txt = preg_replace('/(<p\s.*style=")(.*)(")/', "$1$3", $txt);

The result is

<td style="abc">
    <p width="5" style=""></p>
</td>
Julesezaar
  • 2,015
  • 1
  • 17
  • 18
-1

Hi I got a similar thing working in java script. I used the console to out put my results in the snipped due to them being HTML

var str = "<td style=\"blabla\"><p style=\"blabla\">text <span style=\"blabla\">more text</span></p></td>";
str = str.replace(/((<p|<span)[^])(style{1}={1}"{1}[^"]+"{1})/g, "$1");
console.log(str);

The RegEx is ((<p|<span)[^])(style{1}={1}"{1}[^"]+"{1})you can then replace on the second group to get rid of the style with <p> and <span> tags. I have a small amount of testing and have tried to make the RegEx as specific as possible to avoid removing incorrect things. . I hope this helps.

Mr Lister
  • 44,061
  • 15
  • 107
  • 146
milo.farrell
  • 630
  • 6
  • 18
-2

This is the pattern:

<(p|span)\s+[\w"=\(\);'}{ ]*style=("[^"]*")

It will work even for html like this one:

<span data="data" style="abc"></p>
<p id="test" onclick="test({'name'});" style="abv"></p>
<td style="abvc"></td>
<span></span>
<tr style="abc"></tr>
<p style="bcd"></p>
krasipenkov
  • 1,967
  • 1
  • 9
  • 13