2

does anyone know of a good regular expression to remove events from html.

For example the string:
"<h1 onmouseover="top.location='http://www.google.com">Large Text</h1> Becomes "<h1>Large Text</h1>
So HTML tags are preserved but events like onmouseover, onmouseout, onclick, etc. are removed.

Thanks in Advance!

James Cal
  • 31
  • 1
  • 3
  • 1
    -1 (X)HTML is not a regular language. If you're doing this as some sort of "sanitization", it's especially unsafe - there may be some edge cases which are parsed as JavaScript by certain tag soup parsers; an obvious candidate is IE's conditional comments. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – tc. Oct 02 '10 at 02:30

2 Answers2

5

How about:

data.replace(/ on\w+="[^"]*"/g, '');

Edit from the comments:

This is intended to be run on your markup as a one time thing. If you're trying to remove events dynamically during the execution of the page, that's a slightly different story. A javascript library like jQuery makes it extremely easy, though:

$('*').unbind();

Edit:

Restricting this to only within tags is a lot harder. I'm not confident it can be done with a single regex expression. However, this should get you by if no one can come up with one:

var matched;

do
{
    matched = false;
    data = data.replace(/(<[^>]+)( on\w+="[^"]*")+/g,
        function(match, goodPart)
        { 
            matched = true;
            return goodPart;
        });
} while(matched);

Edit:

I surrender at writing a single regex for this. There must be some way to check the context of a match without actually capturing the beginning of the tag in your match, but my RegEx-fu is not strong enough. This is the most elegant solution I'm going to come up with:

data = data.replace(/<[^>]+/g, function(match)
{
    return match.replace(/ on\w+="[^"]*"/g, '');
});
Ian Henry
  • 21,875
  • 4
  • 48
  • 61
  • very good answer. Just a feedback for james that it wont remove events on html that have been placed unobtrusively and also it wont remove some of the click events triggered through href='javascript:function()' – sushil bharwani Oct 02 '10 at 01:10
  • Thank you for answering Ian. I am just replacing raw html, so the regex looks good. However, is there a way to specify it so that it matches only if the string is inside a tag? currently the regex would replace "onclick events can be written as onclick="something" " to "onclick events can be written as ". Any ideas? Thanks – James Cal Oct 02 '10 at 01:34
  • I appreciate the effort! I think your final attempt will work perfectly for me. Thank you :) – James Cal Oct 02 '10 at 05:51
0

Here's a pure JS way to do it:

function clean(html) {
    function stripHTML(){
        html = html.slice(0, strip) + html.slice(j);
        j = strip;
        strip = false;
    }
    function isValidTagChar(str) {
        return str.match(/[a-z?\\\/!]/i);
    }
    var strip = false; //keeps track of index to strip from
    var lastQuote = false; //keeps track of whether or not we're inside quotes and what type of quotes
    for(var i=0; i<html.length; i++){
        if(html[i] === "<" && html[i+1] && isValidTagChar(html[i+1])) {
            i++;
            //Enter element
            for(var j=i; j<html.length; j++){
                if(!lastQuote && html[j] === ">"){
                    if(strip) {
                        stripHTML();
                    }
                    i = j;
                    break;
                }
                if(lastQuote === html[j]){
                    lastQuote = false;
                    continue;
                }
                if(!lastQuote && html[j-1] === "=" && (html[j] === "'" || html[j] === '"')){
                    lastQuote = html[j];
                }
                //Find on statements
                if(!lastQuote && html[j-2] === " " && html[j-1] === "o" && html[j] === "n"){
                    strip = j-2;
                }
                if(strip && html[j] === " " && !lastQuote){
                    stripHTML();
                }
            }
        }
    }
    return html;
}
winhowes
  • 7,335
  • 5
  • 28
  • 39