0

I need a way to select "h1" everything after "h1" to replace it to nothing using regular expressions. I also need it to work for @import.

I need to change this:

    <link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Special Elite', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Special+Elite);
    <link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Quattrocento Sans', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Quattrocento+Sans);
    <link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Smythe', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Smythe);

To this:

    <link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
    <link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
    <link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>
ThomasReggi
  • 48,606
  • 78
  • 218
  • 380
  • in which programming/scripting language? – drudge Apr 26 '11 at 17:06
  • Please see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – rerun Apr 26 '11 at 17:07
  • 1
    I'm just using search and replace in my text editor. – ThomasReggi Apr 26 '11 at 17:10
  • @rerun: -1 to you for mindless parroting. Regexes are just fine for most specific HTML; they are just tricky on general HTML. If he has specific cases, there is nothing wrong with it. – tchrist Apr 26 '11 at 18:19

1 Answers1

1

This one should match on the lines you want to keep:

(<link.*css'>)

And this one should match on the lines you want to delete:

(h1 {.*})|(@import.*;)
drudge
  • 33,309
  • 7
  • 32
  • 42
  • 1
    I don't understand what the big deal is, I just have a 300 line document with a list of HTML data. Why can't we just pretend this is a string? – ThomasReggi Apr 26 '11 at 17:17
  • @Thomas: HTML is not a Regular language, so using **Regular** Expressions to match it is highly susceptible to breaking. – drudge Apr 26 '11 at 17:20
  • No no no! The patterns used in modern text-processing **ARE NOT REGULAR** so they certainly can be used on stuff like this. It’s just nobody stopped calling them regular expressions once they became non-textbook-regular, like with `(.*)\1`, for example. It’s just tricky in the general case is all. It is usually fairly easy in the specific case, so it is just fine to use them. – tchrist Apr 26 '11 at 18:18