Select line of text starting with something via Regular Expressions

Question

I need a way to select "h1" everything after "h1" to replace it to nothing using regular expressions. I also need it to work for @import.

I need to change this:

    <link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Special Elite', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Special+Elite);
    <link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Quattrocento Sans', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Quattrocento+Sans);
    <link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>
    h1 { font-family: 'Smythe', arial, serif; }
    @import url(http://fonts.googleapis.com/css?family=Smythe);

To this:

    <link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
    <link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
    <link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>

Please see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — rerun, Apr 26 '11 at 17:07
@rerun: -1 to you for mindless parroting. Regexes are just fine for most specific HTML; they are just tricky on general HTML. If he has specific cases, there is nothing wrong with it. — tchrist, Apr 26 '11 at 18:19

drudge · Accepted Answer · 2011-04-26T17:25:34.160

1

This one should match on the lines you want to keep:

(<link.*css'>)

And this one should match on the lines you want to delete:

(h1 {.*})|(@import.*;)

edited Apr 26 '11 at 17:25

answered Apr 26 '11 at 17:12

drudge

33,309
7
32
42

1

I don't understand what the big deal is, I just have a 300 line document with a list of HTML data. Why can't we just pretend this is a string? – ThomasReggi Apr 26 '11 at 17:17
@Thomas: HTML is not a Regular language, so using **Regular** Expressions to match it is highly susceptible to breaking. – drudge Apr 26 '11 at 17:20
No no no! The patterns used in modern text-processing **ARE NOT REGULAR** so they certainly can be used on stuff like this. It’s just nobody stopped calling them regular expressions once they became non-textbook-regular, like with `(.*)\1`, for example. It’s just tricky in the general case is all. It is usually fairly easy in the specific case, so it is just fine to use them. – tchrist Apr 26 '11 at 18:18

Select line of text starting with something via Regular Expressions

1 Answers1