0

The following codes does not work, I am trying to retrive TR strings from a HTML table. Is there any issue with this code or any other solution available?

public static List<string> GetTR(string Tr)
{
    List<string> trContents = new List<string>();

    string regexTR = @"<(tr|TR)[^<]+>((\s*?.*?)*?)<\/(tr|TR)>";

    MatchCollection tr_Matches = Regex.Matches(Tr, regexTR, RegexOptions.Singleline);
    foreach (Match match in tr_Matches)
    {
        trContents.Add(match.Value);
    }

    return trContents;
}

Sample input string is given below:

"<TR><TD noWrap align=left>abcd</TD><TD noWrap align=left>SPORT</TD><TD align=left>5AT</TD></TR>"
abatishchev
  • 95,331
  • 80
  • 293
  • 426
Kannan
  • 3
  • 2
  • 5
    Required reading: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 - or in summary: **don't use regex to parse HTML** – Marc Gravell Jan 28 '11 at 15:23

4 Answers4

6

Parsing HTML with regular expressions is asking for trouble.

Do the job properly using something like HTML Agility Pack.

carla
  • 1,880
  • 1
  • 34
  • 41
LukeH
  • 252,910
  • 55
  • 358
  • 405
0

I think this regular expression would be more appropriate:

<(tr|TR)[^>]*>.*<\/\1>
ChaosPandion
  • 75,687
  • 16
  • 116
  • 154
0

this regex matches your input string:

<(tr|TR)+>((\s*?.*?)*?)<\/(tr|TR)>

i removed "[^<]"... not sure why you need that. also, try to add a non-greedy match...

however, it is better to go with something like HTML Agility Pak (if you want to keep your sanity) :)

Mrchief
  • 73,270
  • 19
  • 138
  • 185
0
(<(tr|TR)[^<]*>)(.+)((<\(tr|TR)[^<]*>)
R. Martinho Fernandes
  • 219,040
  • 71
  • 423
  • 503
Senad Meškin
  • 13,327
  • 4
  • 40
  • 54