1

I use C# and need to parse an HTML to read the attributes into key value pairs. e.g given the following HTML snippet

<DIV myAttribute style="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none" id=my_ID anotherAttribNamedDIV class="someclass">

Please note that the attributes can be
1. key="value" pairs e.g class="someclass"
2. key=value pairs e.g id=my_ID (no quotes for values)
3. plain attributes e.g myAttribute, which doesn't have a "value"

I need to store them into a dictionary with key value pairs as follows
key=myAttribute value=""
key=style value="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"
key=id value="my_ID"
key=anotherAttribNamedDIV value=""
key=class value="someclass"

I am looking for regular expressions to do this.

Deduplicator
  • 43,322
  • 6
  • 62
  • 109
MPV
  • 11
  • 1
  • 1
  • 2
  • 1
    You can't parse [X]HTML with regex. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Homam Apr 11 '11 at 14:50
  • Don't use capitals for your html tags. – David Apr 11 '11 at 17:26

2 Answers2

11

You can do this with the HtmlAgilityPack

string myDiv = @"<DIV myAttribute style=""BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"" id=my_ID anotherAttribNamedDIV class=""someclass""></DIV>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myDiv);
HtmlNode node = doc.DocumentNode.SelectSingleNode("div");

Literal1.Text = ""; 

foreach (HtmlAttribute attr in node.Attributes)
{
    Literal1.Text += attr.Name + ": " + attr.Value + "<br />";
}
Martin Liversage
  • 100,656
  • 22
  • 201
  • 249
MikeM
  • 26,727
  • 4
  • 62
  • 80
-1
HtmlDocument docHtml = new HtmlWeb().Load(url);
lennon310
  • 12,178
  • 11
  • 41
  • 60
puru
  • 59
  • 1
  • 2