0

I have a code like this

<div class="rgz">
  <div class="xyz">
  </div>
  <div class="ckh">
  </div>
</div>

The class ckh wont appear everytime. Can someone suggest the regex to get the data of fiv rgz. Data inside ckh is not needed but the div wont appear always. Thanks in advance

Anish Joseph
  • 1,026
  • 3
  • 9
  • 24
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Amber May 08 '11 at 19:27
  • dont use regex..this is not good practice...which server language u r using?? PHP or other – diEcho May 08 '11 at 19:28
  • Ya i am using php get this data – Anish Joseph May 08 '11 at 19:32

2 Answers2

1

Regex is probably not your best option here.

A javascript framework such as jquery will allow you to use CSS selectors to get to the element your require, by doing something like

$('.rgz').children().last().innerHTML
David
  • 7,970
  • 6
  • 47
  • 70
  • actualy i am crawling data from a site using curl. There is nothing to do with jquery or javascript – Anish Joseph May 08 '11 at 19:47
  • Still I dont think regex is the best option. it will bite you in the ass eventually when the data you are scraping changes.. you maybe should look at a php dom parser... a quick google found me this..http://simplehtmldom.sourceforge.net/ but I am not a PHP guy so I cant vouch for it – David May 08 '11 at 19:49
1

@diEcho and @Dve are correct, you should learn to use something like the native DOMdocument class rather than using regex. Your code will be easier to read and maintain, and will handle malformed HTML much better.

Here is some sample code which may or may not do what you want:

$contents = '';
$doc = new DOMDocument();
$doc->load($page_url);
$nodes = $doc->getElementsByTagName('div');
foreach ($nodes as $node)
{
   if($node->hasAttributes()){
      $attributes = $element->attributes;
      if(!is_null($attributes)){
         foreach ($attributes as $index=>$attr){
            if($attr->name == 'class' && $attr->value == 'rgz'){
               $contents .= $node->nodeValue;
            }
         }
      }
   } 
}
jisaacstone
  • 4,154
  • 2
  • 24
  • 38