-3
<a data-track='' _sp= class=s-item__link href=get_this_href>...</a>

With the above link, the data-track contains some json data. The _sp= could contain numbers/letters and a period (.). The class is s-item__link.

I would need the get_this_href and then I can go from there.

This is the regex I tried... but im stuck from here.

<a\b(?=[^>]* class="[^"]*(?<=[" ])s-item__link[" ])(?=[^>]* href="([^"]*))

Here is an example: https://regex101.com/r/rVPeUI/1

$link = ""; //url im scraping
$html = file_get_html($link);
//find is part of simple_html_dom.php. im saying each li item is an $item.

foreach ($html->find('li.s-item    ') as $item) {
    //$item contains the decent amount of nested divs with spans and links.
}
letsCode
  • 2,706
  • 1
  • 10
  • 33
  • what is `$html`? What is `find()`? What is `$item`? Is this a string, an object instance? IMO, you shouldn't be using regex for this, the answer you've got below is perfect. – Eaten by a Grue Oct 16 '20 at 02:59
  • updated with explaination – letsCode Oct 16 '20 at 03:05
  • I've heard of simple_html_dom.php but can't say I've ever used it. looks to me like you could easily use domdocument and xpath to do this (as suggested by @Wasif) and abandon your 3rd party library. it's only a slight modification to your code. – Eaten by a Grue Oct 16 '20 at 03:11
  • Does this answer your question? [Grabbing the href attribute of an A element](https://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element) – Eaten by a Grue Oct 16 '20 at 03:17

1 Answers1

2

Without using Regex, its better to use DOMDocument() to parse HTML tags:

$doc = DOMDocument::loadHTML($html);
$xpath = new DOMXPath($doc);
$query = "//a[@class='s-item__link']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
  echo "HREF " . $entry->getAttribute("href");
}
Wasif
  • 13,656
  • 3
  • 11
  • 30
  • Thank you for your response. This doesnt play with with the current code I have. I am looping through a UL tag. I will update my question with this code. – letsCode Oct 16 '20 at 02:55