1

I'm a newbie. I'm trying to find the full name in either one of the lines below and without the Obituary for

<h2>Obituary for John Doe</h2>
<h1>James Michael Lee</h1>

My regex is this.

(<h1>(.+?)<\/h1>|<h2>Obituary\sfor\s(.+?)<\/h2>)

What I'm getting is still Obituary for John Doe. How to remove the Obituary for?

mewiben39
  • 57
  • 6

4 Answers4

2

Many roads lead to Rome, you can probably do something like this:

<h(?:1>|2>Obituary\sfor\s)\K[^><]+

See this demo at regex101. The matches will be in $out[0].

\K resets beginning of the reported match. See the SO Regex FAQ for more.

bobble bubble
  • 11,625
  • 2
  • 24
  • 38
0

Could you do something like this without using regex?

/**
 * @description : Function extracts names from html header tags
 * @example : "<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>" -> ["John Doe", "James Michael Lee"]
 * @param $html string
 * @return []string : list of full names
*/
function extractFullNames($html) {
    $regex = '/<h[1-2]>(.*?)<\/h[1-2]>/';
    preg_match_all($regex, $html, $matches);
    $names = $matches[1];
    $names = array_map('trim', $names);
    $names = array_map('strip_tags', $names);
    $names = array_map('strtolower', $names);
    $names = array_map('ucwords', $names);
    $names = array_map('removeObituary', $names); 
    return $names;
}

/**
 * @description : Function used to remove "Obituary For" if present
 * @example : "Obituary For John Doe" -> "John Doe"
 * @param $name string
 * @return string : name without "Obituary For"
*/
function removeObituary($name) {
    $name = str_replace("Obituary For ", "", $name);
    return $name;
} 

// Test cases
$html = '<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>';
$names = extractFullNames($html);
$expected = ['John Doe', 'James Michael Lee'];

echo "Expected: " . implode(', ', $expected) . "\n";
echo "Actual: " . implode(', ', $names);
PCDSandwichMan
  • 753
  • 6
  • 19
0

i'd probably do something like

/^(?:\s<[^>]*?>)?(?:.*\s+for\s+)?([^<]*)/

and extract $1 (the first match group).

chaos
  • 119,149
  • 33
  • 300
  • 308
-1

Use

<h\d+>(?:Obituary\s+for\s+)?\K[^<>]+

See regex proof.

enter image description here

Ryszard Czech
  • 16,363
  • 2
  • 17
  • 35