Isolating a numeric id from a URL link

Question

I was browsing through stackoverflow and found a great regex code HERE. There may be other methods for isolation a youtube video id but I chose to work with regex for learning purposes. The regex code with input1 (shown below) ignores everything after the & character. This wipes out the video id and therefore giving the incorrect or an empty id result. Why is the regex clearing everything after &?

Error:

Input1: http://www.youtube.com/watch?feature&v=317a815FLWQ

Result1: http//www.youtube.com/watch?feature

Normal:

Input2: http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec

Result2: http://www.youtube.com/watch?v=spDj54kf-vY

Regex Code (With original comments)

$text = preg_replace('~
        # Match non-linked youtube URL in the wild. (Rev:20111012)
        https?://         # Required scheme. Either http or https.
        (?:[0-9A-Z-]+\.)? # Optional subdomain.
        (?:               # Group host alternatives.
          youtu\.be/      # Either youtu.be,
        | youtube\.com    # or youtube.com followed by
          \S*             # Allow anything up to VIDEO_ID,
          [^\w\-\s]       # but char before ID is non-ID char.
        )                 # End host alternatives.
        ([\w\-]{11})      # $1: VIDEO_ID is exactly 11 chars.
        (?=[^\w\-]|$)     # Assert next char is non-ID or EOS.
        (?!               # Assert URL is not pre-linked.
          [?=&+%\w]*      # Allow URL (query) remainder.
          (?:             # Group pre-linked alternatives.
            [\'"][^<>]*>  # Either inside a start tag,
          | </a>          # or inside <a> element text contents.
          )               # End recognized pre-linked alts.
        )                 # End negative lookahead assertion.
        [?=&+%\w-]*        # Consume any URL (query) remainder.
        ~ix', 
        '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>',
        $text);
    return $text;

possible duplicate of [How to retrieve query string value from URL value stored in a variable?](http://stackoverflow.com/questions/3061978/how-to-retrieve-query-string-value-from-url-value-stored-in-a-variable) — Juan Mendes, Jan 27 '13 at 00:09
If you call `parse_url`, you just to call `parse_str` on the query part, that will give you the value for the v query string parameter — Juan Mendes, Jan 27 '13 at 00:15
@JuanMendes I am referring to the use of `regex` i never talked about `parse_url` in my question. IT may have answered my question if I was trying to figure it out through `parse_url`. — techAddict82, Jan 27 '13 at 00:34
@Dagon yes `parse_url` is the right way but back to my initial question to @JuanMendes why is it duplicate? — techAddict82, Jan 27 '13 at 00:41
Because you shouldn't do what you're doing with a ginormous RegExp that would only work for this case. You should just use `parse_url` and take a break with all the time you'll save. I don't think it's worth trying to figure out why your particular RegExp doesn't work, because it's not likely that this question/answer will be useful to others, and that's at the heart of SO. A better venue could be http://programmers.stackexchange.com/ — Juan Mendes, Jan 30 '13 at 02:30

thordarson · Accepted Answer · 2013-01-27T00:32:56.703

6

Forget regex, use parse_url:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

Then use parse_str on the query part of the url to extract the variables.

EDIT

Here's a better demo:

$url = "http://www.youtube.com/watch?feature&v=317a815FLWQ";

$parsed_url = parse_url($url);
$query = $parsed_url['query'];

$parsed_query = array();
parse_str($query, $parsed_query);

var_dump($parsed_query);

Outputs:

array(2) {
  ["feature"]=>
  string(0) ""
  ["v"]=>
  string(11) "317a815FLWQ"
}

EDIT 2

Another example that would extract the ID from your second link given in the comments:

$url = "http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4";

$parsed_url = parse_url($url);
$fragment = $parsed_url['fragment'];
$fragment_parts = explode('/', $fragment);
$video_id = array_pop($fragment_parts);

print($video_id);

Outputs:

PPS-8DMrAn4

However, if you're asking for links from your users, you need to be very specific with them. The link in the second example isn't a video link, but if you want to be forgiving of your user's input, you could run the link through both code snippets and check if you got the ID.

edited Jan 27 '13 at 00:32

answered Jan 27 '13 at 00:07

thordarson

5,345
2
16
36

This would actually still require regex on the query bit to get the video id – Michel Feldheim Jan 27 '13 at 00:11
@MichelFeldheim Not at all, see my edit. – thordarson Jan 27 '13 at 00:12
@thordarson My error is with this example link: `http://www.youtube.com/watch?feature&v=317a815FLWQ`. Not the one shown in your answer – techAddict82 Jan 27 '13 at 00:15
@codexMachine23 Sorry, I updated the answer and the code uses your link now. This works reliably with any correctly formed link and any number of parameters. – thordarson Jan 27 '13 at 00:16
@thordarson thanks. But as you mention `correctly formed links` are not always used. With the regex it takes almost all cases. Will your code work with this link `http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4`? – techAddict82 Jan 27 '13 at 00:22
1

In that link's case you'd need to parse the fragment part of parse_url, but that's not part of the question, is it? And using native PHP methods nearly always trumps the use of regex. – thordarson Jan 27 '13 at 00:24
I added a code snippet that extracts your other link. – thordarson Jan 27 '13 at 00:35

Isolating a numeric id from a URL link

1 Answers1