-1

I was browsing through stackoverflow and found a great regex code HERE. There may be other methods for isolation a youtube video id but I chose to work with regex for learning purposes. The regex code with input1 (shown below) ignores everything after the & character. This wipes out the video id and therefore giving the incorrect or an empty id result. Why is the regex clearing everything after &?

Error:

Input1: http://www.youtube.com/watch?feature&v=317a815FLWQ

Result1: http//www.youtube.com/watch?feature

Normal:

Input2: http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec

Result2: http://www.youtube.com/watch?v=spDj54kf-vY

Regex Code (With original comments)

$text = preg_replace('~
        # Match non-linked youtube URL in the wild. (Rev:20111012)
        https?://         # Required scheme. Either http or https.
        (?:[0-9A-Z-]+\.)? # Optional subdomain.
        (?:               # Group host alternatives.
          youtu\.be/      # Either youtu.be,
        | youtube\.com    # or youtube.com followed by
          \S*             # Allow anything up to VIDEO_ID,
          [^\w\-\s]       # but char before ID is non-ID char.
        )                 # End host alternatives.
        ([\w\-]{11})      # $1: VIDEO_ID is exactly 11 chars.
        (?=[^\w\-]|$)     # Assert next char is non-ID or EOS.
        (?!               # Assert URL is not pre-linked.
          [?=&+%\w]*      # Allow URL (query) remainder.
          (?:             # Group pre-linked alternatives.
            [\'"][^<>]*>  # Either inside a start tag,
          | </a>          # or inside <a> element text contents.
          )               # End recognized pre-linked alts.
        )                 # End negative lookahead assertion.
        [?=&+%\w-]*        # Consume any URL (query) remainder.
        ~ix', 
        '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>',
        $text);
    return $text;
Community
  • 1
  • 1
techAddict82
  • 463
  • 2
  • 37
  • 88
  • 2
    possible duplicate of [How to retrieve query string value from URL value stored in a variable?](http://stackoverflow.com/questions/3061978/how-to-retrieve-query-string-value-from-url-value-stored-in-a-variable) – Juan Mendes Jan 27 '13 at 00:09
  • @JuanMendes Please explain how is this a duplicate? – techAddict82 Jan 27 '13 at 00:12
  • 1
    If you call `parse_url`, you just to call `parse_str` on the query part, that will give you the value for the v query string parameter – Juan Mendes Jan 27 '13 at 00:15
  • @JuanMendes I am referring to the use of `regex` i never talked about `parse_url` in my question. IT may have answered my question if I was trying to figure it out through `parse_url`. – techAddict82 Jan 27 '13 at 00:34
  • @Dagon yes `parse_url` is the right way but back to my initial question to @JuanMendes why is it duplicate? – techAddict82 Jan 27 '13 at 00:41
  • Because you shouldn't do what you're doing with a ginormous RegExp that would only work for this case. You should just use `parse_url` and take a break with all the time you'll save. I don't think it's worth trying to figure out why your particular RegExp doesn't work, because it's not likely that this question/answer will be useful to others, and that's at the heart of SO. A better venue could be http://programmers.stackexchange.com/ – Juan Mendes Jan 30 '13 at 02:30

1 Answers1

6

Forget regex, use parse_url:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

Then use parse_str on the query part of the url to extract the variables.

EDIT

Here's a better demo:

$url = "http://www.youtube.com/watch?feature&v=317a815FLWQ";

$parsed_url = parse_url($url);
$query = $parsed_url['query'];

$parsed_query = array();
parse_str($query, $parsed_query);

var_dump($parsed_query);

Outputs:

array(2) {
  ["feature"]=>
  string(0) ""
  ["v"]=>
  string(11) "317a815FLWQ"
}

EDIT 2

Another example that would extract the ID from your second link given in the comments:

$url = "http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4";

$parsed_url = parse_url($url);
$fragment = $parsed_url['fragment'];
$fragment_parts = explode('/', $fragment);
$video_id = array_pop($fragment_parts);

print($video_id);

Outputs:

PPS-8DMrAn4

However, if you're asking for links from your users, you need to be very specific with them. The link in the second example isn't a video link, but if you want to be forgiving of your user's input, you could run the link through both code snippets and check if you got the ID.

thordarson
  • 5,345
  • 2
  • 16
  • 36
  • This would actually still require regex on the query bit to get the video id – Michel Feldheim Jan 27 '13 at 00:11
  • @MichelFeldheim Not at all, see my edit. – thordarson Jan 27 '13 at 00:12
  • @thordarson My error is with this example link: `http://www.youtube.com/watch?feature&v=317a815FLWQ`. Not the one shown in your answer – techAddict82 Jan 27 '13 at 00:15
  • @codexMachine23 Sorry, I updated the answer and the code uses your link now. This works reliably with any correctly formed link and any number of parameters. – thordarson Jan 27 '13 at 00:16
  • @thordarson thanks. But as you mention `correctly formed links` are not always used. With the regex it takes almost all cases. Will your code work with this link `http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4`? – techAddict82 Jan 27 '13 at 00:22
  • 1
    In that link's case you'd need to parse the fragment part of parse_url, but that's not part of the question, is it? And using native PHP methods nearly always trumps the use of regex. – thordarson Jan 27 '13 at 00:24
  • I added a code snippet that extracts your other link. – thordarson Jan 27 '13 at 00:35