3

The current expression validates a web address (HTTP), how do I change it so that an empty string also matches?

(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?
Mark Biek
  • 141,125
  • 53
  • 154
  • 199
Peter Morris
  • 16,193
  • 8
  • 71
  • 124
  • It didn't occur to me from your question that you were matching lines in a text file... I thought you were likely parsing the html of an http-response for links within and couldn't figure out the context of your 'empty string' goal until I read the answer you selected. Think different, eh? – Hardryv Nov 11 '11 at 13:22
  • in case it's helpful to anyone browsing in as I did, the best match string I've architected for URLs buried within HTML is "((http)s?:\/\/)([\w\.\-_]*(\/)?)*(#[\w\.\-_])?" -- I tested it against multiple popular sites with many links each, and it will also encompass the end-of-URL page-class-search tag – Hardryv Nov 11 '11 at 14:14

3 Answers3

6

If you want to modify the expression to match either an entirely empty string or a full URL, you will need to use the anchor metacharacters ^ and $ (which match the beginning and end of a line respectively).

^(|https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)$

As dirkgently pointed out, you can simplify your match for the protocol a little, so I've included that for you too.

Though, if you are using this expression from within a program or script, it may be simpler for you to use the languages own means of checking if the input is empty.

// in no particular language...
if input.length > 0 then
    if input matches <regex> then
        input is a URL
    else
        input is invalid
else
    input is empty
Alex Barrett
  • 15,546
  • 3
  • 50
  • 51
  • Accepted as the answer because you were the only person to mention the ^ and $ required, without which simply adding the ? made any pattern match. Thanks! – Peter Morris Feb 27 '09 at 04:49
0

Put the whole expression in parenthesis and mark it as optional (“?” quantifier, no or one repetition):

((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)?
Gumbo
  • 620,600
  • 104
  • 758
  • 828
0

Expr? where Expr is your URL matcher. Just like I would for http and https: https?. The ? is a known as a Quantifier -- you can look it up. From Wikipedia:

? The question mark indicates there is zero or one of the preceding element.

dirkgently
  • 104,737
  • 16
  • 128
  • 186