-3

I've always been bad using regular expressions, but I think now I have a legitimate need, and I'm not sure if it's possible to accomplish it using them.

I want a regex that, when executed, returns all components of a given URL. It's not for validating format: precondition is that a correct URL is passed (usually it will be location.href).

Desired components are:

  • scheme
  • domain
  • port (optional)
  • path
  • query string

Bonus:

  • query string arguments separately
  • fragment
  • user / password

Examples:

/regex/.exec('http://www.stackoverflow.com/questions/1/regex-for-getting-url-components-in-javascript') --> ["http://stackoverflow.com/questions/30868359/regex-for-getting-url-components-in-javascript", "http", "www.stackoverflow.com", undefined, "questions/30868359/regex-for-getting-url-components-in-javascript"]

/regex/.exec('https://localhost:8080/?a=1&b=2') --> ["http://www.stackoverflow.com/questions/1/regex-for-getting-url-components-in-javascript", "https", "localhost", "8080", "", "a=1&b=2"]


EDIT:

In order to clarify, what I need is a small code which creates an object which represents a URl. Then I must be able to modify components such as parameters, schema, etc, and get the result again as a string. AFAIK, I can't do this with native location object, but I must be wrong.

The size of the code must be specially small, as this must be loaded synchronously in the header of the page. It's possible that it will be finally copied in every page instead of included as an external file. So, at first, I prefer to not rely on external dependencies.

sinuhepop
  • 19,241
  • 17
  • 65
  • 104
  • 1
    I cannot believe there is not a library in javascript somewhere that parses a URL... You'd be much better off using such a thing than trying to do this with a regex yourself – arco444 Jun 16 '15 at 13:04
  • 1
    The location object already does this for you. Reason you need a regular expression? – epascarello Jun 16 '15 at 13:04
  • 1
    @epascarello — The `location` object only holds the URL of the current page. It's useless if you want to work with any other URL. – Quentin Jun 16 '15 at 13:08
  • @arco444: In my very particular case, I need to get data from current url (sometimes another one) and modify some part. Unfortunately this must be done in a synchronous loaded javascript, so I don't want do depend on libraries if it could be done with a single regex. – sinuhepop Jun 16 '15 at 13:09
  • "Unfortunately this must be done in a synchronous loaded javascript, so I don't want do depend on libraries" — Nothing about "library" implies "asynchronous" – Quentin Jun 16 '15 at 13:10
  • @Quentin that is wrong, you can create a link and get all of that information. – epascarello Jun 16 '15 at 13:10
  • @epascarello — Really? Hmm. (pokes at docs). I can find [url utils](https://developer.mozilla.org/en-US/docs/Web/API/URLUtils/hash) but that seems to be supported only by Firefox, which makes it a little impractical. – Quentin Jun 16 '15 at 13:13
  • http://stackoverflow.com/questions/8498592/extract-root-domain-name-from-string/8498668#8498668 take a look at this might help you – Kaushik Jun 16 '15 at 13:13
  • Have you tried anything yourself? I suggest you read [how to ask](http://stackoverflow.com/help/how-to-ask). – Anonymous Jun 16 '15 at 13:19
  • @Quentin posted an answer below that shows it, but it really should not be an answer because it is not a reg exp as OP asked for. – epascarello Jun 16 '15 at 13:21
  • And looks like a dupe of http://stackoverflow.com/questions/27745/getting-parts-of-a-url-regex – epascarello Jun 16 '15 at 13:23
  • @epascarello: You're right. I've Iooked for it but haven't found this question. I was thinking about removing it, but after the answer you posted I think it's intereseting to leave it in order to help who needs it specifically for Javascript. – sinuhepop Jun 16 '15 at 15:39

3 Answers3

5

Basic idea to get the parts without a regular expression.

function parseUrl(url) {
  var a = document.createElement("a");
  a.href = url;

  var obj = {};
  var parts = ['protocol', 'hostname', 'host', 'pathname', 'port', 'search', 'hash', 'href'];

  parts.forEach(function(val) {
    obj[val] = a[val]
  });

  return obj;

}

console.log(parseUrl("http://localhost:8080/foo?bar=1"));
console.log(parseUrl("http://www.example.com#test"));

Based on your edit, you can always make an object and override the toString() method to output the changes. (Below is probably not perfect, wrote it real quick)

    function parseUrl(url) {
      var a = document.createElement("a");
      a.href = url;

      function URL() {} 
      var obj = new URL();
      var parts = ['protocol', 'hostname', 'host', 'pathname', 'port', 'search', 'hash', 'href'];

      parts.forEach(function(val) {
        obj[val] = a[val]
      });
      
      URL.prototype.toString = function () {  
         var  out = this.protocol + "//" + this.host + this.pathname + this.search + this.hash;
          return out;
      
      }

      return obj;

    }

    var xxx = parseUrl("http://localhost:8080/foo?bar=1");
    xxx.hash = "#newHash";
    xxx.search = "?boom=new";
    console.log(xxx.toString());
epascarello
  • 195,511
  • 20
  • 184
  • 225
  • As you said, it's not exactly what I was asking for, but I didn't know about this option. It will be part of my solution. Thanks. – sinuhepop Jun 16 '15 at 15:23
2

Modern browsers support the URL API exactly for this purpose:

url = new URL('http://www.stackoverflow.com/questions/1/regex-for-getting-url-components-in-javascript')

document.write("<pre>" + JSON.stringify(url,null,3))

Look here for polyfills.

If you need something small and regex-based, go with Steven Levithan's parseUri.

georg
  • 204,715
  • 48
  • 286
  • 369
  • Interesting. I didn't know about URL object, but not all browsers support it. Your link contains a couple of regex which are mostly what I was asking for. – sinuhepop Jun 16 '15 at 15:21
1

Got it, I think:

var regex = /([a-zA-Z]+?)\:\/\/(.+?)\:?([\d]+?)?\/(.+?)?\?(.*)/ig;
var url = 'http://www.example.com/path/morePath/PathHere/?version=1#hashHash';
var url2 = 'http://localhost:8080/?test=1';
console.log(regex.exec(url));
regex = /([a-zA-Z]+?)\:\/\/(.+?)\:?([\d]+?)?\/(.+?)?\?(.*)/ig;
console.log(regex.exec(url2));
Jamie Barker
  • 7,965
  • 3
  • 27
  • 62