How to extract domain name from url?

Question

How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com

Dup: http://stackoverflow.com/questions/827024/how-do-i-extract-the-domain-out-of-an-url — Dennis Williamson, Mar 23 '10 at 07:04

score 94 · Answer 1 · edited Jul 05 '18 at 12:58

94

You can use simple AWK way to extract the domain name as follows:

echo http://example.com/index.php | awk -F[/:] '{print $4}'

OUTPUT: example.com

:-)

edited Jul 05 '18 at 12:58

sorin

149,293
163
498
754

answered Jul 08 '12 at 18:50

Soj

965
6
2

Nicee, this is so much better then the answers provided in https://stackoverflow.com/questions/6174220/parse-url-in-shell-script ! – bk138 Dec 06 '14 at 01:19
9

`echo http://example.com:3030/index.php | awk -F/ '{print $3}'` `example.com:3030` :-( – Ben Burns Mar 24 '15 at 09:16
you could split on `:` again to get it, but its not flexible enough to accept both with and without port. – chovy Dec 29 '15 at 03:30
| awk -F/ '{print $3}' | awk -F: '{print $1}' – Andrew Mackenzie Mar 15 '16 at 12:34
What if i need this - http(s)://example.com? I tried printing $1$3 it gives this - http:example.com (missing '//' after http) any idea? – 3AK Jun 16 '16 at 05:14
3

I got it by using this - echo `http://www.example.com/somedir/someotherdir/index.html` | cut -d'/' -f1,2,3 gives `http://www.example.com` – 3AK Jun 16 '16 at 05:44
5

To handle urls with and without ports: `awk -F[/:] '{print $4}'` – Michael Oct 06 '17 at 14:35
@Michael If I also want to remove www but not any other subdomain (e.g., www.example.com -> example.com but home.example.com -> home.example.com)? – d-b Jun 13 '18 at 06:15
On MacOS it makes sense to do this: `echo http://example.com/index.php | awk -F/ '{print $3}' | awk -F: '{print $1}'` – derFunk Aug 21 '18 at 15:46
in case the URL contains `&` wrap it around the quotes while passing as the parameter. – Vishrant Oct 09 '20 at 01:46
This does not work without http or https. for example example.com/index.php/test, would return blank – MaXi32 Jul 31 '21 at 11:06

score 29 · Answer 2 · edited Jul 02 '20 at 13:21

29

$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

see http://en.wikipedia.org/wiki/URI_scheme

edited Jul 02 '20 at 13:21

Flimm

115,689
38
227
240

answered Mar 24 '10 at 09:52

user300653

417
4
3

3

This works with or without port, deep paths and is still using bash. although it doesn't work on mac. – chovy Dec 29 '15 at 03:34
7 years later, this is still my go-to answer. – mwoodman Oct 19 '17 at 17:14
2

I use your suggestion with a little extra to strip out any subdomains that might be in the url ->> `echo http://www.mail.example.com:3030/index.php | sed -e "s/[^/]*\/\/$[^@]*@$\?$[^:/]*$.*/\2/" | awk -F. '{print $(NF-1) "." $NF}'` so I basically cut your output at the dot and take the last & second to last column and patch them back with the dot. – sakumatto Nov 01 '17 at 14:33
**This is the best answer!** I used this for a ping command that allows full URLs: https://unix.stackexchange.com/a/428990/20661 stripping only the `www.` subdomain – rubo77 Mar 08 '18 at 10:52
For those who want to get the port: `sed -e "s/[^/]*\/\/$[^@]*@$\?$[^:/]*$$:\([0-9]\{1,5\}$\)\?.*/\4/"` – wheeler Apr 26 '18 at 23:38
@sakumatto works fine, but how would it be to support "https://example.com.uk" for example? – sanNeck Apr 15 '21 at 17:11

musashiXXX · Answer 3 · 2010-03-29T19:42:16.763

28

basename "http://example.com"

Now of course, this won't work with a URI like this: http://www.example.com/index.html but you could do the following:

basename $(dirname "http://www.example.com/index.html")

Or for more complex URIs:

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

edited Mar 29 '10 at 19:42

answered Mar 29 '10 at 19:34

musashiXXX

3,922
4
21
23

5

I like cut -d'/' -f3 for its simplicity. – Jamie Kitson Mar 14 '12 at 13:40
1

fails if you add a port: `echo "http://www.example.com:8080/somedir/someotherdir/index.html" | cut -d'/' -f3` – chovy Dec 29 '15 at 03:31
got this - `http://www.example.com` by running - echo `http://www.example.com/somedir/someotherdir/index.html | cut -d'/' -f1,2,3` – 3AK Jun 16 '16 at 05:49
`basename $(dirname` does not work, if the url ends with the domain like: `basename $(dirname "http://www.example.com/")` will show just: `http:` – rubo77 Mar 08 '18 at 10:37

score 15 · Answer 4 · answered May 10 '16 at 14:02

15

echo $URL | cut -d'/' -f3 | cut -d':' -f1

Works for URLs:

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

answered May 10 '16 at 14:02

keyoxy

4,041
2
20
17

1

I found this more useful as it would return the url as it is when it doesn't contain 'http://' i.e. `abc.com` will be retained as `abc.com` – Udayraj Deshmukh Nov 05 '18 at 08:16
This is in fact the most intuitive, concise and effective method of all the answers here! – Robert Aug 15 '21 at 14:22
This extracts `host.example.com` rather than the domain name (`example.com`) asked for. – Lucas Apr 05 '22 at 19:19

score 10 · Answer 5 · answered May 24 '17 at 08:23

10

sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

e.g.

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com

answered May 24 '17 at 08:23

Armand

22,275
20
85
117

Boom! `HOST=$(sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< "$MYURL")` is fine in Bash – 4Z4T4R May 26 '17 at 17:58
I would like to crop www from domain. In this case, how should I change the command properly? – Ceylan B. Apr 25 '19 at 08:22
thanks for this, very handy, to capture path from URL I extend this slightly `sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\2_' <<< 'http://example.com'` this allow you to grab path from url sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\3_' <<< 'http://example.com/path/to/something' – Max Barrass May 05 '22 at 03:53

Dark Castle · Answer 6 · 2010-03-23T03:54:07.660

7

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

And if you just want the domain and not the full host + domain use this instead:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

edited Mar 23 '10 at 03:54

answered Mar 23 '10 at 03:47

Dark Castle

1,291
2
9
20

Of course the last one doesn't know about "www.example.co.uk" http://search.cpan.org/~nmelnick/Domain-PublicSuffix-0.04/lib/Domain/PublicSuffix.pm – Dennis Williamson Mar 23 '10 at 07:03
True, and if there is an API for it obviously I'd go with that anyway. Seems like the complete solution would actually have to know all valid country codes and check to see if the last post-dot region was a country code... – Dark Castle Mar 23 '10 at 13:56

score 6 · Answer 7 · answered Mar 23 '10 at 10:31

Instead of using regex to do this you can use python's urlparse:

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//' :

url = urlparse('//www.example.com/index.html','http')

So you will have to prepend those manually, i.e:

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

ghostdog74 · Answer 8 · 2010-03-23T03:29:18.520

4

there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url

eg

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

other ways, using sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

use awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

edited Mar 23 '10 at 03:29

answered Mar 23 '10 at 02:43

ghostdog74

307,646
55
250
337

Your method doesn't work! echo http://example.com/index.php | sed -r 's/http:\/\/|\///g' gives output example.comindex.php and NOT example.com on cygwin. please post a method that works – Ben Smith Mar 23 '10 at 03:11
3

my method doesn't work because your sample url is different !! and you did not provide more info on what type of urls you want to parse !!. you should write your question clearly providing input examples and describe what output you want next time! – ghostdog74 Mar 23 '10 at 03:31
2nd line seems to be incorrect. I copypasted the 2 first lines to my ubuntu shell and got _http://example.com/index.php*_ – jpeltoniemi Jun 25 '12 at 16:58

score 3 · Answer 9 · answered Mar 24 '10 at 09:26

3

The following will output "example.com":

URI="http://user@example.com/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

For more info on what you can do with Ruby's URI class you'd have to consult the docs.

answered Mar 24 '10 at 09:26

Michael Kohl

64,924
11
136
155

score 1 · Answer 10 · answered Dec 29 '15 at 03:45

Here's the node.js way, it works with or without ports and deep paths:

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

Can be called like:

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

Docs: https://nodejs.org/api/url.html

score 1 · Answer 11 · answered Oct 24 '16 at 13:31

One solution that would cover for more cases would be based on sed regexps:

echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'

That would work for URLs like: http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

score 0 · Answer 12 · answered Apr 22 '10 at 00:28

With Ruby you can use the Domainatrix library / gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2'
url = Domainatrix.parse(s)
url.domain
=> "kku"

great tool! :-)

score 0 · Answer 13 · answered Apr 02 '22 at 04:26

Pure Bash implementation without any sub-shell or sub-process:

# Extract host from an URL
#   $1: URL
function extractHost {
    local s="$1"
    s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
    echo -n "${s/%+(:*|\/*)}"
}

E.g. extractHost "docker://1.2.3.4:1234/a/v/c" will output 1.2.3.4

How to extract domain name from url?

13 Answers13

Linked

Related