51

How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com

jww
  • 90,984
  • 81
  • 374
  • 818
Ben Smith
  • 511
  • 1
  • 4
  • 3

13 Answers13

94

You can use simple AWK way to extract the domain name as follows:

echo http://example.com/index.php | awk -F[/:] '{print $4}'

OUTPUT: example.com

:-)

sorin
  • 149,293
  • 163
  • 498
  • 754
Soj
  • 965
  • 6
  • 2
  • Nicee, this is so much better then the answers provided in https://stackoverflow.com/questions/6174220/parse-url-in-shell-script ! – bk138 Dec 06 '14 at 01:19
  • 9
    `echo http://example.com:3030/index.php | awk -F/ '{print $3}'` `example.com:3030` :-( – Ben Burns Mar 24 '15 at 09:16
  • you could split on `:` again to get it, but its not flexible enough to accept both with and without port. – chovy Dec 29 '15 at 03:30
  • | awk -F/ '{print $3}' | awk -F: '{print $1}' – Andrew Mackenzie Mar 15 '16 at 12:34
  • What if i need this - http(s)://example.com? I tried printing $1$3 it gives this - http:example.com (missing '//' after http) any idea? – 3AK Jun 16 '16 at 05:14
  • 3
    I got it by using this - echo `http://www.example.com/somedir/someotherdir/index.html` | cut -d'/' -f1,2,3 gives `http://www.example.com` – 3AK Jun 16 '16 at 05:44
  • 5
    To handle urls with and without ports: `awk -F[/:] '{print $4}'` – Michael Oct 06 '17 at 14:35
  • @Michael If I also want to remove www but not any other subdomain (e.g., www.example.com -> example.com but home.example.com -> home.example.com)? – d-b Jun 13 '18 at 06:15
  • On MacOS it makes sense to do this: `echo http://example.com/index.php | awk -F/ '{print $3}' | awk -F: '{print $1}'` – derFunk Aug 21 '18 at 15:46
  • in case the URL contains `&` wrap it around the quotes while passing as the parameter. – Vishrant Oct 09 '20 at 01:46
  • This does not work without http or https. for example example.com/index.php/test, would return blank – MaXi32 Jul 31 '21 at 11:06
29
$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

see http://en.wikipedia.org/wiki/URI_scheme

Flimm
  • 115,689
  • 38
  • 227
  • 240
user300653
  • 417
  • 4
  • 3
  • 3
    This works with or without port, deep paths and is still using bash. although it doesn't work on mac. – chovy Dec 29 '15 at 03:34
  • 7 years later, this is still my go-to answer. – mwoodman Oct 19 '17 at 17:14
  • 2
    I use your suggestion with a little extra to strip out any subdomains that might be in the url ->> `echo http://www.mail.example.com:3030/index.php | sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/" | awk -F. '{print $(NF-1) "." $NF}'` so I basically cut your output at the dot and take the last & second to last column and patch them back with the dot. – sakumatto Nov 01 '17 at 14:33
  • **This is the best answer!** I used this for a ping command that allows full URLs: https://unix.stackexchange.com/a/428990/20661 stripping only the `www.` subdomain – rubo77 Mar 08 '18 at 10:52
  • For those who want to get the port: `sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\)\(:\([0-9]\{1,5\}\)\)\?.*/\4/"` – wheeler Apr 26 '18 at 23:38
  • @sakumatto works fine, but how would it be to support "https://example.com.uk" for example? – sanNeck Apr 15 '21 at 17:11
28
basename "http://example.com"

Now of course, this won't work with a URI like this: http://www.example.com/index.html but you could do the following:

basename $(dirname "http://www.example.com/index.html")

Or for more complex URIs:

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

musashiXXX
  • 3,922
  • 4
  • 21
  • 23
  • 5
    I like cut -d'/' -f3 for its simplicity. – Jamie Kitson Mar 14 '12 at 13:40
  • 1
    fails if you add a port: `echo "http://www.example.com:8080/somedir/someotherdir/index.html" | cut -d'/' -f3` – chovy Dec 29 '15 at 03:31
  • got this - `http://www.example.com` by running - echo `http://www.example.com/somedir/someotherdir/index.html | cut -d'/' -f1,2,3` – 3AK Jun 16 '16 at 05:49
  • `basename $(dirname` does not work, if the url ends with the domain like: `basename $(dirname "http://www.example.com/")` will show just: `http:` – rubo77 Mar 08 '18 at 10:37
15
echo $URL | cut -d'/' -f3 | cut -d':' -f1

Works for URLs:

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345
keyoxy
  • 4,041
  • 2
  • 20
  • 17
  • 1
    I found this more useful as it would return the url as it is when it doesn't contain 'http://' i.e. `abc.com` will be retained as `abc.com` – Udayraj Deshmukh Nov 05 '18 at 08:16
  • This is in fact the most intuitive, concise and effective method of all the answers here! – Robert Aug 15 '21 at 14:22
  • This extracts `host.example.com` rather than the domain name (`example.com`) asked for. – Lucas Apr 05 '22 at 19:19
10
sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

e.g.

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com
Armand
  • 22,275
  • 20
  • 85
  • 117
  • Boom! `HOST=$(sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< "$MYURL")` is fine in Bash – 4Z4T4R May 26 '17 at 17:58
  • I would like to crop www from domain. In this case, how should I change the command properly? – Ceylan B. Apr 25 '19 at 08:22
  • thanks for this, very handy, to capture path from URL I extend this slightly `sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\2_' <<< 'http://example.com'` this allow you to grab path from url sed -E -e 's_.*://([^/@]*@)?([^/:]+)(.*)_\3_' <<< 'http://example.com/path/to/something' – Max Barrass May 05 '22 at 03:53
7
#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

And if you just want the domain and not the full host + domain use this instead:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}
Dark Castle
  • 1,291
  • 2
  • 9
  • 20
  • Of course the last one doesn't know about "www.example.co.uk" http://search.cpan.org/~nmelnick/Domain-PublicSuffix-0.04/lib/Domain/PublicSuffix.pm – Dennis Williamson Mar 23 '10 at 07:03
  • True, and if there is an API for it obviously I'd go with that anyway. Seems like the complete solution would actually have to know all valid country codes and check to see if the last post-dot region was a country code... – Dark Castle Mar 23 '10 at 13:56
6

Instead of using regex to do this you can use python's urlparse:

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//' :

url = urlparse('//www.example.com/index.html','http')

So you will have to prepend those manually, i.e:

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"
Garns
  • 416
  • 3
  • 4
4

there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url

eg

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

other ways, using sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

use awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com
ghostdog74
  • 307,646
  • 55
  • 250
  • 337
  • Your method doesn't work! echo http://example.com/index.php | sed -r 's/http:\/\/|\///g' gives output example.comindex.php and NOT example.com on cygwin. please post a method that works – Ben Smith Mar 23 '10 at 03:11
  • 3
    my method doesn't work because your sample url is different !! and you did not provide more info on what type of urls you want to parse !!. you should write your question clearly providing input examples and describe what output you want next time! – ghostdog74 Mar 23 '10 at 03:31
  • 2nd line seems to be incorrect. I copypasted the 2 first lines to my ubuntu shell and got _http://example.com/index.php*_ – jpeltoniemi Jun 25 '12 at 16:58
3

The following will output "example.com":

URI="http://user@example.com/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

For more info on what you can do with Ruby's URI class you'd have to consult the docs.

Michael Kohl
  • 64,924
  • 11
  • 136
  • 155
1

Here's the node.js way, it works with or without ports and deep paths:

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

Can be called like:

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

Docs: https://nodejs.org/api/url.html

chovy
  • 65,853
  • 48
  • 201
  • 247
1

One solution that would cover for more cases would be based on sed regexps:

echo http://example.com/index.php | sed -e 's#^https://\|^http://##' -e 's#:.*##' -e 's#/.*##'

That would work for URLs like: http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

0

With Ruby you can use the Domainatrix library / gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2'
url = Domainatrix.parse(s)
url.domain
=> "kku"

great tool! :-)

Tilo
  • 1
0

Pure Bash implementation without any sub-shell or sub-process:

# Extract host from an URL
#   $1: URL
function extractHost {
    local s="$1"
    s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
    echo -n "${s/%+(:*|\/*)}"
}

E.g. extractHost "docker://1.2.3.4:1234/a/v/c" will output 1.2.3.4

vbem
  • 1,705
  • 2
  • 9
  • 8