0

Need to build a regular expression for the following:

x.x.x.x

Above represents some host name, where x could be a group of characters/numbers and there must be only 3 dots.

I tried few things but was failing in some cases.

kenorb
  • 137,499
  • 74
  • 643
  • 694
Anand
  • 19,738
  • 46
  • 124
  • 191
  • 1
    You could post what you tried. – M A Oct 29 '14 at 10:01
  • 1
    Why would you want to limit the number of dots on a hostname to three? That applies to IP addresses only, hostnames can be in form of i.dont.like.google.so.i.will.search.using.bing.com, for instance... Consider checking for three-dotted digit-only strings, or any-number-of-dots hostname strings. – Adam Adamaszek Oct 29 '14 at 10:20
  • There are going to be Unicode characters in host name in the future, and host names are not limited to only 4 part. The host part of a URL can contain IP (v4 and v6), or domain name. – nhahtdh Oct 29 '14 at 10:40

3 Answers3

1

You can try with this one:

String regex = "[a-z0-9]+[\\.]{1}[a-z0-9]+[\\.]{1}[a-z0-9]+[\\.]{1}[a-z0-9]+";

Explained:

  • [a-z0-9]+ matches a group or characters/numbers with minimal length of 1.
  • [\\.]{1} matches exactly one . sympol. The {1} denotes a length of 1, but you can use [.] or \\., as well.
  • the above two rules are repeated several times

As @Calvin Scherle mentioned, you can shorten the regex to:

String regex = "\\w+\\.\\w+\\.\\w+\\.\\w+";

Explained:

  • \w+ will match every group of word characters (including letters and digits) with minimal length of 1
  • \. will match the . symbol
  • In Java, the \ is a metacharacter and has to be escaped and thus we'll have \\w+ and \\.
  • the above two rules are repeated several times
Konstantin Yovkov
  • 60,548
  • 8
  • 97
  • 143
  • Why wouldn't just `\w+\.\w+\.\w+\.\w+` work? – Calvin Scherle Oct 29 '14 at 10:06
  • It will, you're right. – Konstantin Yovkov Oct 29 '14 at 10:07
  • Follow up question, why do you escape the dot with two backslashes within the character class, instead of just one? – Calvin Scherle Oct 29 '14 at 10:09
  • Do you know that `[\\.]{1]` is equivalent of `\\.` or `[.]`? There is no point in escaping dot twice (with backslash and []) and explicitly using `{1}`. – Pshemo Oct 29 '14 at 10:10
  • 1
    @CalvinScherle, because in a Java string, the \ is a metacharacter and has to be escaped. – Konstantin Yovkov Oct 29 '14 at 10:10
  • Ah, so in this case would you need `\\w+\\.\\w+\\.\\w+\\.\\w+` or is it not necessary outside of a character class? I have never used regex within Java before. – Calvin Scherle Oct 29 '14 at 10:11
  • 1
    Yes, that's right. Edited the question accordingly. @Pshemo, I knew that, but it's good to point that `{1}` denotes a *length* – Konstantin Yovkov Oct 29 '14 at 10:13
  • Okay great, thanks for all the clarifications! – Calvin Scherle Oct 29 '14 at 10:13
  • "*but it's good to point that {1} denotes a length*" I am not sure if I agree. If you would want to write regex which will find concatenated `foo` and `bar` strings would you written it as `(foo){1}(bar){1}` or as`foobar`? Being explicit is not always preferred approach, like in this case `[.]` is enough (it is simple and says everything we needed). Actually I don't see reason to use `{1}` anywhere. Even in situation where we would like to make it possessive quantifier, atomic groups would be probably better solution. – Pshemo Oct 29 '14 at 10:29
  • Symbols '_' and '-' is acceptable for host name. String pattern = "[a-z0-9_\\-]+\\.[a-z0-9_\\-]+\\.[a-z0-9_\\-]+\\.[a-z0-9]{2,6}"; – Vitalii Pro Oct 29 '14 at 10:34
1

You could test with String.matches("[a-zA-Z0-9]+\\.[a-zA-Z0-9]+\\.[a-zA-Z0-9]+\\.[a-zA-Z0-9]+")

It will match any alphanumeric characters separated by 3 dots.

  • [a-zA-Z0-9] is a group matching any alphanumeric characters.
  • + means "look for one or more occurences"
  • \\. means the literal . character
Mateon1
  • 358
  • 2
  • 14
1

Try this:

String regex = "[a-zA-Z0-9]+\\.[a-zA-Z0-9]+\\.[a-zA-Z0-9]+\\.[a-zA-Z0-9]"
Kshitij Kulshrestha
  • 1,942
  • 1
  • 18
  • 26