238

I need to split a string base on delimiter - and .. Below are my desired output.

AA.BB-CC-DD.zip ->

AA
BB
CC
DD
zip 

but my following code does not work.

private void getId(String pdfName){
    String[]tokens = pdfName.split("-\\.");
}
weston
  • 52,585
  • 20
  • 135
  • 197
Thang Pham
  • 37,175
  • 74
  • 195
  • 283
  • Based on what you said, it looks like it is working fine. What is your desired output? – Jeff May 13 '11 at 14:59
  • 3
    @Jeff: He showed his desired output (`AA` / `BB` / `CC` ...) – T.J. Crowder May 13 '11 at 15:02
  • 2
    Are you sure? I interpreted that as his current output, not his desired output. Maybe its time to stand up and walk around a little bit. – Jeff May 13 '11 at 15:04
  • 1
    @Jeff: Sorry for the confusion, I updated my post to clear your misunderstand. – Thang Pham May 13 '11 at 15:05
  • Regex will degrade your performance. I would recommend write a method which will go character by character and split string if need. You can optimize this futher to get log(n) performance. – Princesh Feb 16 '13 at 17:55

14 Answers14

352

I think you need to include the regex OR operator:

String[]tokens = pdfName.split("-|\\.");

What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] - or .

ahmednabil88
  • 15,196
  • 11
  • 51
  • 83
Richard H
  • 36,221
  • 37
  • 107
  • 137
  • 11
    why we require two backslashes ?? – pjain Feb 21 '16 at 13:16
  • 10
    The `.` character in regex means any character other than new line. http://www.tutorialspoint.com/java/java_regular_expressions.htm In this case, however, they wanted the actual character `.`. The two backslashes indicate that you are referring to `.`. The backslash is an escape character. – Monkeygrinder Feb 21 '16 at 19:25
  • 5
    for normal cases it would be `.split("match1|match2")`, (eg. `split("https|http")`), \\ is to escape the special char `.` in above case – prayagupa Sep 14 '18 at 22:17
  • or generally, you can use `pdfName.split("\\W");` as below @Peter Knego answer – ahmednabil88 Apr 10 '19 at 21:08
  • 4
    use `[-.]` instead of `-|\\.` – Saeed Jul 04 '19 at 06:01
66

Try this regex "[-.]+". The + after treats consecutive delimiter chars as one. Remove plus if you do not want this.

Peter Knego
  • 79,504
  • 10
  • 122
  • 151
  • 9
    @Lurkers: The only reason Peter didn't have to escape that `-` was that it's the *first* think inside the `[]`, otherwise there would need to be a backslash in front of it (and of course, to put a backslash in front of it, we need *two* because this is a string literal). – T.J. Crowder May 13 '11 at 18:32
  • 2
    I think this answer is better than the accepted one, because when you use the logical operator |, the problem is that one of your delimiters can be a part of your result 'tokens'. This will not happen with Peter Knego's [-.]+ – Jack' Jan 03 '18 at 16:05
30

You can use the regex "\W".This matches any non-word character.The required line would be:

String[] tokens=pdfName.split("\\W");
Varun Gangal
  • 409
  • 4
  • 3
  • it doesn't work for me ` String s = "id(INT), name(STRING),". Using \\W here creates an array of length 6 where as it should be only 4 – user3527975 Mar 02 '15 at 03:25
  • 2
    This will also break when the input contains Unicode character. It's best to only include the actual delimiter, instead of a "grab all" with `\W`. – nhahtdh Oct 07 '15 at 07:23
15

Using Guava you could do this:

Iterable<String> tokens = Splitter.on(CharMatcher.anyOf("-.")).split(pdfName);
ColinD
  • 106,341
  • 29
  • 198
  • 201
14

The string you give split is the string form of a regular expression, so:

private void getId(String pdfName){
    String[]tokens = pdfName.split("[\\-.]");
}

That means to split on any character in the [] (we have to escape - with a backslash because it's special inside []; and of course we have to escape the backslash because this is a string). (Conversely, . is normally special but isn't special inside [].)

T.J. Crowder
  • 959,406
  • 173
  • 1,780
  • 1,769
  • You don't need to escape the hyphen in this case, because `[-.]` couldn't possibly be interpreted as a range. – Alan Moore May 13 '11 at 15:40
  • 1
    @Alan: Because it's the very first thing in the class, that's quite true. But I always do, it's too easy to go back later and add something in front of it without thinking. Escaping it costs nothing, so... – T.J. Crowder May 13 '11 at 18:31
  • do you know how to escape the brackets? I have String "[200] Engineering" that I want to split into "200" , "Engineering" – scottysseus Jul 30 '13 at 21:03
  • 3
    Oh wow I got it...I had to use two backslashes instead of one. `String[] strings = codes.get(x).split("\\[|\\]| ");` – scottysseus Jul 30 '13 at 21:05
7

For two char sequence as delimeters "AND" and "OR" this should be worked. Don't forget to trim while using.

 String text ="ISTANBUL AND NEW YORK AND PARIS OR TOKYO AND MOSCOW";
 String[] cities = text.split("AND|OR"); 

Result : cities = {"ISTANBUL ", " NEW YORK ", " PARIS ", " TOKYO ", " MOSCOW"}

ÖMER TAŞCI
  • 498
  • 5
  • 8
4

I'd use Apache Commons:

import org.apache.commons.lang3.StringUtils;

private void getId(String pdfName){
    String[] tokens = StringUtils.split(pdfName, "-.");
}

It'll split on any of the specified separators, as opposed to StringUtils.splitByWholeSeparator(str, separator) which uses the complete string as a separator

Edd
  • 8,053
  • 14
  • 45
  • 71
4
String[] token=s.split("[.-]");
TylerH
  • 20,816
  • 57
  • 73
  • 92
Nitish
  • 41
  • 3
  • 12
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:48
4

pdfName.split("[.-]+");

  • [.-] -> any one of the . or - can be used as delimiter

  • + sign signifies that if the aforementioned delimiters occur consecutively we should treat it as one.

Trying
  • 13,590
  • 8
  • 67
  • 109
2

It's better to use something like this:

s.split("[\\s\\-\\.\\'\\?\\,\\_\\@]+");

Have added a few other characters as sample. This is the safest way to use, because the way . and ' is treated.

Pritam Banerjee
  • 16,584
  • 10
  • 80
  • 99
1

You may also specified regular expression as argument in split() method ..see below example....

private void getId(String pdfName){
String[]tokens = pdfName.split("-|\\.");
}
bummi
  • 26,839
  • 13
  • 60
  • 97
1

Try this code:

var string = 'AA.BB-CC-DD.zip';
array = string.split(/[,.]/);
Cody Gray
  • 230,875
  • 49
  • 477
  • 553
Reaper
  • 222
  • 2
  • 11
  • 2
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:49
0
s.trim().split("[\\W]+") 

should work.

pleft
  • 7,009
  • 2
  • 18
  • 42
sss
  • 1
  • 1
  • 2
    First, no, it does not work - maybe you can try it before posting? Then [this answer](https://stackoverflow.com/questions/5993779/use-string-split-with-multiple-delimiters#answer-13928086) is same as your - but working. Finally you should check your formating (_should work._). – Arount Oct 11 '17 at 23:19
  • 2
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:49
-1

If you know the sting will always be in the same format, first split the string based on . and store the string at the first index in a variable. Then split the string in the second index based on - and store indexes 0, 1 and 2. Finally, split index 2 of the previous array based on . and you should have obtained all of the relevant fields.

Refer to the following snippet:

String[] tmp = pdfName.split(".");
String val1 = tmp[0];
tmp = tmp[1].split("-");
String val2 = tmp[0];
...
UrsinusTheStrong
  • 1,205
  • 1
  • 16
  • 32
isometrik
  • 389
  • 2
  • 8
  • 19