1


I'm trying to build up a regular expression which splits a paragraph in sentences separated by a period (.). That should work:

String str[] = text.split("\\.");

However I'd need to add a minimum of robustness, for example checking that the period is followed by a space and an uppercase letter. So here's my next guess:

String text="The pen is on the table. The table has a pen upon it.";
String arr[] = text.split("\\. [A-Z]");

for (String s: arr)
    System.out.println(s);

Output:
The pen is on the table
he table has a pen upon it.

Unfortunately, I'm missing the first character after the period. Can you see any way it can be fixed?

Zabuzard
  • 23,461
  • 7
  • 54
  • 77
Francesco Marchioni
  • 3,888
  • 22
  • 36

1 Answers1

4

You can use a lookahead to see what is coming next in the string.

text.split("\\. (?=[A-Z])");
{ "The pen is on the table", "The table has a pen upon it." }

If you want to keep the periods as well, you can also use a lookbehind:

text.split("(?<=\\.) (?=[A-Z])");
{ "The pen is on the table.", "The table has a pen upon it." }
khelwood
  • 52,115
  • 13
  • 74
  • 94