0

I'm trying to split a paragraph into sentences. At the moment I'm splitting by . which works fine but I can't seem to get it to split correctly when there's either . or ? or !

So far my code is:

String[] sentences = everything.split("(?<=[a-z])\\.\\s+");

Thanks

dave
  • 11,173
  • 5
  • 46
  • 60
magna_nz
  • 1,206
  • 4
  • 19
  • 40

2 Answers2

2

If you don't want to remove ., !, ? from the results.

    String[] sentences = everything.split("(?<=[a-z][!?.])\\s+"); 
0

Use a character class, and you don't need the look behind - use a word boundary instead:

String[] sentences = everything.split("\\b[.!?]\\s+");

"[.!?]" means "either ., ! or ?". The word boundary \b requires that a word character precede the end of sentence char.

Bohemian
  • 389,931
  • 88
  • 552
  • 692