I have a text. I split it into sentences and words. Next I must split it on tokens(,,.,?,!, ...) And I have a trouble here. Can you advise me which regex choose?
This is my code which split text into sentences and words.
String s = ReadFromFile();
String sentences[] = s.split("[.!?]\\s*");
String words[][] = new String[sentences.length][];
for (int i = 0; i < sentences.length; ++i)
{
words[i] = sentences[i].split("[\\p{Punct}\\s]+");
}
System.out.println(Arrays.deepToString(words));
So, I have a separete array of sentences and array of words. But with tokens I have a problem.
Input data
Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:
Expected result
. : , :