12

I have been using

explode(".",$mystring)

to split a paragraph into sentences. However this doen't cover sentences that have been concluded with different punctuation such as ! ? : ;

Is there a way of using an array as a delimiter instead of a single character? Alternativly is there another neat way of splitting using various punctuation?

I tried

explode(("." || "?" || "!"),$mystring)

hopefully but it didn't work...

hippietrail
  • 14,735
  • 16
  • 96
  • 147
Chris Headleand
  • 5,513
  • 16
  • 48
  • 66
  • use regular expression to match pattern and store the value in a variable , pass that variable as parameter to explode – sree May 08 '12 at 07:11
  • Take a look at http://stackoverflow.com/questions/5032210/php-sentence-boundaries-detection – Boby May 08 '12 at 07:13

8 Answers8

20

You can use preg_split() combined with a PCRE lookahead condition to split the string after each occurance of ., ;, :, ?, !, .. while keeping the actual punctuation intact:

Code:

$subject = 'abc sdfs.    def ghi; this is an.email@addre.ss! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);

Result:

Array
(
    [0] => abc sdfs.
    [1] => def ghi;
    [2] => this is an.email@addre.ss!
    [3] => asdasdasd?
    [4] => abc xyz
)

You can also add a blacklist for abbreviations (Mr., Mrs., Dr., ..) that should not be split into own sentences by inserting a negative lookbehind assertion:

$subject = 'abc sdfs.   Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);

Result:

Array
(
    [0] => abc sdfs.
    [1] => Dr. Foo said he is not a sentence;
    [2] => asdasdasd?
    [3] => abc xyz
)
Kaii
  • 19,215
  • 3
  • 35
  • 59
  • This helped me a lot. How about if sentence ends with number, such as: "This is my test 40. But also new one." – FosAvance Apr 07 '21 at 16:42
6

You can do:

preg_split('/\.|\?|!/',$mystring);

or (simpler):

preg_split('/[.?!]/',$mystring);
rdlowrey
  • 38,813
  • 9
  • 75
  • 98
codaddict
  • 429,241
  • 80
  • 483
  • 523
2

Assuming that you actually want the punctuations marks with the end result, have you tried:

 $mystring = str_replace("?","?---",str_replace(".",".---",str_replace("!","!---",$mystring)));
 $tmp = explode("---",$mystring);

Which would leave your punctuation marks in tact.

1
preg_split('/\s+|[.?!]/',$string);

A possible problem might be if there is an email address as it could split it onto a new line half way through.

Darren Burgess
  • 4,116
  • 6
  • 26
  • 41
0

Use preg_split and give it a regex like [\.|\?!] to split on

Ansari
  • 8,098
  • 2
  • 21
  • 34
0
$mylist = preg_split("/[.?!:;]/", $mystring);
Kaii
  • 19,215
  • 3
  • 35
  • 59
Michael Manoochehri
  • 7,621
  • 5
  • 32
  • 46
0

You can't have multiple delimiters for explode. That's what preg_split(); is for. But even then, it explodes at the delimiter, so you will get sentences returned without the punctuation marks. You can take preg_split a step farther and flag it to return them in their own elements with PREG_SPLIT_DELIM_CAPTURE and then run some loop to implode sentence and following punctation mark in the returned array, or just use preg_match_all();:

preg_match_all('~.*?[?.!]~s', $string, $sentences);
jankal
  • 1,030
  • 1
  • 10
  • 28
phpzag
  • 21
  • 2
0

You can try preg_split

$sentences = preg_split("/[.?!:;]+/", $mystring);

Please note this will remove the punctuations. If you would like to strip out leading or trailing whitespace as well

$sentences = preg_split("/[.?!:;]+\s+?/", $mystring);
Kaii
  • 19,215
  • 3
  • 35
  • 59
DJ Tarazona
  • 1,719
  • 14
  • 17