1

I'm doing a PHP project (using Codeigniter) on text summarization and for that I need to extract sentences from content of a Rich TextBox (this content includes tags). Therefore is there a proper method or Codeigniter library to extract sentences from a content containing HTML tags?

Subhashi
  • 3,899
  • 1
  • 21
  • 21

2 Answers2

1

This technique is called as web-scraping

Have a look at this

Nullpointer
  • 1,038
  • 7
  • 18
1

A php function strip_tags() should help you. It returns string without php and html tags. If you just need to count sentences, you could do count(explode(". ", $text)) The delimiter is a typical end of a sentence.

Plain simple and limited, but doesn't require any libraries.

RobF
  • 2,598
  • 1
  • 19
  • 25
Tom
  • 3,564
  • 2
  • 15
  • 22
  • Thanks.. explode(". ", $text) can be used. Also it needs a little modification to check whether the "." represents an end of line or anything else such as a "." in fractional numbers. – Subhashi Jan 24 '14 at 13:54