3

I am trying to come up with a platform which can Synthesize Quality Content among many articles found about some particular "topic" on the internet, the algorithm should be able to say recommend top 10 articles found on the internet about the topic. What algorithms and methods should be used to evaluate writing quality and give some kind of a score on many other parameters like "source credibility". I have taken care of crawling and finding out the content, looking for algorithms and methods to evaluate this. Please suggest resources(course,research papers etc..) for the same.

Payas Pandey
  • 131
  • 2
  • Are you trying to synthesize a new article from the corpus, or ranking the articles in it? – Emre Jun 07 '17 at 08:07
  • so you want retrieve related(semantically) documents for given topic right ??? – Abhishek Verma Jun 07 '17 at 11:39
  • @Emre, Priority is to rank the articles among the corpus. Then, if it is possible I would like to synthesize a new article which can work as a summary of say top 10 articles for a topic/query. – Payas Pandey Jun 09 '17 at 03:51
  • @AbhishekVerma, I want to "predict" say top 10 articles for a topic on the basis their content quality(How well it's written, is it off topic, the structure of this article, and more language related "features" which makes an article a good read ). I know features like "source credibility" and "newness of the article" will matter,but that i have sorted out by myself. – Payas Pandey Jun 09 '17 at 03:58
  • This problem is called https://en.wikipedia.org/wiki/Multi-document_summarization – Emre Jun 09 '17 at 04:03
  • @Emre, Does it covers "rating" the article on the basis of thier content? if not, please help me out because that is my priority over summarization. – Payas Pandey Jun 09 '17 at 04:35
  • @PayasPandey , Can you relate your problem with this answer ??? – Abhishek Verma Jun 09 '17 at 04:38
  • No, that's what you do after you've ranked the articles. A simple algorithm would be to search Google and take everything on the first page few pages, filtering out irrelevant results. – Emre Jun 09 '17 at 04:44

1 Answers1

2

"Source credibility" of Internet articles is best calculated through the Page Rank algorithm.

Algorithmically determining writing quality might be intractable. However Page Rank could be a proxy. If an article is a hub then it is the authority on the topic and can be assumed well written (or at least very useful).

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109