Questions tagged [text-mining]

Refers to a subset of data mining concerned with extracting information from data in the form of text by recognizing patterns. The goal of text mining is often to classify a given document into one of a number of categories in an automatic way, and to improve this performance dynamically, making it an example of machine learning. One example of this type of text mining are spam filters used for email.

Text Mining is a process of deriving high-quality information from unstructured (textual) information. Possible applications for text-mining are

  • Comments of Survey responses
  • Customer messages, emails, complaints etc.
  • Investigating competitors by crawling their web sites

More about text mining in below links.

573 questions
6
votes
3 answers

Fraud detection use text mining

I would like to find different patterns recognition algorithm to detect different type of fraud. I have 1 million unstructured text documents about the clients' information with metadata about the client name, viewers, location in the cloud. Here…
rockmerockme
  • 69
  • 1
  • 2
5
votes
2 answers

Are there any annotators or Named Entity Recognition for license plate numbers?

Most vehicle license/number plate extractors I've found involve reading a plate from an image (OCR) but I'm interested in something that could tag instances of license plates in a body of text. Are there any such annotators out there?
howdyjessie
  • 123
  • 5
3
votes
2 answers

Sentiment Analysis of comments to understand support on a topic

I have downloaded comments from a website which asked people whether they supported or opposed the implementation of a certain political policy related to immigration. I would like to get any resources or ideas on how to extract aggregate…
ved
  • 131
  • 2
3
votes
3 answers

Regular Expressions in Word

I've used optical character recognition (OCR) on a historic directory, and am trying to clean up the text with Microsoft Word. Specifically, I need some help writing a Regular Expression to combine two lines together. For example something that…
user10322
  • 39
  • 1
3
votes
1 answer

How to approach automated text writing?

What are the tools, practices and algorithms used in automated text writing? For example, lets assume that I have access to wikipedia/wikinews and similar websites API and I would like to produce article about "Data Science with Python". I believe…
Damian Melniczuk
  • 619
  • 4
  • 19
3
votes
1 answer

Reducing a page of content to a short paragraph

I remember years ago, Yahoo detailed how they were able to reduce a webpage down to a short paragrah of text succently summarising the content in sentences, as opposed to a list of keywords. What is this called? Are there any open / free code to do…
user3791372
  • 398
  • 2
  • 14
2
votes
2 answers

Finding useful noun 2-grams?

Q: How can I find noun 2-grams in the English language (e.g., "roller coaster", "test tube")? Better yet, how can I find them with proportions? Ultimate goal: Generate distinct single images for each English letter-pair (e.g., "RC" -> "roller…
lowndrul
  • 121
  • 3
1
vote
1 answer

SUMMARIST: Automated Text Summarization

There is a text summarization project called SUMMARIST. Apparently it is able to perform abstractive text summarization. I want to give it a try but unfortunately the demo links on the website do not work. Does anybody have any information regarding…
Pasmod Turing
  • 463
  • 2
  • 6
1
vote
1 answer

String similarity algorithms for string containment (rather than string equality)

I saw that there are a lot of string similarity algorithms to whether two strings are the same. I have a slightly different problem - I get two strings "a" and "b" and I need a similarity algorithm to whether "b" contains "a" ("b" is likely to…
1
vote
1 answer

Commercial Text Summarization Tools

I'm looking for commercial text summarization tools (APIs, Libraries,...) which are able to perform any of the following tasks: Extractive Multi-Document Summarization (Generic or query-based) Extractive Single-Document Summarization (Generic or…
Pasmod Turing
  • 463
  • 2
  • 6
1
vote
1 answer

Guidelines for vocabulary sizes for BoW

I am currently trying to get a vocabulary for BoW-vector generation out of a set of 200k scientific abstracts. I do some basic filtering of tokens already like lowercasing, stop-word-removal, stemming, not taking tokens with size < 2, leaving…
Wolfone
  • 113
  • 4
1
vote
0 answers

How to do an automated SWOT analysis with Text Mining

has somebody an idea/approach how to do text mining for a SWOT analysis? (e.g. sentiment analysis) I need to assign categories (Strengths, Weaknesses, Opportunities, Risks) to words in a document and then rank them.
1
vote
2 answers

Extracting popular keyword terms or topics from blog posts based on post usage

I have dataset of posts from blog and for each post I have the number of views. I want to extract the topics (or phrases) that made the posts with more views. I am planning divide all posts in two sets based on number of views (one set with low…
0
votes
1 answer

I want to extract name from CV

I want to extract name from CV. I need high level if accuracy more than 95 %. I have started with taking assumptions that it is highly likely to be found in 10% top lines or if not there then in some section similar to Personal details. Can u plz …
-3
votes
1 answer

german gunning fog index function

I would like to analyse some text and most of my Reviews are german. Does anyone know if python has a good gunning fog index function for german language? I couldnt find anything best regards
Nika
  • 29
  • 2