Questions tagged [text-mining]

Refers to a subset of data mining concerned with extracting information from data in the form of text by recognizing patterns. The goal of text mining is often to classify a given document into one of a number of categories in an automatic way, and to improve this performance dynamically, making it an example of machine learning. One example of this type of text mining are spam filters used for email.

Text Mining is a process of deriving high-quality information from unstructured (textual) information. Possible applications for text-mining are

Comments of Survey responses
Customer messages, emails, complaints etc.
Investigating competitors by crawling their web sites

Fraud detection use text mining

I would like to find different patterns recognition algorithm to detect different type of fraud. I have 1 million unstructured text documents about the clients' information with metadata about the client name, viewers, location in the cloud. Here…

text-mining

asked Apr 29 '15 at 07:27

rockmerockme

votes

2 answers

Are there any annotators or Named Entity Recognition for license plate numbers?

Most vehicle license/number plate extractors I've found involve reading a plate from an image (OCR) but I'm interested in something that could tag instances of license plates in a body of text. Are there any such annotators out there?

text-mining

asked Jul 24 '14 at 00:01

howdyjessie

votes

2 answers

Sentiment Analysis of comments to understand support on a topic

I have downloaded comments from a website which asked people whether they supported or opposed the implementation of a certain political policy related to immigration. I would like to get any resources or ideas on how to extract aggregate…

text-mining

asked Jan 02 '16 at 04:45

ved

votes

3 answers

Regular Expressions in Word

I've used optical character recognition (OCR) on a historic directory, and am trying to clean up the text with Microsoft Word. Specifically, I need some help writing a Regular Expression to combine two lines together. For example something that…

text-mining

asked Jun 24 '15 at 20:05

user10322

votes

1 answer

How to approach automated text writing?

What are the tools, practices and algorithms used in automated text writing? For example, lets assume that I have access to wikipedia/wikinews and similar websites API and I would like to produce article about "Data Science with Python". I believe…

text-mining

asked May 05 '15 at 07:12

Damian Melniczuk

votes

1 answer

Reducing a page of content to a short paragraph

I remember years ago, Yahoo detailed how they were able to reduce a webpage down to a short paragrah of text succently summarising the content in sentences, as opposed to a list of keywords. What is this called? Are there any open / free code to do…

text-mining

asked Dec 26 '16 at 15:30

user3791372

votes

2 answers

Finding useful noun 2-grams?

Q: How can I find noun 2-grams in the English language (e.g., "roller coaster", "test tube")? Better yet, how can I find them with proportions? Ultimate goal: Generate distinct single images for each English letter-pair (e.g., "RC" -> "roller…

text-mining

asked Sep 25 '22 at 08:46

lowndrul

vote

1 answer

SUMMARIST: Automated Text Summarization

There is a text summarization project called SUMMARIST. Apparently it is able to perform abstractive text summarization. I want to give it a try but unfortunately the demo links on the website do not work. Does anybody have any information regarding…

text-mining

asked Aug 06 '14 at 09:02

Pasmod Turing

vote

1 answer

String similarity algorithms for string containment (rather than string equality)

I saw that there are a lot of string similarity algorithms to whether two strings are the same. I have a slightly different problem - I get two strings "a" and "b" and I need a similarity algorithm to whether "b" contains "a" ("b" is likely to…

text-mining

asked Dec 13 '20 at 17:17

Gilad Deutsch

vote

1 answer

Commercial Text Summarization Tools

I'm looking for commercial text summarization tools (APIs, Libraries,...) which are able to perform any of the following tasks: Extractive Multi-Document Summarization (Generic or query-based) Extractive Single-Document Summarization (Generic or…

text-mining

asked Jul 09 '14 at 11:05

Pasmod Turing

vote

1 answer

Guidelines for vocabulary sizes for BoW

I am currently trying to get a vocabulary for BoW-vector generation out of a set of 200k scientific abstracts. I do some basic filtering of tokens already like lowercasing, stop-word-removal, stemming, not taking tokens with size < 2, leaving…

text-mining

asked Jan 18 '19 at 14:09

Wolfone

vote

0 answers

How to do an automated SWOT analysis with Text Mining

has somebody an idea/approach how to do text mining for a SWOT analysis? (e.g. sentiment analysis) I need to assign categories (Strengths, Weaknesses, Opportunities, Risks) to words in a document and then rank them.

text-mining

asked May 27 '18 at 23:41

user1614738

vote

2 answers

Extracting popular keyword terms or topics from blog posts based on post usage

I have dataset of posts from blog and for each post I have the number of views. I want to extract the topics (or phrases) that made the posts with more views. I am planning divide all posts in two sets based on number of views (one set with low…

text-mining

asked Dec 25 '16 at 19:11

user3550351

votes

1 answer

I want to extract name from CV

I want to extract name from CV. I need high level if accuracy more than 95 %. I have started with taking assumptions that it is highly likely to be found in 10% top lines or if not there then in some section similar to Personal details. Can u plz …

text-mining

asked Jun 30 '15 at 03:22

Vipin Jain

-3

votes

1 answer

german gunning fog index function

I would like to analyse some text and most of my Reviews are german. Does anyone know if python has a good gunning fog index function for german language? I couldnt find anything best regards

text-mining

asked Apr 06 '18 at 13:36

Nika