Questions tagged [data-mining]

Data mining uses methods from artificial intelligence in a database context to discover previously unknown patterns. As such, the methods are usually unsupervised. It is closely related but not identical to machine learning. Key tasks of data-mining are cluster analysis, outlier detection and mining of association rules.

1184 questions
47
votes
5 answers

Lift measure in data mining

I searched many websites to know what exactly lift will do? The results that I found all were about using it in applications not itself. I know about the support and confidence function. From Wikipedia, in data mining, lift is a measure of the…
Nickool
  • 625
16
votes
10 answers

A survey of data-mining software tools

Although I was trained as an engineer, I find that I'm becoming more interested in data mining. Right now I'm trying to investigate the field further. In particular, I would like to understand the different categories of software tools that exist…
10
votes
1 answer

How can I find correlations between crashes and system environments?

In my free time, I am working on a small web based system which collects crash reports (but not other, non-crashing bug reports) that are sent from Delphi Windows applications. For troubleshooting, users would love to have a data-mining feature to…
mjn
9
votes
3 answers

Data Mining-- how to tell whether the pattern extracted is meaningful?

I am sure that everyone who's trying to find patterns in historical stock market data or betting history would like to know about this. Given a huge sets of data, and thousands of random variables that may or may not affect it, it makes sense to ask…
Graviton
  • 1,015
6
votes
1 answer

What are subjective interestingness measures?

I came across the word subjective interestingness measures in my book, where author says that: Subjective interestingness measures are based on user belief in the data. These measures find patterns interesting if they are unexpected …
Ant's
  • 539
5
votes
1 answer

Detect statistical anomalies

I've read these SO questions and this is not duplicate of…
4
votes
2 answers

How is the support in the Apriori algorithm calculated in the case of duplicates?

I am trying to implement Apriori algorithm. However I have a small doubt what to do when the same item appears more than once in the one basket,I have 2 transactions say T1 = {A,A,C} T2 = {A,X} What is the support of A ? Is it 3 or 2 ?
koool
  • 143
3
votes
0 answers

Can a different indexing approach improve the relevance and efficiency of search engine results?

I am creating a search engine, with the corpus consisting of websites crawled through a webcrawler (Apache Nutch). I need the query searches to be both fast and relevant. So far, I have been trying to develop a Latent Semantic Indexing system, but I…
3
votes
1 answer

Is there data science reading for general but educated public in Russian?

Can someone point me to a well written reading (preferably, up to 2 pages) in Russian on the subject of "what is data science?" The audience is educated people, but not specialists in statistics. The focus must be on applications, not theory and…
Aksakal
  • 61,310
3
votes
1 answer

Change and anomaly detection

I recently wrote a program that graphed data points so that a user could scroll through them and find "interesting" parts of the data. Now I am looking at ways to make it even simpler by making a table of values deemed to be interesting. These are…
Mat
  • 133
2
votes
1 answer

What is discriminative pattern mining?

I understand that pattern mining is finding frequent patterns in a given dataset. So, practically it is unsupervised learning. But what is discriminative pattern mining? Does it concern two (or more) labeled sets as in supervised learning? Please…
2
votes
0 answers

Choosing between Latent Dirichlet Allocation and Association Rule Mining

Data Mining and Statistical Analysis has a general discussion on stats vs data mining. If I may, narrow down the question a bit - are there any general demarcations that allows you to decide which approach is more suited for using - LDA or ARM?…
Tathagata
  • 471
2
votes
0 answers

What data mining methods can I use for a large (800x100000) matrix that don't require a super computer?

What data mining methods can I use for a large (800x100000) matrix that don't require a super computer? Because upon trying to do data mining on such data set, even basic PCA in sklearn runs into Memory Error (and I have up to 20GB I think). How can…
mavavilj
  • 4,109
2
votes
2 answers

Main challenges in data-mining

I am an occasional data-mining methods user and I simply wonder what are the main challenges and research directions for method designers. Thanks for your expertise, Peuhp
peuhp
  • 1
2
votes
2 answers

Which of these are the best Data Science courses for an recent graduate economist?

I have recently graduated from college in my country with a bachelor's degree in economics. In my country (Paraguay, a Latin American country) there is a big lack of data scientists/statiscians. I have now the temptation to enroll at the National…
JavierAl
  • 21
  • 1
1
2 3