Data mining uses methods from artificial intelligence in a database context to discover previously unknown patterns. As such, the methods are usually unsupervised. It is closely related but not identical to machine learning. Key tasks of data-mining are cluster analysis, outlier detection and mining of association rules.
Questions tagged [data-mining]
1184 questions
47
votes
5 answers
Lift measure in data mining
I searched many websites to know what exactly lift will do? The results that I found all were about using it in applications not itself.
I know about the support and confidence function. From Wikipedia, in data mining, lift is a measure of the…
Nickool
- 625
16
votes
10 answers
A survey of data-mining software tools
Although I was trained as an engineer, I find that I'm becoming more interested in data mining. Right now I'm trying to investigate the field further. In particular, I would like to understand the different categories of software tools that exist…
John Berryman
- 379
10
votes
1 answer
How can I find correlations between crashes and system environments?
In my free time, I am working on a small web based system which collects crash reports (but not other, non-crashing bug reports) that are sent from Delphi Windows applications.
For troubleshooting, users would love to have a data-mining feature to…
mjn
9
votes
3 answers
Data Mining-- how to tell whether the pattern extracted is meaningful?
I am sure that everyone who's trying to find patterns in historical stock market data or betting history would like to know about this. Given a huge sets of data, and thousands of random variables that may or may not affect it, it makes sense to ask…
Graviton
- 1,015
6
votes
1 answer
What are subjective interestingness measures?
I came across the word subjective interestingness measures in my book, where author says that:
Subjective interestingness measures are based on user belief in the
data. These measures find patterns interesting if they are unexpected
…
Ant's
- 539
5
votes
1 answer
Detect statistical anomalies
I've read these SO questions and this is not duplicate of…
Marek Sebera
- 151
4
votes
2 answers
How is the support in the Apriori algorithm calculated in the case of duplicates?
I am trying to implement Apriori algorithm. However I have a small doubt what to do when the same item appears more than once in the one basket,I have 2 transactions say
T1 = {A,A,C}
T2 = {A,X}
What is the support of A ? Is it 3 or 2 ?
koool
- 143
3
votes
0 answers
Can a different indexing approach improve the relevance and efficiency of search engine results?
I am creating a search engine, with the corpus consisting of websites crawled through a webcrawler (Apache Nutch). I need the query searches to be both fast and relevant. So far, I have been trying to develop a Latent Semantic Indexing system, but I…
Sidd Singal
- 171
3
votes
1 answer
Is there data science reading for general but educated public in Russian?
Can someone point me to a well written reading (preferably, up to 2 pages) in Russian on the subject of "what is data science?"
The audience is educated people, but not specialists in statistics. The focus must be on applications, not theory and…
Aksakal
- 61,310
3
votes
1 answer
Change and anomaly detection
I recently wrote a program that graphed data points so that a user could scroll through them and find "interesting" parts of the data.
Now I am looking at ways to make it even simpler by making a table of values deemed to be interesting. These are…
Mat
- 133
2
votes
1 answer
What is discriminative pattern mining?
I understand that pattern mining is finding frequent patterns in a given dataset. So, practically it is unsupervised learning. But what is discriminative pattern mining? Does it concern two (or more) labeled sets as in supervised learning? Please…
Prometheus
- 898
2
votes
0 answers
Choosing between Latent Dirichlet Allocation and Association Rule Mining
Data Mining and Statistical Analysis has a general discussion on stats vs data mining. If I may, narrow down the question a bit - are there any general demarcations that allows you to decide which approach is more suited for using - LDA or ARM?…
Tathagata
- 471
2
votes
0 answers
What data mining methods can I use for a large (800x100000) matrix that don't require a super computer?
What data mining methods can I use for a large (800x100000) matrix that don't require a super computer?
Because upon trying to do data mining on such data set, even basic PCA in sklearn runs into Memory Error (and I have up to 20GB I think).
How can…
mavavilj
- 4,109
2
votes
2 answers
Main challenges in data-mining
I am an occasional data-mining methods user and I simply wonder what are the main challenges and research directions for method designers.
Thanks for your expertise,
Peuhp
peuhp
- 1
2
votes
2 answers
Which of these are the best Data Science courses for an recent graduate economist?
I have recently graduated from college in my country with a bachelor's degree in economics. In my country (Paraguay, a Latin American country) there is a big lack of data scientists/statiscians. I have now the temptation to enroll at the National…
JavierAl
- 21
- 1