I am an occasional data-mining methods user and I simply wonder what are the main challenges and research directions for method designers.
Thanks for your expertise,
Peuhp
I am an occasional data-mining methods user and I simply wonder what are the main challenges and research directions for method designers.
Thanks for your expertise,
Peuhp
This is a bit reductionist, but for me the end goal of data mining is build a robust classifier. Most problems suffer from information obesity so you have to be able to capture the relevant information and simplify it for interpretation by humans. Having computers that think like humans is still a long shot, but that is the trend to follow.
As such, I believe that the major challenges are in the field of pattern recognition.
I think that one very important application of data mining is genetic exploration. With microarrays there are huge challenges regarding design of the experiment, the data collection and transformations of the data (general subtracting baseline noise and scaling). But in the area of inference methods the biggest problem is the multiplicity issue with hypothesis tests. Simultaneously testing thousands of hypotheses leads to a lot of false alarms. Adjust by controlling the false discovery rate is one step in the right direction. Terry Speed, Brad Efron and others have been wotking hard on the microarray problem. Efron even has a recent monograph out titled "Large Scale Inference" and an annual conference on this topic was started last year. An empirical Bayes method is an approach that Efron recommends and illustrates in his monograph.