16

Although I was trained as an engineer, I find that I'm becoming more interested in data mining. Right now I'm trying to investigate the field further. In particular, I would like to understand the different categories of software tools that exist and which tools are notable in each category and why. (Note that I didn't say the "best" tools, just the notable ones lest we start a flame war.) Especially make note of the tools that are open-source and freely available - although don't take this to mean that I'm only interested in open-source and free.

10 Answers10

7

Have a look at

  • Weka (java, strong in classification)
  • Orange (python scripting, mostly classification)
  • GNU R (R language, somewhat vector table oriented, see the Machine Learning taskview, and Rattle UI)
  • ELKI (java, strong on clustering and outlier detection, index structure support for speedups, algorithm list)
  • Mahout (Java, belongs to Hadoop, if you have a cluster and huge data sets)

and the UCI Machine Learning Repository for data sets.

chl
  • 53,725
  • 1
    you could add Red-R to the list (kind of a clone of Orange in R): http://www.red-r.org/ – Amro Aug 22 '10 at 19:10
  • I've downloaded R and I am playing with it now. – John Berryman Aug 23 '10 at 11:52
  • @Amro Thanks! However, it is not available on Mac platform, unless I'm mistaking? – chl Aug 23 '10 at 18:37
  • I'm not a Mac user, but I think the Linux build could work for you (you need to manually install all python dependencies): http://www.red-r.org/forum/topic.php?id=22#post-76 – Amro Aug 24 '10 at 13:03
  • @Amro I'll give it a try; in the past I've been testing RAnalyticFlow (http://j.mp/bYF8xs) but did not get convinced: I am basically a CLI user :-) – chl Aug 25 '10 at 19:52
7

This is probably the most comprehensive list you'll find: mloss.org

  • It is focused on Machine-Learning though, which can be seen as a related field of data-mining, just like AI. Although commonly it is use synonymous, as "prediction" is one of the key challenges in data mining. But there is more than "learning" to data mining. – Has QUIT--Anony-Mousse Feb 09 '12 at 13:29
3

Rattle is a data mining GUI that provides a front end to a wide range of R packages.

Jeromy Anglim
  • 44,984
3

Have a look at KNIME.

Very easy to learn. With lots of scope for further progress. Integrates nicely with Weka and R.

radek
  • 1,397
2

There is ELKI, an open-source university project somewhat comparable to WEKA, but much stronger when it comes to clustering and outlier detection. WEKA actually isn't really data-mining, but machine learning software.

2

From the popularity perspective, this paper (2008) surveys top 10 algorithms in data mining.

ebony1
  • 2,203
2

RapidMiner (Java) [open source]

Amro
  • 488
1

There is this Red-R which has a nice GUI and visual programming interface. It make use of R to process the various data analysis.

Stat-R
  • 681
  • 6
  • 17
1

Rexer Anlaytics does a toolkit survey every year. KDnuggets has software descriptions by industry as well as intent.

dimitriy
  • 35,430
0

SQL Server Data Mining (SSDM) hasn't been updated in a long time, but it's still quite competitive if you're mining large relational databases and cubes. I'm slowly but systematically slogging my way through tests of as many mining tools as I can and SQL Server's Windows interface is the most productive and stable I've found to date (particularly when it comes to enterprise databases, some of which have surprisingly sloppy interfaces) despite its age. I'd prefer a modern Windows Presentation Foundation (WPF) interface but this is the next-best thing.

I wrote a whole series of detailed amateur tutorials on it titled A Rickety Stairway to SQL Server Data Mining, back when I was trying to acquire some basic mining skills. Despite my inexperience they are still useful in helping identify some of the "gotchas" in advance.