In my free time, I am working on a small web based system which collects crash reports (but not other, non-crashing bug reports) that are sent from Delphi Windows applications.
For troubleshooting, users would love to have a data-mining feature to find relationships between hardware or operating system versions and the specific bug and/or crash.
As an example how this should work:
- for every crash there is a report in the database, which has a fingerprint / hash code of the stack trace (call stack) at the moment of the crash to identify duplicates
- the algorithm checks if all duplicates of a bug report also have some other common attributes, for example a missing service pack of the operating system
- the analysis result lists all properties which the bug reports have in common
Let's assume these automatic bug reports contain all key information like the names of all processes which are currently running, file names, version information of loaded DLLs, etc.
How can I find correlations between repeated crashes and the environment? Are there specific algorithms or statistical methods that would help?