I've read these SO questions and this is not duplicate of them
- https://stackoverflow.com/questions/1306785/best-way-to-statistically-detect-anomalies-in-data
- https://stackoverflow.com/questions/3531374/statistical-calculations
- https://stackoverflow.com/questions/2221984/algorithms-for-spotting-anomalies-spikes-in-traffic-data
I've collected statistical data about people
Data hierarchy is like this:
- Region
- Street
- Building number
- Entrance number
- [Statistical package]
- Street
[Statistical package] contains (in this example)
- floor (stock) number
- UUID (defining flat)
- Religion
- Appearance of toilete
What algorithm or procedure should I use to discover anomalies like:
or What statistical programming framework should I use?
(including what is best underlaying technology - like SQL or Document oriented DB, interpreted or compiled language, and so on)
1-a :: Only one floor (of every floors in building) has no toilets
1-b :: One flat (UUID) has no toilet although all other flats in entrance/building has at least on
2-a :: There is one flat claiming Religion X although whole Region has Religions Y and Z
2-b :: There is one building claiming Religion X although whole Region has Religions Y and Z
But this is only example on limited number of Statistical package attributes, I should find many types of anomalies on around 15 attributes in every Statistical package
Note: this question is not about how should I find anomalies for provided examples, those examples are just illustrative, I'm looking for common solution/algorithm
Thanks beforehand for any response