9

I am currently working on a disease diagnosis system, it is a prototype based on one of my dissertation papers S-Approximation: A New Approach to Algebraic Approximation and S-approximation Spaces: A Three-way Decision Approach.

Up to now, I have used randomly generated datasets, most of them are toy examples which I have generated myself randomly. However, it would be great if I can access some disease and symptoms datasets, so I can test my system with real data.

So far, I have searched for months over the Internet, but the more I went, the less I found.

Cutting the story short, are there any freely available datasets in which for every disease x we have a set of symptoms like {a,b,c,d,e,f,g}.

Orophile
  • 1,751
  • 4
  • 11
  • 30
Ali Shakiba
  • 191
  • 1
  • 4
  • 1
    No real answer, but in case you have not found it yet: Disease Database is not open data, but I found the FAQ pretty helpful to learn about estimating the [in]probable availability of such lists. Maybe going via ICD-10 related DBs might yield something? – ojdo Aug 29 '15 at 13:07
  • Thanks for the comment. It is a good database, however, I might need to design a crawler to harvest the data at their site. Before that, I shall check their site for legal issues. – Ali Shakiba Aug 29 '15 at 18:48
  • I believe DiseaseDatabase.com is just based on UMLS which is semi-open depending on your definition. You do need to get a UMLS license but are allowed to use it for profit form what I recall. – Mark Silverberg Aug 29 '15 at 23:22
  • if you could restrict your definition of symptom to only symptoms necessitating some sort of medical utilization, you could use meps http://www.asdfree.com/search/label/medical%20expenditure%20panel%20survey%20%28meps%29 – Anthony Damico Aug 30 '15 at 13:12
  • @AliShakiba did you end up finding anything? – Avision Mar 19 '17 at 23:06
  • @Avision No unfortunately. – Ali Shakiba Mar 21 '17 at 08:32

2 Answers2

5

OpenDDX (Open disease x symptom data) is a project "about creating an open, reliable, global database that associates symptoms with diseases, for the good of everyone."
http://openddx.net/

Project on GitHub:
https://github.com/openddx

Oddx-arch's repo explains this thoroughly:
https://github.com/openddx/oddx-arch/wiki/OpenDDX:-an-open,-distributed-database-of-disease-symptom-observations

concept document (google docs):
https://docs.google.com/document/d/1T05Ao-7uVtOW0rjn_qZray9DK5VtJiIoZ7aU7Q6tA8k/edit#heading=h.xlsnjbxwmwpx

to clarify, I know they are attempting it, not sure what they've accomplished/if its a solution here....

albert
  • 11,885
  • 4
  • 30
  • 57
  • 2
    Thanks for the post. I've checked them all, again. However, they are just proposals and nothing is out there. – Ali Shakiba Aug 29 '15 at 18:46
1

It's incomplete, but NC DETECT is a well regarded disease surveillance system for emergency department and ambulance data. Their case definitions, both ICD and text/ chief complaint based, are online. Would take a bit of work to make a clean symptom database in the format you want, but might be a meaningful and useful subset. I work with that data fairly regularly. www.ncdetect.org