How could you develop a data analysis system that handles discrepancies in medical records?

I loved this talk by John Wilbanks but I started wondering how you can use massive, open systems for looking at health data, when tiny variations in what/how it's reported could seriously alter the outputs.

I'm not a mathematician but would appreciate any pointers.


    Oct 22 2012: The variation could be accounted for as anomalies. The system could not be perfect because of the human element you mentioned. But, the benefit of the the massive data and the work that would be achieved would be beyond measure.

    In the USA HIPA and "personal privacy" would be the biggest hurdle to this idea. This could be easily resolved with a concerted effort by the medical community to obtain patience approval of the use of the anonymous data.

