TED Conversations

This conversation is closed.

How could you develop a data analysis system that handles discrepancies in medical records?

I loved this talk by John Wilbanks but I started wondering how you can use massive, open systems for looking at health data, when tiny variations in what/how it's reported could seriously alter the outputs.

I'm not a mathematician but would appreciate any pointers.


Showing single comment thread. View the full conversation.

  • thumb
    Oct 22 2012: We would have to break the data down as a whole...and then incorporate variables we don’t know (corrupt data) that may impact any decision we come to based on the provided data.

    If we begin combining data on a global scale we would have a good pool of (valid data) to exclude (corrupt data) based on a system of (red flags) associated with known discrepancies in data reporting.

    If we combine data through a system that allows for source citing...or a system that labels data by source...we could pool data by validity based on (known variables) to increase accuracy.

    Data reporting would have to be regulated. A general "template" would have to exist. This would ensure that reports coming from agencies such as hospitals, clinics, and pharmacies would format data in a commonly understood way.

    We would have to define (valid data) before we can form solid assumptions relating to any information.

    (Valid Data) is subjective as (Valid Here) could be (invalid there).

    Reporting from an agency can be formatted and inaccurate at the same time. Therefore, not only do we need to ensure the data is complete....but we also need to consider the accuracy of the source's presentation.

    Could the accurately formatted report contain inaccurate information? Of course!

    So then how do we combat reports that look great....but may contain inaccurate information.

    We would have to consider the impact of each "data string" and how much weight it should have when formulating a "peak" or "assessment of all available information" for any given statistical question.

    REPORT 1 - HOSPITAL INCIDENT REPORT - 10/10/2012 - WRITTEN BY DR. Mark Broyles - Contains information regarding injury, treatment, and projected outcome of injury.

    Each piece of data in our string (Hospital - Incident Report - 10/10/2012 - Written by Dr - Dr. is Mark Broyles - Injury information - Treatment information - projected outcome (opinion)

    We would then have to account for the impact of each variable.

Showing single comment thread. View the full conversation.