This conversation is closed.

How could you develop a data analysis system that handles discrepancies in medical records?

I loved this talk by John Wilbanks but I started wondering how you can use massive, open systems for looking at health data, when tiny variations in what/how it's reported could seriously alter the outputs.

I'm not a mathematician but would appreciate any pointers.

  • thumb
    Oct 23 2012: This all has to start with security and privacy, because before you can even get to the point of analyzing discrepancies in medical records, you have to get those multiple medical records. How you get medical records from disparate sources securely, reliably, and in a truly scalable way that respects patient privacy is the hard part.

    The easy part is defining a common data standard like HL7, CCRs, and so on.

    Also relatively easy, but harder than the common data standard, is an algorithm to detect and report on discrepancies.

    But I think the hardest part is doing the security and privacy in a highly scalable way.
  • Oct 23 2012: Why would you want to do this? The variation between acts committed by staff with different approaches to the work is huge. Reducing human interactions to a set of numbers is fraught with issues. One simple and universal measure of health is blood pressure. These days, the staff attach a cuff, which may be the wrong size, attached over clothing, badly positioned, faulty in operation because it leaks slightly through poor connectors &c.

    The staff member may be using a mercury sphygmomanometer or not listening accurately to the various Korotkoff sounds (five of them) or they may be using palpation to assess the systolic blood pressure. All of this is before you plumb in the patient variation which may be because of body type, disease process, drug therapies, lifestyle or a combination of any or all of the foregoing factors.

    The reduction of patient data to numerical values is, in my opinion, an attempt at coding the impossible. We are gradually losing our clinical skills as more expert systems become available. Machines do not think for us because they can only present us with algorithms based upon accurately input data. People can evaluate data based upon any number of variables and still keep the original objective in mind.

    Expert systems, despite being programmed by experts, are a poor simulacrum of the way that experts really manage the huge numbers of different patient presentations. I have yet to use an expert system in 2012, that replicates the way that clinical experts think about the health of people who present to them. I would suggest that we need less data about health, not more. I want to see clinicians thinking about people as a unique set of circumstances, not reducing them to a one size fits all philosophy of treatment.
  • thumb
    Oct 22 2012: We would have to break the data down as a whole...and then incorporate variables we don’t know (corrupt data) that may impact any decision we come to based on the provided data.

    If we begin combining data on a global scale we would have a good pool of (valid data) to exclude (corrupt data) based on a system of (red flags) associated with known discrepancies in data reporting.

    If we combine data through a system that allows for source citing...or a system that labels data by source...we could pool data by validity based on (known variables) to increase accuracy.

    Data reporting would have to be regulated. A general "template" would have to exist. This would ensure that reports coming from agencies such as hospitals, clinics, and pharmacies would format data in a commonly understood way.

    We would have to define (valid data) before we can form solid assumptions relating to any information.

    (Valid Data) is subjective as (Valid Here) could be (invalid there).

    Reporting from an agency can be formatted and inaccurate at the same time. Therefore, not only do we need to ensure the data is complete....but we also need to consider the accuracy of the source's presentation.

    Could the accurately formatted report contain inaccurate information? Of course!

    So then how do we combat reports that look great....but may contain inaccurate information.

    We would have to consider the impact of each "data string" and how much weight it should have when formulating a "peak" or "assessment of all available information" for any given statistical question.

    REPORT 1 - HOSPITAL INCIDENT REPORT - 10/10/2012 - WRITTEN BY DR. Mark Broyles - Contains information regarding injury, treatment, and projected outcome of injury.

    Each piece of data in our string (Hospital - Incident Report - 10/10/2012 - Written by Dr - Dr. is Mark Broyles - Injury information - Treatment information - projected outcome (opinion)

    We would then have to account for the impact of each variable.
  • Oct 22 2012: You have pointed to a very serious problem for researchers trying to use large quantities of medical data.

    Medical data can have many different types of variances. Thermometers and blood pressure instruments can vary one from another. A thermometer might produce slightly different results when the room is warmer or cooler. Some people have a 'normal' temperature that is a bit above or below the average; if such a person is only measured when she/he is ill, the 'normal' temperature will remain unknown.

    I am not familiar with the actual techniques for using this type of data, but presumably there are statistical methods for determining significant variances, and there are formulas or algorithms for compensating.

    I question whether such a collection of data can produce definitive conclusions. First, there is the fact that the voluntary contributors are not necessarily representative of any group, certainly not the population as a whole. Second there are variances in the measurements due to instruments and operators. Third, there are variances in the people being measured. Fourth, there are misdiagnoses which could lead to comparing oranges to apples. Diagnosis is still more art than science. The data might be used for limited purposes, such as finding areas that need further research.
  • thumb
    Oct 22 2012: The variation could be accounted for as anomalies. The system could not be perfect because of the human element you mentioned. But, the benefit of the the massive data and the work that would be achieved would be beyond measure.

    In the USA HIPA and "personal privacy" would be the biggest hurdle to this idea. This could be easily resolved with a concerted effort by the medical community to obtain patience approval of the use of the anonymous data.