Defending Imperfect Data

“There’s something wrong with your data.  We’re not that bad.”

“Your data’s too old.”

“I don’t agree with your methodology.”

“These reports are useless because they don’t include data from our main competitor.”


These statements are typical of the objections hurled by various people within healthcare organizations, especially when the implications may be unpleasant.  How should the people responsible for generating these reports respond?


The first thing is to acknowledge the partial validity of their charges.  No data set, analytic approach, or methodology is perfect.  The overriding principle for minimizing negative reactions is to be careful about how your present the results.  You should portray them as indicators and starting points for discussion.  Acknowledge the possible limitations right from the start, and invite others to understand both the validity of the basic approach and the limitations.

Data enthusiasts can sometimes get so wrapped up in their analyses that they come across as if they believe the results are absolute truth.  “After all, it’s data.”

Here are some specific ideas:

Data Accuracy – To minimize the impact of imperfect data you should diligently work to bring it to the highest possible accuracy level.  Then invite others to review your data cleansing process and the data accuracy reports that quantify the possible errors.  Sometimes these reports point to systematic errors which, ideally, can be corrected before the reports are generated.  It’s unlikely that error rates in the 1% - 2% range would materially affect results.

Age of Data –  There is always a trade-off between timeliness and accuracy.  When I was Executive Vice President at Georgia Hospital Association, I led a data project that included a time lag of nearly 18 months.  This delay was necessary because of age-of-data requirements of anti-trust laws and the multiple data sets that had to close before the analysis could be completed.  Not surprisingly, we received some complaints about the time lag.  We worked hard to get the participating hospitals’ technical and financial staff to agree to compress the schedule, but they firmly believed that doing so would not allow for the incoming data to “stabilize” to the point for the results to be useful.

Methodology – People can have legitimate concerns about underlying methodology and logic.  If that’s the case, you should hold open discussions and see if there you can reach a resolution that satisfies all parties.  This may not always be possible, but if agreement can’t be reached, this should clue everyone in to the “firmness” with which you apply whatever conclusions the reports suggest.  The more disagreement, the less forceful.

Missing Participants – Comparative data reports are always most useful when all relevant parties participate.  If a main competing organization is missing, certainly that diminishes the reports’ value, but useful information can still be gleaned. 

Another Thought –  Remind everyone of the value of a series of reports over time.  Even if the data is not perfect, assuming data flaws remain constant from report to report, you can track performance trends – very helpful!


The key to maximizing support for your data results is to be humble and moderate the “fervor” with which you present your conclusions in light of possible data issues.  Reports and approaches that are fairly bulletproof can be used with a high degree of confidence.  To the degree that data issues exist, results should be handled more delicately.