Leaving Data on the Battlefield

November 11, 2019
Dr. Kate Cordell

Every data point is sacred. Every data point is great. If a data point gets wasted, decisions deteriorate!

My team works to evaluate whether mental health programs make an impact on people’s lives. I lead a team of data sheros, and one thing I drill into my gals is that no soldier should ever be left on the battlefield. One common mistake early coders or even experienced coders make is not tracking every data point. Like soldiers on a battlefield, no data point should go unaccounted for. We may start with a dataset of 753,859 individuals who received health services, and we should end our analysis understanding why every single individual was included or excluded – for all the variety of reasons that that colors our complex data world. We may lose some soldiers to null values on joining criteria, and we may lose other for uninterpretable codes. Some have faulty spelling and others may have – well there is no telling…anything goes. I see this elegant code, four nested deep, and while it may seem quite fancy, it’s actually quite chancy!

By the time we have linked clients to services and services to programs, programs to agencies and agencies to organizations we expect to have lost some soldiers in battle, but we need to know where and why. We need to be able to account for every data point. Perhaps we lost 1,543 for missing identifiers and 6,754 for birth dates out of range. At the end of the analysis, we may create our dashboard on only 567,093 individuals, but understanding the data we exclude is as important as understanding the data we keep. In essence, we have gone from analyzing a population to analyzing a sample, and we need to know if our sample is biased or representative.

So, one common mistake I have seen is forgetting about the soldiers left behind. On this Veteran’s Day, let’s celebrate the importance of all soldiers, in our data and in our country.