Correlation, Causation and Cash Collection

In most of our work, we’re not trying to prove complex statistical hypotheses. Usually, we just want to know what happened – which stores had too much overtime, which salespeople met their quota, which products drove the most profit. While none of these numbers are perfect (as discussed in point three of 7 Ways to Make Data Work for You – Ditch Your Decimals), they’re mostly accurate enough. After all, some calculations, like figuring out store sales numbers, is a fairly concrete activity. You count the cash, add the credit card charges and reconcile your deposits. But not everything is so straightforward, and we sometimes have to warn clients that yes, we can produce the report you want, but it may not tell you much. Or as one CIO client put it, “That report is vaporware.”

Take a collection agency as an example. Every month, their phone agents make thousands of calls to try and collect bad debt. Their system tracks, by agent and debtor, how many call attempts are made, how many message left, how many parties spoken to. The agency wants to use these numbers to figure out how affective each agent is. Can they?

What they’re trying to do is establish a link between agent calls and cash received. The problem is that cash can come in in two ways. One, the debtor can make a payment while on the phone with the collector – something which doesn’t happen often. Two, the debtor can make a payment sometime in the days after receiving the collector’s call. So, the agency uses a report to try and match the timing of calls with cash received, and then “credit” the appropriate agent. But as is quickly obvious, why these transactions happen is not so easy to decipher. Why did the debtor send the check?  Was he/she going to do it anyway? Did multiple agents leave messages over a period of time? Did the cumulative effect of calls make any difference?

The point here is that when companies try to link individual transactions to people, they often mistake correlation for causation. They end up saying this data point hooks to that data point and then proceed as if all the numbers are as certain and accurate as if counting products sold from a store.

In this type of scenario, it’s often better to look at patterns instead of individual transactions. In the case of the collection agency, it’s better to approach things from the debtor perspective than from an agent perspective. How many calls (on average) did debtors receive in the previous month before making a payment? Compare that number to the average number of calls to people who don’t make payments. The agency can then drill deeper – is there some connection between the timing of the calls and the receipt of cash? And they need to keep asking whether they are seeing real trends or just statistical noise.

This kind of thinking is often a stretch for folks coming from a back office background where numbers are more concrete, and where they can more accurately subscribe meaning to them. They’re used to counting, whether it’s finance or inventory. So when they come up against more amorphous information, the temptation is to keep on counting. To people in other fields, like marketing folks, this inability to ascribe causation is more obvious. Marketers already know the impossibility of saying this advertisement caused this sale.

Do you ever see people use data to make connections that the data can’t justify?



Contact Us:


Sign up for our reporting tips newsletter:

Your Name (required)


Your Email (required)


Subject Area:
 Data for Finance and Accounting Microsoft Business Intelligence (SSRS)


Input this code: captcha