Jul 16

Jul 16 Making meaning out of workforce data…when is a difference not a difference?

David Schmidtchen

Author: A.J. (Tony) Cotton, AM

In an earlier post the question was asked: how do you know that a difference between two numbers is a real difference?

In this case the word ‘real’ has a number of meanings, but the two important ones are:

Is it likely that this difference in scores has occurred purely by chance and has nothing to do with what we are measuring? (This is interesting but alone it is not enough.)
Does this difference have any practical meaning? If I act on this information can I reasonably expect it to result in an effect? (This is where the statistical finding can start to inform decision making)

Let’s have a look at the first of these.

The term ‘statistically significant’ is an often used but poorly understood term when used in to describe HR or workforce data. This is particularly the case when the discussion turns attitude and opinion data that is available from staff surveys.

Mostly, problems arise because statistical significance is the basis on which we determine whether a difference is ‘real’ or not. For some HR and line managers, statistical significance gives them permission to make a decision. The term statistical significance refers to whether the difference we are seeing has occurred purely by chance - statistical significance testing allows us to make a judgement around whether our results are better than chance or not.

There are a three elements that contribute to determining whether or not a difference is statistically significant:

the amount of data you have,
your risk tolerance, and
the actual size of the difference (or the effect) that you are investigating.

First, the amount of data you have is important. Typically statistics are numbers calculated from a sample and we try to predict what the statistic would be in another group. So, for example, we conduct a staff survey and we get a good response rate of 60%. We then calculate statistics from this data, for example, we might look at average employee engagement levels as an estimate engagement levels in the entire workforce. Given this, the more data you have the closer your estimate will come to reality. If, for example, you get a 100% response rate to the survey you are no longer making an estimate, you are now just describing your workforce. So, we might be able to say that 44% of the workforce are happy with their benefits compared with 43% in the previous year. However, if we wanted to estimate the level of satisfaction with benefits in the workforce next year, that’s a different story.

For a statistical calculation, more means more in an absolute sense, so having 50 responses to your survey is much better than only having 20. However, a complication arises when you’re looking at survey results for a small group, of let’s say, 25 employees. In this case, 20 responses represents an 80% response rate which allows you to make estimates about the whole workforce with great confidence, but makes statistical calculations difficult. Interestingly, the reverse is also true, I did some work once where we had over 1,000 responses to a survey but it represented less than 2% of the workforce. So, I could calculate a lot of statistics but I couldn't have any confidence that the conclusions represented the views of the entire workforce (there are ways around this that weren't available to me at the time).

The second element is your risk tolerance. Risk tolerance is perhaps the least well understood element of the statistical significance equation. Interestingly, I think part of the reason for this is that it is the element that is given the least attention when statistics is taught. When someone declares that a result is statistically significant they will almost certainly have applied a somewhat arbitrary cut-off in their decision making that means that the result they have found has a less than 5% (sometimes less than 1%) likelihood of the result having occurred by chance. To put it another way, they think there is a one in twenty chance they are wrong. This is the accepted (and quite arbitrary) standard for assessing the risk that the result is wrong.

Now, for a scientist who is looking to prove a fundamental law of nature a 5% or 1% chance of being wrong seems pretty reasonable limit but for a manager who is looking for guidance on what HR strategies to implement a 10% (one in ten) chance of being wrong may be an acceptable assessment of risk. Using this cut-off, nine times our of ten the manager will make the “right” decision. The arbitrary 5% cut-off for statistical significance has been used for many years and has become, sometimes unthinkingly, the standard for management research. To overcome this, some academics are now reporting what are called ‘p-values’. So, rather than just saying that result is ‘significant’ you also might see something like “p = 0.0753” reported with a statistic. This simply means that there is a 7.53% chance of being wrong (still a pretty good average!). The p-value adds a little bit more rigour to the statement of statistical significance.

For the manager, the ‘so what’ of all this is that it is worthwhile asking ‘how significant’ or asking to see the p-value when you are told that a particular statistic is significant. Not only will this impress (or scare) your data analyst, it will give you the opportunity to exercise your risk tolerance, rather than rely on an arbitrary research standard or statistical habit that has been developed for other uses. Managers make decisions about risk tolerance. You can make a decision that you are happy with a result that has p = 0.104 (a one in ten chance of being wrong) but it is the manager’s call.

The final piece of the ‘significance’ puzzle is the size of the effect that you expect to see. HR and workforce data is often quite complex and the effect of one variable on another is not likely to be large. Let me give you an example. In some parts of the management literature there are strong claims being made about the impact that employee engagement on employee absence due to sickness and how if managers drive up employee engagement they will reduce employee absence.

I've done a lot of work trying to understand this and what I've found is that there is a relationship between engagement and sickness absence but (statistically speaking) it is not very strong. Which makes sense when you think about all of the other things that impact employee absence from work for sickness (base level of health, family support, workplace policies on attending when unwell, etc.). Fortunately, I was working with a rather large data set so the relationship was statistically significant, but in a smaller data set this might not have been the case. This doesn't mean that this relationship only occurs in large data sets, and not in small data sets, it just means that I could detect the relationship in this data set because of the amount of data I had overcame the relative insensitivity of the statistical test I was using, which was a pretty standard one.

So, what does this all mean for a manager – well, just because a result isn't statistically significant doesn't mean it doesn't exist, it might mean the statistical test might not be sensitive enough, or you might not have enough data to detect the effect. Which doesn't mean that we ignore the results of the statistical tests –a very likely cause of a non-significant test is the effect doesn't exist – but nor does it mean that we should stop pursuing what appears to be a sensible line of thought because of a statistical test result, but we might look for other evidence of the existence of the relationship.
Which brings me to the other aspect of a real difference – does the difference have any practical meaning?