Is there room for both the p-value and confidence interval?

Post date: Jun 20, 2015

In epidemiological circles, there’s a general consensus that the p-value is overly relied upon, and language concerning statistical significance should be minimized in deference to language of statistical precision and magnitude of effects. In fact, several pure epidemiology journals preclude reporting p-values under certain circumstances, such as inferring causality. Therefore, this age-old issue is probably more relevant to the majority of peer-reviewed health outcomes literature: clinically-oriented journals. Recognizing this, Grimes and Shultz (Lancet. 2002 Jan 5;359(9300):57-61.) advocated abandoning its use, as they conclude "testing null hypotheses at arbitrary p values of 0·05 has no basis in medicine and should be discouraged."

The arbitrary nature of a p-value is essentially trying to make something objective that is really subjective. Why not 4% or 6% alpha levels? I asked a statistician that question one time and he summed it up as people like things divisible by five. Sounds as plausible a reason as any other! So what’s the researcher supposed to do if he or she can’t/shouldn’t report the p-value? Just present the estimates and precision, and let the readers determine the merit. For example, consider these two statements of findings.

  1. We observed an OR of 1.5 (p=0.05), although it did not meet statistical significance and was a borderline effect.
  2. We observed an OR of 1.5 (95% CI: 1.0, 2.1).

Personally, I would much rather see the second statement. No hand waving at what the p-value means. How do we get away from this hand waving? I’m not entirely sure. Probably the peer review process is a good starting point. So if you're asked to do a peer review, think about the statistical language and if it is appropriate.

But then you may raise the question: "Should we let hypothesis testing go altogether?" Well, not really. I use p-values to help me make decisions about certain statistical procedures, but I don't overly depend on them. For example, let's say I'm considering which confounders to include in an analysis. I may make the claim I'll only include a confounder that is associated with the exposure or outcome at the p=0.20 level. BUT, if I have a strong theoretical basis for including it, or the magnitude of effect is enough (it changes the exposure/outcome relationship) I'll include it regardless of the p-value. Other tests (like if you're doing a cox proportional hazards, aka survival analysis) and you're looking to see if the proportional hazard assumption is violated will ONLY give you a p-value. Therefore this discussion is apropros to causal inference more than anything.

Another example. Let's say you conducted a study and found the association of smoking to lung cancer had a p-value of 0.30. Would you conclude they are unrelated, and refute all the previous evidence? Probably not. You would want to get at WHY this happened. Perhaps your sample was too small, or biased, or you mismeasured smoking or lung cancer. The precision around your estimates are one piece of evidence. Or, suppose you conducted a study and found the association of MMR vaccine and autism had a p-value of 0.001. Sounds pretty strong right? Well, again, what are the possible reasons this occurred? Merely stating the p-value and treating it as gospel is what most epidemiologists advocate moving away from.

As implied earlier, there are times you have to use a p-value... some tests only give you a p-value by which you judge the "significance" of the result. It's about probability (how likely to see your result or one more extreme) in that case. When you have the option to use a confidence interval as your measure of uncertainty, it just makes so much more sense. It lets the reviewer of the information know so much more. Consider these other examples:

  1. OR = 3.5, 95% CI: 2.7, 4.1
  2. OR=3.5, 95% CI: 1.1, 7.2
  3. OR = 3.5, p=0.02

All report the same OR of 3.5 - more than 3-fold an increase in “risk” in this made-up example. But the added information from the CI above is useful - the first one is a tighter CI around the estimate of 3.5, the second is a larger CI - potential “risk” extending all the way from just a little bit to over 7 times the risk. People are generally more comfortable with estimates that have tighter CI (usually means less variability and often comes with a larger sample size). But what is the P value of 0.02 telling me? Ok, the OR is "significant," but I have no idea the variance around the estimate, and I don't know if the OR is significant because it's a true association or because I have a huge sample size.

P-values provide us with some kind of measure by which to judge associations but we have definitely become highly dependent on them, and I don't think it's going to change, because it's much easier to teach someone "P value of < 0.05 = good" than to have to get into what a CI is, how to interpret it, etc. I would summarize by saying the p-value can be used for a superficial look at whether something is related, but you should dig further to try to explain the association.