Understanding basic statistics
From Pubdrug
Contents |
Introduction
There is a very basic knowledge of statistics required to interpret the clinical trial review pages on PubDrug. Most of the statistics reported will deal with p-values, confidence intervals, and risk reduction. You will also see things like hazard ratios and numbers needed to treat. These terms are explained in this article. From this basic understanding you can apply clinical trial reviews to everyday practice, understanding that though a treatment may be "statistically significant," it may not be "clinically significant," that is, have a noticeable impact on the care of your patients.
P-value
The p-value is the probability of obtaining a test statistic at least as extreme as the observed one, with the assumption that the null hypothesis (in common pharmacy literature, that the effect of treatment with a drug is not different from placebo or another drug) is true. The p-value can only reject or fail to reject the null hypothesis, not prove it to be true or false. The p-value is also not to be used to determine the strength of evidence for the alternative hypothesis. A "statistically significant" p-value, traditionally reported in medical literature as p<0.05, simply means that there is statistical evidence that there is a difference between two groups. It does not mean that the difference is especially large or especially clinically relevant.
The significance level of the test is not determined by the p-value of the test. Authors that report "highly" statistically significant data are misinterpreting the meaning of the p-value.
"Naked" p-values are p-values without effect sizes alongside them, and are to be interpreted with extreme caution. Small p-values do not necessarily denote "strong" statistical significance, as the p-value may be small as a result of a small sample size or low variability. In addition, one can eventually reject the null hypothesis with a large enough sample size, obtaining a statistical significance with a trivially small difference between groups.
Risk and risk reduction
Relative risk and relative risk reduction
In a clinical drug trial, the relative risk (RR) is the ratio of a medical event occurring (such as myocardial infarction or hip fracture) or developing or continued affliction with a disease or infection (such as diabetes mellitus or pneumonia) in a group of individuals using a therapy compared to the same event or disease occurring in a control group (usually taking a placebo). Mathematically, this can be represented as:
The more useful statistic derived from relative risk is relative risk reduction (RRR). Relative risk reduction is simply the percentage by which risk is reduced with a treatment relative to the rate of the event in the control group. It is represented mathematically by:
Example
A new (fictitious) drug called Glucogone is put to the test in a clinical trial. The primary endpoint of the trial is mortality from diabetes. Of the 1000 people who took Glucogone for five years, 70 people died from diabetes-related causes, while 210 out of 1000 people taking placebo died in this manner.
| Treatment | Subjects | Events | Event Rate | RR death | RRR in death |
|---|---|---|---|---|---|
| placebo | 1000 | 210 | 0.21 or 21% | 0.07/0.21 = 0.33 or 33% | 1-0.33 = 0.67 or 67% |
| Glucogone | 1000 | 70 | 0.07 or 7% |
So, in this example, we see that the event rates are used directly to calculate relative risk. In this case, there was a 21% event rate of death from a diabetes-related cause in the placebo group, and a 7% event rate in the Glucogone group. The relative risk of a patient dying while using Glucogone, therefore, is just 33% compared to placebo. Using the formula for relative risk reduction, we see that by using Glucogone, there is a 67% relative risk reduction in death compared to placebo.
Absolute risk reduction
Calculating absolute risk reduction (ARR) uses many of the same numbers used above in relative risk but instead measures the increase or decrease in rate of events in an intervention group compared to (but not relative to) a control or placebo group. It is represented mathematically by this equation:
Example
Using the same example from above, we can calculate the absolute risk reduction:
| Treatment | Subjects | Events | Event Rate | RRR in death | ARR in death |
|---|---|---|---|---|---|
| placebo | 1000 | 210 | 0.21 or 21% | 0.67 or 67% | 0.21-0.07 = 0.14 or 14% |
| Glucogone | 1000 | 70 | 0.07 or 7% |
The drug reduced the rate of death from diabetes-related causes 14% (21%-7%). This is compared to the relative risk reduction, which was 67% (1-(7%/21%)). The relative risk reduction can also be calculated from the absolute risk reduction by dividing the ARR by the event rate in the control group (14%/21% = 67%).
Number needed to treat
The number needed to treat (NNT) is the required number of patients to be given an intervention (treatment) in order to decrease the number of events seen by one. It is calculated as the inverse of the ARR:
Example
Using the same example from above, we can calculate the number needed to treat:
| Treatment | Subjects | Events | Event Rate | ARR in death | NNT |
|---|---|---|---|---|---|
| placebo | 1000 | 210 | 0.21 or 21% | 0.21-0.07 = 0.14 or 14% | 1/0.14 = 7.14 or 7 |
| Glucogone | 1000 | 70 | 0.07 or 7% |
We would need to treat approximately seven patients with Glucogone for five years (the duration of the study as described above) to see one fewer death from diabetes-related causes.
Number needed to harm
The number needed to harm (NNH) is the required number of patients to be given an intervention (treatment) to cause harm to one patient. It is rare that a journal article will calculate NNH, but it can be calculated as the inverse of the absolute risk increase (ARI), which is simply the absolute difference between the rate of an adverse event in the intervention group and the rate of the same adverse event in the control group.
The goal of any treatment is as high of an NNH as possible. If an NNH exceeds the drug's NNT, especially for a serious side effect, the risks of taking the drug outweigh the benefits, and the drug is generally not used. There are some examples of drugs that have a very low NNH but are used because the NNT is also low, such as warfarin and flecainide. Some chemotherapy medications may have a lower NNH than NNT but are still used because no other treatments are available.
An example of a drug that was removed from the market due to a low NNH is encainide. A 1991 study[1] in the New England Journal of Medicine found proarrhythmic effects of encainide and flecainide were more likely in the antiarrhythmic group than in the placebo group. Data from the study are presented below:
| Treatment | Subjects | Death/cardiac arrest | Event rate | ARI | NNH |
|---|---|---|---|---|---|
| placebo | 743 | 26 | 3.50% | 8.34%-3.50% = 4.85% | 1/0.485 = 20.6 or 21 |
| flecainide or encainide | 755 | 63 | 8.34% |
The NNH due to death or cardiac arrest specifically from arrhythmia was slightly higher at 28.
Example
Using our example above, let's say sexual dysfunction was seen in 7 subjects on Glucogone and 2 subjects using placebo:
| Treatment | Subjects | Sexual dysfunction | Event Rate | ARI | NNH |
|---|---|---|---|---|---|
| placebo | 1000 | 2 | 0.2% | 0.7%-0.2% = 0.5% | 1/0.005 = 200 |
| Glucogone | 1000 | 7 | 0.7% |
Thus, we would need to treat 200 patients with Glucogone for five years to see the instance of sexual dysfunction increase by one.
Confidence intervals
In statistics, a confidence interval (CI) represents the range of values for a population mean we would be confident in obtaining if we repeated the experiment an infinite number of times. Most often, there will be a number in percentage form before the abbreviation "CI" which tells us the level of confidence we have in obtaining that range of values. A confidence interval can be calculated by using the formula (1-α). For example:
Though the mean of this experiment was 0.53, we have 95% confidence in obtaining a mean value between 0.45 and 0.62 if we were to repeat the experiment.
In most studies there is a general rule that if the confidence interval does not include the number 1 in its range, the measurement is statistically significant. One exception is a confidence interval that is reporting percentages; in this case, if the confidence interval does not include 0%, the measurement is statistically significant.
Hazard ratios
The hazard ratio (HR) is the ratio between the predicted hazard of a member of one group and the member of another group, given the members of one group were given an experimental treatment. The hazard ratio gives information about the odds of a patient healing faster or living longer, but does not give information about how much faster the patient will heal or how much longer the patient will live.
A common assumption made is that the hazard ratio is constant over time when this may not be the case. In clinical trials, the hazard ratio indicates the relative chance of the outcome at any given point in time. For example, a hazard ratio of death of 0.5 would mean that patients taking the medicine have half as likely a chance of dying over the next set time point than those taking the placebo. It does not mean that the drug doubled life expectancy from the time it was started.

