Fragility Index Calculator
Calculates the number of patients required to lose statistical significance
RESULTS
A fragility index of 8 indicates that if 8 patients in the experimental group were "converted" from NOT having the primary endpoint to HAVING the primary endpoint, the study would lose statistical significance (p > 0.05). The higher the fragility index, the more robust the results of a study are. Learn more about an "acceptable" fragility index.
Control group with outcome (N) |
50 |
|
50 |
Control group without outcome (N) |
450 |
|
450 |
Experimental group with outcome (N) |
25 |
+ 8 |
33 |
Experimental group without outcome (N) |
475 |
- 8 |
467 |
P value |
0.004 |
|
0.066 |
About This Calculator
The fragility index is a measure of the robustness (or fragility) of the results of a clinical trial. The fragility index is a number indicating how many patients would be required to convert a trial from being statistically significant to not significant (p ≥ 0.05). The larger the fragility index the better (more robust) a trial's data are. The intent of the fragility index is to be used in conjunction with the P value, 95% confidence interval, and various measures describing benefit or risk (relative risk reduction, absolute risk reduction, etc).
How Is the Fragility Index Calculated?
The fragility index is calculated by converting one patient in the group (control or experimental group) from a "non-event" to an "event" outcome and recalculating a two-sided Fisher's exact test until the P value meets or exceeds 0.05. In essence, the calculation describes how many patients would have had to have a different outcome (within the group with the fewest number of events) to make a study's results not statistically significant.
What Is an "Acceptable" Fragility Index?
There is no specific fragility index that is accepted as being a "good" (robust) or "bad" (fragile) value. In the manuscript introducing the concept, a convenience sample of 399 randomized controlled trials with statistically significant results were analyzed from five high-impact journals.1 This study found the following results, which serves to describe the landscape of fragility index among clinical trials that in general are considered high-quality or "robust" studies:
- The median fragility index was 8 (IQR 3 to 18)
- 25% of trials had a fragility index ≤ 3
- 10% of trials had a fragility index of zero (see Fragility Index of Zero for an explanation)
- When loss to follow-up was reported, 52.9% of trials had a larger fragility index compared to the number of patients lost to follow-up
At least two other publications have summarized the fragility index within other areas of medicine:
Median Fragility Index in RCTs
Error bars in the graph above indicate the interquartile range (IQR) of each publication
- Walsch 20141: Included 399 randomized controlled trials (RCTs) from five high-impact journals
- Evaniew 20152: Included 40 orthopedic spine surgery RCTs with a variety of different endpoints
- Ridgeon 20163: Included 56 critical care RCTs with an endpoint of mortality
Aside from comparing a study's fragility index to the fragility index of other trials, another method of interpreting a fragility index is to compare the value against the number of patients lost to follow-up. As a general rule of thumb, if the number of patients lost to follow-up is greater than the fragility index, the study should be considered less robust. The three previously mentioned publications also reported the percent of trials in which the number lost to follow-up exceeded the fragility index:
Trials with Higher Fragility Index than Number Lost to Follow-Up
- Walsch 20141: Included 399 randomized controlled trials (RCTs) from five high-impact journals
- Evaniew 20152: Included 40 orthopedic spine surgery RCTs with a variety of different endpoints
- Ridgeon 20163: Included 56 critical care RCTs with an endpoint of mortality
Fragility Index of Zero
By definition, the fragility index is calculated using a Fisher's exact test.1 Other methods, such as a Chi squared test, are commonly used to compare a dichotomous (binary) outcome in clinical trials. Particularly for small trials, the P value from a Fisher's exact test can be discrepant from a Chi squared test. In the cases where a Fisher's exact test produces a non-significant P value (without "converting" a patient from a non-event to an event), the fragility index is reported as '0' and clearly emphasizes the lack of robustness of the trial data.
Limitations of the Fragility Index
Because of how a fragility index is calculated, there are several inherent limitations to its use:
- Use of Fisher's exact test for statistical significance - In the manuscript describing the fragility index,1 the authors decided to use a Fisher's exact test for the calculation procedure. In general, this tends to be more conservative (more prone to a type II error) than other methods, such as a Chi squared test. This difference in statistical test can lead to a fragility index of zero even though the original study showed statistically significant results.
- Only appropriate for dichotomous (binary) outcomes - The fragility index cannot be applied to an outcome that is a continuous variable. For example, a primary endpoint using a modified Rankin scale (a scale measuring death or disability between 0 and 6) is not appropriate for a Fisher's exact test (and thus not appropriate for a fragility index calculation). With that said, however, some studies do dichotomize a continuous endpoint. For example, an endpoint of "good outcome" (modified Rankin scale of 0 to 3) versus a "bad outcome" (modified Rankin 4 to 6) converts a continuous variable to a binary variable, in which case a fragility index can be calculated.
- Possibly not appropriate for time-to-event outcomes - Although many time-to-event outcomes are usually dichotomous (eg, mortality), the fragility index does not account for the difference in outcome over time. Particularly in longer studies with variable follow-up time periods and treatment durations, an analysis accounting for time (such as a Kaplan-Meier curve or Cox proportional hazards model) are more appropriate than a simple binary outcome analysis without accounting for time.
- No standard fragility index "cut-off"As discussed above (What Is an "Acceptable" Fragility Index?), there is no specific cut-off or lower limit of the fragility index to classify a study as "fragile" or "robust". To a large extent, the interpretation of the fragility index and the interpretation of the index in conjunction with the trial's results are inherently subjective.
References and Additional Reading
- Walsh M, Srinathan SK, McAuley DF, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. J Clin Epidemiol. 2014 Jun;67(6):622-8. doi: 10.1016/j.jclinepi.2013.10.019. PMID 24508144
- Evaniew N, Files C, Smith C, et al. The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey. Spine J. 2015 Oct 1;15(10):2188-97. doi: 10.1016/j.spinee.2015.06.004. PMID 26072464
- Ridgeon EE, Young PJ, Bellomo R, et al. The Fragility Index in Multicenter Randomized Controlled Critical Care Trials. Crit Care Med. 2016 Jul;44(7):1278-84. doi: 10.1097/CCM.0000000000001670. PMID 26963326