Fragility Index Calculator

Calculates the number of patients required to lose statistical significance

### Study Data

The fragility index is a measure of the robustness (or fragility) of the results of a clinical trial. The fragility index is a number indicating how many patients would be required to convert a trial from being statistically significant to not significant (p ≥ 0.05). The larger the fragility index the better (more robust) a trial's data are. The intent of the fragility index is to be used in conjunction with the P value, 95% confidence interval, and various measures describing benefit or risk (relative risk reduction, absolute risk reduction, etc).

#### How Is the Fragility Index Calculated?

The fragility index is calculated by converting one patient in the group (control or experimental group) from a "non-event" to an "event" outcome and recalculating a two-sided Fisher's exact test until the P value meets or exceeds 0.05. In essence, the calculation describes how many patients would have had to have a different outcome (within the group with the fewest number of events) to make a study's results not statistically significant.

#### What Is an "Acceptable" Fragility Index?

There is no specific fragility index that is accepted as being a "good" (robust) or "bad" (fragile) value. In the manuscript introducing the concept, a convenience sample of 399 randomized controlled trials with statistically significant results were analyzed from five high-impact journals.1 This study found the following results, which serves to describe the landscape of fragility index among clinical trials that in general are considered high-quality or "robust" studies:

• The median fragility index was 8 (IQR 3 to 18)
• 25% of trials had a fragility index ≤ 3
• 10% of trials had a fragility index of zero (see Fragility Index of Zero for an explanation)
• When loss to follow-up was reported, 52.9% of trials had a larger fragility index compared to the number of patients lost to follow-up

At least two other publications have summarized the fragility index within other areas of medicine:

#### Median Fragility Index in RCTs

Error bars in the graph above indicate the interquartile range (IQR) of each publication

• Walsch 20141: Included 399 randomized controlled trials (RCTs) from five high-impact journals
• Evaniew 20152: Included 40 orthopedic spine surgery RCTs with a variety of different endpoints
• Ridgeon 20163: Included 56 critical care RCTs with an endpoint of mortality

Aside from comparing a study's fragility index to the fragility index of other trials, another method of interpreting a fragility index is to compare the value against the number of patients lost to follow-up. As a general rule of thumb, if the number of patients lost to follow-up is greater than the fragility index, the study should be considered less robust. The three previously mentioned publications also reported the percent of trials in which the number lost to follow-up exceeded the fragility index:

#### Trials with Higher Fragility Index than Number Lost to Follow-Up

• Walsch 20141: Included 399 randomized controlled trials (RCTs) from five high-impact journals
• Evaniew 20152: Included 40 orthopedic spine surgery RCTs with a variety of different endpoints
• Ridgeon 20163: Included 56 critical care RCTs with an endpoint of mortality

#### Fragility Index of Zero

By definition, the fragility index is calculated using a Fisher's exact test.1 Other methods, such as a Chi squared test, are commonly used to compare a dichotomous (binary) outcome in clinical trials. Particularly for small trials, the P value from a Fisher's exact test can be discrepant from a Chi squared test. In the cases where a Fisher's exact test produces a non-significant P value (without "converting" a patient from a non-event to an event), the fragility index is reported as '0' and clearly emphasizes the lack of robustness of the trial data.

#### Limitations of the Fragility Index

Because of how a fragility index is calculated, there are several inherent limitations to its use:

1. Use of Fisher's exact test for statistical significance - In the manuscript describing the fragility index,1 the authors decided to use a Fisher's exact test for the calculation procedure. In general, this tends to be more conservative (more prone to a type II error) than other methods, such as a Chi squared test. This difference in statistical test can lead to a fragility index of zero even though the original study showed statistically significant results.
2. Only appropriate for dichotomous (binary) outcomes - The fragility index cannot be applied to an outcome that is a continuous variable. For example, a primary endpoint using a modified Rankin scale (a scale measuring death or disability between 0 and 6) is not appropriate for a Fisher's exact test (and thus not appropriate for a fragility index calculation). With that said, however, some studies do dichotomize a continuous endpoint. For example, an endpoint of "good outcome" (modified Rankin scale of 0 to 3) versus a "bad outcome" (modified Rankin 4 to 6) converts a continuous variable to a binary variable, in which case a fragility index can be calculated.
3. Possibly not appropriate for time-to-event outcomes - Although many time-to-event outcomes are usually dichotomous (eg, mortality), the fragility index does not account for the difference in outcome over time. Particularly in longer studies with variable follow-up time periods and treatment durations, an analysis accounting for time (such as a Kaplan-Meier curve or Cox proportional hazards model) are more appropriate than a simple binary outcome analysis without accounting for time.
4. No standard fragility index "cut-off"As discussed above (What Is an "Acceptable" Fragility Index?), there is no specific cut-off or lower limit of the fragility index to classify a study as "fragile" or "robust". To a large extent, the interpretation of the fragility index and the interpretation of the index in conjunction with the trial's results are inherently subjective.

