Appraising multiple-armed RCTs
The RCT design most familiar to most people is probably the standard two-armed, parallel-design, individually randomized trial. The two arms in this case generally include the treatment arm and the control arm (alternative treatment/placebo arm). But RCTs can have more than two arms (multiple-armed RCT). One example would be a three-armed RCT comparing a treatment arm with an inactive control/placebo arm, and alternative active treatment. Essentially, multiple-armed RCTs can be appraised using the checklist for the standard two-armed trial. However, some additional issues should be considered.
Does the study present an analysis of the differences between each pair of arms, or does it present an overall analysis of the difference between all the groups (for example an ANOVA test)?
An "among-group" statistical assessment can be difficult to interpret, especially if you are only interested in one comparison and cannot attribute results to your group of interest.
The way you combine data can also affect the results, so you need to watch out for selective data combining.
Why is the RCT looking at more than two arms?
Are the different arms examining related clinical question(s)? Example 1: Two different doses of treatment A versus control B, where the related questions are: should we use treatment A and at which dose? Example 2: Treatment A versus previous gold-standard treatment B versus inactive control C, where the question is should we use treatment A as the new first-line treatment?
Alternatively, are the different arms looking at separate questions and examined in one trial for efficiency/logistical reasons? For example, new treatment A versus new treatment B versus standard control C, where the separate questions are: is new treatment A better than standard treatment C and is new treatment B better than standard treatment C?
Does the RCT apply any multiplicity correction factor?
It has been suggested that increasing the number of analyses on a particular data set can in certain cases increase the chances of getting a type I error (i.e., identifying a result as significant when it isn't, due to chance). For example, if you are interested in a particular outcome in a two-armed trial of treatments A versus B, then you have only one comparison of means (A v B); however, in a three-armed trial of treatments A, B and C, then you have three different two-way comparisons of means (A v B; A v C; B v C) on the same data set. As you increase the number of arms, then the number of comparisons also increase — e.g., in a 4-armed trial of A, B, C, D, you have 6 different two-way comparisons of means (A v B; A v C; A v D; B v C; B v D; C v D). To compensate for this, some studies employ a Bonferroni or similar correction factor. However, there has been some debate about whether an adjustment is required depending on the study design and if so what this should be. There is concern, for example, that applying a Bonferroni or similar correction can increase the likelihood of a type II error (i.e., rejecting a true significant result where one exists).
Whatever approach the study takes, it should clearly describe what comparisons and statistical tests it examined and the basis for these. It should comment on the possible interpretations of the result, so that you can decide on the validity of the analysis and then interpret the results.