From the Performance Assessment of the Health and Social Service System, Finnish Institute for Health and Welfare, Helsinki, Finland
Background: The value of randomized controlled trials is dependent on the applicability of their findings to clinical decision-making. The aim of this study is to determine a definition and principles for the applicability of evidence from randomized controlled trials and systematic reviews.
Methods: This narrative review searched studies from PubMed and Web of Science databases using Cochrane Collaboration’s Qualitative Evidence Syntheses guidance. Empirical studies were excluded. Based on the included studies, a definition for the concept and propositions for principles of applicability were formulated.
Results: A definition and 11 propositions are presented, 6 propositions having additional sub- propositions. Low risk of bias, ability to answer to specific questions, documentation of the details of how randomized controlled trials turned out, reporting of favourable and adverse outcomes, and systematic comparison of randomized controlled trials and clinical data were considered important. Biomedical randomized controlled trials have the widest applicability, while heterogeneity in study characteristics, human perception, behaviour, environmental, equity factors, and health economic issues lessen applicability. Obtaining applicable evidence is a gradual process. Methodological and substance expertise is necessary for assessing applicability.
Discussion: A definition of applicability and requirements for applicable evidence from randomized controlled trials to real-world contexts are presented. Propositions are suggested for any assessment of applicability of findings from randomized controlled trials, systematic reviews and meta-analyses.
Key words: applicability; generalizability; external validity; transferability; randomized controlled trial; systematic review; meta-analysis; benchmarking controlled trial.
Accepted May 4, 2021; Epub ahead of print May 11, 2021
J Rehabil Med 2021; 53: jrm00202
Correspondence address: Antti Malmivaara, Performance Assessment of the Health and Social Service System, Finnish Institute for Health and Welfare, Mannerheimintie 166, 00270 Helsinki, Finland. E-mail: antti.malmivaara@thl.fi
Doi: 10.2340/16501977-2843
Clinicians’ need for knowledge about a specific patient (or group of patients) is the underlying principle for applicability. Consequently, randomized controlled trials and systematic reviews should document all essential factors needed for clinical decision-making. Documentation of the study protocol (inclusion and exclusion criteria of patients, description of content of interventions and the outcome measures) is not sufficient. The documentation must also cover what actually happened in the randomized controlled trial, i.e. the characteristics of patients, the adherence to the index and control interventions, and the amount of co-interventions. Clinical registers using uniform documentation with randomized controlled trials increase the applicability of the research findings to clinical practice. The broadest applicability of findings comes from randomized controlled trials that assess the effectiveness of a single biological intervention for a well-defined disease using a valid biological outcome measure. Heterogeneity in study characteristics (patients, interventions and outcomes), and the presence of human perception (diagnosis, interventions and outcomes based on patient perception), and behaviour, environmental and equity factors, lessen the applicability of evidence. Randomized controlled trials must also report probabilities for favourable and adverse outcomes in order to increase the applicability of evidence.
The pivotal question in using the evidence from randomized controlled trials (RCTs) in clinical medicine is contextual: To whom and under what circumstances do the results of this study apply? (1).
The Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program prefers to use the term “applicability” rather than “generalizability”, and defines it as “the extent to which the effects observed in published studies are likely to reflect the expected results when a specific intervention is applied to the population of interest under ‘real-world’ conditions” (2).
The international guidelines for reporting intervention studies aim for a uniform and transparent reporting that allows assessment of internal and external validity of study results (3–5). These guidelines have been widely endorsed by the leading general medical and specialty journals, and following these is mandatory for researchers submitting papers. Consequently, the definitions and principles of applicability (external validity, generalizability) in these guidelines influence how questions related to applicability are reported.
The Consolidated Standards of Reporting Trials (CONSORT) statement includes guidelines for reporting parallel group randomized trials, and defines generalizability as “external validity, applicability of the trial findings”; and “external validity”, also called generalizability or applicability, is the extent to which the results of a study can be generalized to other circumstances” (3). The CONSORT statement presents the principles for each major item of reporting, but does not address the question of how the reporting could optimize the generalizability of evidence from RCTs to clinical practice.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement does not include a definition of applicability (generalizability, external validity) (4). The issue of how the reporting could enhance the applicability of evidence from systematic reviews and meta-analyses to clinical practice is not addressed.
The STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement provides guidelines for reporting observational studies, and defines generalizability as “external validity” (5). The question of how the reporting could enhance the applicability of evidence from observational studies to clinical practice is not addressed.
The 3 international guidelines listed above do not have a universal definition of applicability (generalizability, external validity) and do not comprehensively describe principles of how to increase the applicability of research evidence to clinical practice. It seems that common principles for applying the evidence of effectiveness from RCTs to clinical practice are lacking.
The preliminary aims of this paper were to search for studies that have pursued a definition and/or principles for applicability (generalizability) of evidence from RCTs and systematic reviews to clinical practice; and to describe the principles that they present. The primary aim is to pursue a definition for the concept of applicability in effectiveness research, and to present principles for how to apply research evidence to clinical practice. The ultimate aim is to facilitate better patient care by more valid interpretations of applicability of evidence from RCTs.
Studies on conceptual issues related to applicability (generalizability) were searched in a narrative review, and the definitions and principles of how to assess and increase applicability of evidence from RCTs were extracted. The Cochrane Collaboration’s Qualitative Evidence Syntheses guidance was used with the intention to continue the search and extraction of information to the point where no additional information in relation to the aims of the paper was found (6, 7). PubMed and Web of Science databases were used without time or language limitations. Relevant papers were identified using the following key words in different combinations: conceptual, causal inference; applicability, external validity; generalizability; and transferability, transportability. When relevant papers were found, similar papers and papers that referred to the included paper were assessed for whether they should be added to the review. The review process aimed to find all relevant scientific publications related to the definition and principles of applicability of RCTs. Information from a recent book Clinical Research Transformed was also included (8). Empirical studies statistically assessing the concordance between findings from RCTs and findings from real-world data were excluded, as the focus was on conceptual issues forming the basis for all empirical operationalizations.
Based on studies found in the narrative review, the primary aim of definition of the concept and principles of applicability for RCTs was pursued. The principles are presented in the form of propositions and sub-propositions.
Conceptual studies on applicability (generalizability)
High internal validity of a study indicates that the risk of biased findings is low, i.e. that the findings probably represent “the truth” within the specific context of the study (9). If the internal validity of a study is low, it is probable that the study findings are false. The core issue in applicability is that the study findings would also represent “the truth” within a specific clinical context. Consequently, there is a rationale for applying the results of a study only if the risk of bias is low (9). Also, it has been proposed that internal and external validity should be considered as a joint measure, the target validity, expressing an effect estimate with respect to a specific population (10). N-of-1 trials gather evidence of effectiveness from the individual patient to whom the evidence will be applied (11).
As a prerequisite for enabling the assessment of applicability (generalizability) it is suggested that documentation of each RCT (and systematic review) is performed at 2 levels: what the study was designed to be, and what it actually turned out to be (8). The latter level, what the study actually turned out to be, denotes that RCTs should document and report patient selection, patient characteristics, interventions, parameters that modify treatment effect, adherence to all interventions, and the outcome measures (1, 12).
The appraisal of transferability of RCT data to real-world circumstances are suggested to be based on a comparable description of both the source (RCT) and the target (clinical practice) domains (13). There should be sufficient documentation of what actually happens in the real-world context (12, 14–16).
Measured comparability between population data-sets and randomized trials will enhance the range of policy-relevant research questions that can be answered (17). Statistical methods may be used to improve the applicability of a randomized trial to a target population (18–22). Propensity scores can be used to quantify the difference between the trial participants and the target population (23, 24).
Differences in adherence to the intervention between the RCT and the target population should be taken into consideration (25). Methods for transporting evidence of the effectiveness of compound treatments to clinical practice have been proposed (26). Transportability of evidence may also depend on differences in the mechanisms that determine the outcome in the study and the target populations (27).
RCTs aim to assess the probabilities of change that the intervention causes in outcomes (including adverse effects) when it is used instead of another intervention (or lack of intervention) (8). When the outcomes are dichotomic, a Cox proportional hazards model or some newer regression model, such as the Hanley-Miettinen regression model, can be used in the analyses (28). When the outcome is continuous, the minimally clinically significant changes (or differences) in outcomes, and threshold values for good and poor outcomes are suggested to be determined, and the outcomes dichotomized correspondingly in order to determine respective probabilities using a logistic regression model (8).
Double-blind RCTs are indicated if the question is on the biological (or physical) effectiveness of an intervention (intervention effect per se, without placebo effect) (14, 29). If the study question is on the effectiveness of an intervention in the non-blinded circumstances of everyday healthcare, blinding of the patient or the therapist is not indicated (14, 29). The effectiveness of a clinical pathway or a feature of the healthcare system indicates the use of a cluster randomized RCT or, more commonly, an observational effectiveness study, a benchmarking controlled trial (BCT) (30).
Definition of applicability
Definition of applicability (generalizability): the extent to which the magnitude of effectiveness of an intervention for a specific patient (or specific group of patients) in clinical practice is similar to the magnitude of effectiveness in the results of a RCT or a systematic review of RCTs.
Propositions (principles) for applicability
All propositions relate to clinical interventions (directed towards patients) and most of the propositions also relate to interventions directed towards healthcare system features (in order to improve patient outcomes). The main references on which the propositions are based are presented after each proposition. All propositions are considered important by the author, and those without references are based on the thoughts of the author. The propositions are listed below, and a synopsis of the propositions is shown in Table I.
Table I. A synopsis of the propositions for enhanced applicability (generalizability) of findings from randomized controlled trials (RCTs)
Table II. The benchmarking method for assessment of applicability of evidence from randomized controlled trials (RCTs). The method can be used also for benchmarking controlled trials (BCTs), and RCTs or BCTs in systematic reviews and meta-analyses. (30, 32, 33)
The aim of this paper was to determine conceptual issues (principles) relevant to the applicability of evidence from RCTs. Conceptual principles form the basis for empirical operationalizations, i.e. for studies statistically assessing the concordance between findings from RCTs and findings from real-world data. Thus, the principles presented in this paper should be considered when planning, undertaking and reporting empirical studies on the applicability of results from RCTs.
The definition of applicability presented in this paper considers the clinical context as the starting point, from which to look retrospectively at previously published RCTs. Consequently, it is not possible to make inferences prospectively from RCTs to clinical situations unless the details of the real-world context are explicitly described.
In this paper the definition is better conveyed by the “applicability” than by the term “generalizability”. Applicability must always be judged on an (ad hoc) individual patient level, but the available research evidence can also be considered generalizable to a defined group of patients (to which the individual patient belongs). This thinking opposes attempts to grade generalizability of evidence from RCTs or systematic reviews to clinical medicine without specifying the clinical situation. Even if an illness does not currently exist in a certain country, the results may be generalizable once the illness does occur. And, if an intervention is found effective in 1 country, it may indicate a need to also implement it in another country. Lack of feasibility should not be considered as lack of applicability.
RCTs are suggested to provide case-specific evidence of the effectiveness of interventions for use by clinicians (34). The aim is to provide estimates of effectiveness from clinical research individualized to each particular patient (8, 34). It has also been suggested that evidence that is considered potentially generalizable represents only a working hypothesis to be evaluated within each clinical context (35).
A necessity for the appropriate assessment of applicability is that data from each RCT is documented regarding what the study was designed to be and what it actually turned out to be (8). Documentation of the study object (RCT) should be comprehensive, both with regard to the study plan (inclusion and exclusion criteria of patients, description of the index and control interventions, and outcome measures), and for how the experiment turned out to be (what were the characteristics of patients, including disability, quality of life, behavioural, environmental and equity factors; and what was the adherence to the index and control interventions, what were the percentages of cross-overs and what was the magnitude of co-interventions ) (12, 15). The description needed for the assessment of applicability of evidence of effectiveness from RCTs, BCTs, and systematic reviews and meta-analyses is shown in Table II.
Data are needed both from the RCT, and from the source where the knowledge will be utilized. These sources have been called primary and target contexts (36), but it is suggested in this paper that the primary context is the clinical context where the knowledge is needed, and the corresponding RCTs remain the source contexts. Researchers seem to have a consensus that an appropriate documentation of both of these contexts is necessary for the assessment of applicability (13).
Medical records often lack data regarding essential parameters that modify the treatment effect. For example, data on disease severity assessed on scales used in RCTs are not uniformly recorded in clinical practice. In order to optimize the assessment of applicability from effectiveness research to clinical medicine there must be similar documentation of patient characteristics (including selection), adherence to interventions (including those interventions that were not intended to be included in the study), and outcome measurements. Although a major challenge to produce, there is a strong need for disease-specific clinical registries that are planned and built on the same principles of design and include similar documentation to that of RCTs, from which the evidence of effectiveness is gathered.
Currently, as patient-profile-specific research data are not available from RCTs, clinicians need to gauge the magnitude of effectiveness based on their competence and on their judgement of the effect-modifying influence of the features of a particular patient-profile. Patient-profile-specific effectiveness estimates would decrease the need for the clinical judgement of applicability of findings (8).
Heterogeneity in study characteristics, and the existence of human perception, behaviour and environmental and equity factors lessen the applicability of evidence. However, the clinical relevance of studies in this category may be high, and the choice of study questions should not be based primarily on the degree of applicability of findings, but on the clinicians’ and societies’ need for evidence on effectiveness.
This study has several limitations. The literature search and appraisal of studies was not systematic, and there is no flow-chart describing the number of excluded studies. The aim of this narrative review was to search for studies until no further ideas relating to definition or propositions for applicability emerged, using the principles of the Cochrane Collaboration’s Qualitative Evidence Syntheses guidance. Some relevant studies may have escaped notice. However, the primary aim of presenting a definition and propositions for applicability has been achieved, and these are open for scientific discussion. Some of the propositions question the current thinking, and some present new ideas on the applicability of evidence from RCTs. All of the propositions provide a conceptual basis on which to build operationalization on applicability. Further conceptual research is needed.
The starting point for defining and assessing applicability (generalizability) has to be from the point of view of a clinician needing knowledge for a specified clinical situation. RCTs must report appropriately what the study was designed to be and what it actually turned out to be. To optimize the applicability of evidence from RCTs, the essential data in the RCTs and in the clinical practice have to be reported in a similar way, and there are statistical methods to increase the comparability of the data. In addition to reporting the between-group differences in outcomes, the RCTs must report probabilities for favourable and adverse outcomes, and continuous outcomes must be dichotomized according to clinical importance. The concept and principles of applicability (generalizability) cover preventive, curative, palliative and rehabilitative interventions. Scientific and clinical discussion is needed regarding the definition and principles of applicability of evidence from effectiveness research.
The author developed the idea for this study, gathered evidence from the scientific literature and wrote the paper.
The author has no conflicts of interest to declare.