From the 1Pain and Rehabilitation Centre, and Department of Medical and Health Sciences, Linköping University, Linköping, Sweden, 2Department of Hygiene and Epidemiology, School of Medicine, University of Ioannina, University Campus, Ioannina, Greece and 3Department of Epidemiology and Biostatistics, Imperial College London, London, UK
Objective: To evaluate the strength of the evidence for multimodal/multidisciplinary rehabilitation programmes (MMRPs) for common pain outcomes.
Data sources: PubMed, PsychInfo, PEDro and Cochrane Library were searched from inception to August 2017.
Study selection: Meta-analyses of randomized controlled trials or controlled clinical trials and qualitative systematic reviews of randomized controlled trials and non-randomized controlled trials were considered eligible.
Data extraction: Two independent reviewers abstracted data and evaluated the methodological quality of the reviews. The strength of the evidence was graded using several criteria.
Data synthesis: Twelve meta-analyses, including 134 associations, and 24 qualitative systematic reviews
were selected. None of the associations in meta-analyses and qualitative systematic reviews were supported by either strong or highly suggestive evidence. In meta-analyses, only 8 (6%) associations that were significant at p-value ≤ 0.05 were supported by suggestive evidence, whereas 44 (33%) associations were supported by weak evidence. Moderate evidence was found only in 4 (17%) qualitative systematic reviews, while 14 (58%) qualitative systematic reviews had limited evidence.
Conclusion: There is no evidence that MMRPs are effective for prevalent clinical pain conditions. The majority of the evidence remains ambiguous and susceptible to biases due to the small sample size of participants and the limited number of studies included.
Key words: systematic review; umbrella review; meta-analysis; multimodal pain treatment; multidisciplinary treatment; pain.
Accepted Jun 7, 2018; Epub ahead of print Aug 8, 2018
J Rehabil Med 2018: 50; 00–00
Correspondence address: Elena Dragioti, Pain and Rehabilitation Centre, and Department of Medical and Health Sciences, Linköping University, SE-581 85 Linköping, Sweden. E-mail: email@example.com
This study evaluated the published literature regarding multimodal/multidisciplinary rehabilitation programmes (MMRPs) for pain outcomes. The study reviewed the evidence on a large scale, examining 134 associations derived from 12 meta-analyses (including 462 primary studies) and 24 qualitative systematic reviews (including 243 primary studies). The results suggest that there is a lack of robust evidence about the effectiveness of the programmes investigated; most of the published studies displayed uncertainty in effect sizes due to large heterogeneity, small sample sizes, evidence of small-study effects, excess of significant findings, or any combination of the above. Some weak evidence, especially for short-term outcomes, may be genuine, but no firm conclusions can be drawn. This study highlights the necessity for larger, better-conducted, randomized controlled trials of the effectiveness of MMRP, with a standardized formula of treatment modalities, outcome measures, pain population, pain assessments, and length of treatments.
Pain conditions, such as low back pain (LBP), neck pain (NP), spinal pain (SP), whiplash-associated disorders (WAD), widespread pain (WSP), and fibromyalgia (FMS), are highly prevalent and frequently persistent chronic conditions, which cause significant disability, distress, impaired quality of life, and work absenteeism (1–10). The prevalence of these conditions ranges from 10% to 60%, with a high variation depending on age, sex, population setting (i.e. inpatients, outpatients) and duration of pain (i.e. subacute, chronic) (11–15). A new data analysis from the 2012 National Health Interview Survey (NHIS) found that 55.7% of American adults (~126 million individuals) reported having pain (16). Moreover, the socioeconomic burden of these conditions in developed countries is enormous, due to both direct and indirect costs (10–12). Thus, effective treatments are of the utmost importance.
Over recent decades, multimodal/multidisciplinary rehabilitation programmes (MMRPs) have been studied as a promising strategy for treatment of pain (10, 17, 18). MMRPs comprise a lengthy, biopsychosocial treatment framework, which generally contains a synchronized combination of physical, educational or psychological treatments provided by a team of different professionals (5, 7, 18, 19). Several systematic reviews (SRs) and meta-analyses (MAs) support the effectiveness of MMRPs for LBP (4, 5, 8, 10, 19–23), NP (including WAD) (6, 9, 24, 25) and WSP (including FMS) (2, 26, 27). In support of this data, it has been stated that, among all pain treatments, MMRPs provide a high evidential basis for efficacy, cost-effectiveness, and lack of induced complications (28). Nonetheless, there is growing concern that these results may be influenced (29) by an array of flaws, such as the presence of between-study heterogeneity, publication bias, and selective reporting of positive results (30–35). Biases in the reported findings in SRs and MAs are not unusual in the medical literature (30–35). An up-to-date umbrella review of 247 psychotherapy MAs (including pain outcomes) found that only a small fraction (7%) were supported by strong evidence and were free from biases (35).
Although empirical studies are available, no systematic umbrella review on this topic has been performed to date. Umbrella reviews systematically evaluate the evidence on an entire topic across various SRs and MAs on multiple outcomes (36) and appraise the strength of the evidence, offering better recognition of the uncertainties, biases and knowledge gaps (37). The aim of this study was to examine if, in patients with prevalent clinical conditions, such LBP, NP, SP, WAD, and FMS (Population), do MMRPs (Intervention), compared with any other active or inactive control (Control), improve pain, disability or any other reported outcome (Outcomes).To this end, an umbrella review of SRs and MAs that evaluated the effectiveness of MMRPs for the above-mentioned pain conditions was performed to plot the evidence over time, in addition to presenting areas for further research.
PubMed, PsycINFO, Physiotherapy Evidence Database (PEDro) and Cochrane Database of Systematic Reviews (CDSR) were searched from inception to 31 August 2017 for SRs or MAs investigating the effectiveness of an MMRP for LBP, NP, SP, WAD and WSP including FMS (see Table SI1 for search strings). The reference lists in the relevant SRs and MAs were also hand-searched for additional articles missed by the electronic search. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations for reporting SRs and MAs were followed. The protocol for this umbrella review has been published on Prospero (Prospero record registration no: CRD42017076309).
Two independent investigators (ED, BL) screened the titles, the abstracts of the identified records, and the full-texts of the potentially eligible articles. In cases of discrepancy, a third investigator (BG) was consulted until agreement was reached.
Qualitative SRs and MAs that tested MMRPs vs any control (e.g. treatment as usual, waiting list) or other treatment (e.g. physiotherapy, surgery) were eligible for inclusion. Reviews that used an MMRP as a control group (e.g. physiotherapy vs MMRP) were also included. If a review tested multiple treatments, this was considered eligible only in the case that separate results or analyses of MMRPs were presented. The actual definition adopted by the initial authors was used to classify whether a review examined an MMRP. In cases of absence of a clear definition, MMRP was defined as a treatment approach that includes at least 2 distinct treatment components (e.g. at least one physical and at least one educational or other psychological therapy) (7). No restrictions were set regarding the baseline characteristics (e.g. clinical setting, age or sex) and the duration of pain (e.g. acute, subacute or chronic) of the populations studied. In the case of multiple publications concerning a certain SR or MA from the same research group only the most recent or most prominent publication was used. A clear description of other exclusion criteria is provided in the Supplementary Methods and Results1.
For all eligible reviews the following data were recorded: first author, publication year, country, type of review, examined interventions, pain condition treated, whether a definition of MMRP components was given, number of included studies, total sample size, outcomes, and main findings. For each primary study included in the MAs the following data were also recorded: first author, year of publication, study design, sample size, effect size (ES) (i.e. mean difference (MD); standardized mean difference (SMD); risk ratio (RR); odds ratio (OR)), and 95% confidence intervals (95% CI). One investigator (ED) extracted the data, which were confirmed independently by another investigator (EE). Discrepancies were resolved by discussion with a third investigator (BG).
Two independent investigators (ED, EE) assessed the methodological quality of the selected reviews using the Assessment of Multiple Systematic Reviews (AMSTAR) checklist. The AMSTAR is an 11-item instrument with values ranging from 0 to 11 related to essential features of the methodological rigor across SRs and MAs; higher scores indicate higher quality (for details see Table SII1). The AMSTAR scores can be also ordered as high (8–11), medium (4–7) and low quality (0–3) (38).
tive synthesis only for SRs with quantitative synthesis or MAs of RCTs and CCTs. To this end, both fixed and random-effects models were performed to estimate the summary effect sizes (ES) and the 95% CI in each association (39). A fixed-effect model estimates a single effect that is assumed to be common in every primary study, while a random-effects model estimates the mean of a distribution of effects (40). The direction of associations presented on the original MAs was not altered, so that the results could be compared with the original results. However, to harmonize all the continuous outcomes, whenever MDs were reported transformation into SMDs were performed via standardized formula (40).
Between-study heterogeneity was appraised with the Cochran’s Q statistic (41) and measured with the I2 metric (i.e. low, moderate, large, very large for values of <25, 25–49, 50–74, >75%, respectively) (42). When heterogeneity is not present (I2 = 0), random and fixed-effects coincide. The 95% prediction intervals (PIs) in the random effects modelling were also estimated to provide an additional account of the unexplained heterogeneity and prediction of an interval for future ES estimates (43).
The Egger’s regression asymmetry test was performed to estimate small-study effects bias (44). Briefly, small-study effects refer to the phenomenon that smaller studies often show larger treatment effects than do large ones (44, 45). A p-value ≤ 0.10 in the Egger test, together with a summary random effects ES larger than the ES of the largest study in each association, displays evidence of small-study effects.
Excess of significant findings was assessed using the excess of significant findings test developed by Ioannidis & Trikalinos (46). This test examines whether the observed number of studies (O) with statistically significant results (p-value < 0.05) is larger than the expected number of studies (E) (31, 35, 46). The E was taken as the sum of the statistical power estimates for each study in the MA and the power of each study was calculated with an algorithm using a non-central t distribution (47). Since the true ES of a meta-analysis is not known, this umbrella review assumed as the plausible true effect the ES of the largest study (48). Excess of significance bias was set at a p-value ≤ 0.10 with O> E (32, 35, 46).
Whenever the primary study data for a MA was unavailable, only the summary ESs or any other information (e.g. heterogeneity or publication bias assessment) reported by the original authors were considered. In this case, further assessments of various statistical tests (e.g. 95% PI, ES of the largest study, small-study effects or excess of significant findings) were not feasible.
The secondary analysis in this umbrella review focused on descriptive analysis for qualitative SRs and MAs excluded from the quantitative synthesis. For this analysis, studied outcomes were categorized into 5 outcome areas: (1) pain, (2) physical functioning (including disability and work status), (3) emotional functioning, (4) global measures (e.g. quality of life), and (5) other (e.g. adverse events) (49).
All analyses were performed using Stata version 12 (College Station, TX, USA) (50).
The credibility of the evidence of each association provided in MAs was assessed using a number of criteria previously applied in various medical fields (31, 32, 34, 35, 51). In brief, associations that presented nominally significant random-effects summary estimates (i.e. p-value ≤ 0.05) were regarded as strong, highly suggestive, suggestive, or weak evidence (Table I). The strength of evidence of each qualitative SR or MA not included in the quantitative synthesis was also appraised in 1 of the following 4 categories: strong evidence, moderate evidence, limited evidence, and no evidence, based on modified van Tulder`s et al. criteria (Table I) (52).
Table I. Criteria of the credibility of the evidence for selected meta-analyses and qualitative systematic reviews
The primary search yielded a total of 9,896 articles, which provided 89 potentially eligible articles (Fig. 1). Of these, 36 met the inclusion criteria (1–9, 17, 19–22, 24–27, 53–69), of which 13 were qualitative SRs and 23 were MAs (Table SIII1). The reasons for exclusion of the 53 articles (Supplementary references 1–531) are summarized in Table SIV1. Of the 23 eligible MAs, only 12 (including 134 associations) were finally selected for quantitative synthesis (Fig. 1) (2–4, 6, 8, 17, 21–23, 54, 55, 59). Reasons for exclusion were mostly because 5 MAs were duplicate publications from the same research group, 4 MAs were updated versions of the same research group, and 2 Cochrane reviews did not provide a quantitative synthesis of data (Table SIII1). Primary study data were available for all MAs, with the exception of the meta-analysis by Hoffman’s et al. (59).
Table SIII1 presents the descriptive characteristics of the 36 selected SRs and MAs. All reviews were published between 1994 and 2017. Definition of the contents of MMRP was given in 21 reviews (58.3%).
Fig. 1. Flowchart of the literature search and evaluation process of published meta-analyses and systematic reviews.
The median AMSTAR quality assessment score of all 36 reviews was 7 (interquartile range (IQR) = 6–9; Table SV1). Fifteen reached the “high-quality” level (≥8/11 of the AMSTAR checklist), while 2 reviews met the “low-quality” level (0–3/11). The level of agreement of AMSTAR scores was high; 90% between the 2 independent investigators.
Table SVI1 presents the pain conditions, outcomes, characteristics and summary estimates of the 134 associations. These associations provided evidence for 4 pain conditions; namely, LBP, NP, SP and FMS, and included a total of 462 primary studies, of which only 2 were CCTs. The median number of primary studies per meta-analysis was 2 (IQR = 2−4). The median number of participants was 347 (IQR = 167−457) and the total number of participants was >1,000 in only 11 (8.2%) associations. The median length of the MMRPs was 5 weeks (IQR = 3−8). The examined outcomes are visualized in Fig. 2. A further description of the meta-analytic associations is provided in the Supplementary Methods and Results1.
Fig. 3 and Table SVI1 provide summary estimates for all 134 associations. In the fixed-effect models, 71 (52.9%) associations reported ESs that were significant at p-value <0.05 (Fig. 3), of which only 4 favoured the control group. However, in 2 of those 4 MAs, the comparator was an MMRP. In the random-effect models, 52 (38.8%) associations reported ESs that were significant at p-value <0.05 (Fig. 3); all favouring the MMRPs. In 2 associations, the MMRP was also treated as a control group. Only 15 (11.2%) associations were significant at p-values < 0.001 under random-effects modelling. Of note, in 6 (4.5%) associations it was not possible to use fixed-effect models due to unavailability of the primary data. The results of the largest study in each meta-analysis are provided in the Supplementary Methods and Results1.
In 57 (42.5%) associations the estimates of the PIs included the null value, while in 76 (56.7%) the PIs could not be estimated due to an inadequate number of included RCTs (PIs required at least 3 primary studies included in each MA to be estimated; Fig. 3). In 38 (28.4%) associations the ES of the largest study in each meta-analysis had a nominally statistically significant result. In 2 (1.5%) associations, considering the short-term outcomes of depression and disability for chronic LBP, the result was in the reverse direction (4).
Fig. 3. Summary estimates and evaluation of biases in 134 associations in meta-analyses for neck pain, spinal pain, low back pain, and fibromyalgia Notes: PI=prediction interval, ES=effect size.
Statistically significant between-study heterogeneity (p-value ≤ 0.10) was found in 59 (44.0%) associations (Table SVI1; Fig. 3). There was large heterogeneity (I2=50–75%) in 43 (32.1%) associations and very large heterogeneity (I2 > 75%) in 19 (14.2%) associations of 5 outcomes for chronic and subacute LBP. A further description of the associations with high heterogeneity is provided in the Supplementary Methods and Results1.
Small-study effects bias was found in 9 (6.7%) associations of 6 outcomes for chronic and subacute LBP (i.e. short-term episode of LBP, disability, quality of life, and coping, medium-term pain, disability and depression, and medium and long-term disability/functional status and long-term return to work) (4, 6, 8, 23). Hence, an evidence of small study effects was unimportance. On the other hand, in 76 (56.7%) associations, the small-study effects could not be estimated; the Egger’s test can be employed only for MAs including at least 3 primary RCTs (Fig. 3).
An excess of significant findings (p ≤ 0.10) was observed in 27 (20.1%) associations (Fig. 3), of 6 outcomes for chronic and subacute LBP and chronic SP. In 54 (40.3%) associations E was larger than O, indicating that an excess of significant findings was not pertinent (Table SVI1; Fig. 3). This test could not be estimated in only 6 associations (59). Thus, we did not detect consequential evidence of an excess of significant findings. A further description of the associations with an excess of significant findings is provided in the Supplementary Methods and Results1.
The assessment of the 134 associations is presented in Table II. None (0.0%) of these associations had either convincing or highly suggestive evidence in favour of the MMRP. Only 8 (6.0%) associations had > 350 participants and significant summary associations (p-value >10−6 but < 0.001) under random-effects modelling and they were classified as having suggestive evidence. Five of those associations with suggestive evidence showed beneficial effects in the short-term, 2 in the medium-term and one in the long-term. Forty-four (32.8%) were supported by weak evidence reporting nominally statistically significant random-effects associations at p-value ≤ 0.05. Thirty-eight of these displayed beneficial effects both in the short- and the long-term, whereas only 6 showed beneficial effects in the medium-term. Finally, 82 (61.2%) associations had non-significant evidence under random-effects modelling (p-value > 0.05; Table SVII1).
Table III presents descriptive characteristics with the summary of the evidence of the 24 reviews excluded from the quantitative synthesis. These reviews included a total of 243 primary studies (median = 7; IQR 3−12). A detailed descriptive analysis of qualitative SRs is provided in the Supplementary Methods and Results1.
None of these reviews was supported by strong evidence. The criteria of moderate evidence was met by 4 (16.7%) reviews, limited evidence by 14 (58.3%) reviews, and no evidence by 6 (25.0%) reviews (Table III). Meta-analyses were not performed due to the high heterogeneity in 3 reviews and the limited number of included studies in 8 reviews. All duplicate and update MAs showed agreement on the grading of evidence observed in quantitative synthesis (Tables SII1).
Table III. Descriptive characteristics with the summary of the evidence of the 24 qualitative systematic reviews and meta-analyses not included in quantitative synthesis
A subgroup analysis was also performed to verify whether the credibility of the evidence varies as a function based on newer (i.e. MAs published after 2010) vs older (i.e. MAs published before 2010) published MAs. This analysis showed that the newer MAs provided significantly larger associations with both suggestive and weak evidence compared with older MAs (7 vs 1 for the associations with suggestive evidence and 33 vs 11 for the associations with weak evidence; both p < 0.0001).
A sensitivity analysis with respect to the length of the MMRP was possible only for 35 associations because the rest of the associations did not include both studies with short (≤ 5 weeks) and long length (> 5 weeks) of MMRP (Table SVIII1). Sensitivity analyses that limited data to short length indicated that short length of MMRP for the outcomes of return to work short term and pain medium term, showed the largest evidence of association (highly suggestive evidence and suggestive evidence, respectively) in patients with CLBP. Sensitivity analysis that limited data to long length indicated that long length of MMRP for the outcomes of disability medium- and long-term, and pain long-term showed the largest evidence of association (both weak evidence) in patients with CLBP.
This study appraised the strength of the evidence across published SRs and MAs of MMRPs for prevalent clinical pain conditions. Primary analysis found that, among 134 associations, less than half produced significant results at p-value ≤ 0.05 under random-effects modelling. The proportion of significant results reduced to almost 11% when a stricter threshold was applied (p-value < 0.001). In addition, none of the statistically significant results presented either convincing or highly suggestive evidence. Only a trivial quantity was supported by suggestive evidence. These pertained to MMRPs associations merely for LBP and mainly for short-term outcomes. However, only one of those associations regarding the long-term effects on work absenteeism inferred by both statistically significant results and absence of biases (4, 5). The remaining associations with statistically significant results were supported by weak evidence, of which the vast majority showed both short-term and long-term beneficial effects. These results were further confirmed by secondary analysis of the 24 qualitative SRs or duplicate MAs not included in the quantitative synthesis. Likewise, none of these reviews was supported by strong evidence. Moderate evidence was found in only 4 reviews, while two-thirds of those had limited evidence. However, the MAs published after 2010 showed larger associations in terms of both suggestive and weak evidence, compared with older MAs published before 2010. Sensitivity analysis that limited data to short length specified that short length of MMRP provided larger evidence of association (highly suggestive evidence and suggestive evidence) compared with long length of MMRP (weak evidence) in patients with CLBP.
This study pinpoints concerns about the robustness of the empirical evidence regarding the effectiveness of MMRPs. Some of the evidence, although limited, may reveal probable associations between MRRPs and the outcomes of pain and disability. The possibility that MMRPs increases the odds of return to work sounds promising and should be tested in future large RCTs. Furthermore, these results highlight that MMRPs may have more favourable effects on short-term outcomes compared with medium- and long-term outcomes; assumptions that require further assessment, e.g. with respect to methods for maintaining gains after MMRPs. Consequently, stakeholders, such as clinicians, researchers, and health policymakers, should be aware that findings stemming from few MAs with restricted numbers of RCTs must be used with caution. Indeed, there is ongoing discussion regarding meaningful clinical interpretation of the results of the published MAs and their reported outcomes (70). Health policymakers and expert panels should be aware that the evidence is limited, and adjust for the cost-effectiveness of these treatments. Concerns regarding the economic burden of MMRPs have been described repeatedly in the literature (4, 5, 71). However, adjusting for costs may not be as simple as that; the implementation of larger RCTs may be not be practical due to cost barriers. On the other hand, the consideration of such costs should be balanced against healthcare costs and societal costs, e.g. within the social insurance system and in the workplace.
The method used to grade the evidence presents some difficulties in comparing the current results directly with previous research. However, the method used here generally complies with a current SR on behalf of the American College of Physicians Clinical Practice Guideline (72). In that review, adopting the criteria of the Agency for Healthcare Research and Quality, the authors found low-to-moderate evidence for MMRPs on LBP (72). Similarly, the majority of reviewed SRs and MAs used in this study (some also based on the GRADE approach) conclude that it is possible that MMRP may have benefits; however, there is no convincing evidence (4–7, 9, 17, 18, 21, 26, 57, 61, 62, 66–68). Only a meta-analysis of Hauser et al. (2) reported strong evidence on short-term effects on key symptoms of FMS; a finding not supported by our evaluation. In particular, this finding failed to achieve strong evidence, principally because the small sample size of the participants (< 350) and the PIs under the random-effect modelling included the null value. Additional SRs from other medical fields using GRADE have also produced similar results, e.g. a review of stroke rehabilitation resulted in a weak recommendation regarding acupuncture (73). One may argue that we used a low threshold of the sample size to evaluate the evidence compared with other studies (32, 34, 35, 74). The threshold of above 1,000 cases is used mainly in genetic association studies (51, 74), but there are other fields that, by definition, cannot recruit such sample sizes. In the literature, lower sample sizes (e.g. ≥ 200) for the assessment of the quality of evidence have been also proposed (75).
At first glance, the failure of both SRs and MAs to reach the criteria of strong evidence might be discouraging; however, cautious examination of the results may reveal some optimistic inferences. More than 60% of the published associations displayed non-significant effects. This may indicate that data dredging, also known as “p-value hacking” (76) is less common in the MMRP literature. In a previously published umbrella review of psychotherapy treatments, the significant effects were in favour of the psychotherapy by 80%, while the p-value threshold below 0.001 was found in 65% of associations (35). By the same logic, the finding that the majority of associations encompassed a low risk of biased results may indicate that the publication bias favouring positive results, selection bias or outcome reporting bias are less likely to occur in the MMRP field. However, a large body of work advises that there are a number of diverse possible reasons for heterogeneity, small-study effects or excess of significant biases, and the presence of such biases cannot be determined based only on negative assessments (31, 32, 34, 43, 44, 46, 77). It is also possible that, due to the small number of included studies per MA, the application of such statistical tests is scanty.
It is important to note that the amount of substantial heterogeneity was high, a not unexpected finding, considering the great variability of MMRP components and reported outcomes (7, 18). Similar figures have been reported previously in the psychotherapy field (35, 78) or other medical areas (32, 79). A SR of Cochrane reviews of physiotherapy and occupational therapy, for instance, found that in 52% of these reviews no meta-analysis was performed, mainly due to heterogeneity obstacles (30). In addition, calculation of the 95% prediction intervals, which indicates the possible future treatment effect in an individual study setting (43, 80), revealed that the null value was excluded in only 1 meta-analysis. This may indicate that unexplained sources of heterogeneity remain.
To the best of our knowledge, this umbrella review is the first and the largest comprehensive summary of the published literature regarding MMRPs for common clinically important pain conditions. In addition, this is the first study to assess the existing evidence by applying standardized methodology and state-of-the-art approaches based on rigorous criteria to appraise the results from both MAs and SRs (51). The only published overview of SRs in this field only critically summarized the available evidence (18). Furthermore, the methodological quality of the selected MAs and SRs was assessed in the current study with the AMSTAR tool, which has good reliability, construct validity, and feasibility (38).
This study has a number of limitations. As with any umbrella review, no firm conclusions can be reached about the sources of heterogeneity and the other possible biases, i.e. small-study effects or excess of significant findings. Our statistical tests only can offer an indication of their existence and cannot explain their aetiology effectively (44, 46, 77). However, such an examination was outside of the aims of the current study. One may argue that different lengths of MMRPs may be one of the explanations of the heterogeneity of studies. A previous SR concludes that, in the literature, the relationship between dose of MMRP and outcome effect is limited (29). In addition, the sensitivity analysis did not reveal a common pattern in terms of the credibility of the evidence. The current study also did not evaluate the homogeneity of MAs and SRs in terms of PICO and the limitations in the PICO description. Therefore, this study was limited to providing evidence at a “micro level” perspective in terms of variation within the pain conditions (e.g. definitions), characteristics of patient populations (e.g. co-morbidities), behavioural factors (e.g. smoking), environmental factors (e.g. working status), equity-related factors (e.g. income), treatment characteristics (e.g. education and competence of staff), country-specific factors (e.g. health and social care system), and in the outcome measures. Thus, we cannot exclude the possibility that absence of statistical heterogeneity also means absence of clinical heterogeneity in published MAs. Thus, only when thorough data on PICO of the original studies is available, can a clear decision be made as to whether a MA is justified. Another limitation lies in the fact that some overlap (27 out of 462; 6%), in terms of primary RCTs, mostly in the case of quantitative synthesis, could not be avoided; however, the final set of primary RCTs in each MA was considerably different, thus providing dissimilar summary estimates. A further weakness, which is a common problem in umbrella reviews, is that the results of this study are derived only from published SRs and MAs and, therefore, could have missed some information derived from single RCTs not included in these reviews or from unpublished data. The quality of primary studies included in the SRs and MAs was also not examined, although this is one of the central aims of the original SRs and MAs. Finally, albeit that the methodological quality of the included qualitative SRs and MAs was satisfactory, we did not contact the original authors to elucidate whether particular methodological issues were actually examined; hence, errors may have been introduced.
Future MMRPs should focus on some major methodological issues that appear to challenge the reported evidence. Many RCTs report on several outcomes, which are seldom divided into primary and secondary outcomes, e.g. one Swedish SR (not included here) included an average of 9 outcomes (81). MMRP is a complex treatment with broad goals and as a result, it is highly unlikely that changes in 9 outcomes are independent of each other. The question arises as to how to determine whether positive results are obtained in an RCT of MMRP; evaluating a single outcome at a time, as done here and in most RCTs, SRs and MAs, may not be the most accurate process, since the treatment was not designed to target only a single outcome. Moreover, small changes in 9 outcomes may be more important for the patient than one prominent change in 1 out of 9 outcomes.
This study suggests that, although the exact components of MMRPs are difficult to grasp even in RCTs, a standardized protocol of MMRPs components and outcomes, which could be applied to any MMRP study, might be more usable for making concrete comparisons in future effectiveness studies. Two topical SRs found that the components of the MMPR were described only in general terms, and the outcome domains were measured inconsistently across studies (7, 49); characteristics of MMRPs studies also noted in our evaluation. A further concern applies to the question of whether the patient groups included in different RCTs are indeed comparable; they may have chronic LBP, but the presence of comorbidities and long-term sick leave may be unequal among these patients. Hence, there is a lack of taxonomy of chronic pain patients applicable in clinical settings and in research. The present study also recommends that, notwithstanding the costs, there is a need for more, larger, and better-conducted, RCTs on the effectiveness of MMRPs. An in-depth examination of possible reasons for heterogeneity, including the length of the MMRPs and the homogeneity of PICOs, in future MA may lead to a better understanding of the variations between studies. Finally, data regarding adverse events, and more studies in other pain groups, are also necessary.
The results of this study indicate an absence of strong empirical evidence for MMRPs for common pain conditions. In contrast, the available evidence, although limited, did not manifest a high risk of biased results. Nonetheless, it cannot be ruled out that those biases may be hidden by the small number of studies and small sample sizes. The use of an identical formula for treatment modalities, outcome measures, and length of MMRPs may facilitate comparisons of MMRP effectiveness across future studies. Larger and more rigorous RCTs are, therefore, required.
Conflicts of interest statement: The authors have no conflicts of interest to declare. BG received a research grant from AFA Insurance; AFA Insurance is a commercial founder, which is owned by Sweden’s labour markets parties: the Confederation of the Swedish Enterprise, the Swedish Trade Union Confederation (LO) and The Council for Negotiation and Co-operation (PTK). They insure employees within the private sector, municipalities and county councils. AFA Insurance do not seek to generate a profit, which implies that no dividends are paid to the shareholders. AFA Insurance had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript