From the 1Department of Physical Medicine and Rehabilitation (Physiatry), Centre Hospitalier Universitaire de Québec – Université Laval, 2Center for Interdisciplinary Research in Rehabilitation and Social Integration, and 3Department of Rehabilitation, Faculty of Medicine, Université Laval, Quebec City, Canada
Objective: To determine the diagnostic validity of high-resolution ultrasound and orthopaedic special tests in diagnosing long head of the biceps tendon pathologies in patients with shoulder pain.
Design: Systematic review with meta-analysis tools.
Data sources: MEDLINE, CINAHL and EMBASE.
Data extraction: Included studies had to report on the diagnostic validity of orthopaedic special tests or high-resolution ultrasound (HRUS) compared with a reference standard for diagnosing long head of the biceps tendon target conditions (superior labrum anterior and posterior lesions, long head of the biceps tendon tendinopathy, dislocation, effusion or rupture). Risk of bias was assessed using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS-2) tool.
Results: Of the 30 included studies, 8 focused on high-resolution ultrasound and 22 on orthopaedic special tests. High-resolution ultrasound proved highly specific for the diagnosis of long head of the biceps tendon pathologies. Pooled positive (LR+) and negative (LR–) likelihood ratios were 38.00 and 0.24 for dislocation, respectively, and 35.50 and 0.30 for complete rupture, respectively. The accuracy of orthopaedic special tests varied greatly across studies. The only test of value was Yergason’s ma-noeuvre in confirming proximal long head of the biceps tendon pathologies except superior labrum anterior and posterior lesion (high specificity): the summary LR+ and LR– were 2.56 and 0.70, respectively.
Conclusion: High-resolution ultrasound is reliable to confirm suspected long head of the biceps tendon pathologies. There is insufficient evidence to recommend individual orthopaedic special tests.
Key words: shoulder; biceps tendon; glenoid labrum; imaging; diagnostic ultrasound.
Accepted May 3, 2019; Epub ahead of print May 16, 2019
J Rehabil Med 2019; 00: 00–00
Correspondence address: Valérie Bélanger, Centre Hospitalier Universitaire de Québec – Université Laval, Hôpital de l’Enfant-Jésus, 1401, 18e Rue, Quebec City, Canada, G1J 1Z4. E-mail: valerie.belanger.20@ulaval.ca
People with shoulder pain seek medical attention in order to relieve their symptoms and improve their quality of life. However, given the complexity of the shoulder girdle, making the right diagnosis can be challenging. Clinicians and other healthcare practitioners base their approach on the findings of current medical history, as well as physical and ultrasound examinations. Once a structure is identified as a potential pain-generator, a specific therapy can be used. The biceps tendon is one such structure. The aim of this study is to assess the accuracy of physical and ultrasound examinations in diag-nosing biceps tendon pathologies. This will help to guide clinical decision-making and may prevent delay in seeking specific treatment approaches.
Shoulder pain is common in the general population (1), and pathology of the long head of the biceps tendon (LHBT) can be a primary source of shoulder pain, either in isolation or in association with other shoulder pathologies, such as rotator cuff diseases (2, 3). Most described LHBT pathologies include superior labrum anterior and posterior (SLAP) lesions, tendinosis, dislocation and rupture (4). In the clinical setting, orthopaedic special tests (OSTs) and, more recently, high-resolution ultrasound (HRUS) are used for ruling in or out shoulder disorders, such as LHBT pathologies. While numerous OSTs have been proposed to identify the different LHBT pathologies, HRUS can be used to detect LHBT tendinopathy, dislocation, rupture and intra-articular peritendinous effusion. In rare cases, HRUS can directly diagnose insertional pathology, such as SLAP lesions (5). Eight systematic reviews have been published on the diagnostic accuracy of OSTs for a wide spectrum of shoulder disorders, including LHBT pathologies, most of which were SLAP lesions. The conclusions were that OSTs are neither very specific nor sensitive in diagnosing SLAP lesions (6–13). However, new high-quality diagnostic accuracy studies for OSTs have been conducted in the past few years, and could therefore change the conclusions of these previous systematic reviews. In addition, no systematic review has focused on the accuracy of HRUS in diagnosing LHBT pathologies. To our knowledge, no systematic review has been carried out specifically addressing the diagnosis of LHBT pathologies in clinical practice, including the accuracy of both OSTs and HRUS examinations. A better picture of the current accuracy of clinicians in assessing the LHBT will enable a better selection of diagnostic tools for the clinical evaluation of shoulder pain.
The aim of this study was to determine the diagnostic accuracy of: (i) diagnostic HRUS for detecting LHBT tendinopathy, dislocation, rupture (partial or complete) and bicipital recess effusion; and (ii) OSTs for detecting any pathology of the LHBT in patients with shoulder pain. The study determined the accuracy of each OST related to LHBT, for detecting the specific clinical entity for which they were designed (Appendix I) (14).
Appendix I. Description of orthopaedic special tests (OSTs)
Included studies were prospective, either delayed cross-sectional or diagnostic case-control studies, which included patients recruited in primary, secondary or tertiary care settings. There was no limit to sample sizes or prevalence in the included studies; however, 100% prevalence studies were eliminated because they do not allow calculation of specificity.
Any patients with shoulder pain were considered, with no limit on diagnosis or age group. However, studies including exclusively rheumatological or neurological populations were not considered, since these disorders encompass a diverse group of musculoskeletal conditions that differ from those found in the general population.
OSTs (Appendix I) and HRUS were the index tests. HRUS methods for examining the LHBT had to be congruent with accepted standards (15, 16).
SLAP lesions, tendinopathy, dislocation, rupture and effusion (bicipital recess) of the LHBT were considered.
HRUS had to be compared with surgery (open or arthroscopy), magnetic resonance (MR) imaging or MR arthrography. OSTs had to be compared with surgery, HRUS, MR imaging or MR arthrography.
MEDLINE, CINAHL and EMBASE databases were searched for eligible articles from their inception dates to July 2018. Articles had to be written in French or English. The full search strategy is described in Appendix II. The reference lists for every article found in the original electronic search were screened to identify further eligible articles.
Appendix II. Search strategies for MEDLINE and CINAHL
Selection of studies. Two review authors independently selected the studies. In case of disagreement, a third author was involved to reach consensus. Articles were selected if they met the selection criteria for population, index test, reference standard, and reported on the diagnostic accuracy of individual index tests for diagnosing a specific LHBT pathology (SLAP lesions, LHBT tendinopathy, dislocation, rupture or effusion). We started with a review of titles, proceeded to abstracts where titles indicated possibly relevant studies, and selected eligible studies after reading their full text.
Data extraction and management. Data were extracted by 2 independent authors. If any disagreement occurred during this step, a third reviewer intervened to reach mutual agreement. The extraction decision was based on the possibility of drawing a 2 × 2 table. If the tables were not included in the article, data allowing reconstruction was necessary. If there was any discrepancy between text and tables, articles were removed from analysis unless original authors could be contacted to resolve the issue.
Quality assessment. The risk of bias of each study was assessed using Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS-2) by the same 2 independent authors who selected the studies and extracted the data (17). This tool is designed to appraise studies’ selection bias and information bias by assessing 4 key domains: patient selection; index test; reference standard; flow of patients through the study and timing of the index test(s) and reference standard. Results are expressed in terms of the methodological quality “high”, “low” or “unclear”, based on the author’s judgement. Authors of reviews are encouraged to tailor QUADAS-2 to their review by developing review-specific guidance on how to assess each signalling question (17). In that respect, after consensus among authors, specific criteria were used for each section (Table I). Gwet’s first-order agreement coefficient (Gwet’s AC1) was used to calculate interobserver agreement (18).
Table I. Quality assessment tool for diagnostic accuracy studies (QUADAS-2) items’ specifications developed by authors of the review
A systematic review should not culminate in meta-analysis if there are differences between the studies in terms of the participants they recruit and the test that they evaluate (19). In that respect, data were combined where studies measured the accuracy of the same index test for the diagnosis of the same LHBT pathology: (i) according to the same reference standard; and (ii) according to all reference standards. Meta-analysis tools were used when a minimum of 4 primary studies were identified (Table II) (20). Where a limited number of studies prevented the use of meta-analysis tools, only sensitivity (Sn) and specificity (Sp) estimates are presented from each study, together with forest plots.
Table II. Possible combinations of index test/reference standard/target condition for meta-analyses
Meta-analyses were conducted using the approach developed by Rutter & Gatsonis with the V3.3.3 of R statistical software (http://www.r-project.org/) (21). The HSROC package was used to calculate overall pooled estimates of the included diagnostic studies taking into account the between-study and within-study variability. This routine, based on Bayesian statistics, estimates the overall sensitivity (Sn) and specificity (Sp) for group of studies and produces a receiver operating characteristic (ROC) curve with credible interval and a 95% prediction region. The classical confidence interval (CI) presumes that differences in Sn and Sp between studies are caused only by a statistical instability related to sampling or measurement errors. All estimates would turn around a unique value of Sn and a unique value of Sp. In reality, for the same technique, Sn and Sp may vary in time, with different populations, with different operators or any other relevant conditions that change the nature of the test. Across different conditions, Sn and Sp could fluctuate among a range of values that reflect a change in reality rather than a statistical instability. The credible intervals delimit how Sn and Sp could fluctuate for reasons other than sampling or measurement errors. In this context, the CI adds to the credible interval the uncertainty caused by sampling and measurement errors. The credible intervals are narrower than the CI. The prediction region is defined by pairing the CI with the credible interval. Heterogeneity was explored graphically using forest plots. Positive (LR+) and negative (LR–) likelihood ratios were calculated from the overall Sn and Sp. However, confidence and credible intervals could not be calculated for likelihood ratios.
Studies with cells containing zero in the 2 × 2 table lead to statistical model instabilities. A continuity correction, consisting of a small positive number (0.5 as suggested in the literature) was then added to the observed frequency (20).
For SLAP lesions, because the degenerative fraying of the SLAP I lesion is often considered a normal variant and asymptomatic, type II–IV and type I–IV lesions studies were isolated (22). The type II–IV group comprised studies either designed to assess the diagnosis of SLAP II–IV lesions or where only SLAP II–IV lesions were ascertained by the reference standard.
Searches resulted in 777 citations (duplicates removed). Twenty-eight articles were accepted for the review after full-text screen. Fourteen articles were obtained by scrutiny of the reference lists of reviews and primary studies. Of the 42 eligible studies, 30 were included in the analysis of the review (8 for HRUS, 22 for OSTs; Fig. 1, Table III).
Fig. 1. Flow diagram of the bibliographic search. HRUS: high-resolution ultrasound; OSTs: orthopaedic special tests.
Table III. Summery of included studies
For the risk of bias assessment, inter-rater agreement was excellent (Gwet’s AC1 of 0.85). The overall studies assessment shows some risk of bias in 3 of the 4 categories (Fig. 2). For patient selection, 53% of all studies were assessed as low risk. Nine studies were judged at high risk because of restricted population (n = 5), (23–27), inappropriate exclusions (n = 3)(28-30) and case-control study design (n = 1)(31). In addition, three of them did not enrol patients in a consecutive manner (26, 27, 30). For index test, beside inadequate test description (n = 1)(23) and unknown blinding to the reference standard (n = 2) (26, 32), all were assessed as low risk of bias. For reference standard, 33% of studies included had a low risk of bias. All studies judged as high risk had a blinding issue (n = 14) (23, 25, 29, 31–41). For flow and timing, 27% of the eligible studies were deemed to have low risk. All studies considered to have high risk had inadequate interval between index test and reference standard (n = 8) (22, 25, 26, 32–35, 42). Moreover, for 3 of them, the reference standard was not the same for all patients.
Fig. 2. Methodological quality graph for accuracy studies: (A) all, (B) high-resolution ultrasound (HRUS), and (C) orthopaedic special tests (OSTs). Graphs show the percentage and number of studies with a high (red), low (green) and unclear (yellow) risk of bias for the 4 items.
Few studies compared the same index test with the same reference standard for the same target condition. Therefore, meta-analyses could be considered only for the following combinations: diagnosis of (i) LHBT dislocation with HRUS, (ii) LHBT complete rupture with HRUS, (iii) SLAP I–IV lesions with the Speed test, (iv) SLAP II–IV lesions with the active compression test, the anterior slide test and the crank test, (v) any pathology of proximal LHBT except SLAP lesion with the Speed test and the Yergason’s manoeuvre.
Tendinopathy. Three studies evaluated HRUS for diagnosing LHBT tendinopathy, either with surgery or MRI as reference standard (33, 34, 43). While Sn estimates ranged from 0.22 to 1.00, Sp varied from 0.88 to 1.00 (Fig. S11).
Dislocation. Seven studies assessed the accuracy of HRUS for diagnosing LHBT dislocation, comparing with surgery or MRI (23, 24, 32, 33, 42–44). Sn varied from 0.33 to 1.00, while Sp was in the high end of the spectrum, ranging from 0.96 to 1.00 (Fig. S11). Data from the 7 studies were pooled (Table IV, Fig. 3). Point estimates for Sn and Sp are 0.76 (95% CI 0.15–1.00) and 0.98 (95% CI 0.65–1.00), respectively. Results indicate a quite high Sp but more fluctuating Sn.
Effusion. One study evaluated HRUS accuracy in diagnosing LHBT effusion compared with MRI (43). The Sn and Sp estimates were 0.79 and 0.73, respectively (Fig. S11).
Partial rupture. Two studies investigated HRUS accuracy for the diagnosis of LHBT partial tear, and comparison was made with surgery (32, 34). Sn ranged from 0.27 to 1.00 and Sp was 1.00 for both studies (Fig. S11).
Complete rupture. Five studies evaluated HRUS in diagnosing complete LHBT rupture, compared with surgery or MRI (24, 32–34, 42). Sn and Sp ranged from 0.64 to 1.00 and 0.87 to 1.00, respectively (Fig. S11). Data from the 5 studies were pooled (Table IV, Fig. 3): Sn and Sp are 0.71 (95% CI 0.11–1.00) and 0.98 (95% CI 0.61–1.00), respectively. The results indicate a quite high Sp, but more fluctuating Sn.
Table IV. Overall accuracy of high-resolution ultrasound in characterization of long head of the biceps tendon pathology
Fig. 3. Hierarchical summary receiver operating characteristic (ROC) curve examining the diagnostic value of high-resolution ultrasound (HRUS) for characterization of long head of the biceps tendon (LHBT) (A) dislocation and (B) complete rupture. The 95% prediction region is defined by the blue dotted-curve, while the red dot-dashed-curve marks the boundary of the 95% credible interval of the pooled estimates. Prediction region is defined by pairing the confidence interval with the credible interval.
SLAP I–IV lesions. Accuracy for diagnosing SLAP I–IV lesions was assessed for 10 OSTs (Fig. S21). The Sn and Sp ranged or were for each test, respectively: from 0.60 to 0.91 and from 0.13 to 0.85 for the active compression test (35, 37, 39), from 0.10 to 0.48 and from 0.81 to 0.82 for anterior slide test (37, 39), 0.55 and 0.53 for biceps load II test (35), from 0.13 to 0.39 and from 0.67 to 0.83 for crank test (36, 39), from 0.58 to 0.89 and from 0.31 to 0.98 for dynamic labral shear test (35, 37, 41), 0.27 and 0.75 for labral tension test (35), 0.48 and 0.52 for palpation test (36), 0.82 and 0.86 for passive compression test (45), from 0.09 to 0.47 and from 0.56 to 0.74 for Speed test (35–37, 39), and 0.23 and 0.57 for uppercut test (37). Data were pooled from studies assessing the Speed test (Table V, Fig. 4). The results indicate a widely variable performance. Its point estimates for Sn and Sp are 0.36 (95% CI 0.00–0.82) and 0.71 (95% CI 0.23–1.00), respectively.
SLAP II–IV lesions. Accuracy for diagnosing SLAP II–IV lesions was assessed for 8 OSTs (Fig. S31). The Sn and Sp for each test were, respectively, from 0.47 to 0.65 and from 0.38 to 0.92 for the active compression test, (22, 25, 27, 31, 38–40), from 0.04 to 0.70 and from 0.69 to 0.98 for anterior slide test (22, 27, 31, 38–40), from 0.29 to 0.90 and from 0.78 to 0.97 for biceps load II test (31, 46), from 0.09 to 0.83 and from 0.42 to 1.00 for crank test (22, 26, 27, 39), from 0.25 to 0.26 and from 0.65 to 0.80 for palpation test (27, 31), 0.89 and 0.82 for passive compression test (45), 0.52 and 0.94 for passive distraction test (40), and from 0.04 to 0.48 and from 0.65 to 1.00 for Speed test (27, 31, 39).
Data were pooled from studies assessing the active compression test, the anterior slide test and the crank test (Table V, Fig. 4). The results indicate a widely variable performance for the 3 tests. The pooled Sn and Sp for the active compression test are 0.59 (95% CI 0.19–0.96) and 0.57 (95% CI 0.18–0.96), respectively, for the anterior slide test 0.21 (95% CI 0.00–0.79) and 0.88 (95% CI 0.35–1.00), respectively, and for the crank test 0.49 (95% CI 0.02–1.00) and 0.70 (95% CI 0.06–1.00), respectively.
Tendinopathy. Accuracy for diagnosing LHBT tendinopathy was assessed for 3 OSTs, and HRUS was the reference standard. The Sn and Sp estimates from each study are shown in forest plots (Fig. S41). The Sn and Sp were for each test, respectively: from 0.57 to 0.85 and from 0.49 to 0.72 for the palpation test, (30, 47), from 0.47 to 0.83 and from 0.36 to 0.75 for Speed test (47-49), and from 0.32 to 0.86 and from 0.74 to 0.82 for Yergason’s manoeuvre (47–49).
Table V. Overall orthopaedic special tests’ accuracy in characterization of long head of the biceps tendon (LHBT) pathology
Fig. 4. Hierarchical summary receiver operating characteristic (ROC) curve examining the diagnostic value of the Speed test for characterization of: (A) superior labrum anterior and posterior (SLAP) I–IV lesions, (B) active compression test for characterization of SLAP II–IV lesions, (C) anterior slide test for characterization of SLAP II–IV lesions, (D) crank test for characterization of SLAP II–IV lesions, (E) Speed test for characterization of any long head of the biceps tendon (LHBT) pathology, but SLAP lesion, and (F) Yergason’s manoeuvre in characterization of any pathology but SLAP lesion. The 95% prediction region is defined by the blue dotted-curve, while the red dot-dashed-curve marks the boundary of the 95% credible interval of the pooled estimates. Prediction region is defined by pairing the confidence interval with the credible interval.
Any proximal tendon pathology except SLAP lesion. Accuracy for diagnosing any LHBT pathology except for SLAP lesion was assessed for 5 OSTs. Target conditions included tendinopathy, dislocation, effusion, and rupture. Reference standard varied across studies, including either surgery or HRUS. Sn and Sp estimates from each study are shown in forest plots (Fig. S51). Sn and Sp for each test were, respectively, 0.01 to 1.00 for Heuter’s sign (49), from 0.53 to 0.85 and from 0.49 to 0.72 for palpation test (29, 30, 47), from 0.47 to 0.93 and from 0.27 to 0.81 for Speed test (28, 29, 37, 47–50), 0.72 and 0.78 for upper cut test (37), and from 0.32 to 0.86 and from 0.78 to 0.88 for Yergason’s manoeuvre (37, 47–49, 51)
Data from studies assessing Speed test and Yergason’s manoeuvre were pooled (Table V, Fig. 4). The results indicate a widely variable performance for the 2 tests, except for Yergason’s manoeuvre Sp. Sn and Sp for the Speed test are 0.65 (95% CI 0.17–1.00) and 0.61 (95% CI 0.15–1.00) and for Yergason’s manoeuvre 0.41 (95% CI 0.14–0.72) and 0.84 (95% CI 0.65–1.00).
We identified 30 studies evaluating the accuracy of HRUS or OSTs in diagnosing LHBT pathologies (Table III). The 8 primary studies on HRUS diagnostic accuracy comprised 5 different combinations of target condition/index test. At most, 6 of the studies examined the same combination. The 22 studies assessing OSTs presented 26 such combinations, and no more than 7 research studies tested the same combination. This lack of consistency across studies and the relatively few studies on the subject are a major barrier to the assessment of these clinical tools.
For a diagnostic test to be useful, it must have the ability to sufficiently revise the pre-test probability of a patient having a disease in order to guide clinical decisions. HRUS for the diagnosis of dislocation and complete rupture had LR+ above 35.5 and LR– below 0.30, indicating a large increase in the post-test probability of dislocation and complete rupture when diagnostic ultrasound is positive, and a moderate decrease in the probability of these diseases when it is negative (23). It should be noted that estimates of Sn of HRUS for diagnosing dislocation and complete rupture had wide confidence intervals (0.15–1.00 and 0.11–1.00), hence their calculated LR– might overplay the evidence. Confidence intervals were narrower for Sp (0.65–1.00 and 0.61–1.00), thus LR+ are probably informative.
OSTs LR+ and LR– demonstrated less compelling evidence. The only test of value was Yergason’s manoeuvre in diagnosing proximal LHBT pathology except SLAP lesion. Its LR+ was 2.56, indicating a slight increase in the probability of the disease. As its Sp confidence interval was 0.65–1.00, we can assume that it is of reasonable value. OSTs LR– varied between 0.57 and 0.90, all indicating no change in the post-test probability of the disease. The current review separated SLAP I–IV and II–IV lesions as 2 target conditions in order to investigate whether the accuracy of each OST changes when SLAP I lesions are considered normal variants. When explored graphically with forest plots, there is no apparent significant difference between the OSTs’ accuracies in diagnosing SLAP I–IV and SLAP II–IV lesions.
Eight systematic reviews were identified, of which 4 included a meta-analysis that evaluated the diagnostic accuracy of OSTs for diagnosing SLAP lesions. The 4 systematic reviews that did not include a meta-analysis (6, 7, 9, 12) highlighted that OSTs have a wide range of diagnostic accuracy values, with no particular single test appearing to have strong statistical support. This is in line with our conclusions for the accuracy of OSTs.
Hanchard et al. (9) conducted a Cochrane systematic review on shoulder impingements and local lesions of tendons and labrum that may accompany impingement. Their review comprised several individual studies that were included in our analysis for the accuracy of OSTs. For these analyses, Sn and Sp were obtained in agreement with Hanchard et al.’s study. For these same combinations of index test/target condition, 8 new studies issued after completion of their review were identified and included (22, 24–30). In addition, we classified the target conditions slightly differently. In the current review, we grouped together studies examining the diagnosis of SLAP II–IV and SLAP II lesions (our SLAP II–IV group) while Hanchard et al. kept them separated.
Four previous meta-analyses (8, 10, 11, 13) have reported pooled accuracy estimates for the active compression test, anterior slide test, crank test and Speed test in diagnosing SLAP lesions. Hegedus et al. (10) and Gismervik et al. (8) reviewed the literature on the accuracy of OSTs of the shoulder. For SLAP lesions, there were some discrepancies between the values obtained by these authors and our estimates for the active compression test and Speed test. These discrepancies may arise from the fact that we separated SLAP I–IV from II–IV studies. Our higher Sp for active compression test could suggest that it has a better profile for confirming a SLAP II–IV than a SLAP I–IV lesion. In addition, Gismervik et al. incorporated Holtby & Razmjou’s study (31) when combining data for the Speed test, while we did not. It should be noted that Holtby & Razmjou’s study was not included in our analysis for the combination Speed test/SLAP I–IV lesions because this study evaluates Speed test’s accuracy in diagnosing not only SLAP lesions, but any proximal LHBT pathology including SLAP lesions.
Meserve et al. (11) conducted a meta-analysis examining the accuracy of OSTs for assessing SLAP lesions (active compression test, anterior slide test, crank test, and Speed test). They found that the anterior slide test was statistically inferior to the 3 other tests; this can be appreciated when looking at their ROC curves. In our review, the curve for the anterior slide test resembles the 3 others. This inconsistency may be explained by the 3 studies included in our analysis that were published after their review (22, 32, 33). After reviewing the literature on the same research question, Walton et al. (13) performed a meta-analysis for the OSTs that have been evaluated at least 3 times in the literature. They provided estimates of the pooled LR+ for, among others, the active compression test (1.07), crank test (1.51), and Speed test (1.12). Our pooled LR+ estimates were 1.37 for the active compression test, 1.63 for the crank test, and 1.24 for the Speed test. Our values are slightly higher for the active compression test because we included 3 studies that have been published after their work (26, 34, 35). Also, for the Speed test, they incorporated Holtby & Razmjou’s (31) as well as Bennet’s (36) studies in their analysis, which evaluates not only SLAP lesions, but any LHBT pathology.
Strengths. First, this systematic review was based on a rigorous search of the literature, which resulted in the inclusion of 30 articles. Secondly, a recommended appraisal tool was used to determine the risk of bias of included studies. In addition, the statistics presented in the included studies were double-checked by back-calculating 2×2 tables. Where we observed discrepancy between text and tables, or when values presented had arithmetical errors, the study was excluded. Finally, judicious use was made of meta-analysis’ tools: they were used when there was a minimum of 4 primary studies identified, as suggested by Sotiriadis et al. (20).
Weaknesses. In our protocol design, we chose to exclude non-English or French studies, which may have led to selection bias. There was 1 study in Persian and 1 in Turkish languages that could have been eligible. We also recognize the possibility of information bias in the studies included. More specifically, as appraised with QUADAS-2 instrument, there is a possibility of misclassification due to spontaneous recovery or progression of disease. Of the 30 included studies, 9 had an inadequate interval between index test and reference standard. In the same vein, misclassification in the primary studies due to inaccurate reference standard is another possibility to consider. It was “unclear” if the reference standard was likely to correctly classify the target condition in 8 of the 30 included studies. For instance, in order to assess the accuracy of OSTs in diagnosing tendinopathy, HRUS was the reference standard in the only individual studies identified in the literature (Fig. S41). Since the role of ultrasound in the diagnosis of biceps tendinopathy is still poorly understood, this area of uncertainty would need to be addressed before a more definitive conclusion can be drawn (2).
From the findings of this systematic review, HRUS had variable Sn and thus would be of lower interest as a screening test. Nevertheless, it can be considered a highly specific clinical tool for the diagnosis of dislocation, rupture and tendinopathy of the LHBT; it can be useful in ruling-in disease. Besides its effectiveness, HRUS has several advantages over other imaging modalities: there is no contraindication, it has high spatial resolution, dynamic assessment is possible as well as correlation of findings with patients’ symptoms. Furthermore, it has been shown to be cost-effective in specific situations, such as in the context of rotator cuff disease (37), and proved to be a reliable method for the measurement of the LHBT in healthy shoulders (38).
With regard to OSTs, the evidence was more limited by the variability of the test accuracies across different study settings. A promising screening test (high Sn) for SLAP II–IV lesions is passive compression test, but the test has been evaluated only by its originators. No other test demonstrated high Sn. For ruling-in specific diagnosis, several tests seem to be valuable. The anterior slide test and biceps load II test had high Sp for diagnosing SLAP I–IV lesions. Passive compression test and passive distraction test were highly specific for SLAP II–IV lesions, but only the test’s originators assessed their accuracies. For LHBT tendinopathy, Yergason’s manoeuvre proved highly specific. For proximal LHBT pathology except SLAP lesions, Heuter’s sign (one study) and Yergason’s manoeuvre had high Sp.
Whereas no single clinical finding, either OSTs or HRUS, is accurate enough to confirm diagnosis and guide subsequent clinical decisions, it is appealing for clinicians and researchers to improve diagnostic accuracy by clustering clinical information. Furthermore, combining clinical findings more closely reflects how clinicians make decision in practice. Combining the more sensitive clinical information with the more specific data could be quite helpful in improving our ability to diagnose LHBT pathology. Future research on the subject should focus on the development of such clusters.
In order to rule in LHBT pathology, HRUS has proven its diagnostic efficacy. However, evidence is lacking to recommend its use for the purpose of ruling out pathology. There is insufficient evidence to recommend individual OSTs. In the future, rigour in diagnostic test accuracy research is of paramount importance. Researchers should minimize bias by using prospective cohort-type study designs, index test in accordance with the original description, adequate reference standards and adequate interval between index test and reference standard. Finally, investigators should consider improving accuracy by clustering OSTs with or without HRUS and information about current or past medical history (39).
The authors have no conflicts of interest to declare.