1College of Biomedical and Life Sciences, School of Healthcare Sciences, Cardiff University, Cardiff, UK, 2Institute for Health Services Research in Dermatology and Nursing (IVDP), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany, 3Big Health Ltd, London, UK/San Francisco, USA, 4Sleep and Circadian Neuroscience Institute, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, 5Division of Nursing, Midwifery and Social Work, University of Manchester, Manchester, 6Centre for Trials Research, Cardiff University, Cardiff, UK and 7International Alliance of Dermatology Patient Organizations, Ottawa, Canada
By relying on data from existing patient-reported outcome measures of quality of life, the true impact of skin conditions on patients’ lives may be underestimated. This study systematically reviewed all dermatology-specific (used across skin conditions) patient-reported outcome measures and makes evidence-based recommendations for their use. The study protocol is registered on PROSPERO (CRD42018108829). PubMed, PsycInfo and CINAHL were searched from inception to 25 June 2018. The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) criteria were used to assess the measurement properties and methodological quality of studies. A total of 12,925 abstracts were identified. Zero patient-reported outcome measures were assigned to category A (ready for use without further validation), 31 to category B (recommended for use, but only with further validation) and 5 to category C (not recommended for use). There is no gold-standard dermatology-specific patient-reported outcome measure that can be recommended or used without caution. A new measure that can comprehensively capture the impact of dermatological conditions on the patient’s life is needed.
Key words: patient-reported outcome measures; measurement properties; dermatology-specific; burden; quality of life.
Accepted Jul 14, 2021; Epub ahead of print Jul 15, 2021
Acta Derm Venereol 2021; 101: adv00559.
doi: 10.2340/00015555-3884
Corr: Rachael Pattinson, School of Healthcare Sciences, College of Biomedical and Life Sciences, Cardiff University, Floor 12, Eastgate House, Cardiff, CF24 0AB, UK. E-mail: pattinsonr@cardiff.ac.uk
This is the first study to systematically evaluate all published dermatology-specific (for use across skin conditions), patient-reported outcome measures against the gold-standard Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) criteria and make evidence-based recommendations for their use. The study found that no dermatology-specific patient-reported outcome measure can be unequivocally recommended for use. These results question the validity of the data collected using these patient-reported outcome measures, which has implications for clinical decision-making and research.
Dermatological conditions are reported to cause substantial pain, disfigurement, disability, and stigma and have a psychological, social and financial burden (1, 2). Our qualitative research with people with dermatological conditions resulted in the first conceptual framework of the impact of these conditions on patients’ lives (unpublished data). Impact was defined as a multifaceted construct shown across physical, psychological, social, financial and daily functioning.
The measurement of impact is particularly pertinent to dermatology, where the goal of treatment is often to improve the patient’s quality of life (QoL) rather than prolong it. The true impact of dermatological conditions on patients’ lives is probably underestimated, because most of the evidence derives from data collected using QoL patient-reported outcome measures (PROMs), which have some limitations. First, these PROMs are typically used to assess the impact of an intervention on the patient’s life, not the impact of the skin condition on the patient’s life. Secondly, individual dermatology QoL PROMs do not adequately address all of the relevant domains or aspects thereof. For example, the psychological functioning domain has focussed largely on emotions and, to a lesser extent, coping behaviour, and typically ignores the cognitive impact. Cognitions are known to predict outcomes for a range of long-term conditions (1). In dermatology, beliefs about psoriasis are better predictors of outcomes than clinician-assessed disease severity (2–4) and are closely linked with medication adherence (5). Thirdly, work on cumulative life course impairment (CLCI) and major life decisions has established that skin conditions have a cumulative impact over time (4, 5). Recall bias increases with a long recall period, so it is generally recommended that PROMs are repeatedly administered to capture impacts over time. However, dermatology QoL PROMs typically do not contain items that can be used to track CLCI over time. Finally, the measurement properties of most PROMs used in dermatology have not been evaluated according to the “gold-standard” criteria, the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) criteria (9). This is important because PROMs must meet pre-defined criteria across a range of measurement properties (including validity, reliability and responsiveness) for the data they produce to be meaningful (6–8). Without knowledge of their measurement properties, we cannot judge the quality of a measure nor have confidence in the data it produces.
To fully understand the impact of skin conditions on patients’ lives there is a need to develop a measure specifically designed to capture this. This systematic review is the first step in the development of the Patient-Reported Impact of Dermatological Diseases (PRIDD) measure: a dermatology-specific PROM of the impact of dermatological conditions on the patient’s life for use with adults worldwide. PRIDD will have discriminative and evaluative applications for use in research and clinical practice. This review aims to: (i) identify all dermatology-specific PROMs (see Table I for information on levels of measurement) and assess their suitability for use as a measure of impact, (ii) evaluate their measurement properties according to the COSMIN criteria, and (iii) make evidence-based recommendations for their use.
Table I. Levels of measurement
The review protocol was registered on PROSPERO (CRD42018108829), an international database of prospectively registered systematic reviews with a health outcome. Ethics approval was not required. A comprehensive search strategy (Table SI) identified published evidence of the development and validation of dermatology PROMs. It comprised 3 blocks of search terms: (i) dermatological conditions; (ii) life impact; and (iii) a validated, highly sensitive search filter for measurement properties (10). Searches of PubMed, PsycInfo and CINAHL from inception to 25 June 2018 were limited to journal articles and human subjects. No limit was applied for language, and non-English papers were translated. Due to the volume of articles retrieved, the current study focused on dermatology-specific PROMs. Disease-specific PROMs will be reported in a separate manuscript.
The systematic review was conducted according to the COSMIN methodology which is the gold-standard critical appraisal tool for systematic reviews of PROMs (9). According to COSMIN, all PROMs in a review should be assessed according to both the construct of interest and target population of the review (9). The aim of this review was to identify measures of impact and to establish the quality of dermatology-specific PROMs currently in use. As such, the target population was patients with dermatological conditions (as opposed to disease-specific samples, e.g. atopic dermatitis) and the construct of interest was that of the PROM assessed. This allowed us to both identify measures of impact and establish the quality of PROMs currently used in dermatology.
Three reviewers independently screened titles and abstracts. To satisfy the inclusion criteria, the title and/or abstract had to include at least one term from each of the 3 search strategy blocks. Three reviewers independently assessed and ranked the selected articles according to adapted criteria by Kitchen et al. (11) (Table II). Only articles ranked 1a were included. Reviewers screened and rank-ordered 10% of the others’ samples to determine inter-rater reliability. The results were compared and any discrepancies were resolved through discussion.
An electronic data extraction form complied with the COSMIN guidance (12). The key data extracted were: summary data of included studies; the characteristics of included studies; themeasurement properties of the studied PROM(s); and information on the interpretability and feasibility of included PROMs.
Table II. Ranking criteria for articles adapted from Kitchen et al. (11)
An electronic data extraction form complied with the COSMIN guidance (12). The key data extracted were: summary data of included studies; the characteristics of included studies; themeasurement properties of the studied PROM(s); and information on the interpretability and feasibility of included PROMs.
Methodological quality of included studies
The COSMIN risk of bias checklist (9, 12) was used by 6 independent reviewers to evaluate the methodological quality of included studies.
Quality of measurement properties
Measurement properties from the COSMIN checklist were evaluated against predefined criteria by 6 independent reviewers (9). Criterion validity was not assessed, as no gold-standard exists for the constructs evaluated (e.g. QoL) (13). Interpretability and feasibility data were collected where available.
Best-evidence synthesis
For each PROM, evidence for the methodological quality of the studies and quality of measurement properties per measurement property were pooled and summarized. The summary was rated against the criteria for good measurement properties and then graded using a modified Grades of Recommendation, Assessment, Development and Evaluation (GRADE) to form a best-evidence synthesis (9, 12). The quality of the evidence was graded as high, moderate, low or very low evidence, according to the COSMIN procedures (9). All versions of a PROM were considered separately (9, 12).
The GRADE approach specifies 5 factors to determine the quality of evidence: risk of bias (quality of the studies), inconsistency (of the results of the studies), indirectness (evidence comes from different populations, interventions or outcomes than the ones of interest in the review), imprecision (wide confidence intervals), and publication bias (9). The fifth factor, publication bias, is not included in the COSMIN methodology, since there are no registries for studies on measurement properties. Thus, a modified GRADE approach specifying 4 factors was used to downgrade the evidence.
Generating recommendations for use of dermatology-specific patient-reported outcome measures
The primary outcome assessed was recommendation for use. Each PROM was assigned to 1 of 3 standardized “recommendation for use” categories according to COSMIN criteria (9) similar to a traffic-light system of green indicating good to go, amber meaning proceed with caution and red do not proceed:
A: PROM can be recommended for use (has evidence for sufficient content validity [any level] and at least low-quality evidence for sufficient internal consistency).
B: PROM has potential to be recommended for use, but requires further validation (cannot be categorized into A or C).
C: PROM should not be recommended for use (has high-quality evidence demonstrating insufficient measurement criteria).
The secondary outcome was establishing the existence of dermatology-specific PROM capable of measuring impact, achieved by evaluating the domains measured in each PROM.
The search identified 12,925 abstracts. An additional 3 articles (14–16) were identified through reference lists and expert input. Fig. 1 details the full article selection process. Of the 53 dermatology-specific PROM articles identified, data were extracted from 52 articles. One article (17) was excluded, since the psychometric testing for 2 separate PROMs was combined. Two studies examined more than one PROM (18, 19). Six articles that met inclusion criteria were not included in the COSMIN analysis, but data were extracted (17, 20–25); 2 because only interpretability information was reported (22, 24); and 4 review articles did not provide sufficient information on the methodological quality of included studies, but included information on interpretability and feasibility (20, 21, 23, 25). In all, 36 PROMs (Tables SII and SIII), reported in 46 articles (Table SIV), were included in the COSMIN analysis.
Fig. 1. Flow diagram of the screening and selection process.
Identification of an impact measure
A comparison of each PROM at the domain level is shown in Table III. Domains were derived based on the subscales reported by the developers or through structural validity analyses (i.e. factor analysis). Many of the instruments (83%) measured QoL. The most common domains observed were symptoms, emotional/psychological functioning, physical functioning, social functioning and daily activities. None of the PROMs included other life domains, such as financial impact and life course impairment; therefore, they should not be considered a comprehensive measure of the impact of living with a skin disease.
Table III. Comparison of patient-reported outcome measures (PROMs) at domain level
Methodological quality of included studies and quality of measurement properties
Table IV shows the methodological quality of studies and the quality of the results for the PROM content validity studies. Evidence for content validity was based only on development and pilot-testing studies (n = 22) for most instruments, as a content validity study was conducted only for the Spanish version of Skindex-29 (S29-S). The majority (86%) of the development studies were of very low methodological quality. Only the Patient Benefit Index (PBI; low quality), Turkish Quality of Life Instrument (TQL; low quality) and Skindex 29-Spanish (high quality) were rated as having adequate methodological quality. The most common reason for the downgrading of the overall quality of evidence to very low was that a cognitive interview was not conducted or was of poor methodological quality.
Table IV. Methodological quality and quality assessment of results per development and content validity study per patient-reported outcome measure (PROM)
The methodological quality of studies on measurement properties (Table V) and quality of the measurement properties (Table VI) per study are also presented. No PROM was tested for all measurement properties. The amount of measurement properties tested per PROM ranged between 1 and 6. Internal consistency was the most popular psychometric test (n = 37), measurement invariance the least (n = 2) and measurement error was not tested at all.
Table V. Methodological quality of each study per measurement property
Table VI. Quality assessment of measurement properties per study according to predefined criteria proposed by Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) (12)
Quality of PROMs: best-evidence synthesis and recommendations
The results of the best-evidence synthesis per PROM (Table VII) are shown below according to the category of recommendation (A–C). The results presented are an overview; a more detailed account is shown in Table SV. The best-evidence synthesis shows the overall evidence for each measurement property per PROM expressed as 1 result, combining both the level of evidence (high, moderate, low, very low) and the quality of the measurement property (sufficient, insufficient, indeterminate, inconsistent).
Table VII. Best-evidence synthesis and recommendations
Category A (green traffic light). No PROMs met the requirements for use as an ideal dermatology-specific PROM.
Category C (red traffic light). Five PROMs (18, 26–31) had high-quality evidence for insufficient measurement properties and are not recommended for use: the Sinhala version of the Dermatology Life Quality Index (DLQI-S), the Short-form of the Questionnaire on Experience with Skin Complaints (SF-QES), the Serbian and Spanish versions of Skindex-29 and the Chinese version of Skindex-16.
Category B (amber traffic light). A total of 31 PROMs can be recommended for use pending further validation. Where no PROMs categorized as A are found (as here), COSMIN recommends that the PROM with the best evidence for content validity in category B could be provisionally recommended for use until further evidence is provided (9). Based on the evidence for content validity across instruments in the best-evidence synthesis, the Patient Benefit Index (PBI) and Turkish Quality of Life (TQL) instrument are provisionally recommended for use as measures of patient needs and treatment benefits, and QoL, respectively. However, it is worth noting that, although both of these have content validity studies of acceptable methodological quality, their results were inconsistent. Here, we report only on the PROMs that require further context or to provide additional information that is not shown in Tables IV–VII.
Dermatology Life Quality Index and translations
Dermatology Life Quality Index (DLQI) (36) is a QoL PROM for patients with skin disease. Minimally important difference (MID) thresholds range between 2.34 and 5.7 (24, 37). One study found a ceiling effect with 11% of patients (22).
A number of studies assessed the structural validity of the DLQI, although most were of low methodological quality (38) or did not report statistics corresponding to the COSMIN criteria (16). One good methodological study (defined as a very good or adequate rating on the COSMIN Risk of Bias checklist) suggests that the DLQI is unidimensional (39), although relevant statistics were not reported. Sufficient internal consistency, construct validity and responsiveness were supported by high-quality evidence. There was some evidence of indeterminate reliability in the DLQI.
The measurement properties of the DLQI-Chinese have been evaluated using Rasch analysis (41, 42) and classical test theory (CTT) (43, 44). The results of the Rasch analyses do not directly correspond to the criteria for good structural validity, internal consistency and measurement invariance and, therefore, were not included in the best-evidence synthesis. There is high-quality evidence of a unidimensional structure and internal consistency. One study of adequate methodological quality found 2 factors (43), though these violated criteria for good measurement properties (Comparative Fit Index [CFI] 0.935). Another study of very good methodological quality found evidence of unidimensionality that met the criteria for good measurement properties (44). There was sufficient construct validity.
Dermatology-specific Quality of Life
Dermatology-specific Quality of Life (DSQL) (49) is a QoL PROM for patients with skin disease. Two studies found floor effects for the daily activities (25.2%), social functioning (27.6%), and work/school (41.2% and 53.8%) subscales (49, 50). High-quality evidence for sufficient internal consistency, construct validity and indeterminate structural validity was found, but low-quality evidence of sufficient reliability.
Freiburg Life Quality Assessment
The Freiburg Life Quality Assessment (FLQA) is a set of core, generic items and additional disease-specific items used to assess QoL in dermatology patients. The FLQA-d (14) is a variant of the FLQA for use with patients with long-term skin conditions. High-quality evidence for insufficient internal consistency, sufficient construct validity and responsiveness was found, but so was low-quality evidence of indeterminate reliability.
Patient Benefit Index
The Patient Benefit Index (PBI) (51) is a measure of patient needs and treatment benefits for dermatology patients. The developers found a “major floor effect”. Although the PBI showed low-quality evidence of inconsistent content validity; the overall development study was methodologically adequate and the criteria for good content validity and reviewers’ rating were sufficient overall. There was moderate evidence of sufficient responsiveness, and low-quality evidence of indeterminate internal consistency and reliability.
Skindex
Skindex (54) is a QoL PROM for patients with skin disease. High-quality evidence for indeterminate structural validity was found. Internal consistency was indeterminate because it was tested using hypothesized subscales, rather than those identified by the factor analysis. Low-quality evidence for insufficient construct validity was found.
Skindex-29 and translations
Skindex-29 (55) is a revised version of Skindex. There was high-quality evidence for sufficient internal consistency and construct validity, but also indeterminate structural validity.
Skindex-29-Chinese (18) had high-quality evidence for sufficient construct validity. Moderate evidence suggested insufficient structural validity. No floor or ceiling effects were observed.
Skindex-16 and translations
Skindex-16 (57) is a revised version of Skindex-29. There was moderate evidence for sufficient structural validity and internal consistency and low-quality evidence for sufficient construct validity.
Turkish Quality of Life instrument for skin disease
Turkish Quality of Life (TQL) instrument (62) is a Turkish language QoL PROM for patients with skin disease. TQL has low-quality evidence for inconsistent content validity. The cognitive interview (n = 40) was methodologically adequate, although the results were inconsistent. Moderate evidence was found for sufficient internal consistency and construct validity, and indeterminate structural validity.
To our knowledge, this is the first study to systematically evaluate published dermatology-specific PROMs in accordance with the COSMIN guidelines. A total of 36 dermatology-specific PROMs were identified and the majority measured QoL. Examination of the instruments at the domain level revealed that no single PROM could comprehensively assess the impact of living with a skin condition according to our conceptual framework, indicating that the development of a new PRIDD measure is warranted.
Based on their reported measurement properties, no PROM met the COSMIN requirements to be recommended for unqualified use, 30 showed potential to be recommended for use, but require further validation, and 6 are not recommended for use. Of those with the potential to be recommended for use, in accordance with the COSMIN guidance, only the PBI and TQL can be provisionally recommended for use, as they have the best evidence for content validity (9).
The use of PROMs of poor or unknown quality is wasteful and unethical, in part, because measures that are not valid or reliable can produce misleading results (64). Although, in some situations, an imperfect PROM (beyond accepted levels of measurement error) may be better than no PROM; it is useful to recognize the limitations of the measure so that conclusions drawn can be tempered accordingly (7). This is pertinent in dermatology, where PROMs are used in research, including clinical trials, and in clinical practice to make individual treatment decisions. This review highlighted the paucity of high-quality evidence for dermatology-specific PROMs. These findings concur with another recent COSMIN systematic review of dermatology-specific QoL instruments used in the context of eczema (65). Of the 135 measurement properties evaluated, only 26 had evidence of both adequate methodological quality and sufficient psychometric properties. No PROM performed well across all measurement properties; evidence for measurement invariance and interpretability was lacking and measurement error was absent. Content validity is considered to be the most important measurement property (12). Because PROMs aim to capture information directly from patients, adequate patient input is necessary to establish content validity. However, all original PROM development studies identified were of low or very low methodological quality and only 1 PROM, the Spanish version of Skindex-29, underwent an additional content validity study. Together, this indicates a lack of adequate patient input to the initial development of these PROMs. Future PROM validation and development work should focus on improving the methodological quality of studies, establishing content validity and addressing gaps in known measurement properties.
Four of the 5 PROMs not recommended for use were translated versions of other PROMs, potentially indicating an issue with current practise in cross-cultural translation. Our findings cannot generalize to all translated PROMs in dermatology, as we did not find published development and validation studies for some known translations. It does seem, however, that there are issues in the translation of PROMs in this area. There was a lack of measurement invariance (or cross-cultural validity) testing for translated PROMs in any recommendation category. Measurement invariance is core to the process of validation as it provides evidence of “construct equivalence”, the assumption that items in the translated version measure the same construct in the same way as in the original version (65–68). Evidence of construct equivalence, therefore, is required to synthesize and compare data across the language versions with obvious implications for research. We believe there is a need to standardize cross-cultural translation studies of PROMs in terms of methods (e.g. back- and forward-translation procedures) and measurement properties tested.
Strengths and weakness
Given that no PROMs could be unreservedly recommended for use, it could be argued the COSMIN criteria are too strict. The COSMIN criteria were developed with a range of experts including PROM developers, psychometricians, statisticians, qualitative researchers and clinicians (69, 70). In their systematic review of dermatology-specific QoL instruments, Gabes et al. concluded that the COSMIN guidance was “less strict and slightly more sympathetic to candidate PROMs” (p. 72) (64) than the previously recommended OMERACT approach (71, 72). Use of COSMIN is a strength of this review as it: (i) reduces bias in the evaluation of measurement properties; (ii) allows comparisons between PROMs; (iii) enables standardized recommendations; and (iv) highlights issues in the field, including poor methodological quality and reporting. However, COSMIN conflates inadequate reporting of studies with poor inherent methodological quality, which reduces the validity of the best-evidence synthesis. In addition, COSMIN tools do not adequately evaluate the methodological quality of studies conducted with Rasch analysis, which is considered superior to the CTT framework, since they make limited reference to Rasch relevant statistics. Inadequate evaluation of item response theory/Rasch studies also reduces the validity of the best-evidence synthesis.
A further strength of this review is that the search strategy was developed by a multidisciplinary team with expertise in dermatology, psychology and measurement instrument development and included a COSMIN-validated search filter. Three databases recommended by a subject librarian were searched. However, reference lists of included studies were not searched, which may explain why some translated PROMs were not found. Finally, at least 3 independent reviewers were involved in screening, data extraction and analysis; 2 of whom were involved at every step to ensure consistency.
Conclusion
This study found that no dermatology-specific PROMs could be unreservedly recommended for use according to the COSMIN standards. The single most common reason for poor quality assessment was the lack of patient input to the initial development of the measure. No measure of impact across skin conditions exists in dermatology and, therefore, we argue that the new measure PRIDD, developed with substantial patient input, is warranted.
The authors wish to acknowledge the input of those serving on the GRIDD project’s Scientific Advisory Board: Prof. Andrew Finlay (Cardiff University, UK), Professor Arnon D. Cohen (Clalit Health Services, Israel), Professor Ncoza Dlova (Nelson R Mandela School of Medicine, South Africa), Dr Toshiya Ebata (The Jikei University School of Medicine, Japan), Dr Cristina Echeverría (ECHO Psoriasis, Argentina), Dr Alice Gottlieb (Icahn School of Medicine at Mt Sinai, USA), Dr Luigi Naldi (Ospedale San Bortolo di Vincenza, Italy), Dr Lone Skov (University of Copenhagen, Denmark) and Marc Yale (International Pemphigus Pemphigoid Foundation, USA).
The authors thank the Scientific Communication Team of the IVDP, in particular Merle Twesten and Mario Gehoff, for copy editing.
The project was funded by the International Alliance of Dermatology Patient Organizations (IADPO) as part of its Global Research on the Impact of Dermatological Diseases (GRIDD) research project. The views expressed in this article are those of the authors and not necessarily those of IADPO. The funders had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of the data; or the preparation and approval of the manuscript.
Conflicts of interest. CB, MA, RP and NTS received grants from IADPO during the conduct of this study. MA created 2 measures (Patient Benefit Index and Freiburg Life Quality Assessment) that are included in this review but was not involved in the COSMIN analysis of any measure. The remaining authors have no conflicts of interest to declare.