R. Benjamin Aldridge1, Matteo Zanotto2, Lucia Ballerini2, Robert B. Fisher2 and Jonathan L. Rees1
1Department of Dermatology, University of Edinburgh and 2School of Informatics, University of Edinburgh, Edinburgh, UK
R. Benjamin Aldridge1, Matteo Zanotto2, Lucia Ballerini2, Robert B. Fisher2 and Jonathan L. Rees1
1Department of Dermatology, University of Edinburgh and 2School of Informatics, University of Edinburgh, Edinburgh, UK
The “ABCD” mnemonic to assist non-experts’ diagnosis of melanoma is widely promoted; however, there are good reasons to be sceptical about public education strategies based on analytical, rule-based approaches – such as ABCD (i.e. Asymmetry, Border Irregularity, Colour Uniformity and Diameter). Evidence suggests that accurate diagnosis of skin lesions is achieved predominately through non-analytical pattern recognition (via training examples) and not by rule-based algorithms. If the ABCD are to function as a useful public education tool they must be used reliably by untrained novices, with low inter-observer and intra-diagnosis variation, but with maximal inter-diagnosis differences. The three subjective properties (the ABCs of the ABCD) were investigated experimentally: 33 laypersons scored 40 randomly selected lesions (10 lesions × 4 diagnoses: benign naevi, dysplastic naevi, melanomas, seborrhoeic keratoses) for the three properties on visual analogue scales. The results (n=3,960) suggest that novices cannot use the ABCs reliably to discern benign from malignant lesions. Key words: ABCD; non-analytical reasoning; pattern recognition; skin cancer; melanoma; dermatology diagnosis.
(Accepted November 18, 2010.)
Acta Derm Venereol 2011; 91: XX–XX.
Jonathan Rees, Department of Dermatology, University of Edinburgh, Level 1 Lauriston Building, Lauriston Place, Edinburgh, EH3 9HA, UK. E-mail: jonathan.rees@ed.ac.uk
The “ABCD” rule to aid in the diagnosis of early malignant melanoma has recently celebrated its 25th anniversary (1). This mnemonic was introduced in 1985 to aid non-experts’ macroscopic diagnosis of pigmented lesions, and over the years it has been promoted widely in an attempt to facilitate the earlier detection of melanomas (2). In the 25 years since its inception, the use of the ABCD has transitioned from assisting non-expert physicians’ diagnosis, through educating the public in intensive clinician-led programmes, to now being used at the heart of most general public education strategies; with the criteria described on the public websites of the American Academy of Dermatology (AAD) (3), British Association of Dermatologists (4), Australasian College of Dermatologists (5) and European Academy of Dermatology and Venereology (EADV) (6). Although the fundamental four-part ABCD mnemonic has received widespread adoption, it is surprising that there has been little apparent validation of its utility as a general public education strategy.
Given that lay individuals first identify the majority of melanomas and are also responsible for the largest proportion of the delays in its diagnosis, reliable and accurate information is essential to assist them in early detection (7–10). The importance of accurate patient information has recently been further highlighted as, thus far, the numerous strategies initiated to improve screening and the early detection of melanoma have not resulted in a substantial reduction in the proportion of tumours with prognostically unfavourable thickness (11). There is now mounting evidence leading us to question the use of the ABCD rule as a general public education strategy (12, 13). The majority of studies that are cited as providing evidence for the mnemonics’ adoption have either involved clinician-performed assessments (14–18) or intensive lay education (19, 20), and, as we detail below, there are methodological limitations with extrapolating the findings from these studies to general public health promotion; the roles of experience and prior examples.
The role of experience. Since the late 1980s the cognitive processes involved in dermatological diagnosis have been under investigation (21–25), most notably in the laboratory of Geoff Norman and colleagues. At the risk of some simplification, the processes involved in diagnosis can be viewed either as being explicit and based on conscious analytical reasoning or, alternatively, as being implicit and holistic, and hidden from the conscious view of the diagnostician. Dermatologists appear predominately to use the latter non-analytical reasoning, derived through experience of prior examples to identify skin lesions. This overall pattern recognition cannot be unlearnt and thus has a carry-over effect on any attempts at analytical rule application (26–30). In addition, study designs where experts are asked to explain their diagnoses inherently over-emphasize algorithmic reasoning by promoting intentional rather than incidental analytical processing (31). It has even been suggested that experienced clinicians make a diagnosis intuitively, then alter their analytical assessment to fit in with their preconceptions about the relationship between these features (such as the ABCD) and the diagnosis (such as melanoma) (31, 32). Certainly, the only prospective study of dermatologists’ recognition patterns, confirmed that whilst analytical pattern recognition (the ABCD rules) could be used to predict malignancy, it was not actually how dermatologists arrived at the correct diagnosis; instead the experts seemed to use overall or differential pattern recognition (“Ugly duckling sign (33)”) (23).
The role of prior examples
The exact number of prior examples that are required to significantly alter analytical assessments is unknown, but we do know that these carry-over effects do not only apply to seasoned clinicians; novices have been shown to exhibit this bias with only a few prior examples (27, 28, 34). We do not fully understand how novices naturally visually assess skin lesions, but unless it is significantly different from the rest of human visual perception it is unlikely to be by analytical methods. If, as is likely, overall-pattern recognition plays even a small role, the biasing effect of prior examples needs to be controlled for when assessing the analytical ABCD criteria. In the few studies where intensive ABCD education has been demonstrated to have a beneficial effect on lay individuals, overall pattern recognition was not controlled for (19, 20). It is therefore unclear how much of the positive effect can be attributed to the novices’ ability to discriminate the true analytical ABCD criteria rather than their ability to use overall-pattern recognition “learnt” from the example lesions used to demonstrate the analytical criteria during their patient education. Thus far the only randomized control trial testing the ABCD in lay hands showed that it decreased diagnostic accuracy and was not as effective as a pattern recognition education strategy (12).
In this particular context any screening or diagnostic tool would ideally have the following three criteria: inter-observer variability should be minimal; variation within a diagnostic class should be small; and the inter-diagnosis differences should be significant. In the present study we set out to examine these three criteria experimentally, assessing the three subjective analytical properties (the ABCs) of the ABCD rule.
METHODS
Study image selection
Forty digital images of pigmented skin lesions were selected randomly from 878 relevant images in the University of Edinburgh Department of Dermatology’s image database. The Department’s image library (comprising over 3,500 images) has been prospectively collected for ongoing dermato-informatics research investigating semi-automated diagnostic systems and non-expert education. The 40 images selected for this experiment were stratified so that there were 10 images from each of the following four diagnostic classes: benign naevi, dysplastic naevi (defined as lesions with histological atypia), melanomas and seborrhoeic keratoses. All the images had been collected using the same controlled fixed distance photographic set-up; Canon (Canon UK Ltd, Reigate, Surrey, UK ) EOS 350D 8.1MP cameras, Sigma (Sigma Imaging UK Ltd, Welwyn Garden City, Hertfordshire, UK) 70 mm f2.8 macro lens and Sigma EM-140 DG Ring Flash at a distance of 50 cm. The lesions were cropped from the original digital photograph, each to the same resolution (600 × 450 pixels) and displayed in a 1:1 ratio (approximately equivalent to seven times magnification from the original 50 cm capturing distance) in the custom-built experimental software.
Software design and implementation
A purpose built programme was created to allow the three subjective criteria of the ABCD rule (Asymmetry, Border Irregularity and Colour Uniformity) to be tested remotely over the internet, to limit any investigator-related interaction bias. Diameter was felt not to be suitable for assessment as the images were magnified and thus would have required the placement of a relative 6 mm marking scale next to each lesion before asking each participant to comment which was longer; the lesion or the measuring scale, which would not have been instructive. The programme was written in JavaScript and PHP, and after entering their demographics the participants were given a set of instructions stating how to use the ABC criteria and the software. The instructions were based on the verbal descriptors of the ABC criteria taken from the website of the AAD (3). After confirming that they had read and understood the instructions the subjects assessed each of the 40 images in turn for asymmetry, border irregularity and colour uniformity on a 10-point visual analogue scale (VAS). A screen shot of the software can be seen in Fig. 1. At any stage the subjects could click on a “Help” button to redisplay the instructions and verbal descriptors of the ABC.
Fig. 1. A screen “snapshot” of the purpose-built software used to record the 33 subjects’ assessments of the three ABC properties. The subjects scored each of the three properties on the 10-point visual analogue scales that were displayed to the right of the image, by moving the slider to the desired level.
To minimize any lead bias or fatigue effect, the software was programmed to randomize the order in which the subjects undertook the 40-lesion assessment, so that it was different for each participant. In addition to the verbal descriptions of the three properties assessed, to further reduce inter-rater differences visual anchor points were integrated into the software at the mid-point and either ends of the rating scale. These anchor points were taken from the ABCDE patient guidelines on the SkinCancerNet “Melanoma: What it Looks Like” webpage produced in conjunction with the AAD (35). As we were interested in the lay public’s ability to discriminate analytically the three properties and not their ability to use their innate non-analytical reasoning to match or distinguish from real-life examples, only the caricatured diagrams from the SkinCancerNet website were used as the high-end anchor points, rather than example pictures of melanomas (Fig. 2). The mid- and low-end anchor points were computer-generated to complete the VAS.
Fig. 2. A screen “snapshot” taken from the SkinCancerNet website (35), demonstrating the caricatured images that we used as the anchor points for the visual analogue scales in our software. The pictures on the right were used as they demonstrate the analytical criteria of the ABC, but without facilitating any non-analytical pattern recognition that could have developed if the “real-life” images had been used.
Subjects
An open e-mail request containing the URL link to the study was distributed to MSc students of the University of Edinburgh’s School of Informatics, inviting them to personally undertake the study and forward the invitation on to non-medical acquaintances who also might be willing to participate. A total of 33 lay subjects agreed to participate without remuneration. Sex distribution was split with 21 males and 12 females (64% male). Mean age was 34 years (age range 17–62 years). None of the subjects had a personal history of skin cancer.
Statistics
The subjects’ responses were recorded automatically by the programme then exported into “R for MacOS” for graphing and statistical analysis (36).
Ethics
NHS Lothian research ethics committee granted permission for the collection and use of the images, and additional permission for the use of students in this research was granted through the University of Edinburgh’s “Committee for the use of student volunteers”.
RESULTS
The full results of all 3,960 analytical VAS scores attributed in the study (33 subjects × 40 lesions × 3 “ABC” properties) are graphically displayed in Fig. 3a–c, with each property presented in a separate plot. At first glance these plots may seem complicated, but they have the virtue that every data-point from the experiment is presented and the variability in scoring can be instinctively assessed. In explanation: each horizontal coloured bar represents an individual subject’s score for a specific lesion, with each of the 40 lesions displayed in individual columns across the x-axis. These 40 lesions’ columns are grouped into different colours according to their diagnostic classes (green = the 10 benign naevi, orange = the 10 dysplastic naevi, red = the 10 melanomas, blue = the 10 seborrhoeic keratoses). The overall median score for each diagnostic class is demonstrated by the large horizontal black bar, straddling each of the four coloured series of 10 columns.
Fig. 3. The full results of all 3,960 comparisons undertaken are split according the ABC properties assessed into three plots (a: asymmetry; b: border irregularity; c: colour uniformity). Each horizontal bar represents an individual’s score. The 40 lesions assessed are displayed in columns across the x-axis, grouped by their diagnostic classes (green = benign naevi, orange = dysplastic naevi, red = melanomas, blue = seborrhoeic keratoses). The median score for each diagnostic class is demonstrated by the large horizontal black bar.
The inter-person variability in assessing each of the three ABC properties for any single lesion is represented by a single column’s vertical spread across the y-axis. Within a specific diagnostic class the variability in scores is demonstrated by the differences in vertical spread between the 10 uniform coloured columns. The variability between the four diagnostic classes is appreciated by the differences in the overall distributions between the four coloured groups and further enforced by the differences in their median scores indicated by the horizontal black bars. The results are further summarized in Table I.
Table I. Medians, interquartile ranges and 90th percentiles of the ’ABC’ visual analogue scale (VAS) scores for each diagnostic class
Lesion class |
Benign naevi |
Dysplastic naevi |
Melanomas |
Seborrhoeic keratoses |
Asymmetry VAS scores (0 = symmetrical, 10 = asymmetrical) |
||||
Median |
3.66 |
4.55 |
4.83 |
4.77 |
IQR |
4.96 |
4.88 |
5.94 |
4.07 |
90th percentile |
8.93 |
8.97 |
9.52 |
8.09 |
Border irregularity VAS scores (0 = regular, 10 = irregular) |
||||
Median |
2.37 |
3.77 |
3.68 |
2.72 |
IQR |
4.61 |
5.05 |
6.46 |
4.08 |
90th percentile |
8.27 |
8.93 |
9.78 |
7.52 |
Colour uniformity VAS scores (0 = single uniform colour, 10 = multiple or non-uniform colour distribution) |
||||
Median |
3.92 |
4.83 |
5.63 |
4.92 |
IQR |
5.00 |
4.88 |
5.59 |
4.23 |
90th percentile |
8.26 |
9 |
9.59 |
8.34 |
VAS: visual analogue scale; IQR: interquartile range.
Whilst it is possible to appreciate the small, albeit significant (Kruskal–Wallis = p < 0.0001), difference between the four diagnostic groups’ scores, what is far more striking is the substantial spread in the scores attributed to the same lesion by the 33 subjects and the further variation in scoring between the 10 lesions within the same diagnostic class for all three of the subjective ABC properties. Additional data analysis demonstrates that a similar substantial variation exists within the 10 scores that each individual attributed to the lesions within the same diagnostic class (data not shown).
The inter-person and intra-class variations can better be appreciated by specific examples (see Fig. 4, which is an enlarged view of the highlighted section of Fig. 3b). In Fig. 4 it can be seen that lesion 3 (blue arrow/highlights), which had the largest inter-person variation (interquartile range (IQR) = 4.86) of the 10 lesions within the melanoma class had a range of “border irregularity” scores attributed by the 33 subjects from 0.7 to 10 with a median score of 6.6, and lesion 7 (cyan arrow/highlights), which had the least inter-person variance (IQR = 1.93), had a range from 0 to 5.5 with a median score of 1.3.
Fig. 4. An enlarged display of the highlighted section of Fig. 3b, showing all the border irregularity visual analogue scale (VAS) scores for the 10 melanoma lesions. The lesions with the highest variation (lesion 3, interquartile range (IQR)=4.86) and lowest variation (lesion 7, IQR=1.93) are further highlighted, in blue and cyan, respectively, to demonstrate the large spread of scores attributed to lesions within the same diagnostic class. For lesion 3 it can be seen that the range of scores attributed by the 33 subjects was 0.7–10, with a median of 6.6, and for lesion 7 the range was 0–5.5, with a median of 1.3.
DISCUSSION
Our principal motivation for the current work was the observation that the original rationale and justification for the ABCD approach had slipped from the primary target group of physicians to the lay public, with little supporting evidence to justify this transfer. In the absence of empirical evidence of effectiveness, there are at least two theoretical reasons to be suspicious of this approach. First, there is an increasing body of evidence that experts are not necessarily able to explicitly state the basis of their own expertise in a way that is simply transferable (31, 32). Secondly, that previous testing of ABCD had not controlled for prior exposure (14–20), meaning that prior accounts of the utility of the ABCD may have reflected prior experience and knowledge rather than the implementation and use of the criteria themselves (26–30).
Other relevant considerations are that, whereas experts may be able to use the criteria on appropriate subclasses of lesions (i.e. melanocytic naevi and melanomas), available evidence suggests that distinguishing primary melanocytic lesions from mimics (such as seborrhoeic keratoses) requires considerable expertise (37–39). Finally, the only large-scale randomized controlled trial (RCT) in this area comparing ABCD approach with those based on pattern recognition provided little support for the use of the ABCD criteria (12).
The approach we took was an experimental one attempting to delineate the characteristics of the ABCD rules on a test series of lesions. Our rationale was that for the ABCD system to be capable of guiding diagnosis, it would have to have certain statistical properties: different diagnostic groups needed to score differently, and variance between persons for the same lesion and within diagnostic groups needed to be small. Within the limits of our test situation and the images randomly chosen, it is self-evident that these criteria were not met. Different people judged the same lesion very differently, and although the medians of different diagnostic groups differed, the overlap was considerable. Looking at Fig. 3, it is difficult to imagine being able to choose any criteria based on ABCs that would usefully discriminate suspicious from banal lesions.
There are limitations to our approach. Our subjects were not chosen at random, and were highly educated, computer literate, and probably above-average at abstract and analytical reasoning. We do not consider that this is a reason to doubt the generalizability of our conclusions. Secondly, we included not just primary melanocytic lesions, but mimics, such as seborrhoeic keratoses. The justification for this is simply that non-experts and many physicians are not able reliably to distinguish between these classes of lesions. In practice the ABCD criteria are applied (incorrectly) to various diagnostic classes: we needed to account for this. Thirdly, as the subjects undertook the experimental task remotely over the internet we were unable to assess the “effort” they applied to their scoring. However, because we randomized the order in which each subject assessed the 40 lesions any fatigue effect would have been minimized. Indeed, close inspection of the data demonstrates that whilst there is substantial overlap in scoring between (and within) the diagnostic groups, the subjects’ scoring was not random; individual lesions each had (to varying degrees) distinct scoring patterns.
We cannot say whether, if subjects had undergone intense education in the use of the ABCD approach, the results might have been different. In practice, however, the promulgation of the ABCD criteria via web sites and patient leaflets provides little opportunity for such intense education. We also suggest that previous studies of the ABCD approach have been methodologically compromised because of failure to control for prior exposure during the teaching phase. Training persons in the use of the ABCD inevitably means exposure to test images: during this, albeit minimal exposure, pattern recognition skills will develop, and it is a mistake to believe that any change in performance pre- and post-test is due to the ABCD system rather than other learning. To make any other conclusion requires a degree of experimental control that has been lacking in prior work.
We also acknowledge the multiple modifications to the basic ABCD mnemonic that have been suggested over the years to “improve” its functionality (40–47), and accept it is now commonplace to use the ABCDE criteria (although we note there is a wide variety of “E”s suggested; evolving, enlarging, elevated, erythema, expert). However, in light of the fact that there is now further evidence that untrained novices cannot use the analytical ABC criteria effectively, should the public education message not be simplified to include only “evolving” (i.e. change). Such a simplification has previously been suggested by Weinstock (47, 48) and has independently been found to be the most important predictor of melanoma in patient-observed features (13).
Finally, given the changing epidemiology of malignant melanoma and the importance of early presentation by patients and early diagnosis by physicians, our criticisms of the ABCD approach are not meant to disparage attempts to develop alternative strategies. In this respect we note the work of Grob and co-workers (12), who have used approaches based on fostering pattern recognition skills for laypersons. Our own work has also suggested that the use of images and a structured database may enhance diagnostic skills of laypersons, although in the context of malignant melanoma, such systems need far more testing before being promoted as being clinically useful (49, 50).
ACKNOWLEDGEMENTS
The work was supported by The Wellcome Trust (reference 083928/Z/07/Z) and the Foundation for Skin Research (Edinburgh). We are also grateful to the advice and assistance given by Karen Roberston and Yvonne Bisset (Department of Dermatology, University of Edinburgh) regarding the photographic capture and preparation of the digital images. We also recognize the contribution of Nikolaos Laskaris (School of Informatics, University of Edinburgh) who undertook some of the preliminary programming as part of his MSc thesis (51).
The authors declare no conflicts of interest.
REFERENCES