Reporting study results in JRM should follow CONSORT (Consolidated Standards of Reporting Trials) for clinical trials with its extension to non-pharmacological treatments and STROBE (Strengthening the Reporting of Observational studies in Epidemiology) for observational studies [1-3]. Both CONSORT and STROBE include statistical guidelines that authors in JRM should adhere to when applicable.
Study size
Explain how the study size was arrived at. If a formal calculation was conducted when the study was planned, the authors should present details on outcome measures and quantities that were used in the calculation and the resulting target sample size per study group. Since confidence intervals for the outcome measures reflect the precision that was ultimately obtained in the study, there is no need for post hoc justifications for study size or retrospective power calculations.
Statistical methods
Describe the statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results. References for advanced or specialized statistical methods should be to standard works when possible (with pages stated) rather than to papers in which the design or methods were originally reported. Specify any general-use computer programs (including version no.) used.
Specify which statistical methods that were used for group comparisons of primary and secondary outcome measures, and those used to adjust for differences between the groups compared. Specify whether adjusted analyses were planned or suggested by data, describe how the adjustment variables were selected and how continuous covariates were handled. When the outcome measures are rating scales, methods appropriate for ordinal data are strongly recommended, such as nonparametric methods for direct group comparisons in treatment evaluations or Rasch analysis for development of new scales and evaluations of existing ones (see Guidelines for Rasch).
Describe any methods used to examine subgroups and test for interactions. Specify whether the subgroup analyses were pre-specified or decided post hoc. Explain how repeated measurements from one participant were handled, how clustering of data by care providers or centers was addressed and how missing data were handled. Describe any actions undertaken to assess the adequacy of the chosen methods, model fit and robustness of results to alternative analysis strategies or alternative statistical methods. Describe how inflated risks for positive findings were addressed if multiple outcomes were evaluated or interim analyses undertaken.
Participation
A diagram, such as the CONSORT flow diagram, showing the participant flow increases transparency and is therefore recommended not only for clinical trials but also for observational studies. Report participation rates and give reasons for nonparticipation at each stage of the study.
Background and baseline data
Present background and baseline characteristics (e.g. demographic, social and clinical data) of participants in each study group in a table. Give enough details such that the comparability of the groups and the risk of confounded results in direct group comparisons can be judged. Statistical tests of background or baseline characteristics can mislead the reader since they do not provide guidance about the magnitude of the confounding that could occur due to group imbalances. P-values for such tests should therefore not be presented, unless differences in background or baseline characteristics across study groups are of interest per se (which could sometimes be the case in observational studies but never in randomized clinical trials).
Summarize both the average and the variability for the background and baseline variables in each study group using appropriate statistical measures. Indicate the number of participants with missing data for each variable. Mean and Standard Deviation (SD) should only be used to summarize ordinal quantitative variables, and only if they have a symmetrical distribution. Please do not use “average” but rather "mean" or "median" and please always write SD as “mean (SD)” and not “mean±SD”. Do not use inferential measures such as standard errors and confidence intervals to describe the variability. Use median and percentiles to summarize rating scales and quantitative variables that have an asymmetrical distribution. Presenting both median and mean is sometimes useful to indicate the skewness of the distribution. Appropriate percentiles for describing the variability are 2.5th – 97.5th percentiles for larger groups (>50 individuals) or 25th – 75th percentiles (quartiles) for medium size groups (20-50 individuals). Range (min and max) can be used instead of percentiles for small groups (
Outcomes
Present the average outcome, using mean, median or the proportion of events as appropriate, for all primary and secondary outcome measures in each study group together with the confidence intervals (usually with 95% confidence level) reflecting the statistical precision. Report estimated effect sizes (e.g. mean or median differences as appropriate) together with confidence intervals. Effect sizes based on mean values and standardized by SD are usually not feasible for rating scales; non-parametric analogues or more clinically relevant measures of the treatment effect should be considered. For binary outcomes presentation of both absolute and relative effect sizes is recommended (e.g. both risk differences and relative risks). Always present unadjusted effect sizes, and supplement them with adjusted estimates if such are warranted due to group imbalances. P-values can be provided in addition to confidence intervals but should not replace them. The actual p-value should always be reported, at least if in range 0.001 – 0.30. Reporting p-values below 0.001 as “p<0.001” and p-values above 0.30 as “p>0.30” may however increase readability in tables. Present p-values with one or at most two meaningful digits. Never report (or refer to) results simply as “significant” or “non-significant”.
Interpretation
Rather than reported as “significant” or “non-significant”, results should be interpreted in the context of the type of study, consistency of results and other available evidence. It should be acknowledged that p-values of the order 0.05 (above or below), or null effects just outside the 95% confidence interval, need not provide strong evidence against the tested null hypothesis, but it is reasonable to say that p-values of order 0.001 does [4]. Bias or confounding should however always be considered.
Absence of evidence, i.e. a finding where the null effect is clearly included in the confidence interval and where the corresponding p-values is high, is not necessarily evidence of absence [5]. Reported confidence intervals indicate the magnitude of the effect sizes, group differences or treatment effects that can be ruled out with high certainty if there is no bias or confounding. Important group differences can seldom be ruled out with certainty in small studies.
References
[1] Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340: c869.
[2] Boutron I, Moher D, Altman DG, Schulz KF, Ravaud P. Extending the CONSORT statement to randomized trials of nonpharmacologic treatment: explanation and elaboration. Ann Intern Med 2008; 148: 295–309.
[3] Vandenbroucke JP, von Elm E, Altman DG, Gotzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Ann Intern Med 2007; 147: W163–194.
[4] Sterne JA, Davey Smith G. Sifting the evidence-what's wrong with significance tests? BMJ 2001; 322: 226–231.
[5] Altman DG, Bland JM. Absence of evidence is not evidence of absence. BMJ 1995; 311: 485.