×

Where is the next generation of medical educators?

In reply: We thank Hart and Pearce for supporting the views raised in our editorial, noting the unmet demand for medical education expertise.

We also thank Kandiah for his response, and agree that medical graduates should be “clinically competent, reliable, keen to learn and show compassion to patients and colleagues”. We believe this outcome is best achieved by strong collaborations among “skilled clinicians and excellent mentors” and medical educators, many of whom are also practising clinicians. Clinicians provide critical input to ensure the validity and authenticity of what is taught and assessed, and are an essential element of the “triad” of patient, student and clinician in clinical learning.1 Collaboration between clinicians and medical educators is not difficult because they are often embodied within the same people.

The question of proof in medical education is the subject of much activity and, as Kandiah notes, there is an increasing output of scholarship in medical education. Moreover, the quality and rigour of this output is increasing, with a growing evidence base for medical educational practice.2 Generating new knowledge and applying it to medical student education is a key goal of an increasingly professionalised medical education community.

Medical education research is confounded by multiple factors, not the least being the powerful and uncontrolled effects of the diverse clinical environments in which students learn and practise as graduates.3 These make causal pathways difficult to unpick. While researching the effect of medical educators may be desirable, we believe that researching the effects of medical education interventions is more fruitful. For example, if one were to substitute medical educators with radiologists, how could one “prove” that radiologists have improved the health of the Australian population? Yet we are convinced that radiology does play an important role, based on multiple individual studies showing contributory evidence for this claim.

We welcome opportunities to work with health services and the community to examine the long-term performance of our students and their impact on the health system. Collaboratively defining and answering specific questions is likely to be much more productive than making artificial distinctions between clinicians and educators.

Neuropsychology beyond psychometry

WHEN ONCE ASKED about the qualities of a good clinician, I replied that, as well as a fundamental interest in the human condition and the skills to fully appreciate the meaning of people’s stories, good clinicians should be good storytellers themselves.

Why good storytellers? History-taking requires much more than a few words jotted down or typed out — it needs compassion and understanding, informed by knowledge, skill and experience. To understand the patient’s story for a diagnosis, to refer a patient to colleagues, and most importantly, to tell the patient what is going on, and what comes next.

New Zealander Dr Jenni Ogden, one of the world’s foremost clinical neuropsychologists, is well worth listening to. Her compassion, care, experience, and supreme interest in the human condition and its stories is evident. Her technical exposition, and choice of references, covering the fundamentals of cognitive dysfunction and its impact, hits the mark.

The book takes the reader chapter by chapter, and case by case, through the most important aspects of clinical neuropsychology. All chapters, many of them topical, reward the reader. In “Just a few knocks on the head”, two 16-year-old New Zealand boys, aspiring to play elite level rugby union, have their lives affected deeply by repeated concussions — a subject currently much debated. But Dr Ogden leavens the story by taking two young men from the opposite sides of the tracks and interweaving their experiences, with an unexpected turn of events: there is good news at the end.

The poignant “The long goodbye” tells of Sophie, a young wife and mother looking after her own mother with Alzheimer disease, but then realising she has the same symptoms. Dr Ogden does not spare our feelings as she sets out the disease processes and abnormal neurological functioning, alas still incurable, but helps us to empathise with the intergenerational pain of this family and its future.

The book shows what good clinical neuropsychology is all about. Discussing Sophie’s psychometric test results, Dr Ogden states: “Sophie’s psychologist made the mistake of thinking that an average score is a normal score, whereas she should have compared the scores with an estimate of Sophie’s premorbid abilities.” This “salient lesson” shows that, through clinically relevant information gathering, diagnoses and differential diagnoses, and through suggesting investigations and possible treatment, good clinical neuropsychologists go beyond mere psychometric practice.

Dr Ogden brings great depth to understanding cognitive disability in this book. Anyone with even a passing interest in the brain and mind (meaning, any reader of the MJA) will benefit from her book — it is great value for money.

Risk factors for recurrent Mycobacterium ulcerans disease after exclusive surgical treatment in an Australian cohort

Mycobacterium ulcerans causes necrotising lesions of skin and soft tissue. The major disease burden is found in tropical climates, mainly in Africa, but cases have been reported from 33 countries worldwide.1 It is endemic in both the temperate south-eastern region and tropical areas of north-eastern Australia, where cases have recently been increasing.2

Traditionally, wide surgical excision of lesions was the recommended treatment for M. ulcerans disease, as antibiotics were felt to be ineffective.3,4 However, recurrences are common with surgical treatment alone, occurring in 16%–30% of cases,58 and patients often require multiple operations, resulting in significant morbidity, time in hospital6,9 and cost to achieve cure.10 Recently, antibiotics have been shown to be highly effective in sterilising lesions and preventing recurrences when used alone1113 or combined with surgery.5 The World Health Organization now recommends combined antibiotic treatment for 8 weeks as first-line therapy for all M. ulcerans lesions, with surgery reserved to remove necrotic tissue, cover large skin defects and correct deformities.14

Nevertheless, especially in resource-rich settings where surgical services are readily available, exclusive surgical treatment still has a role for patients unable or unwilling to take antibiotics and those preferring the more rapid healing of small lesions that surgical excision and direct closure enables, compared with the often prolonged healing of lesions treated with antibiotics alone.11,13 In assessing a patient’s suitability for exclusive surgical treatment, it is important to understand factors that increase the risk of recurrence. Previous studies have reported such risk factors,57 but these analyses were univariable and did not control for other potentially confounding factors that may have influenced outcomes. Using data from an Australian observational cohort of patients with M. ulcerans infection from Victoria’s Bellarine Peninsula, we performed a multivariable analysis to further describe risk factors for recurrence after exclusive surgical treatment.

Methods

Data on all patients with confirmed M. ulcerans disease managed at Barwon Health were collected prospectively from 1 January 1998 to 31 December 2011. All patients who received exclusive surgery without prior antibiotics were included in the study. Patients were selected for surgery by the treating clinician’s choice rather than by specified criteria. The study was approved by the Barwon Health Human Research Ethics Committee.

Definitions

An M. ulcerans case was defined as the presence of a lesion clinically suggestive of M. ulcerans plus any of: a culture of M. ulcerans from the lesion; a positive polymerase chain reaction (PCR) test result from a swab or biopsy of the lesion; or a necrotic granulomatous ulcer with the presence of acid-fast bacilli consistent with acute M. ulcerans infection on histopathological examination of an excised lesion.

The position of a lesion was defined as distal if on or below the elbow or knee. Exclusive surgical treatment was surgical excision alone, without adjunctive antibiotics. A major excision involved use of a split skin graft or vascularised skin and tissue flap to cover the defect. Positive margins were defined as the presence of granulomatous inflammation or necrotic tissue extending to one or more surgical excision margins on histopathological examination. Immunosuppression was defined as current treatment at any dose with immunosuppressive medication (eg, prednisolone) or presence of an active malignancy.

Treatment failure was defined as disease recurrence within 12 months of follow-up. Recurrence was defined as a new lesion appearing in the wound, locally or on another part of the body that met the M. ulcerans case definition. If a patient had recurrent lesions that were treated with surgery alone, it was included as a further treatment episode.

Statistical analysis

Data were collected using EpiInfo 6 (Centers for Disease Control and Prevention) and analysed using Stata 12 (StataCorp). Outcome data were censored at the time of disease recurrence, up to 12 months of follow-up from surgical treatment or until 31 October 2012.

A random-effects Poisson regression model designed to account for correlation between treatment episodes in a single patient was used to assess rates of and associations with treatment failure. Crude rate ratios for all identified variables were determined by performing univariable analyses.

An initial multivariable analysis was performed using the a priori variables of sex and age. All variables showing strong evidence of an association with treatment failure in the crude analysis (P ≤ 0.10) were then included (labelled major effect variables). The variable “duration of symptoms before diagnosis” was strongly associated with treatment failure on univariable analysis but, due to missing data, was not included in the multivariable model. All remaining variables were assessed but not included in the multivariable model as they showed evidence of multicollinearity with the major effect variables. P values were determined by the likelihood ratio test. A multivariable Poisson regression model including only first episodes of treatment was also performed to test whether associations persisted when multiple episodes in individual patients were excluded.

Results

Of 192 patients with M. ulcerans infection treated at Barwon Health during the study period, 50 (26%) had exclusive surgical treatment of an initial lesion. Baseline characteristics of patients and lesions are shown in Box 1. The median age of patients was 65.0 years (interquartile range [IQR], 45.5–77.7 years). Four patients had immunosuppression: two were taking prednisolone for polymyalgia rheumatica or eczema, and two had cancer (prostate and oesophagus). Where it was known for a patient’s first lesion, the median duration of symptoms before diagnosis was 46 days (IQR, 26–90 days). No patients were lost to follow-up.

There were 58 treatment episodes: 45 patients had one treatment episode and four patients had two episodes. One patient (who was initially treated in 2002, before use of antibiotics for recurrences increased) had five surgical treatment episodes, each followed by a recurrence. Thirty-seven treatment episodes involved surgical excision and direct closure, 15 included a split skin graft, and six included a vascularised tissue flap.

There were 20 recurrences in 16 patients. The incidence rate was 41.8 (95% CI, 25.6–68.2) per 100 person-years for first recurrences over 38.3 years’ follow-up, and 48.1 (95% CI, 31.0–74.6) per 100 person-years for all recurrences over 41.6 years’ follow-up. The Kaplan–Meier curve for cumulative incidence of first recurrences is shown in Box 2. The median time to recurrence after treatment was 50 days (IQR, 30–171 days) for first lesions and 90 days (IQR, 33–171 days) for all lesions. Recurrence involved a lesion ≤ 3 cm from the original lesion in 13 cases, and > 3 cm in nine (two patients had recurrences both ≤ 3 cm and > 3 cm from the original lesion).

On univariable analysis, factors associated with treatment failure after surgery were age ≥ 60 years, distal lesion position, positive margins, immunosuppression and duration of symptoms before diagnosis of > 75 days (Box 3). On multivariable analysis, positive margins and immunosuppression remained strongly associated with treatment failure (Box 3). The multivariable Poisson regression model including only first episodes of treatment showed the strength of these associations persisted when multiple episodes in individual patients were excluded (data not shown).

Discussion

In our study, recurrence of M. ulcerans disease occurred in about a third of patients treated with surgery alone. This proportion is slightly higher than reported in studies from Africa (17%–22%)6,7 and northern Australia (11%).15 In previous studies, we found adjunctive antibiotics were associated with a reduced risk of recurrence compared with surgery alone, especially if there were positive histological margins or patients had major surgery.5,16 Therefore, we recommend antibiotics as first-line therapy for M. ulcerans infection. However, there are patients in whom antibiotics are contraindicated, not tolerated or declined. We found 68% of patients were cured with a single surgical procedure, suggesting a role for exclusive surgical treatment as a potential alternative to antibiotics in selected cases. This study provides further prognostic information to aid decision making when considering whether surgery alone is appropriate.

We found that positive histological margins were associated with nearly an eightfold increased rate of treatment failure after surgery alone. This is likely due to incomplete excision of mycobacteria from the initial lesion, and the immune system being unable to clear them. A study from Africa similarly reported increased recurrence rates when excision was macroscopically incomplete.6 Even when excisions are performed with wide margins of macroscopically normal tissue, evidence of infection extending to excision margins is found on microscopy and PCR testing in most cases.17 Hence, we believe that histological examination of the excision margins to ensure they are free of signs of inflammation or infection is important to reduce the risk of recurrence. Nevertheless, M. ulcerans can spread subclinically from the initial lesion, including to non-contiguous body parts,16,18,19 as shown in our study by 45% of recurrences occurring > 3 cm from the original lesion. These distant foci will not be removed by wide excision of the initial lesion alone.

Studies from Africa have found increased recurrence rates in young patients (< 16 and < 30 years).6,7 In our region, the disease affects mainly older adults,2 and there were not enough children in our patient population to examine this association. However, univariable analysis showed a 14-fold increased risk of recurrence in patients ≥ 60 years old. An increase in the point estimate remained on multivariable analysis, but the evidence for an association was not strong. Nevertheless, it is plausible that, in older patients, a weakened immune system would allow more subclinical dissemination and thus greater risk of recurrence with surgery alone, and a study with greater patient numbers may find a stronger association. If true, this may explain the slightly higher recurrence rates seen in our older population compared with those reported from Africa. Our data suggest that until more evidence is obtained, caution should be exercised in treating patients aged 60 years with surgery alone.

We found immunosuppression was associated with a sixfold increase in recurrence rates, which we believe is the first report of this association. This is biologically plausible, as T-cell immunity plays an important role in clearance of M. ulcerans,20,21 sometimes clearing infection in the absence of medical treatment.22 Patients with an attenuated immune response may have an increased risk of recurrence. This is supported by evidence from Mycobacterium tuberculosis treatment, where it has been shown that HIV-related immunosuppression is a risk factor for recurrence after treatment.23

Similar to a study from the Ivory Coast,7 our univariable analysis found an increased rate of recurrence when the duration of symptoms before diagnosis was longer than 75 days. This may relate to potentially increased dissemination of mycobacteria from a lesion when present in a clinically recognisable form for longer durations.

On multivariable analysis, there was a trend toward reduced recurrence risk with proximal lesions (P = 0.07), which may have been strengthened with greater patient numbers. Although this association may be due to chance, other possible reasons include improved local immunity in proximal body regions; proximal body parts being more frequently covered, potentially reducing exposure to M. ulcerans or inhibiting its growth through higher skin temperatures;21 or wider excision margins being obtained due to easier closure of proximal wounds.

Our study has several limitations. First, it is observational and there may be other unmeasured confounders that could affect the validity of the findings. Second, the number of patients was small, affecting the power of multivariable analyses to detect weaker associations with identified variables. Third, there were no data on lesion size, so we could not measure its effect on outcomes. However, the included data on the type of surgery broadly separates small and large lesions, as small lesions are amenable to excision and direct closure, whereas larger lesions require split skin graft or vascularised flaps. Finally, missing data prevented testing the strength of the association of duration of symptoms before diagnosis in the multivariable model, thus weakening conclusions regarding its effect.

In conclusion, recurrence rates after exclusive surgical treatment of M. ulcerans infection in this Australian cohort are high, with increased rates associated with immunosuppression or positive histological margins on excised lesions. Our findings suggest that patients aged ≥ 60 years and those who have had clinical symptoms longer than 75 days or with distal lesions may also be at increased risk of recurrent disease. Further research to validate these risk factors is recommended.

1 Baseline characteristics of the study population

All patients in cohort

Patients with treatment failure


Patient characteristics

Sex (n = 50)

Female

28 (56.0%)

8 (28.6%)

Male

22 (44.0%)

8 (36.4%)

Age (n = 50)

< 60 years

21 (42.0%)

2 (9.5%)

≥ 60 years

29 (58.0%)

14 (48.3%)

Diabetes (n = 50)

No

48 (96.0%)

14 (29.2%)

Yes

2 (4.0%)

2 (100.0%)

Immunosuppression (n = 50)

No

46 (92.0%)

13 (28.3%)

Yes

4 (8.0%)

3 (75.0%)

Duration of symptoms before diagnosis (n = 44)*

≤ 75 days

32 (72.7%)

7 (21.9%)

> 75 days

12 (27.3%)

7 (58.3%)

Treatment episode characteristics


Lesion site (n = 58)

Upper limb

19 (32.8%)

6 (31.6%)

Lower limb

35 (60.3%)

14 (40.0%)

Torso

4 (6.9%)

0

Proximal

15 (25.9%)

1 (6.7%)

Distal

43 (74.1%)

19 (44.2%)

Not over joint

42 (72.4%)

14 (33.3%)

Over joint

16 (27.6%)

6 (37.5%)

Lesion type (n = 42)

Ulcer

40 (95.2%)

19 (47.5%)

Nodule

2 (4.8%)

0

Major excision (n = 58)

No

37 (63.8%)

12 (32.4%)

Yes

21 (36.2%)

8 (38.1%)

Positive margins (n = 57)

No

37 (64.9%)

5 (13.5%)

Yes

20 (35.1%)

15 (75.0%)


* First episodes.

2 Cumulative incidence of first recurrences for Mycobacterium ulcerans lesions

3 Poisson regression model showing adjusted and unadjusted associations between identified factors and treatment failure

Failure
episodes

Follow-up
(years)

Rate per 100 
person-years (95% CI)

Crude rate ratio
(95% CI)

P

Adjusted rate ratio (95% CI)

P


Sex

Female

8

22.8

35.1 (17.5–70.1)

1

1

Male

12

18.8

64.0 (36.3–112.7)

1.23 (0.21–7.02)

0.82

0.52 (0.19–1.39)

0.20

Age

< 60 years

2

19.5

10.2 (2.6–40.9)

1

1

≥ 60 years

18

22.0

81.7 (51.4–129.6)

13.84 (2.21–86.68)

< 0.01

3.21 (0.65–15.88)

0.12

Lesion type

Ulcer

19

24.7

76.8 (49.0–120.4)

Nodule

0

2.0

Lesion site

Upper limb

6

13.9

43.1 (19.4–95.9)

1

Lower limb

14

23.7

59.2 (35.1–99.9)

1.33 (0.23–7.77)

0.19

Torso

0

Lesion position

Proximal

1

13.5

7.4 (1.0–52.8)

1

1

Distal

19

28.1

67.6 (43.1–105.9)

20.43 (1.97–212.22)

< 0.01

4.49 (0.58–34.51)

0.07

Over a joint

No

14

31.0

45.2 (26.8–76.3)

1

Yes

6

10.6

56.6 (25.4–126.0)

2.00 (0.53–7.60)

0.32

Positive margins

No

5

32.6

15.3 (6.4–36.9)

1

1

Yes

15

8.4

178.0 (107.3–295.3)

21.02 (5.51–80.26)

< 0.001

7.72 (2.71–22.01)

< 0.001

Major excision

No

12

27.3

44.0 (25.0–77.5)

1

Yes

8

14.3

56.0 (28.0–111.9)

1.64 (0.28–9.61)

0.58

Diabetes

No

18

41.2

43.6 (27.5–69.3)

1

Yes

2

0.3

603.7 (151.0–2413.9)

9.94 (0.43–227.8)

0.13

Immunosuppression

No

13

39.9

32.6 (18.9–56.1)

1

1

Yes

7

1.7

416.4 (198.5–873.5)

17.97 (4.17–77.47)

< 0.01

6.45 (2.42–17.20)

0.01

Duration of symptoms before diagnosis

≤ 75 days

7

27.1

25.8 (12.3–54.2)

1

> 75 days

7

6.3

111.3 (53.1–233.6)

10.13 (1.76–58.23)

0.02

Direct-to-consumer genetic testing — where should we focus the policy debate?

What are the implications for health systems, children and informed public debate?

Until recently, human genetic tests were usually performed in clinical genetics centres. In this context, tests are provided under specific protocols that often include medical supervision, counselling and quality assurance schemes that assess the value of the genetic testing services. Direct-to-consumer (DTC) genetic testing companies operate outside such schemes, as noted by Trent in this issue of the Journal.1 While the uptake of DTC genetic testing has been relatively modest, the number of DTC genetic testing services continues to grow.2 Although the market continues to evolve,3 it seems likely that the DTC genetic testing industry is here to stay.

This reality has led to calls for regulation, with some jurisdictions going so far as to ban public access to genetic tests outside the clinical setting.4,5 In Australia, as Nicol and Hagger observe, the regulatory situation is still ambiguous;6 regulation is further complicated by the activity of internet-accessible companies that lie outside Australia’s jurisdiction. In general, the numerous policy documents that have emanated from governments and scientific and professional organisations cast DTC services in a negative light, seeing more harms than benefits, and, in some jurisdictions, governments have tried to regulate their services and products accordingly.7,8 Policy debates have focused on the possibility that DTC tests could lead to anxiety and inappropriate health decisions due to misinterpretation of the results. But are these concerns justified? Might they be driven by the hype that has surrounded the field of genetics in general. If so, what policy measures are actually needed and appropriate?

Time for a hype-free assessment of the issues?

Driven in part by the scientific excitement associated with the Human Genome Project, high expectations and a degree of popular culture hype have attracted both public research funds and venture capital to support the development of disease risk-prediction tests.3 This hype — which, to be fair, is created by a range of complex social and commercial forces9 — likely contributed to both the initial interest in the clinical potential of genetic testing and the initial concerns about possible harms. Both are tied to the perceived — and largely exaggerated — predictive power of genetic risk information, especially in the context of common diseases. There are numerous ironies to this state of affairs, including the fact that the call for tight regulation of genetic testing services may have been the result, at least in part, of the hype created by the both the research community and the private sector around the utility of genetic technologies.9 This enthusiasm helped to create a perception that genetic information is unique, powerful and highly sensitive, and specifically that, as a result, the genetic testing market warrants careful oversight.

Now that research on both the impact and utility of genetic information is starting to emerge, a more dispassionate assessment can be made about risks and the need for regulation. Are the concerns commonly found in policy reports justified? Where should we direct our policymaking energy?

It may be true that consumers of genetic information — and, for that matter, physicians — have difficulty understanding probabilistic risk information. However, the currently available evidence does not show that the information received from DTC companies causes significant individual harm, such as increased anxiety or worry.10,11 In addition, there is little empirical support for the idea that genetic susceptibility information results in unhealthy behavioural changes (eg, the adoption of a fatalistic attitude).5

The concerns about consumer anxiety and unhealthy behaviour change have driven much of the policy discussion surrounding DTC testing. As such, the research could be interpreted as suggesting that there is no need for regulation or further ethical analysis. This is not the case. We suggest that the emerging research invites us to focus our policy attention on issues that reach beyond the potential harms to the individual adult consumer — where, one could argue, there seems to be little empirical evidence to support the idea that the individual choice to use DTC testing should be curtailed — to consideration of the implications of DTC testing for health systems, children and informed public debate.

Health system costs

Although genetic testing is often promoted as a way of making health care more efficient and effective by enabling personalised medical treatment, it has been suggested that the growth in genetic testing will increase health system costs. A recent survey of 1254 United States physicians reported that 56% believed new genetic tests will increase overall health care spending.12

Will DTC testing exacerbate these health system issues by increasing costs and, perhaps, the incidence of iatrogenic injuries due to unnecessary follow-up? This seems a reasonable concern given that studies have consistently shown that DTC consumers view the provided data as health information that should be brought to a physician for interpretation. One study, for example, found that 87% of the general public would seek more information about test results from their doctor.13 The degree to which these stated intentions translate into actual physician visits is unclear. But for health systems striving to contain costs, even a small increase in use is a potential health policy issue, particularly given the questionable clinical utility of most tests offered by DTC companies. It seems likely that there will be an increase in costs with limited offsetting health benefits — although more research is needed on both these possible outcomes.

Compounding the health system concerns is the fact that few primary care physicians are equipped to respond to inquiries about DTC tests. A recent US study found that only 38% of the surveyed physicians were aware of DTC testing and even fewer (15%) felt prepared to answer questions.14 As Trent notes, even specialists can encounter difficulties in interpreting DTC genetic tests.1 This raises interesting questions about how primary care physicians will react to DTC test results. Will they, for example, order unnecessary follow-up tests or referrals, thus amplifying the concerns about the impact of DTC testing on costs?

Testing of children

While there is currently little evidence of harm caused by DTC genetic testing, most of the research has been done in the context of the adult population. The issues associated with the testing of minors are more complicated, involving children’s individual autonomy and their right to control information about themselves. Many DTC genetic testing companies include tests for adult-onset diseases or carrier status. Testing children for such traits contravenes professional guidelines. Nevertheless, research indicates that only a few DTC companies have addressed this concern. A study of 29 DTC companies found that 13 did not have policies on the issue and eight allowed testing if requested by a parent.15 While it is hard to prevent parents from submitting samples from minors to genetic testing companies, this calls for an important policy debate on whether there are limits on parental rights to access the genetic information of their children. Current paediatric genetic guidelines recommend delaying testing in minors unless it is in their best interests, but these are not enforceable and not actively monitored.16

In addition, unique policy challenges remain with regard to the submission of DNA samples in a DTC setting. It is difficult for DTC companies to check whether the sample received is from the person claiming to be the sample donor. Policymakers should consider strategies, such as sanctions, that eliminate the ordering of tests without the consent of the tested person.

Truth in advertising

The DTC industry is largely based on reaching consumers via the internet. Research has shown that the company websites — which, in many ways, represent the face of the industry — contain a range of untrue or exaggerated claims of value.17 Advertisements for tests that have no or limited clinical value have a higher risk of misleading consumers, because the claims needed to promote these services are likely to be exaggerated. It is no surprise that stopping the dissemination of false or misleading statements about the predictive power of genetics has emerged as one of the most agreed policy priorities.8 While evidence of actual harm caused by this trend is far from robust, it is hard to argue against the development of policies that encourage truth in advertising and the promotion of more informed consumers. Moreover, the claims found on these websites may add to the general misinformation about value and risks associated with genetic information that now permeates popular culture. Taking steps to correct this phenomenon is likely to help public debate and policy deliberations. For example, this might include a coordinated and international push by national consumer protection agencies to ensure that, at a minimum, the information provided by DTC companies is accurate.18

Conclusion

These are not the only social and ethical issues associated with DTC genetic testing. Others, like the use of DTC data for research and the implications of cheap whole genome sequencing, also need to be considered. But they stand as examples of issues worthy of immediate policy attention, regardless of what the evidence says about a lack of harm to individual adult users. We must seek policies that, on the one hand, allow legitimate commercial development in genomics and, on the other, achieve appropriate and evidence-based consumer protection. In finding this balance, we should not be distracted by hype or unsupported assertions of either harm or benefit.

Deciding when quality and safety improvement interventions warrant widespread adoption

Evaluative criteria are needed to determine the likelihood of successful implementation and acceptable return on investment

Determining when a specific quality and safety improvement intervention (QSII) has sufficient evidence of effectiveness to warrant widespread implementation is highly controversial.1,2 Some large-scale QSIIs have been shown to be less effective than originally thought (Box).38 Reporting guidelines for QSII studies stipulate sufficient detail to allow users to gauge the feasibility and reproducibility of a specific QSII within local contexts.9 Some authors have focused on study designs and statistical methods used to evaluate QSIIs.10 An international expert group has distilled several key themes that researchers should consider and discuss when describing experiences with specific QSIIs.11

While considerable resources are being directed at QSIIs in Australia and elsewhere, recent literature reviews show significant shortcomings in research on QSII effects.1214 Based on these reviews and our experience with various QSIIs, we propose a checklist of evaluative criteria that decisionmakers — clinicians, quality teams, policymakers and statutory bodies — can apply to existing literature relating to specific QSIIs to determine whether they are fit for purpose and whether widespread adoption is justified.

Checklist of evaluative criteria

1. Has the problem to be addressed by the QSII been fully characterised?

What is the problem; where, when and how often does it occur; who does it affect and by how much; what are the predisposing or mitigating factors; and what are the potential levers for remediation? Qualitative and quantitative data are necessary to elucidate the root cause of the problem, which should inform the design of a responsive QSII.

2. Does a sound change theory underpin the intervention?

What individual or organisational behaviour is the QSII trying to change and how will it do this? Many QSIIs are complex, multifaceted, socially embedded, non-linear interventions which vary in their context (target population and setting), content and application (the QSII itself and how it will be delivered), and outcomes. The QSII should have a sound theoretical construct which explains and predicts how it will effect change in care and is fully cognisant of the beliefs and attitudes of target groups.15 Validated theories of behavioural and organisational change need to have been considered in developing a model of change which addresses the key issues listed in Appendix 1.16,17 A review of guideline implementation studies found that only 23% mentioned a theoretical framework, most referring to only one theory.18

3. Has the QSII undergone preliminary testing to confirm proof of concept?

Pilot testing of a QSII should have demonstrated its feasibility and potential benefit, while exposing any “weak links,” learning curves, unanticipated contextual barriers and undesirable consequences. Systematic literature reviews should have been conducted to identify prior experience with similar QSIIs, including field studies or modelling exercises that assess feasibility regarding up-front implementation costs.19

4. Is the QSII standardised and replicable?

Demonstrating successful results from implementation of a single-site QSII does not guarantee generalisability of effect. If a QSII is to be replicated and tested in multiple settings, it must be standardised to some degree. However, strict standardisation may impede local adaptation required for successful implementation. During implementation of the World Health Organization’s Surgical Safety Checklist, which was associated with significant reductions in mortality and complications across eight sites in different countries, local refinement of each step according to perceived need was allowed.20 However, we propose that a QSII implemented in more than one setting should have — as a minimum level of standardisation — common objectives, theoretical framework, target populations and core components.

5. Have the effects of the QSII been evaluated with a sufficient level of rigour?

Measuring the success of a QSII is prone to bias if it relies on qualitative self-reports of individuals directly involved in its design and implementation,21 so externally verifiable outcome data are preferred. Also, outcome measures should be standardised and appropriate, data should be collected accurately and comprehensively, and study designs should minimise the risk of confounding.

Were outcome measures standardised and appropriate?
Well defined and objective patient-important outcomes minimise ascertainment bias3,22 (Appendix 2). QSII studies which report surrogate or intermediate outcomes (eg, change in medication error rates, or compliance with surgical site identification policies) should indicate how strongly such measures correlate with hard clinical end points. For example, improvements in “safety culture” as measured by survey tools show tenuous associations with reductions in patient harm.23

Were data collected accurately and comprehensively? The extent of inaccurate or missing data in many QSII studies is significant, and cherry-picked data from sites that perform better than others are often presented as the generalisable result. In addition, the intervention itself can alter how data are collected. For example, greater pharmacist participation in clinical teams may not only prevent prescribing errors but also unearth previously undetected errors.24

Did study designs minimise the risk of confounding? Investigators should have used study designs which minimise bias (Appendix 2). Randomised studies minimise selection bias in attributing improved patient outcomes to QSII effects. Cluster randomised trials involving multiple sites avoid contamination of control groups within sites. If extensive rollout of a QSII is already occurring or about to occur, then stepped wedge designs which insert randomisation into the phasing of implementation are preferred. Where randomisation is impractical, non-randomised studies (controlled before–after trials, interrupted time series studies, statistical process control charts) can be used.

6. Have the observed effects been reconciled with the underpinning theoretical framework?

Were data that adequately tested the theory collected? Evaluations should have assessed whether what the theory predicted occurred in terms of behaviour change, and whether contingencies were accurately foreseen and responded to. Variables which may, theoretically, have an impact on QSII effectiveness (eg, participant characteristics, intervention intensity, or effect modifiers) should be measured quantitatively and qualitatively.

Were detailed process evaluations reported? Theory-driven process evaluations of QSIIs which describe actual implementation (intervention as performed) versus original intention (intervention as planned) enables users to differentiate between lack of effect due to potentially avoidable implementation failure and across-the-board ineffectiveness. This helps identify instances where no amount of intervention re-engineering is likely to render it sufficiently effective to be worth pursuing. However, process evaluations that suggest good execution of a QSII do not guarantee effectiveness. For example, in one study, educational visits for general practitioners aimed at influencing prescribing practice were well received and associated with high recall, but prescribing behaviour changed little and was constrained by patient preference and local hospital policy.25

7. Has the potential for adverse and unintended effects been evaluated?

The potential for some QSIIs to harm patients should have been considered. For example, it has been suggested that decreasing junior medical staff working hours to reduce fatigue-related errors might increase errors due to greater discontinuity of care and multiple handovers. However, this concern has been allayed.26

8. Have resource use and costs been assessed?

While formal economic analyses of QSIIs are rare, some attempt should have been made to quantify resource use and costs involved in implementation (personnel, equipment, training programs, consumables, etc) to compare investment required with achievable benefits. While cost savings may accrue by minimising expensive safety errors in patient care, QSIIs may incur considerable opportunity costs, as has been claimed for the 100,000 Lives Campaign.27

9. Are QSII effects clinically plausible and consistent?

Studies of QSIIs that report large benefits over short periods are more likely to be true if:

  • prevalence of suboptimal or unsafe care, in the absence of the QSII, was quite high

  • the effects are plausibly explained by the theory underpinning the QSII and supported by process evaluations

  • plausible confounders that would have reduced the observed effect have been accounted for

  • similarly large effects have been observed across multiple studies

  • levels of uncertainty of effect estimates, as expressed by confidence intervals, are relatively small.

10. Has sustainability of the intervention been assessed?

QSIIs should be favoured if there is evidence of sustainability of effects across multiple sites over 2 years or more.

11. Have methodological limitations and conflicts of interest been assessed?

Methodological limitations and possible sources of confounding, particularly for observational studies, should have been openly acknowledged, together with any conflicts of interest involving researchers who may benefit financially from providing QSII consultancies.

12. Is publication bias unlikely?

As it is highly unlikely that every study of a QSII will have returned a positive result, the complete absence of negative studies should raise suspicion of publication bias.

Applying the checklist to a specific QSII

Before applying the checklist to a specific QSII, users must retrieve as much published evidence relating to the QSII as possible. By applying the checklist to this evidence, it is possible to build a profile of the QSII according to the evaluative criteria. Responses to the criteria may be dichotomous (yes or no) or, if the evidence is more subjective and uncertain, graded (using a 5-point Likert scale). Users may also wish to give different weightings to individual criteria depending on how critical they regard them to the overall utility of the QSII. We do not imply that all 12 criteria must attract favourable responses before proceeding with QSII implementation, although we feel that most QSIIs should satisfy Criteria 1–8. If they do not, we recommend that detailed longitudinal evaluations are undertaken as the QSII is implemented in pilot sites.

If a major quality problem is evident and requires urgent remediation, QSIIs that appear promising but have not been extensively evaluated may need to be considered. In such cases, we advise rigorous evaluation during implementation.

The strength of the checklist is that it encourages a structured appraisal of how QSIIs have been developed, implemented and evaluated. As an example, the checklist is applied to evidence around hospital rapid-response teams in Appendix 3. The results indicate that if this checklist had been available some years ago, it may have tempered early enthusiasm for rapid-response teams. The checklist also highlights the need for frequent and systematic evaluations of newly developed QSIIs. In an era of limited resources, the potential effectiveness and likely return on investment of specific QSIIs must be assessed. The checklist may contribute to greater discipline and transparency of investment decisions and help clarify which QSIIs require further refinement and testing before large-scale implementation.

Comparison of initial and later experience of two large-scale quality and safety improvement interventions

Rapid-response teams (RRTs)

RRTs are multidisciplinary teams of medical, nursing and airway management staff charged with prompt bedside evaluation, triage and treatment of clinically deteriorating patients throughout all hospital wards outside intensive care units (ICUs). Their aim is to reduce preventable deaths, cardiac arrest, unplanned ICU admissions and postsurgical complications.

Initial experience: Early trials suggested a large potential benefit of RRTs in reducing unexpected cardiac arrests (by up to 50%), unplanned ICU admissions (by up to 44%), postoperative deaths (by up to 37%) and mean length of hospital stay (by up to 4 days).3,4 As a result of such observations and advocacy for RRTs from the Institute for Healthcare Improvement’s 100,000 Lives Campaign, hundreds of hospitals worldwide have implemented RRTs.

Later experience: The validity of earlier positive observations has been challenged and a meta-analysis of 18 high-quality trials confirmed no reduction in mortality, although cardiac arrest calls were reduced by a third.5

Pay-for-performance (P4P) schemes

P4P schemes involve defined changes in reimbursement to clinical providers (individual clinicians, group practices or hospitals) in direct response to a change in one or more performance measures as a result of one or more practice innovations. Their aim is to incentivise optimal provider performance and improve quality and safety of care.

Initial experience: In the United Kingdom, large-scale implementation of P4P contracts for family practitioners over 12 months was reported in 2006 to have resulted in practitioners achieving a median of 97% of their available points covering quality of clinical care, well in excess of the predicted 75%.6 However, no baseline was established for most indicators. The United States Institute of Medicine and high-profile quality experts recommended greater use of P4P programs to improve quality of care, and by 2009 more than 200 P4P programs covering over 50 million beneficiaries were implemented.

Later experience: A review of 17 studies (12 controlled trials) showed modest improvement (4%–8% absolute increases) in some or all process-of-care measures in five of six studies of clinician-level financial incentives and seven of nine studies of group practice-level incentives.7 Four studies showed unintended adverse effects (gaming, patient exclusion, and tick-box documentation of undelivered care). A 2009 review of P4P schemes in the UK showed that, within 2 years of commencement, there was no further improvement in quality-of-care indicators despite a more than £1 billion budget overrun and a decline in continuity of care.

Emergency surgery model improves outcomes for patients with acute cholecystitis

To the Editor: Reducing the time from presentation to cholecystectomy in patients with acute cholecystitis has been shown to benefit patients (eg, by reducing the duration of patient discomfort before surgery) and to be cost-effective.13 Benefits have also been shown for performing cholecystectomy during the index admission for gallstone pancreatitis.4

Geelong Hospital (in regional Victoria) introduced daily general surgery emergency theatre sessions in February 2011. We compared 401 patients who presented to the emergency department (ED) with acute cholecystitis from February 2008 to January 2011 (control period) with 137 who presented from February 2011 to January 2012 (intervention period). We also compared patients who presented with gallstone pancreatitis — 91 in the control period and 38 in the intervention period. For patients who underwent cholecystectomy during their index admission, we analysed the time of presentation to the ED and time of surgery. Complication rates (for bile duct injury, bile leak requiring intervention, unplanned endoscopic retrograde cholangiopancreatography, mortality or unplanned reoperation) were analysed by medical record review.

We found an increase in the proportion of patients with acute cholecystitis who had a cholecystectomy during their index admission, excluding those who were transferred to the private system, from 53% (199/373) to 72% (94/130) (P < 0.001). We also found a decrease in the median waiting time from patient arrival in the ED to operation for those with acute cholecystitis who had a cholecystectomy during their index admission, from 41.8 to 26.4 hours (P < 0.001). However, there was no significant difference in the complication rate for patients with acute cholecystitis who received a cholecystectomy in the control and intervention periods (P = 0.96).

Patients with gallstone pancreatitis underwent a cholecystectomy after their pancreatitis had settled. Of those who presented with gallstone pancreatitis in the control period, 42% (38/91) had their cholecystectomy during their index admission; this increased to 63% (24/38) in the intervention period (P = 0.03).

The proportion of cholecystectomies (for acute cholecystitis or gallstone pancreatitis) performed after-hours did not increase, despite an increase, from 51% to 70%, in patients receiving cholecystectomy during their index admission. Operations were performed in-hours for 73% (172/237) of those who underwent cholecystectomy during their index admission in the control period and 70% (83/118) of those who underwent cholecystectomy during their index admission in the intervention period (P = 0.15). For both of these groups, the median postoperative length of stay was 2 days (P = 0.67).

These data show that introducing dedicated general surgery emergency theatre sessions improved our ability to perform surgery in a timely manner for patients who presented with cholecystitis or gallstone pancreatitis.

Only the best: medical student selection in Australia

To the Editor: I share Mahar’s concern regarding any future screening of prospective medical students for signs that they are likely to develop mental or physical impairment.1 Although Wilson and colleagues do acknowledge that screening may not be ethical, the separate issue of their seeming conflation of likelihood of illness with impaired ability to practise in the long term is problematic.2 Mental illness, and particularly depression and anxiety, are common disorders in medical students.3

Even in the context of medical schools’ “fitness to practise” procedures, which Wilson et al consider more practical, it is important that criteria and processes for removal of students are not so broad that they can be applied selectively. Further, the tendency for the medical mind to seek to eliminate personified risk factors of future problems should be resisted; further research may not be “paramount”.2 It is not the possibility or probability of developing later illness that matters (dark actions wait at the end of that uncertain path) — individuals should be judged on the acts that demonstrate potential harm to patients.

Students and society will benefit
if the attitude of schools to students suffering illness is one of compassionate support — many students’ conditions may improve. We should also not neglect the current weakness of our own “postmarketing surveillance” — our management of colleagues whose behaviour is far from ideal. We struggle with the fact that they, unlike students, are cloaked with the tribal defence of Fellowship, though this should not change our judgement of their actions — as it certainly does not change their harmful consequences.

Risks of complaints and adverse disciplinary findings against international medical graduates in Victoria and Western Australia

Correction

Incorrect author name: In a letter responding to “Risks of complaints and adverse disciplinary findings against international medical graduates in Victoria and Western Australia” in the Matters Arising section of the 18 March 2013 issue of the Journal (Med J Aust 2013; 198: 256), an error occurred in the second author’s name. The name should have been Tuan V Nguyen.

Should hospitals have intensivist consultants in-house 24 hours a day? – No

Twenty-four-hour coverage is costly, has not demonstrated benefit and diminishes the quality of intensivists’ training

At first glance, proposals for having an in-house consultant intensivist providing 24-hour care have some appeal. It has been suggested that because daily intensivist input improves outcomes in the critically ill, moving from an after-hours consultation service to a 24-hour presence onsite would improve the quality of health care.1 However, this belief is purely speculative and is not supported by data. It is important to recognise that in other areas of medicine, treatments require a certain “dose”, and when given in excess of this dose there is no further improvement. For example, excessive administration of what some may consider relatively benign therapies, such as oxygen, intravenous fluid and enteral nutrition, has no benefit and indeed can be harmful beyond a certain dose. The optimal “dose” of an intensivist remains uncertain.

Before introducing major structural changes to a system, its problems should be identified, and the solution provided should have the potential to fix or ameliorate the problems. Accordingly, if onsite intensivists are the solution, there must be a problem with the current level of care provided to the critically ill, and the problem must be one that intensivists have the capacity to address. Recently, Bhonagiri and colleagues evaluated more than 200 000 patient admissions to Australian intensive care units (ICUs) and observed that after adjusting for severity of illness, patients admitted unexpectedly have similar mortality regardless of whether the admission occurs in-hours or after-hours.2 The investigators did report that patients with planned admissions after undergoing elective surgery were at greater risk of death if they were admitted after-hours when compared with those admitted in-hours. However, a prolonged time spent in theatre (and later admission as a result) is more likely to reflect surgical problems. It is therefore unlikely that an onsite intensivist will influence outcomes in these patients.

A number of ICUs overseas have adopted the model of having a consultant intensivist onsite 24 hours a day. We propose that data from these ICUs will be biased to observe associations with reduced mortality even in the absence of causality. This is based on the likelihood that refusal to admit to ICU on the grounds of futility will be more frequent when intensivists are onsite, thereby reducing ICU mortality while hospital mortality remains unaffected. Further, most studies from these ICUs have evaluated mortality using a before-and-after intervention design. However, ICU mortality appears to be falling over time,3 so using such a study design is biased toward observing a reduction in mortality even when the intervention is ineffective.

Despite these inherent biases, every published study has reported that ICU mortality is unaffected by the presence of 24-hour onsite intensivists. Moreover, the pivotal study in this area evaluated staffing across 49 United States ICUs and 65 752 patient admissions.4 This study reported that in “closed” ICUs (the model used in Australia), mortality was similar whether intensivists were onsite after-hours or available as a consultative service.

An important part of medical training is the progression to independent decision making that is developed when a senior registrar has responsibility for some decisions, but is supported as required by a consultant. In our opinion this skill is a fundamental determinant of subsequent success as an intensivist. The presence of consultant intensivists in-house 24 hours a day will “protect” senior registrars from making independent decisions. Indeed, the whole premise on which this endeavour is based is that all clinical decision making should be effected by the onsite consultant. Junior consultants will subsequently need to acquire these skills without the benefit of senior support.

Australian health care expenditure continues to rise at a rate greater than gross domestic product.5 The cost implication of introducing intensivists onsite 24 hours a day would be substantial, as salary costs for the increased number of consultant intensivists are fixed, whereas any potential reduction in patient bed-days is unrealised unless beds and smaller ICUs are closed. Such closures are often unpopular and may have unforeseen consequences. For these reasons rigorous cost–benefit modelling must be done, particularly as to date there is no sign of benefit from 24-hour onsite intensivists.

In summary, while the mechanisms underlying any proposed benefit of increasing intensivist “dose” are questionable, the intervention will be costly and may adversely affect training. Unless future well designed studies show an actual benefit for patients, hospitals and health care policymakers should resist any attempts to enforce this potentially expensive and ineffective practice.

Comparative effectiveness research — the missing link in evidence-informed clinical medicine and health care policy making

To change practice, we should move beyond trial-based efficacy to real-world effectiveness

Meaningful health care reform requires robust evidence about which interventions work best for whom and under what circumstances. The Institute of Medicine in the United States has estimated that less than 50% of current treatments are supported by evidence and 30% of health care expenditure reflects care that is of uncertain value.1 In studies testing established clinical standards of care, more than half reported evidence that contradicts standard care or is inconclusive.2 Many Medicare Benefits Schedule services lack comprehensive evidence of comparative safety or effectiveness, while many that have been evaluated have been shown to be ineffective, harmful or of uncertain value compared with alternative forms of care.3

Filling the void — the rise of comparative effectiveness research

Comparative effectiveness research (CER) compares new
or existing interventions (or a new dose or intensity of an intervention) to one or more non-placebo alternatives, which may include “usual care”. It can be used to evaluate
a broad spectrum of clinical interventions, including diagnostic tests or strategies, screening programs, surgical procedures, pharmaceuticals, prostheses and medical devices, quality and safety improvement interventions, behavioural change and prevention strategies, and care delivery systems. While CER is not a new process — many past trials have compared different interventions — it represents a new focus and consolidation of approaches
in clinical and health services research.

In 2009, the US Congress authorised $1.1 billion for CER and, in 2010, the Patient-Centered Outcomes Research Institute was established to identify CER priorities and develop appropriate methodologies. In the United Kingdom, the National Institute for Health Research (which was established in 2006) commissions and disseminates CER that informs clinical decision making, overseen by
the National Institute for Health and Clinical Excellence. However, in Australia, there is no comparable group or agency with CER as the prime focus of activity.

For CER to realise its full potential, the research community must accommodate four prerequisites in
the following order.

1. Involvement of all relevant stakeholders in setting the research agenda

Research has often lacked meaningful engagement of health care providers and patients in the choice of research questions and design and implementation of the research effort. Researchers and consumers of research must collaboratively identify important unanswered questions among current systematic reviews and clinical guidelines.4 Questions should be selected for CER on the basis of: the perceived needs of key stakeholders (clinicians, patients and health care managers); factors related to potential impact (eg, disease burden, cost of care and variation in outcomes); paucity of effectiveness data among specific populations; and emerging concerns about undisclosed harm.4 In the US, the Agency for Healthcare Research
and Quality has developed iterative and transparent methods for defining and prioritising future research
needs that involve a wide spectrum of stakeholders
(http://www.effectivehealthcare.ahrq.gov). Quantitative modelling methods which calculate the potential value of information in filling existing gaps in knowledge can also assist in prioritisation.5 The Institute of Medicine has issued an initial list of 100 national CER priorities derived by consensus, which includes patient-level and health system-level interventions.6

2. Flexible approach to evidentiary standards

To be useful, CER must use the best possible data sources and methods to provide credible, timely and relevant evidence. The analytic scope of CER includes reanalysing existing data from available studies (in the form of systematic reviews, meta-analyses or decision analyses)
or, if these fail to provide answers, generating additional data from new studies.

The aim of CER is to determine intervention benefit among unselected patients in real-world practice settings (ie, measure effectiveness), as opposed to doing so among highly selected patients in tightly controlled experiments (ie, measure efficacy). The design and conduct of CER studies must reflect this aim (Box).7

CER encounters the vexed question regarding the relative clinical utility of observational studies versus experimental trials. Randomised controlled trials (RCTs) have high internal validity, but narrow patient selection criteria limit their generalisability. Observational studies use data on care delivered routinely to unselected populations in various settings, but their results are
more vulnerable to confounding and bias owing to the absence of randomisation. The way forward for CER is
to encourage more large-scale, real-world RCTs (pragmatic trials) and more rigorous observational
studies (see Appendix).

In RCTs, the inclusion of as-treated and per-protocol analyses (in addition to intention-to-treat analyses) can help expose patient-specific differences in intervention uptake and response. More head-to-head RCTs that fairly compare appropriately administered alternative interventions are needed. Network (or mixed-treatment) meta-analysis enables direct and indirect comparisons of different treatments to be combined into one synthesis, which enables greater use of all available RCT evidence than traditional meta-analysis.

In observational studies, several design features improve rigour: prospective and standardised data collection, blinded outcome assessors, prespecified matching or stratification of patient groups, and analytic techniques that minimise confounding, such as risk-adjusted regression modelling and interrupted time series analysis. Multiple high-quality studies related to a single question which consistently show large intervention effects that persist after discounting all important sources of bias confer a high level of credibility. High-quality observational studies may fill evidence gaps more proficiently than RCTs in situations where:

  • technologies are rapidly evolving (ie, there are moving targets)

  • technologies cannot easily be randomised

  • no head-to-head trials exist

  • RCTs exclude certain types of patients or conditions

  • information is being sought about modification of treatment effects due to

    • variation in patient adherence and tolerance

    • use of concomitant treatments

    • dosing or intensity of treatments

    • selection or switching of treatments according to provider and patient preferences.

The choice of study design will depend on the context in which CER results will be applied, the necessary level of rigour according to the consequences of incorrect inferences from the sample studied (eg, potential harm or waste from introducing mass screening program versus small-scale niche therapy on the basis of invalid analyses), the feasibility and costs of different study designs, and the urgency of the need for evidence. The overarching goal is to describe methods that, if consistently applied, provide decisionmakers with a reasonable level of confidence that one intervention is more effective than or equally effective as another.

3. Investment in and redesign of research infrastructure

To fill evidence gaps quickly and definitively, CER will require substantially increased investment in current research infrastructure, both human and technical. This includes expanding existing and adding new research teams, developing new research methods and establishing collaborative research partnerships across multiple sites.
It also requires data linkage at the patient level (involving administrative databases and clinical registries for public and private patients), which enables more patients to be studied and facilitates better quality and diversity of studies. Such data linkage will require establishment of data standards and common vocabularies, unique patient identifiers, data quality control and privacy protection systems, and informatics grids which connect practice-based research networks.

Of even greater importance is tackling the current bottlenecks in clinical research: ethics approvals,
contract negotiations and incentives for organisations
to participate. Truly harmonised and rapidly responsive
ethics approval procedures across multiple jurisdictions, standardised contract language, and logistical and financial support for institutions to collect and share data are required.

In attracting researchers, CER also requires a dedicated funding stream for investigator-led CER, as is the case
with basic biomedical science research and RCTs that predominantly or solely measure efficacy.8 In 2011, the National Health and Medical Research Council spent less than 5% of its $800 million budget on CER, compared with 47% on biomedical research.9

4. Implementation of CER in changing clinical practice

Generating or revising clinical guidelines using CER results will, by itself, have minimal impact in changing clinical practice quickly. Implementation drivers that have greater impact include: redesign of care processes, professional roles and systems of care; financial incentives that reward better practice; performance reporting and feedback; health information technology and clinical decision support; mandates for shared decision making with patients; and better training of clinicians in CER and its application.10 The interdisciplinary field of implementation science, used to study successful diffusion of innovation, will become an important tool,11 aided by CER trials designed to simultaneously evaluate intervention effectiveness and optimal methods of implementation.12

The biggest implementation challenge is reconciling clinicians to important shifts in who delivers what care, and to whom, under different circumstances. The Medicare Benefits Schedule and Pharmaceutical Benefits Scheme will need to move towards greater investment in efficiently priced interventions that CER shows to be effective and disinvestment in interventions that are not. These reforms will require strong political endorsement, independent researchers, early and ongoing engagement with stakeholders around reimbursement decisions, and demonstrable commitment to evidence-informed best practice. However, CER should not be perceived as a means to substantially reduce overall health care spending. In the US, estimated cost savings from CER are less than 0.1% of total expenditure.13 Instead, the aim is to facilitate better return on investment. In fact, CER may lead to recommendations to adopt new interventions.

Current status of CER and its impact on
clinical practice

In a recent review of 231 CER studies (37% on drugs, 29% on behavioural interventions and 16% on procedures), only 35% favoured the new intervention; in contrast, 79% of 804 non-CER studies favoured the new intervention.14 More than 70% of the CER studies relied on non-commercial funding, but less than a quarter evaluated safety and cost.14

CER is informing health care policy and changing clinical practice in Australia and overseas. Australian researchers have reported sentinel CER trials comparing saline infusions with albumin infusions in intensive care15 and early dialysis with late dialysis in end-stage renal failure.16 In Norway, an entire colorectal cancer screening program has been set up as a series of adaptive randomised trials testing different screening tests and procedures.17

Conclusion

CER has the potential to reform health care and transform health care research. The research community needs to accommodate a greater emphasis on CER and address challenges regarding optimal methods for selecting stakeholders, prioritising research questions, selecting study designs that best answer the clinical question posed, determining funding and governance arrangements, and implementing CER findings into practice and policy making.

Elements of clinical and health services research that distinguish efficacy and effectiveness studies*

Study elements

Efficacy studies

Effectiveness studies


Intervention

Protocol strictly enforced; treatments masked; cross-overs discouraged

Highly flexible, as used in routine health care; treatments not masked; crossovers permitted

Patient population

High disease risk, highly compliant, few comorbidities

Anyone with the condition of interest

Study sites

Academic settings with well resourced research specialists

Routine clinical practice settings

Outcome measures

Often short-term surrogates or composite outcomes

Outcomes that are clinically relevant to patients, clinicians and health care managers

Duration of study

Often short (eg, several months to a year)

Often long

Intensity of monitoring

Intense

Intensity depends on condition of interest and practice setting

Data sources

Specific to trial

Various, including administrative databases

Data collection

Ceases when study is discontinued

Continues as part of routine care

Analysis

Typically intention to treat

Various, depends on study aims and design


* Adapted from Gartlehner et al.7 When the study design is a randomised controlled trial (RCT), efficacy studies are termed explanatory RCTs and effectiveness studies are termed pragmatic RCTs.