Development and validation of statistical learning models to predict breast cancer diagnosis and breast cancer mortality in the general population: a cohort study

Status

Ongoing

Title

What is the aim of the study and why is it important?

Breast cancer is the most common cancer affecting women. In the UK each year, 55,000 women are diagnosed with it, and over 11,000 women die from it. Breast cancer screening uses mammograms (a form of X-ray) to try and detect breast cancers at an earlier stage and help women be treated more effectively. However, there is significant debate around the balance of harms and benefits of screening, with a recent independent group finding that for each breast cancer death avoided with screening, three women are diagnosed with a tumour that would never have affected them. They may have a mastectomy, chemotherapy or other treatment completely inappropriately. Other studies suggest that this balance might be 1:10 rather than 1:3. Offering the same form of breast screening to the whole female population ignores the fact that individual women may have very different risks of developing breast cancer or developing a breast cancer that threatens their life.

‘Risk-stratified breast screening’ is a relatively new idea which suggests that targeting screening to those at highest risk might reduce the harms and enhance the benefits of screening. ‘Personalising’ screening could mean changing the types of screening a woman receives (such as using MRI instead of mammograms), what age she starts at, or the time between screening scans. Furthermore, an improved way of calculating risk might also help identify women that could take medications, or change their lifestyle habits to reduce their breast cancer risk. However, the best way to calculate the risk of an individual woman is not yet clear. There are some mathematical methods available, but these do not perform well enough to guide a risk-stratified approach for breast screening. Models need to be able to accurately tell apart women that eventually do develop breast cancer and those that do not. There also needs to be close agreement between the risks that a model predicts and what the ‘true risk’ is.

Some researchers believe that predicting the risk of a potentially lethal breast cancer may be more helpful to direct screening, rather than predicting every possible tumour. Additionally, predicting how ‘risky’ a breast tumour is after being diagnosed might help identify lower risk lesions that need less aggressive treatment.

The QResearch database is a collection of data from general practices that has been used many times to develop statistical equations (also called models) that can make predictions about individual people’s risk for specific conditions, and many of these are installed in GP software systems.

This project seeks to use the anonymised data of millions of British women to develop highly performing risk prediction equations that could be used for ‘personalising’ breast screening, and/or personalising the treatment of breast tumours once they are diagnosed. These equations will be developed using statistical techniques, but also techniques from the ‘machine learning’ or ‘artificial intelligence’ fields. All models will be evaluated in terms of how well they perform in the QResearch dataset compared with existing models, and also in a separate dataset to make sure they still have good performance if they were applied in real world settings. We are particularly interested in looking at which situations the models perform best or less well, such as looking at their accuracy in different age groups, or women from different ethnic groups.

Chief Investigator

Dr Ashley Clift

Sponsor

Oxford

Location of research

University of Oxford

Date on which research approved

16-Nov-2020

Project reference ID

OX129

Generic ethics approval reference

18/EM/0400

Are all data accessed are in anonymised form?

Yes

Brief summary of the dataset to be released (including any sensitive data)

GP data on age, demographics, risk factors for breast cancer, cancer diagnoses.

Hospital data regarding diagnoses using ICD-10 of above conditions, invasive breast cancer, or ductal carcinoma in situ during hospital admission or appointment and associated dates. OPCS codes for undergoing previous breast biopsy, previous hysterectomy, previous systemic cytotoxic chemotherapy, or radiotherapy to chest wall and associated dates.

Mortality data including date and causes of death.

Cancer registration data Information regarding all breast cancer diagnoses in women – this will include stage at diagnosis, route to diagnosis, initial treatment and tumour grade, tumour size, ER status, PR status, HER2 status, nodes involved, nodes excised, Nottingham prognostic index, and whether or not the cancer was screening detected.

What were the main findings?

This study resulted in new prediction models that could, pending further evaluation such as external validation and cost-effectiveness simulations, be useful to inform novel, risk-based early detection strategies for breast cancer, and/or risk stratification for women diagnosed with breast cancer.

We developed the first ever model that predicts the risk of dying from breast cancer in women that currently do not have the condition. This focus on identifying women at higher risk of developing 'cancers that kill' could inform different approaches to risk-based screening or prevention that explicitly target mortality reduction, and reduce the risks of 'overdiagnosis' in screening. By using a cohort of over 11 million women, this study was also the largest yet undertaken to develop breast cancer risk prediction models, and compared the performance of statistical and machine learning methodologies. The findings were published in the Lancet Digital Health (https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00113-9/fulltext).

Another publication in the BMJ (https://www.bmj.com/content/381/bmj-2022-073800) details our results with developing and validating a new prediction tool for breast cancer mortality that can be used in any woman diagnosed with the condition. This part of the study used data for over 140,000 women diagnosed with breast cancer, and presented two models that could be useful for counselling women regarding their risks, identifying 'higher risk' women that could be eligible for clinical trials, or be used to improve trial efficiency.

Ash Clift and the research team would like to thank Cancer Research UK for funding the work, and also the thousands of healthcare professionals and millions of patients who contributed to the QResearch database.

Funding Source

Cancer Research UK Clinical Doctoral Fellowship

Research Team

Dr Ashley Clift, Professor Julia Hippisley-Cox, Professor Stavros Petrou, Professor Gary Colluns, Dr Simon Lord, Dr David Dodwell, all from the University of Oxford

Approval Letter

Download Approval Letter

Publications

Development and validation of clinical prediction models for breast cancer incidence and mortality: a protocol for a dual cohort study
Authors: Ashley Kieran Clift, Julia Hippisley-Cox, David Dodwell, Simon Lord, Mike Brady, Stavros Petrou, Gary S. Collins
Ref:
https://bmjopen.bmj.com/content/12/3/e050828.abstract?ct=
Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
Authors: Clift AK, Dodwell D, Lord S, Petrou S, Brady M, Collins GS, Hippisley-Cox J
Ref:
https://www.bmj.com/content/381/bmj-2022-073800
The current status of risk-stratified breast screening
Authors: Clift AK, Dodwell D, Lord S, Petrou S, Brady M, Collins GS, Hippisley-Cox J
Ref:
https://www.nature.com/articles/s41416-021-01550-3
Predicting 10-year breast cancer mortality risk in the general female population in England: a model development and validation study
Authors: Ash Kieran Clift MBBS, Prof Gary S Collins, Simon Lord, Prof Stavros Petrou, David Dodwell, Prof Michael Brady, Prof Julia Hippisley-Cox
Ref:
https://www.sciencedirect.com/science/article/pii/S2589750023001139

Press Releases

New model could offer personalised breast cancer screening approach, say experts

Access Type

Trusted Research Environment (TRE)

›

Development and validation of statistical learning models to predict breast cancer diagnosis and breast cancer mortality in the general population: a cohort study

Status

Title

What is the aim of the study and why is it important?

Chief Investigator

Lead Applicant Organisation Name

Sponsor

Location of research

Date on which research approved

Project reference ID

Generic ethics approval reference

Are all data accessed are in anonymised form?

Brief summary of the dataset to be released (including any sensitive data)

What were the main findings?

Funding Source

Public Benefit Statement

Research Team

Approval Letter

Publications

Press Releases

Access Type

Share this