Status
Ongoing
Title
Comparisons of risk prediction algorithms using three clinical research databases (QResearch, CPRD Aurum and CPRD Gold)
What is the aim of the study and why is it important?
QResearch, CPRD Gold and CPRD Aurum databases are three large general practice databases which are widely used for research. The CPRD databases are similar to the QResearch database but contain a different group of practices and are linked to different external data sources.
Our study will compare characteristics of the database such including how common various diseases are and how well the data are recorded on each. We will develop new prediction algorithms and compare them with existing algorithms. We will then check to see how well various risk algorithms work in each data source. For example we will develop algorithms on QResearch and test them on the both CPRD databases.
Risk algorithms are tools which work out the chances that a patient has got or might develop a disease in the future (such as diabetes or cancer), based on information about them such as their age, sex, ethnicity and illnesses and treatments. In clinical practice, such tools can be used to help patients understand their risk of different diseases and identify those who might need help to reduce their risk or referral to hospital for tests. In this study, we want to see how well these algorithms work on each database and to understand similarities and differences between the databases which will help us interpret the results.
How is the research being done?
OBJECTIVE 1
To identify and quantify systematic differences between the three UK research databases (QResearch, CPRD Gold and CPRD Aurum) including geographical spread, diversity of the registered patient population, and clinical coding.
We will analyse GP linked data to assess completeness of recording of outcomes (e.g. diabetes) by examining the number of cases recorded on the following data sources (below). This will then be analysed to understand differences and similarities between the databases
(a) GP record alone
(b) GP record or deaths record
(c) GP or HES record
(d) GP or HES or death record
(e) GP or HES or death or cancer registry
We will also compare rates with external data sources where available. For example we compare mortality rates with published statistics from the Office of National Statistics.
OBJECTIVE 2:
To validate the performance of multiple new and existing risk prediction algorithms for identifying patients at risk of different types of outcomes on an each of the three databases (CPRD Gold and Aurum). This includes the assessment of discrimination, calibration, decision curve analysis, sensitivity, specificity, positive and negative predictive values at different thresholds and with and without accounting of competing risk of death. It will also include cross validation by developing some models in each database and validating it in the other two.
Combined with objective 1, we will determine whether the inclusion of linked data materially affects the calibration or discrimination of the algorithms.
Chief Investigator
Professor Julia Hippisley-Cox
Lead Applicant Organisation Name
Sponsor
Oxford
Location of research
University of Oxford
Date on which research approved
22-May-2023
Project reference ID
OX330
Generic ethics approval reference
18/EM/0400
Are all data accessed are in anonymised form?
Yes
Brief summary of the dataset to be released (including any sensitive data)
We will undertake cohort studies in a large population of primary care patients from an open cohort using data from all three database - CPRD Gold and Aurum databases and QResearch. We will include all practices which have been using their current GP clinical computer system for at least a year. We will use the latest data available from each database at the time of the analysis. We will identify cohorts from each database which will include patients registered with practices on or after 01 Jan 1998 until the latest date for which data are available at the time of the study.
We will use the GP data linked to hospital episode statistics, cancer registry and mortality.
1. GP DATA
demographics including age, sex, ethnicity, deprivation, region
clinical diagnoses - major chronic diseases e.g. diabetes, cardiovascular disease, thromboembolism, cancer, fracture, haemorrhage
clinical values e.g. body mass index, smoking, alcohol
laboratory investigations e.g. full blood count, electrolytes, liver function tests, CA125
commonly prescribed medication
2. HES Data
HES data to identify outcomes of interest e.g. diabetes, cardiovascular disease, thromboembolism, cancer, fracture, haemorrhage
3. Mortality Data
mortality data to identify outcomes of interest on the death certificate e.g. diabetes, cardiovascular disease, thromboembolism, cancer, fracture, haemorrhage and cancer treatments.
4. Cancer Registry Data
Cancer registry data to identify characteristics of cancers e.g. type, location, stage, grade, route to diagnosis, treatments (e.g. chemotherapy, radiotherapy, hormonal, surgery)
Funding Source
John Fell Fund
Public Benefit Statement
Research Team
Julia Hippisley-Cox, University of Oxford
Carol AC Coupland, University of Oxford
Mona Bafadhel, King’s College London
Richard EK Russell, King’s College London
Aziz Sheikh, University of Edinburgh
Peter Brindle, University of Bristol
Keith M. Channon, University of Oxford
Approval Letter
Publications
-
Development and validation of a new algorithm for improved cardiovascular risk prediction
Authors: Hippisley-Cox J, Coupland CAC, Bafadhel M, Russell REK, Sheikh A, Brindle P, Channon KM
Ref:
https://www.nature.com/articles/s41591-024-02905-y
Press Releases
- New risk score for cardiovascular disease with improved performance
- New heart disease calculator could save lives by identifying high-risk patients missed by current tools
Access Type
Trusted Research Environment (TRE)