Datasets available through QResearch
QResearch is a member of the Health Data Research UK Alliance. HDRUK supports the HDR Innovation Gateway through which UK health datasets can be discovered. More information about the datasets available though QResearch can be found on the Gateway website.
Summary of Data Items in the QResearch-linked Database
The QResearch linked database has high quality data to support world-leading research to improve our understanding of disease and improve patient care. This page summarises the data items which are available on the QResearch linked database. Some datasets have been provided for collaborative COVID-19 research and are governed by bespoke data sharing agreements. If you wish to access any of these datasets, please contact [email protected]
You can see summaries of current and recent projects using the database. You can see research papers using the database. Information for practices wishing to contribute to QResearch and information for researchers interested to access data
This information is anonymised in the GP computer system and then provided by the GP computer systems provider (EMIS). It includes the following information. No information about the patients' name or address and no free text information which GPs make about consultations is included.
- Demographics include a person's year of birth, sex, self-assigned ethnicity, geographical region, quintile of Townsend Deprivation Score, date of registration with the practice, date on which the patient left the practice.
- Prescriptions include name of medication, ingredient, dose, date of issue, number of tablets, duration of the course and estimated NHS cost.
- Diagnoses and problems include clinical codes for problems and diagnosis (such as heart disease or stroke) and the date. A patient may have many problems or diagnoses.
- Laboratory tests include things like blood tests and X-rays which the doctor might order to find out more about a person's illness or monitor their treatment. The information includes the type of test (e.g. blood count), the date, the result of the test.
- Clinical values include measurements the GP surgery might make such as weight, height, body mass index, blood pressure, peak flow meter reading along with the relevant date
- Symptoms include information which has been coded by the clinician such as shortness of breath, bleeding, headache, abdominal pain, chest pain.
- Consultations includes information about the date on which a person was seen, what type of health care professional they saw and whether it was in the surgery, on the phone or at home.
- Appointments includes information about the date of the appointment, how long it was for and which type of health care professional it was with (GP, nurse, physio, osteopath, pharmacist etc) and whether the person attended or not.
- Referrals include information on the date of the referral, urgency of the referral (routine, urgent) type of referral.
Cancer Registry Data linked to QResearch
Information is provided by Public Health England on diagnoses of cancer. This is a subset of the information on the National Cancer Registry (information about the full cancer registry dataset). The subset of information provided is then linked to the GP data. A summary is given below.
- Type and Site of Cancer - which part of the body (e.g. breast cancer or lung cancer)
- Year of birth.
- Ethnic group - this is one of the 17 groups defined at the 2001 census.
- Deprivation quintile - 5 groups to indicate information about the level of deprivation.
- Date of Diagnosis - this is the date when the diagnosis was made.
- Basis for diagnosis - how was the diagnosis made e.g. histology, clinical.
- Morphology - what it looks like down a microscope mapped to ICD10-02 cancer morphology codes or a number (8000-9990).
- Grade of cancer - how aggressive the cancer appears down a microscope (graded 1-4 where 4 is high grade).
- Gleeson grade - grade of cancer using the Gleeson scoring system (relevant to prostate cancer).
- Stage of cancer - how far it has spread (graded 1-4 where 4 is late stage when cancer has spread far).
- Cancer behaviour - how is the cancer behaving. This can be one of the following: benign; uncertain; in-situ; malignant; micro invasive; malignant metastatic; malignant where primary is uncertain
- Receptor status - Oestrogen or progesterone or HER2 receptor state (relevant to breast cancer).
- Route to Diagnosis - how was the cancer picked - options include on screening, by an urgent GP referral, emergency presentation at hospital, death certificate, during an inpatient episode, via outpatients, from a GP two week wait referral.
- Treatment - what treatment has been given e.g. radiotherapy, chemotherapy, surgery, hormones, other
- Size of the tumour
- Nottingham Prognostic Index - indicates likely survival and is relevant to breast cancer
- Nodes excised - number of glands removed at operation
- Nodes involved - number of nodes involved
- Excision margin - whether the excision margin at operation was clear of tumour and if so by how much
- Date of death - (where applicable)
- Cause of death - coded by ICD-9 and 10 codes from the mortality register.
Civil Registration Data linked to QResearch
Information is provided by NHS Digital of deaths which have been recorded on the National Death Register. This information is then linked to the GP data. A summary is given below.
- Date of death
- Main cause of death as recorded on the death certificate
- Other causes of death - Upto 14 other causes of death which may be recorded on the death certificate using ICD-10 codes.
Hospital Episode Statistics linked to QResearch
Information is provided by NHS Digital about contacts (known as 'episodes') which have resulted from a person being looked after by a hospital, either in the Accident and Emergency Department, Outpatients or when a person has been admitted to hospital because they need some tests, because they are unwell or need to have an operation or are having a baby. This information includes the following key variables
- Date of the episode started
- Date the episode ended
- Type of the episode (was it A&E, Outpatients or Inpatients or Maternity)
- Type of admission (was it a planned admission or emergency)
- Diagnoses made during the episode (these are coded using a system known as ICD-10 which is the International Classification of Diseases)
- Operations undertaken during the episode (these are coded using a system known as OPCS)
- Destination on discharge (e.g. back home, to a nursing home or transfer to another hospital)
- Maternity includes information on the delivery, any interventions needed (eg forceps) and general health of the baby at birth.
- Critical care - for people who have been very unwell, they may need critical care (also known as 'intensive care' or ITU). Information includes the date on which they were admitted to critical care, any diagnoses and the date when they were discharged.
Intensive Care National Audit & Research Centre (ICNARC) linked to QResearch
Information is provided by ICNARC from their Case Mix Programme (CMP). The CMP is a recognised national audit of patient outcomes from adult, general critical care units (intensive care and combined intensive care/high dependency units) covering England, Wales and Northern Ireland. Currently 100% of adult, general critical care units participate in the CMP. Other specialist units, including neurosciences, cardiac and high dependency units, also participate. This information from the CMP is linked to the GP data and is provided to support specific COVID-19 research projects as a collaboration between ICNARC and the University of Oxford.
This dataset contains data items including:
Demographics include a person’s year of birth, sex, ethnicity, and residence prior to admission
Diagnoses include past medical history, pregnancy, cancer, cirrhosis, hepatic encephalopathy, portal hypertension, congenital immunohumoral or cellular immune deficiency, HIV, respiratory disease, and cardiovascular disease
Laboratory tests include blood lactate, serum creatinine, serum urea, haemoglobin, and platelets
Clinical values include blood pressure, Glasgow Coma Score (GCS), oxygenation and pH, respiratory rate, and temperature
Treatments and interventions include chemotherapy, radiotherapy, cardiopulmonary resuscitation (CPR), home ventilation, renal therapy, steroids, and sedation
Outcomes include discharge from ICU and hospital, and death
Dates and times include admissions to and discharges from ICU and hospital, and number of support days (e.g. respiratory support days)
Derived variables include APACHE II score and ICNARC physiology score
Second Generation Surveillance System (SGSS) - COVID test data linked to QResearch
SGSS is the preferred method for capturing routine laboratory surveillance data on infectious diseases and antimicrobial resistance from laboratories across England. This information from the SGSS from January 2020 onwards is linked to the GP data. This dataset contains data items including:
Demographics include a person’s age in years, sex, ethnicity description, and county
Laboratory tests include Covid-19 result
Dates and times include specimen and lab report
This virtual registry, derived from GP and HES linked records, will become available for use by QResearch projects during 2021.
This is a registry of fertile women aged 12-49 years old in the UK between 2016 and the latest date for which data are available at the time of the study. This a subset of the information contained within the GP data. The subset of information provided is linked to the Hospital Episode Statistics (which provides outcomes of pregnancy in England) and Civil Registration Data (date of death, main and other causes of death). A summary is given below:
Demographics include a woman's year of birth, self-assigned ethnicity, geographical region, quintile of Townsend Deprivation Score, date of registration with the practice, date on which the woman left the practice.
Pregnancy includes dates and types of term pregnancy (delivered) and non-term pregnancy (most commonly by miscarriage/termination) as well as women currently pregnant.
Mother-baby link includes follow up of children and their mothers. It uses a household identifier which allows to identify siblings and fathers (when only one adult male is living in the house)
COVID National Immunisation Database (NIMS)
The COVID national immunisation dataset is provided by NHS Digital for COVID-19 related research. The dataset includes the following items
- Vaccine Date
- Vaccine Dose
- Vaccine Type
- Setting where administered
Additional Data Linkages Expected during 2022
Additional datasets which will be linked during 2021 to support specific research projects and collaborations. More information will be provided once this is available.
1. Lung cancer screening data from the National Lung Cancer Screening progress
2. Maternity medical dataset (NHS Digital)
3. Births Notifications (NHS Digital, subject to approval)
QCode Group Library
Researchers using the QWeb software can choose to publish their code groups under a Creative Commons License. Please see here