client logo
Version: 1.0.0 | Published: 24 Mar 2026 | Updated: 22 days ago

CPRD Cardiovascular Disease Synthetic Dataset

Dataset

Summary

Description:
This synthetic dataset is based on anonymised real primary care patient data extracted from the CPRD Aurum database. The dataset focuses on cardiovascular disease risk factors and was a proof-of-concept dataset.
Identifier:
10.11581/yk6n-b652
Contact Point:
Health Category:
Electronic Health Records (EHRs)
Number of Unique Individuals:
499344

Documentation

Documentation:
This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database and focuses on cardiovascular disease risk factors. Researchers will not be able to access the real anonymised patient data extract which was used as the basis for the synthetic dataset generation to preserve patient privacy. The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset, which is based on this, does not reflect the structure of the source CPRD Aurum database. This synthetic dataset was developed as part of a project funded by the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019.

Coverage

Spatial

Spatial Coverage:
United Kingdom

Temporal

Start Date:
25 March 2020
Frequency:
OTHER
Date of Latest Release:
28 June 2020
Date of First Release:
08 October 2024
Temporal Aggregation:
Unknown

Provenance

Origin

Purpose:
Study
Collection Situation:
Other
Image Contrast:
Not stated

Access and Governance

Usage

Data Use Requirements:
  • Project specific restriction
  • Geographical restriction
  • User specific restriction
  • Time limit on use
  • Institution specific restriction

Access

Jurisdiction:
Great Britain
Data Controller:
Clinical Practice Research Datalink (CPRD)
Data Processor:
CPRD
Delivery Lead Time:
1-2 months
Legal Basis:
General research use, No linkage, Research-specific restrictions, Research use only
Health Data Access Body:
CPRD

Format and Standards

Language:
English
Format:
Tab delimited text
Coding System:
SNOMED CT

Data Distribution

Data Status:
Not available
Distribution:
Access to CPRD data, including UK Primary Care Data, and linked data such as Hospital Episode Statistics, is subject to protocol approval via CPRD’s Research Data Governance (RDG) Process. Independent scientific and patient advice is provided by Expert Review Committees (ERCs) and the Central Advisory Committee (CAC): https://www.cprd.com/research-applications, https://www.cprd.com/pricing

Observations

Name
Population Type
Value
Description
Variable Measured
Unit Code
Observation Date
Number of Records
Minimum Typical Age
Maximum Typical Age
Persons
499344
Patients in the dataset
COUNT
28 June 2020
499344
0
150