Version: 1.0.0 | Published: 24 Mar 2026 | Updated: 22 days ago
Summary
Description:
This synthetic dataset is based on anonymised real primary care patient data extracted from the CPRD Aurum database. The dataset focuses on cardiovascular disease risk factors and was a proof-of-concept dataset.
Identifier:
10.11581/yk6n-b652
Contact Point:
Health Category:
Electronic Health Records (EHRs)
Number of Unique Individuals:
499344
Documentation
Associated Media:
Documentation:
This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database and focuses on cardiovascular disease risk factors.
Researchers will not be able to access the real anonymised patient data extract which was used as the basis for the synthetic dataset generation to preserve patient privacy.
The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset, which is based on this, does not reflect the structure of the source CPRD Aurum
database. This synthetic dataset was developed as part of a project funded by the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and
managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019.
Coverage
Spatial
Spatial Coverage:
United Kingdom
Temporal
Start Date:
25 March 2020
Frequency:
OTHER
Date of Latest Release:
28 June 2020
Date of First Release:
08 October 2024
Temporal Aggregation:
Unknown
Provenance
Origin
Purpose:
Study
Collection Situation:
Other
Image Contrast:
Not stated
Access and Governance
Usage
Data Use Requirements:
- Project specific restriction
- Geographical restriction
- User specific restriction
- Time limit on use
- Institution specific restriction
Access
Access Rights:
Jurisdiction:
Great Britain
Data Controller:
Clinical Practice Research Datalink (CPRD)
Data Processor:
CPRD
Delivery Lead Time:
1-2 months
Legal Basis:
General research use, No linkage, Research-specific restrictions, Research use
only
Health Data Access Body:
CPRD
Format and Standards
Language:
English
Format:
Tab delimited text
Coding System:
SNOMED CT
Data Distribution
Data Status:
Not available
Distribution:
Access to CPRD data, including UK Primary Care Data, and linked data such as
Hospital Episode Statistics, is subject to protocol approval via CPRD’s Research
Data Governance (RDG) Process. Independent scientific and patient advice is
provided by Expert Review Committees (ERCs) and the Central Advisory Committee
(CAC): https://www.cprd.com/research-applications, https://www.cprd.com/pricing
Observations
Name
Population Type
Value
Description
Variable Measured
Unit Code
Observation Date
Number of Records
Minimum Typical Age
Maximum Typical Age
Persons
499344
Patients in the dataset
COUNT
28 June 2020
499344
0
150
Origin
Name:
Data Catalogue