Version: 1.0.0 | Published: 24 Mar 2026 | Updated: 35 days ago
Summary
Description:
A synthetic dataset featuring patient-level information for >24,000 acute admissions with atrial fibrillation, including demographics, co-morbidities, symptoms, investigations, medications and outcomes, derived from real patient records.
Access Tier:
Controlled
Contact Point:
Health Category:
- Electronic Health Records (EHRs)
- Data on factors impacting on health, including socio-economic, environmental & behavioural determinants of health
Number of Unique Individuals:
24800
Documentation
Documentation:
Atrial fibrillation (AF) is a common abnormal heart rhythm that causes the heart to beat irregularly and often too fast. AF increases the risk of stroke and heart failure. AF primarily affects older adults and individuals with chronic conditions such as heart disease, high blood pressure, or obesity. Additional factors include congenital heart disease, and cardiomyopathy. AF can be treated by ablation or controlled using medication. The risk of stroke can be reduced using anti-coagulants.
This synthetic AF dataset comprises of 24.8k “patients” including demographics, co-morbidities, presenting symptoms and medical events during hospital stays, coded with ICD-10 and SNOMED-CT.
Using the Synthetic Data Vault package with a GAN synthesizer, a synthetic dataset was generated from real clinical data. The dataset includes demographic information and hospital admission details. The real data was pre-processed for correct datetime parsing and metadata was defined to capture schema structure, guiding the synthesizer in learning data distributions and relationships. The resulting synthetic dataset closely mirrors the statistical properties of the original, supporting privacy-preserving analysis and model training.
Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Coverage
Spatial
Spatial Coverage:
Temporal
Start Date:
19 November 2017
End Date:
16 January 2021
Frequency:
QUARTERLY
Date of Latest Release:
02 December 2024
Date of First Release:
02 December 2024
Temporal Aggregation:
1 - 10 Years
Provenance
Origin
Purpose:
Care
Collection Situation:
Secondary care - Accident and Emergency
Image Contrast:
Not stated
Method of Collection:
Machine generated
Access and Governance
Usage
Data Use Requirements:
Project specific restriction
Access
Jurisdiction:
England
Data Controller:
University Hospitals Birmingham NHS Foundation Trust
Data Processor:
NOT APPLICABLE
Delivery Lead Time:
1-2 months
Legal Basis:
General research use
Health Data Access Body:
This publication uses data from PIONEER, an ethically approved database and
analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
Format and Standards
Language:
English
Format:
SQL
Conforms To:
LOCAL
Coding System:
- SNOMED CT
- ICD10
- OPCS4
Data Distribution
Data Status:
Available
Distribution:
Trusted Research Environments (TRE) are built using Microsoft Azure services and
hosted in the UK to provide research teams a safe, secure and agile environment
which allows users to quickly analyse, interpret and form an enriched view of
primary care information through a range of integrated datasets. Health data
collated from multiple sources is ingested into a secure data lake which will
then allow subsets of data to be made available to research teams on approval of
a data request. Once approved a customer specific TRE is made available with a
standard set of leading analytical tools from Microsoft including Azure
Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale
data warehouses). Specific tools can be provided at an additional cost over the
standard platform data access charge and the PIONEER team will work with you to
determine your exact needs. Access to the TRE is managed using the latest
virtual desktop technology to provide a safe and secure end-user experience. By
utilising leading edge design PIONEER are able to create TREs rapidly to enable
us to service any customer requirement.,
www.pioneerdatahub.co.uk/data/data-services-costs/
Observations
Name
Population Type
Value
Description
Variable Measured
Unit Code
Observation Date
Number of Records
Minimum Typical Age
Maximum Typical Age
Persons
24800
24800 spells for patients with AF between 19/11/2017 and 16/01/2021
Count
26 September 2022
24800
0
120
Origin
Name:
Data Catalogue