Version: 1.0.0 | Published: 24 Mar 2026 | Updated: 16 days ago
Summary
Description:
The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types)
Contact Point:
Health Theme:
Cancer
Health Category:
- Electronic Health Records (EHRs)
- Data on factors impacting on health, including socio-economic, environmental & behavioural determinants of health
- Data from clinical trials, clinical studies & clinical investigations
Number of Unique Individuals:
118984
Documentation
Documentation:
The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types).
The curation of the dataset is based on specific cancer types and cancer patient forums. It is not based on every social post about cancer within the online sources, which is often irrelevant to the patient experience.
Coverage
Spatial
Spatial Coverage:
United Kingdom
Temporal
Start Date:
01 March 2023
Frequency:
IRREGULAR
Date of First Release:
07 April 2025
Temporal Aggregation:
Other
Provenance
Origin
Purpose:
Research cohort
Collection Situation:
Other
Image Contrast:
Not stated
Method of Collection:
Free text NLP
Access and Governance
Usage
Data Use Requirements:
Project specific restriction
Access
Access Rights:
In Progress
Jurisdiction:
United Kingdom of Great Britain and Northern Ireland
Data Controller:
White Swan
Data Processor:
White Swan
Delivery Lead Time:
1-2 months
Legal Basis:
Project-specific restrictions
Health Data Access Body:
White Swan is a registered charity in England and Wales (1176486) improving
health and wellbeing through AI technology and analytics.
Format and Standards
Language:
English
Format:
- csv
- xlsx
- web page explorer
Conforms To:
- OTHER
- LOCAL
Coding System:
- LOCAL
- OTHER
- HPO
Data Distribution
Data Status:
Available
Distribution:
On Request, On Request
Observations
Name
Population Type
Value
Description
Variable Measured
Unit Code
Observation Date
Number of Records
Minimum Typical Age
Maximum Typical Age
Persons
118984
Persons in this dataset are determined by the unique volume of chosen display names in the data. This is calculated per source (reddit, reviews, other forums), and then totaled together. In other forums and reviews domains persons may choose to denote themselves as anonymous. In this case, anonymous users are counted once per domain. For example, on 'https://healthunlocked.com/lungcancer'.
Unique online names indicating number of persons
16 April 2025
118984
0
112
Origin
Name:
Data Catalogue