client logo
Version: 1.0.0 | Published: 24 Mar 2026 | Updated: 35 days ago

White Swan UK Oncology Online Patient & Public Conversations Dataset

Dataset

Summary

Description:
The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types)
Health Theme:
Cancer
Health Category:
  • Electronic Health Records (EHRs)
  • Data on factors impacting on health, including socio-economic, environmental & behavioural determinants of health
  • Data from clinical trials, clinical studies & clinical investigations
Number of Unique Individuals:
118984

Documentation

Documentation:
The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types). The curation of the dataset is based on specific cancer types and cancer patient forums. It is not based on every social post about cancer within the online sources, which is often irrelevant to the patient experience.

Coverage

Spatial

Spatial Coverage:
United Kingdom

Temporal

Start Date:
01 March 2023
Frequency:
IRREGULAR
Date of First Release:
07 April 2025
Temporal Aggregation:
Other

Provenance

Origin

Purpose:
Research cohort
Collection Situation:
Other
Image Contrast:
Not stated
Method of Collection:
Free text NLP

Access and Governance

Usage

Data Use Requirements:
Project specific restriction

Access

Access Rights:
In Progress
Jurisdiction:
United Kingdom of Great Britain and Northern Ireland
Data Controller:
White Swan
Data Processor:
White Swan
Delivery Lead Time:
1-2 months
Legal Basis:
Project-specific restrictions
Health Data Access Body:
White Swan is a registered charity in England and Wales (1176486) improving health and wellbeing through AI technology and analytics.

Format and Standards

Language:
English
Format:
  • csv
  • xlsx
  • web page explorer
Conforms To:
  • OTHER
  • LOCAL
Coding System:
  • LOCAL
  • OTHER
  • HPO

Data Distribution

Data Status:
Available
Distribution:
On Request, On Request

Observations

Name
Population Type
Value
Description
Variable Measured
Unit Code
Observation Date
Number of Records
Minimum Typical Age
Maximum Typical Age
Persons
118984
Persons in this dataset are determined by the unique volume of chosen display names in the data. This is calculated per source (reddit, reviews, other forums), and then totaled together. In other forums and reviews domains persons may choose to denote themselves as anonymous. In this case, anonymous users are counted once per domain. For example, on 'https://healthunlocked.com/lungcancer'.
Unique online names indicating number of persons
16 April 2025
118984
0
112