What data is involved from General Practice
From General Practice, the TVS SDE receives only the structured data from the GP record. Specifically, this includes structured (coded) GP record data that would normally appear in a shared care record, such as diagnoses, medications, test results, observations, referrals, and coded clinical events. Free‑text GP notes or unstructured narrative entries are explicitly excluded and are not approved for transfer.
The data initially contains identifiers so it can be linked to records from other care settings (e.g. hospital data), but identifiers are removed before researchers access the data.
How does the data flow
GP data flows from the GP system into the Shared Care Record, then securely into the NHS‑run Secure Data Environment, where it is linked, de‑identified and tightly controlled before any approved research use
Please can you provide clarity on when data is pseudonymised and when it is anonymised in the TVS SDE?
Pseudonymisation happens early in the SDE process, after data has been received and linked, but before it is made available to users.
The data flow begins with the receipt, linkage, and validation of data, during which identifiers such as NHS numbers and MRNs are temporarily utilised to accurately connect records and verify opt‑outs.
This is followed by processing within a controlled environment, where identifiable information is necessary to link and produce reliable and high‑quality datasets. security of the data.
At this stage:
- Direct identifiers exist only within the restricted processing environment
- Access is limited to a small, authorised technical team
- Identifiers are replaced or separated, but re‑identification would still be possible if additional information were available. Pseudonymised data is still personal data under UK GDPR, because re‑identification is theoretically possible.
Anonymisation happens before data is released to users.
In the TVS SDE, this process takes place when data extracts are generated as per the requested data spec for the approved project, and prior to the data being transferred to the analysis environment, where researchers are able to access it.
At this stage:
- All direct identifiers are removed
- Data is purpose‑minimised
- Users cannot access raw data or identifiers
- Outputs are further controlled via an airlock process, with only aggregate results allowed to leave the environment
How can I be reassured that there are no re identification risks for my patients’ data when it is pseudonymised or anonymised? And who/what process checks this?
The TVS SDE has been set up in line with the Five Safes Framework. One core element of the Five Safes Framework is the implementation of safeguards against the risk of re-identification.
The TVS SDE programme has also been following Information Commissioner’s Office (ICO) guidance on pseudonymisation and anonymisation
The datasets that flow into the researchers workspace will de-identified and stripped from identifiers. The key to the pseudonyms are kept separately and cannot be accessed by the researchers. To ensure this, technical and organisational measures including strict access controls are in place. This should ensure that there is no legal way to link the data with identifiers which would allow re-identification of patients.
Furthermore, researchers are required to sign Acceptable Use Policies when they request access to the database. This policy establishes standards which all staff and users (e.g. analysts and researchers) need to follow and puts restrictions on them on how they can use the data. They are strictly prohibited to attempt to re-identify anyone through the access that they receive. Breach of this can lead to severe consequences for them. Their access of data can be audited and if any misconduct is detected, access privileges will be revoked.
What safeguards are in place to protect the data from cyber-attacks?
Strong safeguards are in place to protect the data from cyber‑attacks. The TVS SDE is operated within Oxford University Hospitals’ NHS infrastructure, meets national NHS security standards, and complies with the Data Security and Protection Toolkit (DSPT). The environment is undergoing ISO 27001 accreditation and DHSC accreditation, uses industry‑standard secure data transfer mechanisms, and has been subject to independent penetration testing to check that the design is secure.
Access is tightly restricted, monitored, and controlled, and only a small authorised NHS technical team can access identifiable data, with additional technical and contractual controls in place to prevent misuse or re‑identification.
What should we expect if there is a breach?
If there were a data breach, it would be handled under standard NHS and UK GDPR incident‑management processes.
Oxford University Hospitals, as host of the Secure Data Environment, would lead the investigation, containment and mitigation of the incident, working with all participating organisations as joint controllers.
Any breach would be formally assessed, reported where required (including to the Information Commissioner’s Office), and affected organisations and patients would be informed if there were a risk to individuals.
Clear accountability, agreed indemnity arrangements, and established NHS security and governance frameworks are in place to ensure breaches are managed transparently, lawfully and consistently, with lessons learned to prevent recurrence.