Workshop Recap - Anonymization and Pseudonymization of Research Data (Latvia 2026)

A two-day online training on statistical disclosure control delivered for Latvia's Higher Education and Science IT Shared Service Centre (VPC).

On April 28 and 29, 2026, we ran a two-day online training called “Anonymization and Pseudonymization of Research Data” for the Latvian Higher Education and Science IT Shared Service Centre (VPC). It was aimed at people who work with sensitive research data across Latvia’s education and science system, and the goal was simple: give them methods they can actually use.

We started with the basics that everything else rests on: the legal grounds for processing personal data, the difference between anonymization and pseudonymization, and a practical overview of statistical disclosure control.

From there we moved into the core techniques, such as recoding, local suppression, microaggregation and noise addition. Participants tried them out directly in the R package sdcMicro, which our team develops, working on real datasets like the Coleman and EU-SILC examples. Doing it by hand is the best way to feel the trade-off between protecting privacy and keeping the data useful.

The second day went deeper into the two questions that decide everything: how much re-identification risk is left in a dataset, and how much utility it still has. We looked at attacker scenarios, risk metrics and utility measures, so participants could judge for themselves when an “anonymized” dataset is genuinely safe to share. We also spent time on synthetic data, from classical statistical synthesis to newer deep-learning and LLM-based approaches, along with the more practical side of documentation and responsible data sharing.

This is exactly the kind of work our SNF Bridge project is about: turning anonymization research into skills people can put to use. A big thank you to Daina Kosīte, Mikus Melderis and Karīna Znotiņa at VPC, who organised the workshop and made it all run smoothly.