Legal framework
The GDPR does not explicitly define “anonymization” in its operative text. The key provision is Recital 26, which states that “the principles of data protection should not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”. In other words, true anonymization takes the data outside the scope of the GDPR, while anything “in between” remains personal data.
Pseudonymization is defined in Art. 4(5) GDPR as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures”. Pseudonymized data remain personal data and are fully subject to the Regulation.
EDPB Guidelines 04/2025 on anonymization are the Board’s attempt to provide a new standard after Opinion 05/2014 of the Article 29 Working Party. They set stricter technical and legal requirements and must be read together with national supervisory authority practice, including the Bulgarian CPDP.
For help applying these guidelines, see our resources at gdprbg.com, where the Innovires Legal GDPR team publishes guides, templates and case notes.
Anonymization vs. pseudonymization — key differences
The differences between the two concepts have direct legal consequences. The table below summarizes the key points:
| Criterion | Anonymization | Pseudonymization |
|---|---|---|
| GDPR scope | OUTSIDE scope (if truly anonymous) | Within scope — still personal data |
| Reversibility | Irreversible | Reversible (with key) |
| Legal basis | Not required | Required (after pseudonymization) |
| DPIA | Not required | May be required |
| Data subject rights | Do not apply | Apply |
| Breach notification | Not required | Required (but lower risk) |
| Third-country transfers | Free | Under GDPR rules |
The practical significance: if you classify a dataset as “anonymous” when it is in fact only pseudonymized, the entire legal basis for processing may be wrongly determined. This is exactly why the EDPB proposes strict identifiability tests.
The three EDPB tests for true anonymization
Originating from Opinion 05/2014 of the Article 29 Working Party and confirmed and expanded in the new Guidelines, the EDPB uses three cumulative identifiability tests:
- Single-out — can an individual data subject be isolated from the dataset, even without knowing their name? A unique combination of attributes is already a problem.
- Linkability — can records relating to the same subject across different datasets (or within one dataset) be linked together?
- Inference — can attributes of the subject be inferred with high probability based on other values in the dataset?
True anonymization requires a NEGATIVE answer to ALL THREE tests. If even one of them is positive, the dataset still contains personal data and must be treated as pseudonymization.
For practical testing and anonymization audits, our team at gdprbg.com offers a structured methodology including residual-risk documentation.
Anonymization techniques
There is no universal technique. The choice depends on the purpose of the processing, the data type and the acceptable trade-off between utility and protection. The main techniques:
| Technique | Description | Weakness |
|---|---|---|
| Generalization | Replacement with a broader category (e.g. “Sofia” → “Bulgaria”) | Loss of utility |
| Suppression | Removing values or entire records | Incomplete dataset |
| k-anonymity | Each record indistinguishable from k-1 others | Does not protect from inference |
| l-diversity | Diversity of sensitive attributes within a group | Does not cover skewness |
| t-closeness | Group distribution close to the overall distribution | Complex implementation |
| Differential privacy | Mathematical privacy guarantee via added noise | Hard to apply generally |
| Synthetic data | New data with statistics similar to the original | Can still “memorize” rare records |
In practice, several techniques are almost always combined, e.g. generalization + k-anonymity + l-diversity. A documentation-first approach (documenting each transformation) is mandatory for demonstrating compliance to the supervisory authority.
Pseudonymization techniques
- Hashing — one-way transformation, but vulnerable to dictionary and rainbow-table attacks if no salt is used.
- Keyed hash (HMAC) — hash with a secret key, which makes attacks significantly harder; however, compromise of the key renders all records re-identifiable.
- Deterministic encryption — enables JOINs across datasets because the same input yields the same ciphertext.
- Tokenization — replacement of values with random tokens stored in a separate token vault.
- Encryption with an externally managed key — classic pseudonymization because the key is stored separately and under a different access regime.
Important: hashing alone, without additional controls, is almost never sufficient for anonymization. The EDPB and the Article 29 WP have consistently treated it as pseudonymization.
We cover the technical details in our GDPR audits at gdprbg.com — from algorithm choice to key management.
Typical business applications
- Analytics and reporting without exposure of personal data — BI dashboards, management reporting.
- Machine learning — training sets that do not require subject identity.
- Research — scientific publications, clinical research, academic collaboration.
- Data sharing with partners — B2B integrations where identity is not necessary.
- Marketing analytics — privacy-friendly alternatives to GA4 and other tracking solutions.
- Health data for research — under the strict regime of special categories in Art. 9 GDPR.
- Fraud detection — anomaly detection without direct identification.
In all these cases the choice between anonymization and pseudonymization has direct implications for legal basis, retention period and data subject rights.
Risks and limitations
The main risk of anonymization is re-identification — revealing identity by combining with other public or private datasets. Several classic case studies illustrate this:
- Netflix Prize — anonymous movie ratings combined with IMDb profiles led to de-anonymization of subscribers.
- AOL search log — published “anonymous” search queries revealed user identities through query content.
- Medical data — research shows that the combination of ZIP code + date of birth + gender is often enough for unique identification.
Residual risk always remains. True, absolute anonymization is almost impossible with rich datasets. This is why the EDPB requires a risk-based approach, including assessment of context, means of a hypothetical attacker and available external sources.
Step-by-step process
- Define the purpose of processing — what will the output be used for.
- Inventory of personal data in the source dataset — which fields, which categories.
- Risk assessment for re-identification — who are likely “attackers” and what external data they have.
- Choose a technique — anonymization vs. pseudonymization, based on purpose and risk.
- Technical implementation of the chosen transformations.
- Testing — the three EDPB tests (single-out, linkability, inference).
- DPIA if the process is high-risk or involves special categories.
- Documentation — records of every step, including decisions and justifications.
- Regular review — the risk profile changes with new data and techniques.
If your process is complex, request a free GDPR audit at gdprbg.com.
Relation to DPIA
When you use pseudonymization as a risk-mitigation measure in a DPIA (Data Protection Impact Assessment), it reduces residual risk and is usually viewed favourably by the supervisory authority. Importantly, however, pseudonymization does not exempt you from the obligation to perform a DPIA — it is an element of the assessment, not an exception.
The DPIA report should explicitly describe: (i) the chosen pseudonymization method, (ii) the location and protection of the key, (iii) the circle of persons with access and (iv) the procedure for periodic review.
Further guidance and a DPIA template: DPIA guide at gdprbg.com, as well as the DPO role in the anonymization process.
Pseudonymization and data breaches
In a breach of pseudonymized data, the risk to subjects is often objectively lower — the attacker gets tokens or hashes rather than direct identity. Nevertheless, notification to the CPDP within 72 hours under Art. 33 GDPR remains mandatory if the breach poses a risk to the rights and freedoms of subjects.
If, alongside the pseudonymized data, the re-identification key (or tokenization vault) is also compromised, the risk increases sharply and notification of the data subjects themselves under Art. 34 GDPR becomes mandatory. Practical tip: store keys with a different provider or at least on separate infrastructure.
Full breach protocol: 72-hour breach protocol at gdprbg.com.
Frequently asked questions
Need help with anonymization or pseudonymization?
Our dedicated GDPR team at gdprbg.com provides assessment, implementation and auditing of anonymization and pseudonymization techniques under EDPB Guidelines 04/2025. Request a free consultation or fill in the form below.