Homework 4
Download the Covid-19 dataset and anonymise it using the ARX tool. Explain your choices throughout the process.
Choose identifying, quasi-identifying, sensitive and insensitive attributes. For quasi-identifiers, define levels of anonymisations (create hierarchies). Choose privacy models, it is enough to use k-anonymity and l-diversity.Run the anonymisation process and choose an appropriate transformation. Analyse the anonymised dataset in terms of utility. Try to make sure no columns besides the identifying attributes are entirely deleted while keeping the number of suppressed records under control. Export the final ARX project file.
Create a report, where you explain why you made the choices you made (you can briefly describe previous attempts if they gave unsatisfactory results).
- Explain why you chose the levels of generalisations that you did (if you chose ordering, explain why you ordered things the way you did; if you chose intervals, explain how you chose them; if you created a custom hierarchy you can explain the logic behind it).
- In your own words add a small explanation of what guarantees the privacy models offer in terms quasi-identifiers and sensitive attributes.
- Briefly explain which transformation you chose and why.
- Explain what level the transformation chose for each attribute. Do this by attribute (for isntance “Age: Level 2” is sufficient, the actual levels are visible from the project file).
- Report the minimal class size from the input and output data.
- Report how many records were suppressed (Analyze utility → Class sizes) and which attributes had the most missing values (Analyze utility → Quality models).
- Analyze the risk of the input data: report the estimated percentage of records from the input data that had a larger re-identification risk than 50%.
- Analyze the risk of the output data: report the estimated percentage of records that have a re-identifiaction risk below 5%.
Hints
If the anonymisation suppresses entire attributes or too many records:
- Check the suppression level (suggested: 100%);
- Define more hierarchy levels for quasi-identifiers;
- Play around with sensitive attributes and quasi-identifiers (The distinction between them is not always clear. Turning a sensitive value into a quasi-identifier allows ARX to generalise it, while making a quasi-identifier into a sensitive attribute will mean that it is not considered in the k-anonymous classes).
Submission Form
Submit a zip container containing the ARX project file (*.deid) and the PDF of the report.