Arvutiteaduse instituut
  1. Kursused
  2. 2022/23 kevad
  3. Privaatsust säilitavad tehnoloogiad (LTAT.04.007)
EN
Logi sisse

Privaatsust säilitavad tehnoloogiad 2022/23 kevad

  • Homepage
  • Lectures and Practice sessions
  • Homework
  • Links

Homework 4

Download the Covid-19 dataset and anonymise it using the ARX tool. Explain your choices throughout the process.

Choose identifying, quasi-identifying, sensitive and insensitive attributes. For quasi-identifiers, define levels of anonymisations (create hierarchies). Choose privacy models, it is enough to use k-anonymity and l-diversity.Run the anonymisation process and choose an appropriate transformation. Analyse the anonymised dataset in terms of utility. Try to make sure no columns besides the identifying attributes are entirely deleted while keeping the number of suppressed records under control. Export the final ARX project file.

Create a report, where you explain why you made the choices you made (you can briefly describe previous attempts if they gave unsatisfactory results).

  • Explain why you chose the levels of generalisations that you did (if you chose ordering, explain why you ordered things the way you did; if you chose intervals, explain how you chose them; if you created a custom hierarchy you can explain the logic behind it).
  • In your own words add a small explanation of what guarantees the privacy models offer in terms quasi-identifiers and sensitive attributes.
  • Briefly explain which transformation you chose and why.
  • Explain what level the transformation chose for each attribute. Do this by attribute (for isntance “Age: Level 2” is sufficient, the actual levels are visible from the project file).
  • Report the minimal class size from the input and output data.
  • Report how many records were suppressed (Analyze utility → Class sizes) and which attributes had the most missing values (Analyze utility → Quality models).
  • Analyze the risk of the input data: report the estimated percentage of records from the input data that had a larger re-identification risk than 50%.
  • Analyze the risk of the output data: report the estimated percentage of records that have a re-identifiaction risk below 5%.

Hints

If the anonymisation suppresses entire attributes or too many records:

  • Check the suppression level (suggested: 100%);
  • Define more hierarchy levels for quasi-identifiers;
  • Play around with sensitive attributes and quasi-identifiers (The distinction between them is not always clear. Turning a sensitive value into a quasi-identifier allows ARX to generalise it, while making a quasi-identifier into a sensitive attribute will mean that it is not considered in the k-anonymous classes).

Submission Form

Submit a zip container containing the ARX project file (*.deid) and the PDF of the report.

4. Task 4: Anonymisation
Sellele ülesandele ei saa enam lahendusi esitada.
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused