Homework 6: Privacy-preserving joint data analysis
You are a PETs consultant. Three banks (COOP, LHV and Holm) come to you with a problem. They want jointly analyse their databases to create a joint machine learning model for estimating the risks for housing loans. The problem is, they do not want to share their data as is and are not legally allowed to do so anyway. They would prefer if the data never left their data centres in any readable form.
Task (part A): Using your favourite visual modelling language (BPMN, UML, tiny stick people and cylinders), please model four versions of solutions (hint: some of the models can be really similar):
- anonymisation,
- synthetic data generation,
- federated machine learning,
- secure multi-party computation.
You can abstract the database structure and assume that they can convert the data into a unified format. The dataset contains integer values, floating-point values, time series and classifiers. You can use ML as a black-box, but assume, that the method is of medium computational complexity (if you know more about ML, then logistic regression is fine, no need for deep learning).
Task (part B): Compare the models based on information disclosure to parties and complexity of deployment and cost of development (comparative, not exact sum). Suggest which solution you think would be best for these banks. Please try to fit the text on one A4 page, the models do not have to fit on the A4.
Deadline: May 19th, 2021 23:59 EEST. If you need an extension, let me know as early as possible.
Guidelines:
- The task is individual.
- The homework should be in one PDF document that contains:
- not more than 1 page of text for evaluation and your suggestion,
- 4 models as images (the images can be in between the text, but will not count towards the length of the text).