Homework 4: Predictive Process Monitoring
This task uses the BPIC15 (link [1]) preprocessed event log. This practice's aim is for you to train and use predictive process monitoring techniques to predict the outcome of a process from a log of events. Make the necessary modifications to the framework proposed by Taineema et al. (link [2]), which was reviewed in class to meet this goal. Tasks
- (2 points) As part of the log preprocessing, it is necessary to categorize the process traces as deviant or regular. A case is deviant if its duration is greater than the mean case duration of the log by 10% (i.e., mean-case-duration * 1.10). Otherwise, the case is regular. You must create a new column in the log that contains a case attribute called label that must take a value of 1 for deviant cases or 0 for regular ones.
- (2 points) Add a column to the event log that describes the WIP of the process in each event. Remember that the WIP refers to the number of active cases that have started but not been completed.
- (3 points) Given the WIP column in Task 2 and the outcome (label) in Task 1, train an XGboost and a Random Forest models using single bucketing and combined encoding.
- (3 points) Repeat Task 3 without the WIP column. Compare the resulting models with the previous step and explain the differences in the results. Did the use of the WIP attribute affect the accuracy of the models? If so, explain why this effect can occur.
What to submit? A report in PDF format containing the explanation of the modifications made to the approach, the evaluation of the changes, and the link to the repository containing the made changes. You must submit this document via the Submit link on the course website.
References
- https://owncloud.ut.ee/owncloud/index.php/s/8DC9KXyTnJWNRgJ/download/HW4-BPIC15.csv
- [[ https://github.com/Mcamargo85/predictive-monitoring-benchmark.git
]]