Homework 4: Predictive Process Monitoring
This task uses the preprocessed BPIC12W event log. This practice aims for you to train and use predictive process monitoring techniques to predict the remaining time of a process from an event log. Make the necessary modifications to the frameworks revisited in class to meet this goal.
Tasks
- (2 points) As part of the log preprocessing, we need to calculate the remaining time for each case in the log. This can be done by finding the difference in seconds between the end timestamp of one event and the end timestamp of the last event in the case. To implement this, a new column called "remtime" should be created in the log, containing this new feature as an event attribute.
- (2 points) One way to improve the model's precision is by extracting new time contextual features from the timestamps. These features can provide meaningful information to the model about possible seasonal influences on the process behavior. For example, from a start timestamp like "2023-04-18 13:00:00", we can extract the month of the year (e.g., 04), the day of the week (e.g., 2 for Tuesday, where Monday is 0 and Sunday is 6), and the relative time in seconds since midnight (e.g., 46800 for 1:00 PM, as midnight is 0). To implement this, six new columns must be created in the log, containing these three contextual features for the start and complete timestamps. NB! Consider using the weekday() method from Python.
- (3 points) Given the “remtime” column in Task 1 and the contextual features columns in Task 2, train an XGBoost Regressor using single bucketing and aggregation encoding.
- (3 points) Perform Task 3 again, but this time exclude the contextual features. Then, compare the accuracy of the resulting models with the previous step, and explain any differences in the results. Consider whether the use of contextual features impacted the accuracy of the models, and explain why such an effect may or may not have occurred.
What to submit?
A report in PDF format containing the explanation of the modifications made to the approach, the evaluation of the changes, and the link to the repository containing the made changes. You must submit this document via the Submit link on the course website.