Homework 4: Predictive Process Monitoring
This task involves utilizing the preprocessed BPIC12W event log https://owncloud.ut.ee/owncloud/s/pMRrERiTnpE9srJ to train and employ predictive process monitoring techniques for predicting the outcome of a process. To accomplish this objective, you must modify the frameworks reviewed in class as needed.
Tasks:
- (2 points) As a part of log preprocessing, it is important to categorize process traces as either deviant or regular. A case is considered deviant if its total duration exceeds the mean of all cases' duration. To achieve this, a new column must be created in the log that contains a case attribute called 'label.' This attribute must take a value of 1 for deviant cases or 0 for regular cases.
- (2 points) To improve the precision of the model, new time contextual features can be extracted from the timestamps. These features can provide valuable information to the model about any seasonal influences on the process behavior. For instance, from a start timestamp like "2024-04-18 13:00:00", we can extract the month of the year (e.g., 04), the day of the week (e.g., Tuesday as 2, where Monday is 0 and Sunday is 6), and the relative time in seconds since midnight (e.g., 46800 for 1:00 PM as midnight is 0). To implement this, six new columns must be added to the log, containing these three contextual features for the start and complete timestamps. It is recommended that the weekday() method from Python be used.
- (3 points) Train an XGBoost Classifier using single bucketing and last-state encoding with the “label” column as target and contextual feature columns as part of the training log.
- (3 points) Please perform Task 3 once more, but this time, exclude the contextual features. Compare the accuracy of the resulting models with the previous step. Explain any differences in the results obtained from both steps. You should consider whether the use of contextual features impacted the accuracy of the models or not. Please also explain why such an effect may or may not have occurred.
What do you need to submit? You must submit a report in PDF format that includes a comprehensive explanation of the modifications made to your approach, an evaluation of the changes, and a link to the repository that contains the modifications you have made. Kindly submit this document through the 'Submit' link provided on the course website.