Potential project topics and supervisors
- This semester you have to provide the topic and supervisor by yourself.
- We do the grand introduction of topics and supervisors in autumn terms.
- Spring term projects are for those who really know what they want to do.
Finding patterns for dynamically generated classes from Java stack traces
Student: Kairi Hennoch
Supervisor: --
Problem description:
Plumbr is a tool for monitoring Java performance issues. When it detects an issue, it informs the user of the details and origin of the problem, usually including a stack trace. It also groups problems together, so it can say something like ”There have been 123 issues that have this stack trace”. This works well if the problem caused by some line of code always has the exact same stack trace, but in Java, it is possible to dynamically create classes, because classes are loaded only when they are actually needed by the program. These classes might be generated with different names every time the JVM (Java Virtual Machine) is restarted, which leads to differing stack traces for essentially the same problem.
The dataset for this problem is a few hundred gigabytes of stack traces gathered from hundreds of JVMs. The end goal is to detect generated classes from these stack traces and to define 1 patterns that would enable us to detect these generated classes in future stack traces. We already know a few of them: ’sun.reflect.GeneratedMethodAccessor’, ’com.sun.proxy.$Proxy’, ’CGLIB$$’, ’org.apache.wicket.proxy.$Proxy’, ’$$ javassist ’, all of which are followed by a generated part of the class name (usually some number). One idea would be to find stack traces that differ only in a few places, which would be the generated class candidates. Another idea would be to find class names that occur with a few differences many times in different stack traces.
Morphological Segmentation via Bayesian Inference
Student: ---
Supervisor: Mark Fishel
Problem description:
Implement and test the method published in http://www.aclweb.org/anthology/P08-1084, very friendly introduction into the methodology: http://www.isi.edu/natural-language/people/bayes-with-tears.pdf
Morphology Induction via Deep Learning
Student: ---
Supervisor: Mark Fishel
Problem description:
Implement and test the method published in http://www.aclweb.org/anthology/N15-1186.
Porting predictive modelling code from GNU R to Java
Student: Kerwin Jorbina
Supervisor: Fabrizio Maggi
Problem description:
Predictive process monitoring is concerned with exploiting event logs to predict how running (uncompleted) cases will unfold up to their completion. In this project, we implement an instance of a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case. The prediction problem is approached in two phases. First, prefixes of previous traces are clustered according to control flow information. Secondly, a classifier is built for each cluster using event data to discriminate between fulfillments and violations. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classifier. The approach will be implemented in java as a plug-in of the process mining tool ProM.
Relevant literature: http://arxiv.org/abs/1506.01428
Topic in Process Mining
Student: ---
Supervisor: Fabrizio Maggi
Problem description:
Process discovery techniques try to generate process models from execution logs. Declarative process modeling languages are more suitable than procedural notations for representing the discovery results deriving from logs of processes working in dynamic and low-predictable environments. However, existing declarative discovery approaches aim at mining declarative specifications considering each activity in a business process as an atomic/instantaneous event. In this project, we investigate how to use discriminative rule mining in the discovery task, to characterize lifecycles that determine constraint violations and lifecycles that ensure constraint fulfillments. The approach will be implemented in java as a plug-in of the process mining tool ProM.
Related literature: http://link.springer.com/chapter/10.1007%2F978-3-319-09870-8_21
Topic in Deviance Mining
Student: ---
Supervisor: Fabrizio Maggi
Problem description:
Deviant process executions of a business process are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as executions that undershoot or exceed performance targets. This project aims at implementing a new approach for discriminating between normal and deviant executions. We start from the requirement that the discovered rules should explain potential causes of observed deviances. Using as a baseline feature types extracted using pattern mining techniques we try to explore more complex feature types to achieve higher levels of accuracy. The approach will be implemented in java as a plug-in of the process mining tool ProM.
Related literature: http://link.springer.com/chapter/10.1007%2F978-3-662-45563-0_25
Python library for k-means clustering with missing values
Student: Markus Lippus
Supervisor: Sven Laur
Problem description:
Standard implementations of k-means algorithm assume that none of the attributes (measurements) are missing. This is hardly the case for medical measurements. Hence, one needs a derivative of k-means algorithm that can naturally handle missing values. The aim of the work is to implement and test few alternatives described in the internet