Seminar on Enterprise Software - Kursused

Incremental Predictive Monitoring of Business Processes

Predictive monitoring of business processes aims at providing predictions on the execution of business processes by learning from the past. Traditional approaches to predictive process monitoring are based on a training phase, in which training data are used to learn, and a running phase in which the future of current ongoing traces is predicted. However, as soon as the future of the current trace becomes present, more up-to-date training data is made available. Purpose of this thesis is investigating incremental machine learning algorithms in order to be able to incrementally update the predictive model and provide more accurate and up-to-date predictions.

[1] Marco Maisenbacher, Matthias Weidlich: Handling Concept Drift in Predictive Process Monitoring. SCC 2017: 1-8

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Interpretable Predictive Monitoring of Business Processes

Recent advances of supervised machine learning in various tasks stem from the use of powerful and complex models (neural networks, deep learning, random forests). However, adoption in practice remains challenging because of limited interpretability of these methods and low actionability (what should the user do to alter the ongoing process instance to improve the expected/predicted outcome). Lack of understandability and actionability poses a serious challenge in domains such as financial and medical services, where the understanding of the decision behind the prediction is crucial. Moreover, the interpretability of the model can provide a valuable feedback in order to improve it even further. As such, this thesis project goes beyond the state-of-the-art in predictive process monitoring by developing methods and techniques to translate complex predictive models into understandable knowledge for key stakeholders in the process.

[1] https://www.sciencedirect.com/science/article/pii/S0957417417303950

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Conformance Checking with Data-Aware Declarative Models

Conformance checking is a branch of process mining embracing approaches for verifying whether the behavior of a process, as recorded in a log, is in line with some expected behaviors provided in the form of a process model. One of the open challenges in the context of conformance checking is the capability of supporting multi-perspective specifications, i.e., data, time, and resources. In this thesis, we close this gap by providing a framework for conformance checking based on MP-Declare, a multi-perspective version of the declarative process modeling language Declare. The approach will be implemented in the process mining tool ProM and experimented in real life case studies.

[1] Andrea Burattin, Fabrizio Maria Maggi, Alessandro Sperduti: Conformance checking based on multi-perspective declarative process models. Expert Syst. Appl. 65: 194-211 (2016)

[2] http://link.springer.com/chapter/10.1007%2F978-3-642-32885-5_6

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Discovery of Hybrid Process Models

The declarative-procedural dichotomy is highly relevant when choosing the most suitable process modeling language to represent a discovered process in the context of process discovery techniques. Less-structured processes with a high level of variability can be described in a more compact way using a declarative language. By contrast, procedural process modeling languages seem more suitable to describe structured and stable processes. However, in various cases, a process may incorporate parts that are better captured in a declarative fashion, while other parts are more suitable to be described procedurally. In these scenarios, hybrid models are the best choice for describing the discovery results. In this thesis, an approach for the discovery of hybrid process models from logs of process executions will be developed. The approach will be implemented in the process mining tool ProM and experimented in real life case studies.

[1] Fabrizio Maria Maggi, Tijs Slaats, Hajo A. Reijers: The Automated Discovery of Hybrid Processes. BPM 2014: 392-399

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Declarative Hierarchical Process Editor

While the notion of hierarchy has been widely investigated in the literature for procedural process models in terms of languages for describing and applications of the process/subprocess relationship, it is still a relatively young field for declarative process models. No tools or instruments supporting the modeling of hierarchical process models are currently available. Purpose of this thesis is implementing a tool which could support the modeling of hierarchical declarative process models and investigate its advantages in terms of understandability and reuse.

[1] Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini, Fabrizio Maria Maggi: Declarative Process Models: Different Ways to Be Hierarchical. ICSOC 2016: 104-119

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Change Propagation between Business Rules and Process Models

In real world situations it often happens that (procedural) business process models have to be compliant to some constraints (e.g., laws, policies, standards). In these situations, whenever a change occurs in the rules, the process model has to be changed accordingly (i.e., the change has to be propagated to the process model), in order to guarantee that the compliance is preserved. It can happen, however, that business process models evolve over the time and the rules become obsolete. In this scenario, a change in the process model has to be propagated to the rules. Purpose of this work is investigating how to propagate changes from the rules to the process models or vice versa so as to preserve compliance.

[1] Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini, Arne Laponin, Fabrizio Maria Maggi: Rule Propagation: Adapting Procedural Process Models to Declarative Business Rules. EDOC 2017: 165-174

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Runtime Monitoring of Business Processes

A business process execution can be monitored to see if it in line with some expected behavior. This expected behavior can be modeled as a set of compliance rules that should be satisfied during the process execution. If there is a violation in one of such rules during the process execution an exception is raised and feedback about the violation is given to the user. In this thesis an approach for the compliance of running process executions will be developed. Possibly, the approach should take into consideration not only the control flow perspective of the process but also the data perspective.

[1] Fabrizio Maria Maggi, Marco Montali, Michael Westergaard, Wil M. P. van der Aalst: Monitoring Business Constraints with Linear Temporal Logic: An Approach Based on Colored Automata. BPM 2011: 132-147

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Online Data-Aware Declarative Process Discovery from Event Streams

Stream processing is defined as “technologies designed to process large real-time streams of event data” and one of the example applications is process monitoring. The challenge to deal with streaming event data is also discussed in the Process Mining Manifesto. A process discovery algorithm is a function that maps an event log in a process model such that the model is representative for the behavior seen in the event log. A declarative process model is a set of business rules that describe the process behavior under an open world assumption, i.e., everything that is not forbidden by the model is allowed. These models can be used to express process behaviors involving multiple alternatives and can be enriched by data-aware conditions depending on some values that can be represented as attributes in the data in a compact way and are very suitable to be used in changeable and unstable environments with respect to the conventional procedural approaches. In [1] an approach to automatically discover declarative process models from streams of data has been presented. However this approach did not consider Data-aware conditions. In this thesis, we extend the algorithm in [1] in order to generate Data-aware declarative process models.

[1] Andrea Burattin, Marta Cimitile, Fabrizio Maria Maggi, Alessandro Sperduti: Online Discovery of Declarative Process Models from Event Streams. IEEE Trans. Services Computing 8(6): 833-846 (2015)

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Systematic Literature Review of Predictive Monitoring Methods for Predicting Continuous Process Performance Indicators

Predictive business process monitoring methods exploit historical process execution logs to provide predictions about running instances of a process, which enable process workers and managers to preempt performance issues or compliance violations. A number of approaches have been proposed to predict various process performance indicators, such as remaining cycle time, cost, or probability of deadline violation. Unfortunately, due to numerous differences in evaluation setups, such as the choice of datasets, evaluation metrics and baselines, the overall picture of the relative performance of various methods remains largely unclear. Accordingly, in this thesis, we conduct a systematic review and taxonomy of methods for the predictive monitoring of process performance indicators, and their comparative experimental evaluation.

[1] Andreas Rogge-Solti, Mathias Weske: Prediction of Remaining Service Execution Time Using Stochastic Petri Nets with Arbitrary Firing Delays. ICSOC 2013: 389-403

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

A Framework for the Categorization of Business Rules

Imperative process modeling languages such as BPMN, Petri nets, UML ADs, EPCs and BPEL, are very useful in environments that are stable and where the decision procedures can be predefined. Participants can be guided based on such process models. However, they are less appropriate for environments that are more variable and that require more flexibility. Consider, for instance, a physician in a hospital confronted with a variety of patients that need to be handled in a flexible manner. Nevertheless, there are some general regulations and guidelines to be followed. In such cases, business rules are more effective than imperative process models. In comparison to imperative approaches, which produce “closed” models (what is not explicitly specified is forbidden), declarative languages are “open” (everything that is not forbidden is allowed). In this way, models offer flexibility and still remain compact. There are several types of business rules in the literature that can be used to describe a business process. This thesis aims at providing a framework to classify them. The possible types of business rules are harvested based on a systematic literature review.

[1] Maja Pesic, Helen Schonenberg, Wil M. P. van der Aalst: DECLARE: Full Support for Loosely-Structured Processes. EDOC 2007: 287-300

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Deviance Mining of Business Processes

Deviant business process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as executions that undershoot or exceed performance targets. There are classification methods that can be used to discriminate between normal and deviant executions. In particular, they can be used to discover rules that explain potential causes of observed deviances. In this thesis, an approach for deviance mining of business processes will be implemented in the process mining tool ProM and experimented in real life case studies.

[1] Hoang Nguyen, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Suriadi Suriadi: Business Process Deviance Mining: Review and Evaluation. CoRR abs/1608.08252 (2016)

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

A Process Modeling Tool for Hybrid Process Models

The declarative-procedural dichotomy is highly relevant when choosing the most suitable process modeling language to represent a business process. Less-structured processes with a high level of variability can be described in a more compact way using a declarative language. By contrast, procedural process modeling languages seem more suitable to describe structured and stable processes. However, in various cases, a process may incorporate parts that are better captured in a declarative fashion, while other parts are more suitable to be described procedurally. In these scenarios, hybrid models are the best choice for describing business processes. In this thesis, starting from a well-defined formal semantics of hybrid models a tool for modelling hybrid processes will be implemented and experimented in real life case studies.

[1] Tijs Slaats, Dennis M. M. Schunselaar, Fabrizio Maria Maggi, Hajo A. Reijers: The Semantics of Hybrid Process Models. OTM Conferences 2016: 531-551

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Monitoring of Business Processes with Fuzzy Logic

In different research fields a research issue has been to establish if the external, observed behavior of an entity is conformant to some rules/specifications/expectations. Most of the available systems, however, provide only simple yes/no answers to the conformance issue. Some works introduce the idea of a gradual conformance, expressed in fuzzy terms. The conformance degree of a process execution is represented through a fuzzy score. In this thesis, we provide an approach to monitor process executions ad give at runtime diagnostics based on fuzzy conformance. The approach will be implemented in the process mining tool ProM and experimented in real life case studies.

[1] Stefano Bragaglia, Federico Chesani, Paola Mello, Marco Montali, Davide Sottara: Fuzzy Conformance Checking of Observed Behaviour with Expectations. AI*IA 2011: 80-91

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

Generation of Logs from Declarative Process Models

In the process mining field, several techniques have been developed during the last years, for the discovery of declarative process models from event logs. This type of models describes processes on the basis of temporal constraints. Every behavior that does not violate such constraints is allowed, and such characteristic has proven to be suitable for representing highly flexible processes. One way to test a process discovery technique is to generate an event log by simulating a process model, and then verify that the process discovered from such a log matches the original one. For this reason, a tool for generating event logs starting from declarative process models becomes vital for the evaluation of declarative process discovery techniques. In this thesis, we develop an approach for the automated generation of event logs, starting from process models that are based on Declare, one of the most used declarative modeling languages in the process mining literature.

[1] Claudio Di Ciccio, Mario Luca Bernardi, Marta Cimitile, Fabrizio Maria Maggi: Generating Event Logs Through the Simulation of Declare Models. EOMAS@CAiSE 2015: 20-36

Proposed by F. M. Maggi

email: f.m.maggi@ut.ee

A Systematic Survey of the GDPR-compliance Oriented Methods

Starting from May 2018, organisations will have to comply to the GDPR regulation when processing personal data. The goal of this systematic literature review is to understand the extent of the compliance methods, which potentially could help the business organisations to meet the aforementioned goal. The starting papers to get acquainted with the problem could be:

[1] M. Robol, M. Salnitri and P. Giorgini, “Toward GDPR-Compliant Socio-Technical Systems: Modeling Language and Reasoning Framework,” in IFIP Working Conference on The Practice of Enterprise Modeling, Leuven, 2017.

[2] M.-L. Alaküla and R. Matulevicius, “An Experience Report of Improving Business Process Compliance Using Security Risk-Oriented Patterns,” in The Practice of Enterprise Modeling 8th IFIP WG 8.1. Working Conference, Valencia, 2015.

[3] V. Diamantopoulou, K. Angelopoulos, M. Pavlidis and H. Mouratidis, “A Metamodel for GDPR-based Privacy Level Agreements,” in Proceedings of the ER Forum 2017 and the ER 2017 Demo Track, Valencia, 2017.

[4] J. Becker, M. Heddiger, S. Brauer and R. Knackstedt, “Integrating Regulatory Requirements into Information Systems Design and Implementation,” in The International Conference on Information Systems, Auckland, 2014.

[5] S. Islam, H. Mouratidis and J. Jürjens, “A framework to support alignment of secure software engineering with legal regulations,” Software & Systems Modeling, vol. 10, no. 3, pp. 369-394, July 2011.

[6] L. T. Ly, F. M. Maggi, M. Montali, S. Rinderle-Ma and W. M. P. van der Aalst, “Compliance monitoring in business processes: Functionalities, application, and tool- support,” Information Systems, vol. 54, no. 1, pp. 209-234, 2015.

[7] J. García-Galán, L. Pasquale, G. Grispos and B. Nuseibeh, “Towards adaptive compliance,” in SEAMS '16 Proceedings of the 11th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Texas, 2016.

Proposed by R. Matulevicius

email: rma@ut.ee

A Systematic Survey of the Security and Security Risks in the Internet of Things Systems

Security is an important core-stone when designing and implementing Internet of Things (IoT) systems. The goal of this systematic study is to understand what are the the typical security and privacy risks and how they are mitigated in the IoT systems. The starting points for getting acquainted with the problem could be:

[1] M. Ammar, G. Russello, B. Crispo, Internet of Things: A survey on the security of IoT frameworks, Journal of Information Security and Applications, Volume 38, February 2018, Pages 8-27

[2] Fadele Ayotunde Alaba, Mazliza Othman, Ibrahim Abaker Targio Hashem, Faiz Alotaibi, Internet of Things security: A survey, Journal of Network and Computer Applications, Volume 88, 15 June 2017, Pages 10-28

[3] Arbia Riahi Sfar, Enrico Natalizio, Yacine Challal, Zied Chtourou, A roadmap for security challenges in the Internet of Things, Digital Communications and Networks, In press, corrected proof, Available online 13 April 2017

Proposed by R. Matulevicius

email: rma@ut.ee

A Systematic Survey of the Cognitive Visual Notations for the Information Systems Modelling

Nowadays there exists a number of proposals to improve the modelling languages with the cognitively effective notations [1]. The example of these studies include unified modelling language [2], process modelling language [3], languages for goal modelling [4], and languages for function modelling [5]. The goal of this work is to perform a systematic survey to understand the extent of the visual notation analysis.

[1] Moody, D. (2009a). The “physics” of notations: toward a scientific basis for constructing visual notations in software engineering. IEEE Transactions on Software Engineering, 35(6), 756–779.

[2] Moody, D., & van Hillegersberg, J. (2008). Evaluating the visual syntax of UML: An analysis of the cognitive effectiveness of the UML family of diagrams. In International Conference on Software Language Engineering (pp. 16–34). Springer.

[3] Genon, N., Heymans, P., & Amyot, D. (2010). Analysing the cognitive effectiveness of the BPMN 2.0 visual notation. In International Conference on Software Language Engineering (pp. 377–396). Springer. Retrieved from https://link.springer.com/chapter/10.1007/978-3-642-19440-5_25

[4] Moody, D., Heymans, P., & Matulevicius, R. (2010). Visual syntax does matter: improv- ing the cognitive effectiveness of the i* visual notation. Requirements Engineering, 15(2), 141–175. https://doi.org/10.1007/s00766-010-0100-1

[5] Saleh F., El-Attar M., A scientific evaluation of the misuse case diagrams visual syntax, Journal Information and Software Technology, Volume 66 Issue C, October 2015, pp. 73-96

Proposed by R. Matulevicius

email: rma@ut.ee

A Systematic Survey of Information Systems Privacy Modelling Languages

Privacy modelling and designing is an important activity in the modern information system development. There exist a number of modelling languages [1] [2] [3] that provide means to capture few major privacy concerns. The goal of this systematic survey is to understand the extent to which nowadays modelling languages can cover the privacy modelling and analysis domain.

[1] Ladha, W., Mehandjiev, N., Sampaio, P.: Modelling of privacy-aware business processes in BPMN to protect personal data. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 1399–1405 (2014)

[2] Mouratidis, H., Kalloniatis, C., Islam, S., Hudic, A., Zechner, L.: Model based process to support security and privacy requirements engineering. Int. J. Secur. Softw. Eng. 3(3), 1–22 (2012)

[3] Pullonen P., Matulevicius R., Bogdanov D., PE-BPMN: Privacy-Enhanced Business Process Model and Notation., BPM 2017, LNCS 10445, pp. 40–56, 2017

Proposed by R. Matulevicius

email: rma@ut.ee

Impact of Business Process Compliance on Information Systems

The introduction of the GDPR in May this year shook up businesses across Europe regardless of their industry. While there are several approaches to business process compliance itself, the events that followed its announcement exposed the need for a standardized, or at the very least, structured approach to policy impact evaluation and implementation in information systems. Such an approach would combine technical and legal viewpoints to guide organizational decision makers. For this seminar, the student will be required to perform a literature review and present the state of the art in business process compliance, focused on managing its technical impact.

[1] http://ieeexplore.ieee.org/document/7182466/

[2] https://www.sciencedirect.com/science/article/pii/S0740624X11000700?via%3Dihub

Proposed by J. Tom and R. Matulevicius

email: rma@ut.ee

Types of Hackathon Events

Hackathons started out as time-bounded competitive events during which young developers formed small ad-hoc teams and engaged in short-term intense collaboration on software projects for pizza and sometimes the prospect of a future job. In recent years however hackathons have increasingly diversified. There is now a large variety of events aiming at creating innovative prototypes for arts and culture, medicine, civic open innovation as well as events that are aimed at strengthening interaction in specific scientific domains and teaching specific skills. Your task is to provide a comprehensive overview over different types of hackathon events. This overview should include the goals of the different events as well as the specific way they are organized.

[1] Esteve Almirall, Melissa Lee, and Ann Majchrzak. 2014. Open innovation requires integrated competition-community ecosystems: Lessons learned from civic open innovation. Business Horizons 57, 3 (2014), 391–400.

[2] Hoang, C., J. Liu, Z. Bokhari, and A. Chan (2016): ‘IBM 2016 community hackathon’. In: Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering. pp. 331–332.

[3] Komssi, M., D. Pichlis, M. Raatikainen, K. Kindström, and J. Järvinen (2015): ‘What are Hackathons for?’. IEEE Software, vol. 32, no. 5, pp. 60–67.

[4] Nandi, A. and M. Mandernach (2016): ‘Hackathons as an informal learning platform’. In: Proceedings of the 47th ACM Technical Symposium on Computing Science Education. pp. 346– 351.

Proposed by A. Nolte

email: alexander.nolte@udo.edu

Hackathon Outcomes

Research on hackathons is just emerging. Most current work focuses on how to organize them and how to manage group dynamics during phases of intensive collaboration. There is little to no research around the outcomes of hackathons so far. Your task is to find work on hackathon outcomes and categorize it alongside the following three dimensions: Hackathon projects, teams and individual participants. The description should contain information about the specifics of how each event was organized.

[1] Baccarne, B., P. Mechant, D. Schuurma, L. De Marez, and P. Colpaert (2014): ‘Urban socio- technical innovations with and by citizens’. Interdisciplinary Studies Journal, vol. 3, no. 4, pp. 143.

[2] Cobham, D., B. Hargrave, K. Jacques, C. Gowan, J. Laurel, S. Ringham, et al. (2017): ‘From hackathon to student enterprise: an evaluation of creating successful and sustainable student entrepreneurial activity initiated by a university hackathon’.

[3] Ruiz-Garcia, A., L. Subirats, and A. Freire. ‘Lessons learned in promoting new technologies and engineering in girls through a girls hackathon and mentoring’. Health Sciences, vol. 164, pp. 72–573.

[4] Tandon, J., R. Akhavian, M. Gumina, and N. Pakpour (2017): ‘CSU East Bay Hack Day: A University hackathon to combat malaria and zika with drones’. In: Global Engineering Education Conference (EDUCON), 2017 IEEE. pp. 985–989.

Proposed by A. Nolte

email: alexander.nolte@udo.edu

Comparing Business Process Variants Based on Their Event Logs

Companies often manage multiple variants of the same business process, like for example multiple variants of an order-to-cash process for different classes of customers (e.g. retail customers vs. wholesale customers) or for different products. It is sometimes useful to understand why one variant of a process performs better or worse than another one. To this end, it is useful to compare the event logs produced during the executions of these processes, in order to detect differences that might explain why one variant performs better than the other.

[1] Alfredo Bolt, Massimiliano de Leoni, Wil M. P. van der Aalst: A Visual Approach to Spot Statistically-Significant Differences in Event Logs Based on Process Metrics. CAiSE 2016: 151-166

[2] N. R. T. P. van Beest, Marlon Dumas, Luciano García-Bañuelos, Marcello La Rosa: Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405

[3] Suriadi Suriadi, Moe Thandar Wynn, Chun Ouyang, Arthur H. M. ter Hofstede, Nienke J. van Dijk: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013: 449-464

[4] https://arxiv.org/abs/1608.08252

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Customer Churn Prediction Based on Social Ties

Customer churn refers to the phenomenon of customers discontinuing their regular purchase of a product or service such as canceling their subscription to a telecommunications service (e.g. canceling a mobile phone subscription). Many data mining methods for churn prediction have been studied over time. In this literature review, you are asked to focus on churn prediction approaches that use social ties between consumers of the service, for example:

[1] W. Verbeke, D. Martens, and B. Baesens. Social network analysis for customer churn prediction. Applied Soft Computing, 14:431–446, 2014

[2] Yossi Richter, Elad Yom-Tov, Noam Slonim. Predicting Customer Churn in Mobile Networks through Analysis of Social Groups. In Proc. of SDM 2010, pp. 732-741.

[3] Koustuv Dasgupta, Rahul Singh, Balaji Viswanathan, Dipanjan Chakraborty, Sougata Mukherjea, Amit Anil Nanavati, Anupam Joshi. Social ties and their relevance to churn in mobile telecom networks. Proc. of EDBT 2008, pp. 668-677

[4] https://dl.acm.org/citation.cfm?id=3105832

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Queue Mining: Data Mining to Analyze Queuing Effects in Business Processes

Queues caused by resource contention (tasks waiting for available resources) are a major source of delays during the execution of day-to-day business process. A number of data mining techniques to analyze queues based on business process execution logs have been proposed recently, for example:

[1] Luc de Smet. Queue Mining: Combining Process Mining and Queueing Analysis to Understand Bottlenecks, to Predict Delays, and to Suggest Process Improvements. Masters Thesis, Eindhoven University of Technology, The Netherlands, 2014. - http://alexandria.tue.nl/extra1/afstversl/wsk-i/Smet_de_2014.pdf

[2] Arik Senderovich, Sander J. J. Leemans, Shahar Harel, Avigdor Gal, Avishai Mandelbaum, Wil M. P. van der Aalst: Discovering Queues from Event Logs with Varying Levels of Information. Business Process Management Workshops 2015: 154-166

[3] Arik Senderovich, Matthias Weidlich, Avigdor Gal, Avishai Mandelbaum. Queue Mining - Predicting Delays in Service Processes. In Proceedings of CAiSE'2014, Springer, pp. 42-57.

[4] Arik Senderovich, Matthias Weidlich, Avigdor Gal, Avishai Mandelbaum, Sarah Kadish, Craig A. Bunnell: Discovery and Validation of Queueing Networks in Scheduled Processes. CAiSE 2015: 417-433

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Data Race Detection

Data races occur when concurrent threads in a program access the same data in a way that creates non-determinism with undesirable effects. Numerous techniques and tools for data race detection have been developed in the past two decades, but only recently some of them have reached maturity and have been used on a large scale. Write a synthesis of the state of the art in the field. Below are some initial pointers to the literature. I do not provide full bibliographic details, but you should be able to retrieve them in the DBLP bibliographic database or in the papers themselves.

[1] http://dslab.epfl.ch/pubs/portend.pdf

[2] http://dl.acm.org/citation.cfm?id=1791203

[3] http://research.google.com/pubs/pub37123.html

[4] http://dl.acm.org/citation.cfm?id=2688205

[5] http://homepages.inf.ed.ac.uk/dts/students/spathoulas/spathoulas.pdf

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Credit Scoring Using Social Network Data

Companies such as Lenddo and Cignifi use social media data for credit scoring. In your seminar paper, you will review emerging technology, methods and initial practical experiences related to the development of credit scoring models using social network data. Below are some initial pointers

[1] http://pubsonline.informs.org/doi/abs/10.1287/mksc.2015.0949

[2] http://essay.utwente.nl/68338/

[3] http://people.stern.nyu.edu/bakos/wise/papers/wise2009-p09_paper.pdf

[4] http://e3journals.org/cms/articles/1330783408_Wanting.pdf

[5] https://www.google.com/patents/US5870721

[6] https://www.google.com/patents/US8806584

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Mining Behavior Models and Business Process Models from Web Applications

In contemporary information systems, a large proportion of business processes are supported by Web applications used by employees across an enterprise as well as customers and business partners. As these Web applications grow and evolve, their behavior becomes increasingly complex to the point that no stakeholder in the company has a full detailed understanding of the business process. Yet, this understanding is necessary to maintain and improve the business process. This observation has motivated a range of research that aims to develop methods and tools for discovering (or "reverse engineering") behavior models and business process models from Web applications. Your task is to review literature in this field and provide a synthesis of the main approaches and their intended applications. Below are some initial pointers to the literature. I do not provide full bibliographic details, but you should be able to retrieve them in the DBLP bibliographic database or in the papers themselves.

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4812747&tag=1

[2] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6258316

[3] http://link.springer.com/chapter/10.1007/978-3-642-40176-3_7

[4] http://dl.acm.org/citation.cfm?id=2568234

[5] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7169616

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Robotic Business Process Automation

Robotic process automation refers to the emerging practice of automating business processes by means of software systems that engage in complex interactions or that take complex decisions, which would be traditionally be performed by human actors. Several pilot deployments and case studies of robotic process automation have been reported in recent years. Your task is to synthesize the ongoing research and developments in this field. Below are some initial pointers to the literature. I do not provide full bibliographic details, but you should be able to retrieve them in the DBLP bibliographic database or in the papers themselves.

[1] https://tinyurl.com/y8qxt8jt

[2] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1261370

[3] http://eprints.lse.ac.uk/64519/

[4] http://eprints.lse.ac.uk/64518/

[5] http://eprints.lse.ac.uk/64516/

Proposed by M. Dumas

email: marlon.dumas@ut.ee

Predicting Business Processes Completion Time

Predictive monitoring techniques aim at making predictions about ongoing executions of a business process (i.e. active cases). One specific predictive monitoring task is that of predicting the completion time of an active case. Several techniques have been proposed to address this predictive task. Write a synthesis of the state of the art in the field. Below are some initial pointers to the literature. I do not provide full bibliographic details, but you should be able to retrieve them in the DBLP bibliographic database or in the papers themselves (note: these articles can be downloaded from inside the university network or VPN).

[1] http://kodu.ut.ee/~dumas/pubs/icssp2017whitebox.pdf

[2] http://www.sciencedirect.com/science/article/pii/S0306437915000642

[3] http://link.springer.com/chapter/10.1007/978-3-642-45005-1_27

[4] http://www.sciencedirect.com/science/article/pii/S0306437910000864

[5] http://link.springer.com/chapter/10.1007/978-3-540-88871-0_22

Proposed by M. Dumas

email: marlon.dumas@ut.ee

If Media is Biased ? An Empirical Analysis

News channel often try to portray news stories from their own perspectives. It has been observed particular about media houses that they are biased towards specific topics, people and political parties. In this thesis, you will be analyzing a set of news stories derived from different news websites (such as BBC, CNN etc). The study will be done with an intention to explore if the news channels are biased towards specific 1) Topics, 2) People or 2) Political parties etc. You will be using data science techniques (such as opinion mining, machine learning) for performing the empirical analysis of your study.

[1] Robert M. Entman. Media framing biases and political power: Explaining slant in news of Campaign 2008. Journalism. Vol 11, Issue 4, pp. 389 - 408

[2] David Niven. Bias in the News: Partisanship and Negativity in Media Coverage of Presidents George Bush and Bill Clinton. International Journal of Press and Politics. Vol 6, Issue 3, pp. 31 - 46

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Understanding Filter bubbles in social media networks

A filter bubble is an algorithmic bias that skews or limits the information an individual user sees on the internet. The bias is caused by the weighted algorithms that search engines, social media sites and marketers use to personalize user experience. The concept is particularly important in creating opinionated individuals. In this thesis, a study will be performed to understand the effect of filter bubbles on social media individuals.

[1] Mario Haim, Andreas Graefe & Hans-Bernd Brosius, Burst of the Filter Bubble? Effects of personalization on the diversity of Google News. Journal Digital Journalism.

[2] Nguyen, Tien T. and Hui, Pik-Mai and Harper, F. Maxwell and Terveen, Loren and Konstan, Joseph A. Exploring the Filter Bubble: The Effect of Using Recommender Systems on Content Diversity, WWW 2014

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Analyzing Echo Chambers in Social Networks

An echo chamber is a metaphorical description of a situation in which information, ideas, or beliefs are amplified or reinforced by communication and repetition inside a defined system. In this thesis, we will investigate echo chambers in social media platforms such as Twitter or Facebook and their effect on social media users. Techniques like network science + machine learning will be explored for understanding echo chambers in social media.

[1] Eric Gilbert, Tony Bergstrom, and Karrie Karahalios. 2009. Blogs are echo chambers: Blogs are echo chambers. In 42nd Hawaii International Conference on System Sciences. IEEE, 1–10.

[2] Eric Lawrence, John Sides, and Henry Farrell. 2010. Self-segregation or deliberation? Blog readership, participation, and polarization in American politics. Perspectives on Politics 8, 1 (2010), 141–157

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Left, Center or Right?: Controversial groups on Social Media

Description: With respect to political views, the users in social media can often be classified broadly in three categories namely left, right or central. In this thesis, the users in social media platforms, in particularly in platforms like Facebook, will be studied anonymously. The crux of the problem will be to predict users' inclination in terms of right, center and left political parties. Data science techniques such as network science, machine learning, sentiment analysis will be explored for predicting problem.

[1] Gottipati S., Qiu M., Yang L., Zhu F., Jiang J. (2013) Predicting User’s Political Party Using Ideological Stances. In: Jatowt A. et al. (eds) Social Informatics. SocInfo 2013. Lecture Notes in Computer Science, vol 8238. Springer,

[2] Michael D. Conover, Bruno Gonc¸alves, Jacob Ratkiewicz, Alessandro Flammini and Filippo Menczer. Predicting the Political Alignment of Twitter Users IEEE Third Inernational Conference on Social Computing (SocialCom), 2011.

[3] Aaron Acosta (ateam91), Silviana Ciurea-Ilcus (smci), Michal Wegrzynski (michalw). Predicting users’ political support from their Reddit comment history . goo.gl/Gchzyw

[4] Daniel Xiaodan Zhou, Paul Resnick, Qiaozhu Mei. Classifying the Political Leaning of News Articles and Users from User Votes. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media 2011

[5] Waheed H, Anjum M, Rehman M, Khawaja A (2017) Investigation of user behavior on social networking sites. PLoS ONE 12(2)

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Measuring Corporate Reputation through Online Social Media

When businesses are caught out engaging in illegal or immoral activities, their reputation might suffer. Corporate reputation is a reflection of how a business is regarded by its customers and the public in general. If corporate misbehaviour negatively affects a business’ reputation, customers might switch to rival businesses. For this reason, reputation has got a central role in free markets as it has the potential to deter businesses from misbehaving. The extent, to which corporate wrongdoings trigger a reputational loss is still debated and is subject to a large body of academic works. Most of these works are based on survey methods to measure reputation. This research relies on a more direct method to measure reputational changes, by conducting a sentiment analysis of how the public reacted on Twitter to some of the most high-profile corporate misconducts. In this particular work thesis, corporate reputation will be studied using the Volkswagen (VW) scandal as a case study and the public reaction it created on the Twitter. VW’s scandal has been chosen because it has been widely covered over time through both traditional and social media. Moreover we can measure how changes in media coverage and social media reaction affected VW’s financial performance. The dataset and related literature will be provided for speeding up the work.

[1] Corné Dijkmans, PeterKerkhof,Camiel J.Beukeboom, A stage to engage: Social media use and corporate reputation, 2014.

[2] Nadine Gatzert, The impact of corporate reputation and reputation damaging events on financial performance: Empirical evidence from the literature, 2015.

Proposed by R. Sharma and P. Ormosi

email: rajesh.sharma@ut.ee

Behavior analysis of bike users in a city settings

As a part of smart cities, authorities are creating separate lanes and bicycle rack for bikers. However, the key question is the utilization of these resources put in place for better traffic management. In this thesis, we will be analyzing real dataset of a Italian city with a population of 385.192 inhabitants. The dataset is taken over a period of 6 months, from April 2017 to September 2017. We will be predicting users' behavior in terms of using these resources. We expect you to use data science techniques and machine learning techniques. Dataset is property of SRM Reti e Mobilità Srl and all the analysis must respect the NDA preserving privacy and anonymization of the users.

References: 1) Gabriel Martins Dias, Boris Bellalta and Simon Oechsner . Predicting Occupancy Trends in Barcelona’s Bicycle Service Stations Using Open Data , SAI Intelligent Systems Conference 2015 2) Eoin O’Mahony, David B. Shmoys. Data Analysis and Optimization for (Citi)Bike Sharing . Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence 2015

Proposed by R. Sharma, F. Bertini and S. Rizzo

email: rajesh.sharma@ut.ee

Tracking Unusual Activities in Traffic Police Data

Sensors and cameras often being places on highways are meant for disruption free traffic to detect accidents and to take appropriate and timely actions. However, the traffic data can also be used for detecting unusual activities. In this thesis, you will be analyzing large scale traffic police data shared by the Italian authorities to detect anomalies or unusual behavior on highways. Dataset will be provided. We expect you to find unusual behavior in this traffic data using machine learning techniques in particularly using anomaly detection techniques.

[1] Liang Xiong, Xi Chen, Jeff Schneider. Direct Robust Matrix Factorizatoin for Anomaly Detection. International Conference on Data Mining (ICDM), 2011.

[2] Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. https://arxiv.org/pdf/1502.06922.pdf

[3] Jefferson Ryan Medel. Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units. http://scholarworks.rit.edu/theses/9319/

Proposed by R. Sharma, F. Bertini and S. Rizzo

email: rajesh.sharma@ut.ee

Analysing Customers Behavior Using Purchase Data

The purchasing transactions being performed by customers can provide valuable insights about the behavior of individuals. For example, transactions can infer about the purchasing power (money) as well as eating habits of the users. In this thesis, you will analyze a large scale customer transactions, using data science techniques, for predicting customer's behavior.

[1] Diego Pennacchioli, Michele Coscia, Salvatore Rinzivillo, Dino Pedreschi, Fosca Giannotti. Explaining the Product Range Effect in Purchase Data BigData 2013

[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” SIGMOD Rec., vol. 22, no. 2, pp. 207–216, Jun. 1993.

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Analyzing Question-Answering System: Quora Case Study

Question answering systems (QASs) generate answers of questions asked in natural languages. Early QASs were developed for restricted domains and have limited capabilities. Currently platforms like Quora have been helpful in diminishing the boundaries. In this thesis, using Quora as a case study, you will perform users analytics to understand the reasons behind the success of a platform where "all kind of questions are welcome". We expect you to perform empirical study among the users using Quora as a case study in this thesis.

[1] Abdel ghani Bouzianea, Djelloul Bouchihaa, Noureddine Doumi, Mimoun Malki. Question Answering Systems: Survey and Trends.. Procedia Computer Science

[2] Albert Tung and Eric Xu, Determining Entailment of Questions in the Quora Dataset. Class Reports https://web.stanford.edu/class/cs224n/reports/2748301.pdf

Proposed by R. Sharma

email: rajesh.sharma@ut.ee

Vulnerability Analysis in Multilayer Networks: A Data Science Approach

Description: Traditionally, networks are studied individually. In other words, studies do not consider the relationships that might exists among any two or more networks connected with each other. For example, some of the social media users are present not only on Facebook but also on Twitter and other social media platforms such as Instagram, etc [1]. As another example, related to transportation, it is more prudent to jointly analyse various mobility mediums such as road, rail and air collectively [2]. A bird eye view of the collection of various networks is called a multilayer network (ML), encompassing the multitude of related individual networks [3]. In such ML networks, we would like to study the problem of vulnerability analysis, that is how strong a (complex) network is against any kind of breakage. The breakage could occur due to natural or unnatural events (for example, earthquake, accidents, riots). In this master, by focusing on transportation networks as a use case, vulnerability analysis will be studied by considering and possibly defining novel metrics adapted to the context of multilayer networks (e.g., centrality, risk probability, etc.), building on previous work from the team [4, 5].

[1] Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, Mason A. Porter; Multilayer networks, Journal of Complex Networks, Volume 2, Issue 3, 1 September 2014, Pages 203–271

[2] Aleta, Alberto, Sandro Meloni, and Yamir Moreno. "A Multilayer perspective for the analysis of urban transportation systems." Scientific reports 7 (2017): 44359.

[3] Kivelä, Mikko, et al. "Multilayer networks." Journal of complex networks 2.3 (2014): 203-271.

[4] A Furno, NE El Faouzi, R Sharma, E Zimeo. Two-level Clustering Fast Betweenness Centrality Computation for Requirement-driven Approximation. IEEE Big Data 2017

Proposed by R. Sharma and A. Furno

email: rajesh.sharma@ut.ee

Replication of Empirical Software Engineering Case Study Experiments

Empirical software engineering community publishes many case studies validating different approaches and analytical algorithms to software engineering. Unfortunately, these studies are rarely validated by independent replication. To make matters worse, the studies use different validation metrics, which makes them incomparable. Thus, your mission, should you choose to accept it, is to analyse different published case studies on one topic (e.g. bug detection, code churn estimation) to evaluate their replicability and replicate the studies in order to make them comparable. In short you will: 1. envisage a workflow/pipeline for replicating published studies (including testing for replicability); 2. use the workflow to replicate several studies; 3. validate these studies and compare their results on an common scale.

[1] Le Goues, C., Dewey-Vogt, M., Forrest, S., & Weimer, W. (2012, June). A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Software Engineering (ICSE), 2012 34th International Conference on (pp. 3-13). IEEE. http://ieeexplore.ieee.org/abstract/document/6227211/

[2] Tian, Y., Lawall, J., & Lo, D. (2012, June). Identifying linux bug fixing patches. In Proceedings of the 34th International Conference on Software Engineering (pp. 386-396). IEEE Press. https://dl.acm.org/citation.cfm?id=2337269

[3] Kagdi, H., Collard, M. L., & Maletic, J. I. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software: Evolution and Process, 19(2), 77-131. http://onlinelibrary.wiley.com/doi/10.1002/smr.344/full

[4] Thomas, S. W. (2011, May). Mining software repositories using topic models. In Proceedings of the 33rd International Conference on Software Engineering (pp. 1138-1139). ACM. https://dl.acm.org/citation.cfm?id=1986020

Proposed by S. Karus

email: siim.karus@ut.ee

GPU-Accelerated Data Analytics

In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.

[1] Bakkum, P., & Skadron, K. (2010, March). Accelerating SQL database operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (pp. 94-103). ACM. https://dl.acm.org/citation.cfm?id=1735706

[2] Breß, S., Heimel, M., Siegmund, N., Bellatreche, L., & Saake, G. (2014). GPU-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV (pp. 1-35). Springer, Berlin, Heidelberg. https://link.springer.com/chapter/10.1007/978-3-662-45761-0_1

[3] Sitaridi, E. A., & Ross, K. A. (2016). GPU-accelerated string matching for database applications. The VLDB Journal, 25(5), 719-740. https://link.springer.com/article/10.1007/s00778-015-0409-y

[4] Karnagel, T., Mueller, R., & Lohman, G. M. (2015). Optimizing GPU-accelerated Group-By and Aggregation. In ADMS@ VLDB (pp. 13-24).

Proposed by S. Karus

email: siim.karus@ut.ee

Graph Reasoning on Software Repositories

Software repositories offer lots of insights into software development process. Most of the analytical processes used on software repositories are heavily reliant on the availability of training data – samples of positive and negative cases. These samples, however, have to be determined by people. This limits the usefulness of the analytics as people might miss possible relationships or make mistakes in specifying the training output values. Graph reasoning on the other hand is a machine learning technique that does not require training data and uses internal rules to find relationships in the data. As such, graph reasoning can be used to discover unwanted or unnoticed patterns in software and software evolution. Your task is to bridge these two disciplines in order to further our understanding of software, its evolution and perhaps even improve the quality assurance process.

[1] Kiefer, C., Bernstein, A., & Tappolet, J. (2007, May). Mining software repositories with isparol and a software evolution ontology. In Proceedings of the Fourth International Workshop on Mining Software Repositories (p. 10). IEEE Computer Society. https://dl.acm.org/citation.cfm?id=1269048

[2] Watkins, E. R., & Nicole, D. A. (2006, January). Named graphs as a mechanism for reasoning about provenance. In Asia-Pacific Web Conference (pp. 943-948). Springer, Berlin, Heidelberg. https://link.springer.com/chapter/10.1007/11610113_99

[3] Keivanloo, I., Forbes, C., Hmood, A., Erfani, M., Neal, C., Peristerakis, G., & Rilling, J. (2012, June). A linked data platform for mining software repositories. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on (pp. 32-35). IEEE. http://ieeexplore.ieee.org/abstract/document/6224296/

[4] Martinez, M., & Monperrus, M. (2015). Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering, 20(1), 176-205. https://link.springer.com/article/10.1007/s10664-013-9282-8

Proposed by S. Karus

email: siim.karus@ut.ee

1. Over the past years we have seen how digital technologies have transformed businesses and how we interact with traditional services. Many of the traditional business models have been disrupted and new business models have emerged. This topic is about investigating in what main ways, digital technologies have enabled new companies to offer same things as incumbents but in digital ways. In other words, how has digital technologies changed business models? Below you will find a few references as starting point.

[1] Big data for big business? A taxonomy of data-driven business models used by start-up firms PM Hartmann, M Zaki, N Feldmann, 2014 - nsuchaud.fr http://www.nsuchaud.fr/wp-content/uploads/2014/08/Big-Data-for-Big-Business-A-Taxonomy-of-Data-driven-Business-Models-used-by-Start-up-Firm.pdf

[2] Business models in a new digital culture: The open long tail model A Rieple, P Pisano - Symphonya, 2015 - search.proquest.com http://symphonya.unimib.it/article/view/2015.2.06rieple.pisano/10695

[3] Ride on! Mobility business models for the sharing economy B Cohen, J Kietzmann - Organization & Environment, 2014 - journals.sagepub.com http://journals.sagepub.com/doi/abs/10.1177/1086026614546199

[4] Reflections on societal and business model transformation arising from digitization and big data analytics: A research agenda C Loebbecke, A Picot - The Journal of Strategic Information Systems, 2015 - Elsevier https://www.sciencedirect.com/science/article/pii/S0963868715000372

Proposed by F. Milani

email: milani@ut.ee

2. With the emergence of digital technologies, physical presence is no longer a necessity as it used to be. Work is conducted from remote places and teams can be located at different continents but still work together. For instance, when working with elicitation and collaboration within projects, there can be virtual teams. This topic aims at investigating how such teams can work, what is required for successful collaboration and “teamwork” or in other words, how is the work conducted in virtual teams?

[1] An analysis of virtual team characteristics: A model for virtual project managers S Morley, K Cormican, P Folan - Journal of technology management & …, 2015 - SciELO Chile https://scielo.conicyt.cl/scielo.php?pid=S0718-27242015000100014&script=sci_arttext

[2] Communication Strategies for Successful Virtual Teams S Dunn, C Grannan, M Raisinghani… - … (HICSS), 2015 48th …, 2015 - ieeexplore.ieee.org http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7069700

Proposed by F. Milani

email: milani@ut.ee

3. The customer perspective is becoming increasingly important. A models that capture the user perspective is called “customer journey maps”. However, capturing the complexity of customer behaviour and the many ways in which they can interact with a product or company, is not easily managed. This topic is about how can multi- and omni-channels be captured and managed in customer journey maps?

[1] Mapping customer journeys in multichannel decision-making J Wolny, N Charoensuksai - Journal of Direct, Data and Digital Marketing …, 2014 - Springer https://link.springer.com/article/10.1057/dddmp.2014.24

[2] Managing multi-and omni-channel distribution: metrics and research directions KL Ailawadi, PW Farris - Journal of retailing, 2017 - Elsevier https://www.sciencedirect.com/science/article/pii/S0022435916300823

Proposed by F. Milani

email: milani@ut.ee

4. Digital technologies have changed the way we do our work. Many companies have become digital and many are struggling with digital transformation. As such, we have companies at different degrees of “digital maturity”. The question of this topic is simply how can one measure the digital maturity of businesses?

[1] DIGITAL MATURITY IN TRADITIONAL INDUSTRIES–AN EXPLORATORY ANALYSIS G Remane, A Hanelt, F Wiesboeck, L Kolbe - 2017 - aisel.aisnet.org http://aisel.aisnet.org/cgi/viewcontent.cgi?article=1009&context=ecis2017_rp

[2] Stages in Digital Business Transformation: Results of an Empirical Maturity Study. S Berghaus, A Back - MCIS, 2016 - pdfs.semanticscholar.org https://pdfs.semanticscholar.org/d416/aa50e0eb6abb3f5e6e5fa071931f9a494d28.pdf

Proposed by F. Milani

email: milani@ut.ee

5. How to align IT and business to form a digital business strategy?

[1] Digital business strategy: toward a next generation of insights A Bharadwaj, O El Sawy, P Pavlou, N Venkatraman - 2013 - papers.ssrn.com https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2742300

[2] A framework of IS/business alignment management practices to improve the design of IT Governance architectures J Orozco, A Tarhini, T Tarhini - International Journal of Business and …, 2015 - ccsenet.org http://www.ccsenet.org/journal/index.php/ijbm/article/view/44329/25166

[3] STRATEGIC IT-BUSINESS ALIGNMENT AS MANAGERS' EXPLORATIVE AND EXPLOITATIVE STRATEGIES A Tarhini, RH Al-Dmour, BY Obeidat - European Scientific Journal …, 2015 - eujournal.org https://eujournal.org/index.php/esj/article/view/5334/5158

Proposed by F. Milani

email: milani@ut.ee

6. Digital technologies are now part of every company's’ concern and strategy. However, to become digital is not an easy thing. There are capabilities that are required, changes that might go deep into the very culture of the company. This topic is about learning more about these what are the digital capabilities of a company and how can they be developed?

[1] Digital capability framework: A toolset to become a digital enterprise AE Uhl, 2016 - books.google.com (section 2.3, p35-42) https://books.google.com.om/books?hl=en&lr=&id=LIYGDAAAQBAJ&oi=fnd&pg=PA27&dq=digital+capability+map&ots=VGLC-uODC7&sig=Wp80Uw4LPKBtdXGMSESCxqMnoYo&redir_esc=y#v=onepage&q=digital%20capability%20map&f=false

[2] Becoming a Digital Organization: The Journey to Digital Dexterity DL Soule, AD Puram, GF Westerman, D Bonnet - 2016 - papers.ssrn.com https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2697688

[3] The Role of IS Capabilities in the Development of Multi-Sided Platforms: The Digital Ecosystem Strategy of Alibaba.com Barney Tan, Shan L. Pan, Xianghua Lu, Lihua Huang (section 2.2) https://pdfs.semanticscholar.org/2c66/bf6386751038c9f95867ac2676d7a624066e.pdf?_ga=2.232466143.1569047573.1519547002-181788468.1519547002

Proposed by F. Milani

email: milani@ut.ee

7. Data is now part of the assets a company has. However not all employ this valuable asset. In particular, data is highly valuable in product lifecycle management. This topic is about how can big data techniques be used in product lifecycle?

[1] Big data in product lifecycle management J Li, F Tao, Y Cheng, L Zhao - The International Journal of Advanced …, 2015 - Springer https://link.springer.com/content/pdf/10.1007%2Fs00170-015-7151-x.pdf

[2] A framework for Big Data driven product lifecycle management Y Zhang, S Ren, Y Liu, T Sakao, D Huisingh - Journal of Cleaner …, 2017 - Elsevier https://www.sciencedirect.com/science/article/pii/S0959652617309150

Proposed by F. Milani

email: milani@ut.ee

8. Digital transformation is one of the most pressing issues at most companies. In fact, many CEOs have identified it as one of the most pressing issues in their corporate strategy. However, it is not easy to know where to start, how to start, or what it means. To this end, there is a need of a form of framework. Some have proposed digital transformation frameworks. This topic is about identifying the such frameworks and presenting them. In other words, what are the existing frameworks for digital transformation.

[1] https://www.bcg.com/capabilities/technology-digital/digitalization-strategy-framework.aspx

[2] http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A899087&dswid=-4323

[3] http://www.emeraldinsight.com/doi/pdfplus/10.1108/IJRDM-09-2015-0140

[4] https://www.mckinsey.com/business-functions/organization/our-insights/six-building-blocks-for-creating-a-high-performing-digital-enterprise

Proposed by F. Milani

email: milani@ut.ee

Risk-based Testing Approach for Test Case Prioritization

It is impossible to test software systems exhaustively. If the software cannot be tested exhaustively, it must be tested selectively. Risk-based testing is an approach which analysis the software risks and aids to optimize available resources available for testing. There are mutual dependencies between risk analysis and testing processes helping us to determine the priority of tests during test automation. Your task is to provide a systematic overview of the techniques and methods that have been used for test case prioritization using risk models.

[1] http://www.sciencedirect.com/science/article/pii/S0164121200000194

[2] http://dl.acm.org/citation.cfm?id=2667940

[3] http://link.springer.com/article/10.1007/s10009-014-0330-5

[4] http://publica.fraunhofer.de/documents/N-256557.html

[5] http://dl.acm.org/citation.cfm?id=2667945

Proposed by D. Pfahl and B. Nazarbakhsh

email: dietmar.pfahl@ut.ee

Automatic Test Case Generation

Software testing has a valuable and important place in software development life cycle. The most difficult (and costly) activity during testing is test case design. Automatic test case generation will reduce time and effort of testers. Your task is to find - and categorize by type and tool support - techniques and tools used for automatic test case generation.

[1] http://link.springer.com.ezproxy.utlib.ut.ee/chapter/10.1007/11498490_18

[2] http://link.springer.com.ezproxy.utlib.ut.ee/chapter/10.1007/978-3-642-39277-1_2

[3] http://www.sciencedirect.com.ezproxy.utlib.ut.ee/science/article/pii/S0950584914001037

[4] http://dl.acm.org/citation.cfm?id=2771812&CFID=890586683&CFTOKEN=59410927

Proposed by D. Pfahl and B. Nazarbakhsh

email: dietmar.pfahl@ut.ee

Techniques for measuring software energy consumption: A comparative analysis

Green software engineering applies the practices and concepts of software engineering in such way that the products produced have minimal negative impact on its environment. Green software engineering domain can be explored in five dimensions; social, environmental, economics, technical and individual. As products like laptop, mobiles, tablets etc. are widely used today, one of the concerns within the technical domain is to find out how much energy is consumed by these devices at the software level and how can we achieve the same performance by minimizing the energy consumption. Your task is to produce a comparative analysis of the techniques used for measuring software energy consumption.

[1] Becker, C., Chitchyan, R., Duboc, L., Easterbrook, S., Penzenstadler, B., Seyff, N., & Venters, C. C. (2015). Sustainability Design and Software: The Karlskrona Manifesto. Proceedings - International Conference on Software Engineering, 2, 467–476. https://doi.org/10.1109/ICSE.2015.179

[2] Calero, C., & Piattini, M. (2015). Green in software engineering. Green in Software Engineering, 1–327. https://doi.org/10.1007/978-3-319-08581-4

[3] Chowdhury, S. A., & Hindle, A. (2016). GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora. Msr ’16. https://doi.org/10.1145/2901739.2901763

Proposed by D. Pfahl and H. Anwar

email: dietmar.pfahl@ut.ee

Evaluating the impact of refactoring code smells on the power consumption of android apps

Code smells point to areas in software applications that could benefit from refactoring. Refactoring is defined as a technique for restructuring an existing body of code, changing its internal structure without changing its external behavior. Choosing not to resolve code smells by refactoring will not result in the application failing to work but will likely increase the difficulty of maintaining it. Thus refactoring helps to improve the maintainability of an application. Given that maintenance is considered the most expensive stage of software development and that the proportion of maintenance cost over total software cost is increasing, refactoring is supposed to save development effort in the long run. Research suggests, however, that while refactoring might improve maintenance it might not reduce – or even increase – power consumption. Your task is to find and analyse literature with regards to evidence for effects of refactoring on (1) maintainability and (2) power consumption of software systems.

[1] Investigating the Energy Impact of Android Smells (2017) http://ieeexplore.ieee.org.ezproxy.utlib.ut.ee/document/7884614/?reload=true

[2] Understanding Code Smells in Android Applications (2016) https://dl.acm.org/citation.cfm?id=2897094

[3] Refactoring for Energy Efficiency: A Reflection on the State of the Art (2015) http://ieeexplore.ieee.org.ezproxy.utlib.ut.ee/document/7168335/

[4] How Do Code Refactoring Affect Energy Usage? (2014) https://dl.acm.org/citation.cfm?id=2652538

Proposed by D. Pfahl and H. Anwar

email: dietmar.pfahl@ut.ee

'''Automatic extraction of app features from App Reviews: Evaluation on a benchmark dataset ''' AppStore or Playstore platforms enable app users to give feedback in the form of reviews. Users express opinions towards various aspects of an app (feature evaluations, feature request, bugs/issues, praise) in these reviews. Over the past few years, research has been conducted for automatic identification, extraction, and summarization of this information to help software developers in improving their apps. In the same direction, few studies have attempted to automatically extract fine-grained app features from user reviews to perform feature-based sentiment analysis. Since each study use their own dataset (training and test) for the evaluation of proposed feature extraction technique. So, the accuracy of these techniques on a different dataset has not been evaluated. Thus, the purpose of this study is to evaluate or cross validate existing feature-extraction techniques on a common benchmark dataset.

[1] http://ieeexplore.ieee.org/document/7372064/

[2] http://ieeexplore.ieee.org/document/6912257/

[3] http://www.lrec-conf.org/proceedings/lrec2016/pdf/59_Paper.pdf

[4] http://dl.acm.org/citation.cfm?id=2916003

Proposed by D. Pfahl and F. A. Shah

email: dietmar.pfahl@ut.ee

Evaluation of Sentiment analysis tools for Feature-based sentiment analysis on App Reviews

A recent study has shown that sentiments analysis tools do not agree with sentiments recognized by human annotators for software engineering domain because these tools have been trained on product reviews or customer reviews. Over the past few years, app reviews emerged as a special kind of software repository mining provides a wealth of information for developers to improve mobile apps based on users feedback. Existing research has used sentiment analysis tools, such as SentiStrength, Stanford NLP sentiment analyzer/deeply moving) for feature-based sentiment analysis of app reviews. So, the purpose of this study is to evaluate to what extent sentiment analysis tools agree with human annotators for the task of feature-based sentiment analysis on app reviews.

[1] http://link.springer.com/article/10.1007/s10664-016-9493-x

[2] http://ieeexplore.ieee.org/document/7372064/

[3] http://ieeexplore.ieee.org/document/6912257/

Proposed by D. Pfahl and F. A. Shah

email: dietmar.pfahl@ut.ee

Data Mining for Technical Debt Analysis

The metaphor of technical debt in software development was introduced two decades ago to explain to nontechnical stakeholders the need for what we call now "refactoring." As the term is being used to describe a wide range of phenomena, decision support for managing technical debt requires clarity about the definition of debt items that can be identified and approaches to measure technical debt. Your task is to survey and summarize the existing literature related to technical debt identification and measurement. Particularly interesting are are automatic approaches to measuring technical debt using data mining.

http://ieeexplore.ieee.org/abstract/document/6336722/ http://dl.acm.org/citation.cfm?id=2666044 http://dl.acm.org/citation.cfm?id=2892643

Proposed by D. Pfahl

email: dietmar.pfahl@ut.ee

How start-ups benefit from open innovation

15 years after the publication of Chesbrough's seminal book on Open Innovation (2003), research in the field of open innovation has grown steadily and scholarly interest in the theme is far from being exhausted. Your task is to find and summarize literature that discusses the use of open innovation by software start-up companies. Questions that could be answered might involve: Why, to what extend and how do software start-ups take advantage of open innovation?

[1] https://www.sciencedirect.com/science/article/pii/S0164121217302169

[2] https://www.sciencedirect.com/science/article/pii/S095058491730513X

[3] https://www.sciencedirect.com/science/article/pii/S0950584917304512

Proposed by D. Pfahl

email: dietmar.pfahl@ut.ee

Analysis of developers’ interactions through social network data

Nowadays, software development is supported by different social media. Developers usually communicate and manage their projects by well-known social coding communities such as GitHub, Bitbucket, and JIRA. These repositories contain a wealth of information that is available to analyze and allow us to understand how developers interact and influence each other during the software development. In this context, several studies have explored the developers’ interactions and their social behavior patterns through the analysis of social networks. Your task is to provide an overview of the main findings as well as the methods and techniques that have been proposed to understand the interactions of software developers in the context of social networks.

[1] http://ieeexplore.ieee.org/document/7752419/

[2] https://arxiv.org/abs/1407.2535

[3] https://dl.acm.org/citation.cfm?id=2666571

Proposed by E. Scott and R. Sharma

email: ezequielscott@gmail.com

Methods and techniques to identify dependencies among User Stories

In Agile software development, requirements are usually expressed as User Stories. Although User Stories are expected to follow a fixed structure (“As <a role>, I want to <a feature> in order to <a benefit>”), they are still written by using natural language and informal descriptions. This can lead to bad quality User Stories that overlap each other in concept and cannot be schedulable and implementable in any order. In this context, some studies have explored different approaches to organize and identify dependent user stories. Your task is to provide an overview of the methods and techniques that have been proposed to deal with the dependencies among User Stories in agile contexts.

[1] https://link.springer.com/chapter/10.1007/978-3-319-06862-6_8

[2] http://ieeexplore.ieee.org /document/7549299/

[3] http://ieeexplore.ieee.org/document/7272550/

[4] https://link.springer.com/chapter/10.1007/978-3-642-13054-0_17

Proposed by E. Scott

email: ezequielscott@gmail.com

Seminar on Enterprise Software 2017/18 kevad