Quo vadis entrepreneurial hackathons?
Time bounded events such as hackahons, data dives, codefests, hack-days, sprints and edit-a-thons have received wide spread attention in recent years. Events that are organized with the aim to support teams to develop innovative products and services that can be turned into successful start-ups have been at the forefront of this recent surge. The question however remains how hackathons and entrepreneurship are actually connected. The aim of this topic is to conduct a comprehensive literature review covering the current state-of-the-art of research on entrepreneurial hackathons and their connection to the startup scene. This review should outline current knowledge as well as open questions and shortcomings.
[1] Cobham, D., Hargrave, B., Jacques, K., Gowan, C., Laurel, J., & Ringham, S. (2017). From hackathon to student enterprise: an evaluation of creating successful and sustainable student entrepreneurial activity initiated by a university hackathon.
[2] Komssi, M., Pichlis, D., Raatikainen, M., Kindström, K., & Järvinen, J. (2015). What are hackathons for?. IEEE Software, 32(5), 60-67.
[3] Taylor, N., & Clarke, L. (2018, April). Everybody's Hacking: Participation and the Mainstreaming of Hackathons. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 172). ACM.
Proposed by A. Nolte
email: alexander.nolte@udo.edu
Replication of Empirical Software Engineering Case Study Experiments
Empirical software engineering community publishes many case studies validating different approaches and analytical algorithms to software engineering. Unfortunately, these studies are rarely validated by independent replication. To make matters worse, the studies use different validation metrics, which makes them incomparable. Thus, your mission, should you choose to accept it, is to analyse different published case studies on one topic (e.g. bug detection, code churn estimation) to evaluate their replicability and replicate the studies in order to make them comparable. In short you will: 1. envisage a workflow/pipeline for replicating published studies (including testing for replicability); 2. use the workflow to replicate several studies; 3. validate these studies and compare their results on an common scale.
[1] Le Goues, C., Dewey-Vogt, M., Forrest, S., & Weimer, W. (2012, June). A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Software Engineering (ICSE), 2012 34th International Conference on (pp. 3-13). IEEE. http://ieeexplore.ieee.org/abstract/document/6227211/
[2] Tian, Y., Lawall, J., & Lo, D. (2012, June). Identifying linux bug fixing patches. In Proceedings of the 34th International Conference on Software Engineering (pp. 386-396). IEEE Press. https://dl.acm.org/citation.cfm?id=2337269
[3] Kagdi, H., Collard, M. L., & Maletic, J. I. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software: Evolution and Process, 19(2), 77-131. http://onlinelibrary.wiley.com/doi/10.1002/smr.344/full
[4] Thomas, S. W. (2011, May). Mining software repositories using topic models. In Proceedings of the 33rd International Conference on Software Engineering (pp. 1138-1139). ACM. https://dl.acm.org/citation.cfm?id=1986020
Proposed by S. Karus
email: siim.karus@ut.ee
GPU-Accelerated Data Analytics
In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimizations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.
[1] Bakkum, P., & Skadron, K. (2010, March). Accelerating SQL database operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (pp. 94-103). ACM. https://dl.acm.org/citation.cfm?id=1735706
[2] Breß, S., Heimel, M., Siegmund, N., Bellatreche, L., & Saake, G. (2014). GPU-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV (pp. 1-35). Springer, Berlin, Heidelberg. https://link.springer.com/chapter/10.1007/978-3-662-45761-0_1
[3] Sitaridi, E. A., & Ross, K. A. (2016). GPU-accelerated string matching for database applications. The VLDB Journal, 25(5), 719-740. https://link.springer.com/article/10.1007/s00778-015-0409-y
[4] Karnagel, T., Mueller, R., & Lohman, G. M. (2015). Optimizing GPU-accelerated Group-By and Aggregation. In ADMS@ VLDB (pp. 13-24).
Proposed by S. Karus
email: siim.karus@ut.ee
Graph Reasoning on Software Repositories
Software repositories offer lots of insights into software development process. Most of the analytical processes used on software repositories are heavily reliant on the availability of training data – samples of positive and negative cases. These samples, however, have to be determined by people. This limits the usefulness of the analytics as people might miss possible relationships or make mistakes in specifying the training output values. Graph reasoning on the other hand is a machine learning technique that does not require training data and uses internal rules to find relationships in the data. As such, graph reasoning can be used to discover unwanted or unnoticed patterns in software and software evolution. Your task is to bridge these two disciplines in order to further our understanding of software, its evolution and perhaps even improve the quality assurance process.
[1] Kiefer, C., Bernstein, A., & Tappolet, J. (2007, May). Mining software repositories with isparol and a software evolution ontology. In Proceedings of the Fourth International Workshop on Mining Software Repositories (p. 10). IEEE Computer Society. https://dl.acm.org/citation.cfm?id=1269048
[2] Watkins, E. R., & Nicole, D. A. (2006, January). Named graphs as a mechanism for reasoning about provenance. In Asia-Pacific Web Conference (pp. 943-948). Springer, Berlin, Heidelberg. https://link.springer.com/chapter/10.1007/11610113_99
[3] Keivanloo, I., Forbes, C., Hmood, A., Erfani, M., Neal, C., Peristerakis, G., & Rilling, J. (2012, June). A linked data platform for mining software repositories. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on (pp. 32-35). IEEE. http://ieeexplore.ieee.org/abstract/document/6224296/
[4] Martinez, M., & Monperrus, M. (2015). Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering, 20(1), 176-205. https://link.springer.com/article/10.1007/s10664-013-9282-8
Proposed by S. Karus
email: siim.karus@ut.ee
Interpretable Predictive Monitoring of Business Processes
Recent advances of supervised machine learning in various tasks stem from the use of powerful and complex models (neural networks, deep learning, random forests). However, adoption in practice remains challenging because of limited interpretability of these methods and low actionability (what should the user do to alter the ongoing process instance to improve the expected/predicted outcome). Lack of understandability and actionability poses a serious challenge in domains such as financial and medical services, where the understanding of the decision behind the prediction is crucial. Moreover, the interpretability of the model can provide a valuable feedback in order to improve it even further. As such, this thesis project goes beyond the state-of-the-art in predictive process monitoring by developing methods and techniques to translate complex predictive models into understandable knowledge for key stakeholders in the process.
[1] https://www.sciencedirect.com/science/article/pii/S0957417417303950
Proposed by F. M. Maggi
email: f.m.maggi@ut.ee
Discovery of Hybrid Process Models
The declarative-procedural dichotomy is highly relevant when choosing the most suitable process modeling language to represent a discovered process in the context of process discovery techniques. Less-structured processes with a high level of variability can be described in a more compact way using a declarative language. By contrast, procedural process modeling languages seem more suitable to describe structured and stable processes. However, in various cases, a process may incorporate parts that are better captured in a declarative fashion, while other parts are more suitable to be described procedurally. In these scenarios, hybrid models are the best choice for describing the discovery results. In this thesis, an approach for the discovery of hybrid process models from logs of process executions will be developed. The approach will be implemented in the process mining tool ProM and experimented in real life case studies.
[1] Fabrizio Maria Maggi, Tijs Slaats, Hajo A. Reijers: The Automated Discovery of Hybrid Processes. BPM 2014: 392-399
Proposed by F. M. Maggi
email: f.m.maggi@ut.ee
Online Data-Aware Declarative Process Discovery from Event Streams
Stream processing is defined as “technologies designed to process large real-time streams of event data” and one of the example applications is process monitoring. The challenge to deal with streaming event data is also discussed in the Process Mining Manifesto. A process discovery algorithm is a function that maps an event log in a process model such that the model is representative for the behavior seen in the event log. A declarative process model is a set of business rules that describe the process behavior under an open world assumption, i.e., everything that is not forbidden by the model is allowed. These models can be used to express process behaviors involving multiple alternatives and can be enriched by data-aware conditions depending on some values that can be represented as attributes in the data in a compact way and are very suitable to be used in changeable and unstable environments with respect to the conventional procedural approaches. In [1] an approach to automatically discover declarative process models from streams of data has been presented. However this approach did not consider Data-aware conditions. In this thesis, we extend the algorithm in [1] in order to generate Data-aware declarative process models.
[1] Andrea Burattin, Marta Cimitile, Fabrizio Maria Maggi, Alessandro Sperduti: Online Discovery of Declarative Process Models from Event Streams. IEEE Trans. Services Computing 8(6): 833-846 (2015)
Proposed by F. M. Maggi
email: f.m.maggi@ut.ee
Deviance Mining of Business Processes
Deviant business process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as executions that undershoot or exceed performance targets. There are classification methods that can be used to discriminate between normal and deviant executions. In particular, they can be used to discover rules that explain potential causes of observed deviances. In this thesis, an approach for deviance mining of business processes will be implemented in the process mining tool ProM and experimented in real life case studies.
[1] Hoang Nguyen, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Suriadi Suriadi: Business Process Deviance Mining: Review and Evaluation. CoRR abs/1608.08252 (2016)
Proposed by F. M. Maggi
email: f.m.maggi@ut.ee
Static Analysis of Node.js applications
Node.js is a runtime environment that allows Javascript applications to run outside the Web browser. It has become a popular platform to implement server-side applications in Javascript. The lack of type-safety of Javascript makes it prone to errors and vulnerabilities, including injection attacks. A number of analysis techniques specifically designed for Node.js have emerged in the past years. Your task is to conduct a review of techniques and tools for static analysis and dynamic analysis tools for Javascript in general, and Node.js in particular, and to discuss the maturity and limitations of existing solutions in this field.
[1] https://dl.acm.org/citation.cfm?id=3179527
[2] https://dl.acm.org/citation.cfm?id=3236502
[3] https://plg.uwaterloo.ca/~olhotak/pubs/oopsla15.pdf
[4] https://dl.acm.org/citation.cfm?doid=3145473.3106739
[5] https://dl.acm.org/citation.cfm?doid=2771783.2771809
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Case Studies of Robotic Process Automation
Robotic Process Automation (RPA) tools, such as UIPath and Automation Anywhere, allow organizations to automate repetitive work by executing scripts that encode sequences of fine-grained interactions with Web and desktop applications, such as opening a file, selecting a field in a form or a cell in a spreadsheet, and copy-pasting data across fields or cells. A typical task that can be automated using an RPA tool is transferring data from one system to another via their respective user interfaces, e.g. copying records from a spreadsheet application into a Web-based enterprise information system. Several case studies of robotic process automation have been reported in recent years. Your task is to do a survey of case studies in the field of RPA, and to derive from this survey some advantages and pitfalls of RPA. Below are some initial examples of such case studies, which you can use as a starting point.
- http://eprints.lse.ac.uk/64518/
- http://tinyurl.com/y5m8gxht
- https://journals.sagepub.com/doi/pdf/10.1057/jittc.2016.5
- http://tinyurl.com/yxu2rs64
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Automated Discovery of Data Transformations From Examples
Implementing transformations between multiple data schemas or document formats is a recurrent task in enterprise system integration efforts. Recently, a new family of techniques have emerged, which seeks to automatically discover mappings between two data schemas based on examples. Your task is to conduct a survey of the literature in this field and to discuss the capabilities and limitations of existing techniques. Below are some starting points.
- https://web.eecs.umich.edu/~michjc/papers/jin_foofah_sigmod17.pdf
- http://tinyurl.com/yyzns4l2
- https://www.cc.gatech.edu/~xchu33/chu-papers/TDE-paper.pdf
- https://cs.uwaterloo.ca/~ilyas/papers/AbedjanICDE16.pdf
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Collaborative Business Process Execution on Blockchains
Blockchain technology provides basic building blocks to support the execution of collaborative business processes involving mutually untrusted parties in a decentralized environment. Several research proposals have demonstrated the feasibility of designing blockchain-based collaborative business processes using a high-level notation, such as the Business Process Model and Notation (BPMN), and thereon automatically generating the code artifacts required to execute these processes on a blockchain platform. Your task is to conduct a review of the state of the art in this field.
- http://kodu.ut.ee/~dumas/pubs/bpm2017caterpillar-demo.pdf
- https://arxiv.org/pdf/1808.03517.pdf
- http://tinyurl.com/yyk29bbb
- https://arxiv.org/pdf/1704.03610.pdf
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Predicting the Next Task in a Business Process
Predictive process monitoring is a family of techniques to predict future events or properties of running executions of a process. Within this field, several techniques have been proposed to address the following question: What will be the next event or task that will occur in an ongoing execution of a process? Your task is to review the literature in the field and discuss the capabilities and limitations of techniques in this field. Below are some initial pointers:
- https://arxiv.org/pdf/1612.04600.pdf
- https://arxiv.org/pdf/1612.02130.pdf
- https://link.springer.com/article/10.1007/s00607-018-0593-x
- https://arxiv.org/pdf/1811.00062.pdf
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Principles of microservice architectures
Microservice architectures are a modern approach to develop, deploy, operate, and maintain software applications, particularly in enterprise settings. They rely on the tenet that the data and business logic of an application should be decomposed into cohesive pieces, which are developed, deployed and operated independently. But what exactly is a microservice architecture? What are the key principles or tenets of a microservice architecture? And what trade-offs does it strike with respect to other alternative types of software architectures? Your task is to conduct a review of definitions and principles of microservice architectures in order to provide a synthetic answer to the above questions.
- https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7030212
- https://www.computer.org/csdl/proceedings/icsaw/2017/4793/00/07958492.pdf
- https://research.birmingham.ac.uk/portal/files/29076197/bare_conf_1.pdf
- https://www.ifsoftware.ch/uploads/tx_icscrm/1_msa-pospaperzio4summersoc2016v15nc.pdf
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Migrating monoliths to microservices
Microservice architectures are a modern approach to develop, deploy, operate, and maintain software applications, particularly in enterprise settings. They rely on the tenet that the data and business logic of an application should be decomposed into cohesive pieces, which are developed, deployed and operated independently. Over the past few years, many companies have migrated their enterprise systems from a so-called monolothic (three-tier) architecture to a microservices architecture. Your task is to conduct a review of case studies and methods for migrating monolithic applications to microservices.
- https://arxiv.org/pdf/1704.04173.pdf
- https://www.computer.org/csdl/proceedings/icsaw/2017/4793/00/07958457.pdf
- https://arxiv.org/ftp/arxiv/papers/1807/1807.10059.pdf
- http://www.ivanomalavolta.com/files/papers/ICSA_2018.pdf
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Offline-First Web Applications
Most Web applications are architected in such a way that they cannot be used in the absence of a (stable) connection between the client and the server. In the context of mobile computing, it is however desirable to have at least some functionality available even when the device is disconnected from the network. There are several approaches for developing offline-first web applications, capable of operating (possibly in degraded mode) without a connection. Your task is to review existing approaches in this field.
- https://aaltodoc.aalto.fi/bitstream/handle/123456789/29096/master_Vanhala_Janne_2017.pdf
- http://tinyurl.com/yxv7nvzj
- https://link.springer.com/chapter/10.1007/978-3-319-04244-2_15
- http://tinyurl.com/yy3nwftq
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Progressive Web Applications
Web applications are said to be progressive when they provide similar user experience regardless of the browser and the device (e.g. desktop PC, tablet, or smartphone), and that can work offline in degraded mode. Your task is to review existing definitions of progressive Web apps, and approaches to implement such apps. Below are some initial pointers:
- http://www.scitepress.org/Papers/2017/63537/63537.pdf
- https://aisel.aisnet.org/hicss-51/st/mobile_app_development/7/
- https://link.springer.com/chapter/10.1007/978-3-319-93527-0_4
- https://www.smashingmagazine.com/2018/11/guide-pwa-progressive-web-applications/
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Static Bug Detection
Detecting bugs automatically is a holy grail in the field of software verification. Decades of development in the field of static program analysis have led to relatively mature bug finding tools, which have been deployed in industrial settings in the past 10-15 years. Your task is to conduct a survey of bug finding tools based on static analysis (e.g. FindBugs, Infer, Julia) and to discuss their current capabilities and limitations. Below are some initial pointers:
[1] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/34339.pdf
[2] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/32791.pdf
[3] https://link.springer.com/chapter/10.1007/978-3-662-53413-7_3
[4] http://www.cs.columbia.edu/~junfeng/14fa-e6121/papers/coverity.pdf
[5] https://link.springer.com/chapter/10.1007%2F978-3-319-17524-9_1
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Sampling-based Java Profiling
Developers generally rely on profilers to analyze the performance of software applications and to identify the source of performance issues. However, profiling comes at a cost: It visibly generally slows down the performance of the application under observation, which in many cases diminishes the usefulness of profilers. Sampling-based profiling is a technique to profile applications in a way to limits the performance overhead. This technique is however difficult to implement correctly and with minimum overhead, particularly in the context of multi-threaded applications. Significant amount of research and development is still ongoing in the field of sampling-based profiling of (e.g.) Java applications.
[1] Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney. Evaluating the accuracy of Java profilers. PLDI 2010: 187-197
[2] Peter Hofer, David Gnedt, Hanspeter Mössenböck: Lightweight Java Profiling with Partial Safepoints and Incremental Stack Tracing. Proc. of ICPE 2015: 75-86
[3] Peter Hofer, Hanspeter Mössenböck: Efficient and accurate stack trace sampling in the Java hotspot virtual machine. ICPE 2014: 277-280
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Process Mining for Auditing
Auditing is a time-consuming task. Widespread use of enterprise systems (e.g. CRM and ERP systems) to perform day-to-day business processes in companies are making it possible to apply data mining techniques to support the work of auditors. In particular, several experiences have shown that process mining can be used to support certain auditing tasks.
[1] Mieke Jans, Michael Alles, Miklos Vasarhelyib: The case for process mining in auditing: Sources of value added and areas of application. International Journal of Accounting Information Systems 14(1): 1-20 (2013)
[2] Wil M. P. van der Aalst, Kees M. van Hee, Jan Martijn E. M. van der Werf, Marc Verdonk: Auditing 2.0: Using Process Mining to Support Tomorrow's Auditor. IEEE Computer 43(3): 90-93 (2010)
Proposed by M. Dumas
email: marlon.dumas@ut.ee
Discovering concept drift in business processes from event logs
In the field of business process mining, the term "concept drift" refers to the fact that business processes tend to change over time. Several methods have been proposed to discover concept drift in business processes from event logs, for example:
[1] R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indre Zliobaite, Mykola Pechenizkiy: Dealing With Concept Drifts in Process Mining. IEEE Trans. Neural Netw. Learning Syst. 25(1): 154-171 (2014)
[2] Josep Carmona, Ricard Gavald?: Online Techniques for Dealing with Concept Drift in Process Mining. Proc. of IDA 2012, pp. 90-102
[3] Abderrahmane Maaradji, Marlon Dumas, Marcello La Rosa, Alireza Ostovar:,Detecting Sudden and Gradual Drifts in Business Processes from Execution Traces. IEEE Trans. Knowl. Data Eng. 29(10): 2140-2154 (2017)
[4] Alireza Ostovar, Abderrahmane Maaradji, Marcello La Rosa, Arthur H. M. ter Hofstede: Characterizing Drift from Event Streams of Business Processes. CAiSE 2017: 210-228
Proposed by M. Dumas
email: marlon.dumas@ut.ee
If media is biased ? An empirical analysis
News channel often try to potray news stories from their own perspectives. It has been observed particular about media houses that they are biased towards specific topics, people and political parties. In this thesis, you will be analysing a set of news stories derived from different news websites (such as BBC, CNN etc). The study will be done with an intention to explore if the news channels are biased towards specific 1) Topics, 2) People or 2) Political parties etc. You will be using data science techniques (such as opinion mining, machine learning) for performing the empirical analysis of your study.
[1] Robert M. Entman. Media framing biases and political power: Explaining slant in news of Campaign 2008. Journalism. Vol 11, Issue 4, pp. 389 - 408
[2] David Niven. Bias in the News: Partisanship and Negativity in Media Coverage of Presidents George Bush and Bill Clinton. International Journal of Press and Politics. Vol 6, Issue 3, pp. 31 - 46
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Understanding Filter bubbles in social media networks
A filter bubble is an algorithmic bias that skews or limits the information an individual user sees on the internet. The bias is caused by the weighted algorithms that search engines, social media sites and marketers use to personalize user experience. The concept is particularly important in creating opinioated individuals. In this thesis, a study will be performed to understand the effect of filter bubbles on social media individuals.
[1] Mario Haim, Andreas Graefe & Hans-Bernd Brosius, Burst of the Filter Bubble? Effects of personalization on the diversity of Google News. Journal Digital Journalism.
[2] Nguyen, Tien T. and Hui, Pik-Mai and Harper, F. Maxwell and Terveen, Loren and Konstan, Joseph A. Exploring the Filter Bubble: The Effect of Using Recommender Systems on Content Diversity, WWW 2014
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Analysing echo chambers in social networks
An echo chamber is a metaphorical description of a situation in which information, ideas, or beliefs are amplified or reinforced by communication and repetition inside a defined system. In this thesis, we will investigate echo chambers in social media platforms such as Twitter or Facebook and their effect on social media users. Techniques like network science + machine learning will be explored for understanding echo chambers in social media.
[1] Eric Gilbert, Tony Bergstrom, and Karrie Karahalios. 2009. Blogs are echo chambers: Blogs are echo chambers. In 42nd Hawaii International Conference on System Sciences. IEEE, 1–10.
[2] Eric Lawrence, John Sides, and Henry Farrell. 2010. Self-segregation or deliberation? Blog readership, participation, and polarization in American politics. Perspectives on Politics 8, 1 (2010), 141–157
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Left, Center or Right?: Controversial groups on Social Media
With respect to political views, the users in social media can often be classified broadly in three categories namely left, right or central. In this thesis, the users in social media platforms, in particularly in platforms like Facebook, will be studied anonymously. The crux of the problem will be to predict users' inclination in terms of right, center and left political parties. Data science techniques such as network science, machine learning, sentiment analysis will be explored for predicting problem.
[1] Gottipati S., Qiu M., Yang L., Zhu F., Jiang J. (2013) Predicting User’s Political Party Using Ideological Stances. In: Jatowt A. et al. (eds) Social Informatics. SocInfo 2013. Lecture Notes in Computer Science, vol 8238. Springer,
[2] Michael D. Conover, Bruno Gonc¸alves, Jacob Ratkiewicz, Alessandro Flammini and Filippo Menczer. Predicting the Political Alignment of Twitter Users IEEE Third Inernational Conference on Social Computing (SocialCom), 2011.
[3] Aaron Acosta (ateam91), Silviana Ciurea-Ilcus (smci), Michal Wegrzynski (michalw). Predicting users’ political support from their Reddit comment history . goo.gl/Gchzyw
[4] Daniel Xiaodan Zhou, Paul Resnick, Qiaozhu Mei. Classifying the Political Leaning of News Articles and Users from User Votes. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media 2011
[5] Waheed H, Anjum M, Rehman M, Khawaja A (2017) Investigation of user behavior on social networking sites. PLoS ONE 12(2)
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Measuring corporate reputation through online social media
When businesses are caught out engaging in illegal or immoral activities, their reputation might suffer. Corporate reputation is a reflection of how a business is regarded by its customers and the public in general. If corporate misbehavior negatively affects a business’ reputation, customers might switch to rival businesses. For this reason, reputation has got a central role in free markets as it has the potential to deter businesses from misbehaving. The extent, to which corporate wrongdoings trigger a reputational loss is still debated and is subject to a large body of academic works. Most of these works are based on survey methods to measure reputation. This research relies on a more direct method to measure reputational changes, by conducting a sentiment analysis of how the public reacted on Twitter to some of the most high-profile corporate misconducts. In this particular work thesis, corporate reputation will be studied using the Volkswagen (VW) scandal as a case study and the public reaction it created on the Twitter. VW’s scandal has been chosen because it has been widely covered over time through both traditional and social media. Moreover we can measure how changes in media coverage and social media reaction affected VW’s financial performance. The dataset and related literature will be provided for speeding up the work.
[1] Corné Dijkmans, PeterKerkhof,Camiel J.Beukeboom, A stage to engage: Social media use and corporate reputation, 2014.
[2] Nadine Gatzert, The impact of corporate reputation and reputation damaging events on financial performance: Empirical evidence from the literature, 2015.
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Predicting information diffusion using regression techniques
Information diffusion on online social media platforms (such as Facebook, Twitter, etc) has its applications in various domains such as viral marketing, news propagation etc. Some information spreads faster compared to others depending on the topic of interest of the online users. For example, taking Twitter as a use case, researchers have investigated tweet prediction. In other words, given a tweet, predicting how many times that tweet will be retweeted. The problem has been analysed as classification as well as regression problem. You can review the literature which are related which have analysed the problem as regression (or classification).
[1] Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW’11, pp. 57{58. ACM, New York, NY, USA (2011)
[2] K Lytvyniuk, R Sharma, A Jurek-Loughrey. Predicting Information Diffusion in Online Social Platforms: A Twitter Case Study. International Workshop on Complex Networks and their Applications, 405-417, 2017.
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
---------------------------
The impact of social media on consumers
By analyzing social media discussions about products we would like to analyze if there exists any correlation between the sales or economic success of a particular product with the products popularity pre- and post-announcement and sales. (Also, in case of positive correlation, finding the parameters or features that drive success). In particular, we are interested in analyzing popular products (mobile phones, computers or video games) by analyzing social platforms like Facebook, Twitter or blogs or web pages etc. This may involve sentiment analysis of the discussions about products, feature extractions of the product itself and using critics opinions, ratings etc. and looking at the sales numbers of the products in question.
[1] M. Nick Hajli. A study of the impact of social media on consumers. International Journal of Market Research Vol. 56 Issue 3, 2014
[2] M Odhiambo. Social media as a tool of marketing and creating brand awareness.
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
The predictive power of the wisdom of the crowd
The term “wisdom of the crowd” refers to the collective opinion of a community or group. In comparison, expert views refer to the views expressed by the experts of a particular domain. In this thesis, you will investigate if it is the experts or if it’s the wisdom of the crowd, that can predict the box office outcome of the movies. In particular, you will analyse tweets with respect to movies around the period of release date of movies. A small dataset of tweets of various movies will be provided. However, we also expect to expand our analysis by collecting tweets about more movies during the period of thesis. The thesis involves, sentiment analysis of the tweets and subsequently proposal of the prediction model about predicting box office result of the movies.
[1] Fabian Abel, Ernesto Diaz-Aviles, Nicola Henze, Daniel Krause, and Patrick Siehndel. Analyzing the blogosphere for predicting the success of music and movie products. In Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on, pages 276–280. IEEE, 2010.
[2] Sitaram Asur and Bernardo A Huberman. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 492–499. IEEE, 2010.
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Behavior analysis of bike users in a city settings
As a part of smart cities, authorities are creating separate lanes and bicycle rack for bikers. However, the key question is the utilization of these resources put in place for better traffic management. In this thesis, we will be analyzing real dataset of a Italian city with a population of 385.192 inhabitants. The dataset is taken over a period of 6 months, from April 2017 to September 2017. We will be predicting users's behavior in terms of using these resources. We expect you to use data science techniques and machine learning techniques. Dataset is property of SRM Reti e Mobilità Srl and all the analysis must respect the NDA preserving privacy and anonymization of the users.
[1] Gabriel Martins Dias, Boris Bellalta and Simon Oechsner . Predicting Occupancy Trends in Barcelona’s Bicycle Service Stations Using Open Data , SAI Intelligent Systems Conference 2015
[2] Eoin O’Mahony, David B. Shmoys. Data Analysis and Optimization for (Citi)Bike Sharing. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence 2015
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Tracking unusual activities in Traffic police data
Sensors and cameras often being places on highways are meant for disruption free traffic to detect accidents and to take appropriate and timely actions. However, the data being collected can also be used for detecting unusual activities. In this thesis, you will be analyzing large scale traffic police data shared by the Italian authorities to detect anomalies or unusual behavior on highways. A dataset will be provided. We expect you to find unusual behavior in this traffic data using machine learning techniques in particularly anomaly detection techniques.
[1] Liang Xiong, Xi Chen, Jeff Schneider. Direct Robust Matrix Factorization for Anomaly Detection. International Conference on Data Mining (ICDM), 2011.
[2] Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. https://arxiv.org/pdf/1502.06922.pdf
[3] Jefferson Ryan Medel. Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units. http://scholarworks.rit.edu/theses/9319/
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Analyzing customers behavior using purchase data
The purchasing transactions being performed by customers can provide valuable insights about the behavior of individuals. For example, transactions can infer about the purchasing power (money) as well as eating habits of the users. In this thesis, you will analyse a large scale customer transactions, using data science techniques, for predicting customer's behavior.
[1] Diego Pennacchioli1, Michele Coscia, Salvatore Rinzivillo, Dino Pedreschi, Fosca Giannotti. Explaining the Product Range Effect in Purchase Data BigData 2013
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” SIGMOD Rec., vol. 22, no. 2, pp. 207–216, Jun. 1993.
[3] https://bigml.com/user/czuriaga/gallery/dataset/info?reload
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Analyzing question-answering system: The Quora case study
Question answering systems (QASs) generate answers of questions asked in natural languages. Early QASs were developed for restricted domains and have limited capabilities. Currently platforms like Quora have been helpful in diminishing the boundaries. In this thesis, using Quora as a case study, you will perform users analytics to understand the reasons behind the success of a platform where "all kind of questions are welcome". We expect you to perform empirical study among the users using Quora as a case study in this thesis.
[1] Abdel ghani Bouzianea, Djelloul Bouchihaa, Noureddine Doumi, Mimoun Malki. Question Answering Systems: Survey and Trends.. Procedia Computer Science
[2] Albert Tung and Eric Xu, Determining Entailment of Questions in the Quora Dataset. Class Reports https://web.stanford.edu/class/cs224n/reports/2748301.pdf
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Vulnerability analysis in multilayer networks: A data science approach
Traditionally networks are studied individually. In other words, it means, studies do not consider the relation that might exists among any two or more networks connected with each other. For example, some of the social media users are present not only on Facebook but also on Twitter and other social media platforms such as Instagram etc. In another example, consider transportation networks, it is more prudent to analyze various transportation mediums such as road, rail and air collectively. A bird eye view of the collection of various networks is called a multilayer network (ML), encompassing of various individual networks. In such ML networks, we would like to study the problem of vulnerability analysis. That is how strong a network is against any kind of breakage. The breakage could occur due to natural or unnatural (for example, earthquake, accidents, riots). In this thesis, using transportation networks as a use case, study will be performed with particular focus on vulnerability analysis.
[1] Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, Mason A. Porter; Multilayer networks, Journal of Complex Networks, Volume 2, Issue 3, 1 September 2014, Pages 203–271
[2] Aleta, Alberto, Sandro Meloni, and Yamir Moreno. "A Multilayer perspective for the analysis of urban transportation systems." Scientific reports 7 (2017): 44359.
[3] Kivelä, Mikko, et al. "Multilayer networks." Journal of complex networks 2.3 (2014): 203-271.
[4] A Furno, NE El Faouzi, R Sharma, E Zimeo. Two-level Clustering Fast Betweenness Centrality Computation for Requirement-driven Approximation. IEEE Big Data 2017
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Analyzing Server Logs for predicting Job Failures
Server logs generally refer to files which are created for monitoring the activities being performed on servers. In recent years a lot of research has been performed in analyzing server logs for analyzing the status of the jobs or tasks that arrive on servers. In this thesis, you will be analyzing logs from Google cluster, which is a is a set of machines responsible for running real Google jobs for example, search queries. The research encompasses the domain of large scale data analytics and machine learning. The main contribution of the thesis includes proposing of model to predict the job failures on servers. A real dataset of Google traces will be provided along with related literature to ramp up the learning process.
[1] Chunhong Liu et al, 2017 Predicting of Job Failure in Compute Cloud Based on Online Extreme Learning Machine: A Comparative Study 2017
[2] Andrea Rosa et al, Predicting and Mitigating Jobs Failures in Big Data Cluster 2015
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Context Based Text Classification using Deep Learning
Rent seeking is a term used by economists to capture situations where businesses acquire extra wealth without doing anything productive (lobbying is almost synonymous). A textbook example is when taxi companies lobby governments around the world to limit the entry of Uber-like operators. Rent seeking is one of the most important ideas in economics, but the measurement of rent seeking has always been a challenge for economists because of its latent nature, and the empirical literature is very thin. One way to solve this problem is by using classical machine learning (SVM etc) for text-classification however, they do not return good results. As an alternate would be to use deep learning approaches such as word embedding (word2vec) approaches for classifying the text. That is if a document is about rent seeking or not. For this topic, you will investigate state of the art literature w.r.t capturing context of documents for better classification of documents.
[1] Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek. "What is Relevant in a Text Document?": An Interpretable Machine Learning Approach. Link on arxiv: https://arxiv.org/abs/1612.07843
[2]T Mikolov, I Sutskever, K Chen, GS Corrado, J Dean. Distributed representations of words and phrases and their compositionality. Neural information processing systems
Proposed by R. Sharma
email: rajesh.sharma@ut.ee
Analysis of developers’ interactions through social network data
Nowadays, software development is supported by different social media. Developers usually communicate and manage their projects by well-known social coding communities such as GitHub, Bitbucket, and JIRA. These repositories contain a wealth of information that is available to analyze and allow us to understand how developers interact and influence each other during software development. In this context, several studies have explored the developers’ interactions and their social behaviour patterns through the analysis of social networks. Your task is to provide an overview of the main findings as well as the methods and techniques that have been proposed to understand the interactions of software developers in the context of social networks.
[1] http://ieeexplore.ieee.org/document/7752419/
[2] https://arxiv.org/abs/1407.2535
[3] https://dl.acm.org/citation.cfm?id=2666571
Proposed by E. Scott and R. Sharma
email: ezequiel.scott@ut.ee
Methods and techniques to identify dependencies among User Stories
In Agile software development, requirements are usually expressed as User Stories. Although User Stories are expected to follow a fixed structure (“As <a role>, I want to <a feature> in order to <a benefit>”), they are still written by using natural language and informal descriptions. This can lead to bad quality User Stories that overlap each other in concept and cannot be schedulable and implementable in any order. In this context, some studies have explored different approaches to organize and identify dependent user stories. Your task is to provide an overview of the methods and techniques that have been proposed to deal with the dependencies among User Stories in agile contexts.
[1] https://link.springer.com/chapter/10.1007/978-3-319-06862-6_8
[2] http://ieeexplore.ieee.org /document/7549299/
[3] http://ieeexplore.ieee.org/document/7272550/
[4] https://link.springer.com/chapter/10.1007/978-3-642-13054-0_17
Proposed by E. Scott
email: ezequiel.scott@ut.ee
Team performance in agile software development
Having teams that perform well is critical for the success of a project, and well performance is often related to effectiveness. For this reason, one of the agile principles claims that teams should have room to reflect on how to become more effective and adjust their behaviour accordingly. However, there is still a need to provide a clear definition of team performance, team effectiveness, and what enables team performance. Your task is to review the current literature on definitions, models, and approaches related to team performance, and how they can be used to improve the effectiveness of agile software development teams.
[1] Lindsjørn, Y., Sjøberg, D. I., Dingsøyr, T., Bergersen, G. R., & Dybå, T. (2016). Teamwork quality and project success in software development: A survey of agile development teams. Journal of Systems and Software, 122, 274-286.
[2] Kozak, Yavuz (2013). Barriers Against Better Team Performance in Agile Software Projects. Chalmers University of Technology, Sweden.
[3] Downey, S., & Sutherland, J. (2013, January). Scrum metrics for hyperproductive teams: how they fly like fighter aircraft. In 2013 46th Hawaii International Conference on System Sciences (pp. 4870-4878). IEEE.
Proposed by E. Scott
email: ezequiel.scott@ut.ee
Issue/bug recommender systems
In agile software development, issue allocation is often based on self-assignment. That is, developers choose the issues (e.g. user stories, bug) that they will develop during the sprint according to their own preferences and experience. Industry practices give some evidence to support this method of issue allocation but how this takes place is not completely clear yet. As far we know, developers apply different strategies for self-assigning different types of issues (new feature, enhancement, bug fixation). Recommender systems have been developed to help developers to choose their issues/bugs. The goal of this project is to review the current literature on what approaches for recommending issues/bugs to developers have been proposed, what sources of information have been used, how those approaches have been evaluated, and what is the importance for agile software development.
[1] Kanwal, J., & Maqbool, O. (2010, December). Managing open bug repositories through bug report prioritization using SVMs. In Proceedings of the International Conference on Open-Source Systems and Technologies, Lahore, Pakistan (pp. 22-24).
[2] Anvik, J., Hiew, L., & Murphy, G. C. (2006, May). Who should fix this bug?. In Proceedings of the 28th international conference on Software engineering (pp. 361-370). ACM.
[3] Alenezi, M., Banitaan, S., & Zarour, M. (2018). Using Categorical Features in Mining Bug Tracking Systems to Assign Bug Reports. arXiv preprint arXiv:1804.07803.
Proposed by E. Scott
email: ezequiel.scott@ut.ee
Cases studies of fintech companies applying agile software development
Agile software development has been popular during the last decades. There many industrial sectors that rely on agile practices, ranging from e-commerce to automotive and healthcare. The financial services and fintech industry is not an exception to this; in fact, it is considered the second most relevant industrial sector that uses agile methods [1]. However, there is still a lack of understanding to which extend this sector have tailored their processes to support their particularities. Your task is to summarize the case studies of fintech companies and financial services that have applied agile practices and to determine the particularities that impact on the software development process.
[1] "The 12th annual State of Agile Development survey”, Version One, 2018, [online] Available: https://explore.versionone.com/state-of-agile/versionone-12th-annual-state-of-agile-report.
[2] Gechevski, D., Poposka, K., Angelova, B., & Gecevska, V. (2014). AGILE SOFTWARE DEVELOPMENT PRODUCTS FOR FINTECH-FINANCIAL TECHNOLOGIES. In Proceedings of the 8th International Conference on Mass Customization and Personalization– (Vol. 23, No. 6, p. 107). UNIVERSITY OF NOVI SAD–FACULTY OF TECHNICAL SCIENCES DEPARTMENT OF INDUSTRIAL ENGINEERING AND MANAGEMENT 21000 Novi Sad, Trg Dositeja Obradovića 6, Serbia.
[3] Dapp, T. F. (2017). Fintech: the digital transformation in the financial sector. In Sustainability in a Digital World (pp. 189-199). Springer, Cham.
[4] Kilu, E., Milani, F., Scott, E., & Pfahl, D. (2019, January). Agile Software Process Improvement by Learning from Financial and Fintech Companies: LHV Bank Case Study. In International Conference on Software Quality (pp. 57-69). Springer, Cham.
Proposed by E. Scott
email: ezequiel.scott@ut.ee
Lean Software Development metrics
Lean Software Development is an approach where traditional Lean manufacturing philosophies, principles, and tools are applied to software development. In recent years, Lean Software Development has gained popularity and many companies have used this approach to optimize efficiency and minimize waste in the development of their software. In this context, the use of meaningful software metrics is critical. This project aims to summarize and review the current metrics related to Lean Software Development and that are relevant to the industry as well as to describe how they should be used in a given context.
[1] Alahyari, H., Gorschek, T., & Svensson, R. B. (2019). An exploratory study of waste in software development organizations using agile or lean approaches: A multiple case study at 14 organizations. Information and Software Technology, 105, 78-94.
[2] Feyh, M., & Petersen, K. (2013). Lean software development measures and indicators-a systematic mapping study. In Lean Enterprise Software and Systems (pp. 32-47). Springer, Berlin, Heidelberg.
[3] Kupiainen, E., Mäntylä, M. V., & Itkonen, J. (2015). Using metrics in Agile and Lean Software Development–A systematic literature review of industrial studies. Information and Software Technology, 62, 143-163.
[4] Staron, M., Meding, W., & Palm, K. (2012, May). Release readiness indicator for mature agile and lean software development projects. In International Conference on Agile Software Development (pp. 93-107). Springer, Berlin, Heidelberg.
Proposed by E. Scott and F. Milani
email: ezequiel.scott@ut.ee
A Survey of Security Risks in the Permissionless Blockchain Applications
The goal of this survey is to explain security risks that can be mitigated by the permissionless blockchain applications, such as Ethereum. In addition once the insfrastructure of the system is moved to the blockchain-supported platforms, potentially new security risks can be introduced. Thus the second goal is to identify and explain the most frequent security risks, that can be found in such a type of technology.
Proposed by R. Matulevicius
email: raimundas.matulevicius@ut.ee
A Survey of Security Risks in the Permissioned Blockchain Applications
The goal of this survey is to explain security risks that can be mitigated by the permissioned blockchain applications, such as Hyperledger Fabric. In addition once the insfrastructure of the system is moved to the blockchain-supported platforms, potentially new security risks can be introduced. Thus the second goal is to identify and explain the most frequent security risks, that can be found in such a type of technology.
Proposed by R. Matulevicius
email: raimundas.matulevicius@ut.ee
A Survey of the Privacy-by-Design Techniques and Tools to Comply to GDPR
GDPR is the European regulation to guarantee privacy of the personal data when processing it. The goal of this survey is to explain the privacy by design techniques that could be applied to make the organisational processes compliant to the (selected) regulation articles. The survey should provide a comparative explanation and potentially suggest some guidelines in selecting the proper techniques.
Proposed by R. Matulevicius
email: raimundas.matulevicius@ut.ee
A Survey of the Architecture Frameworks of the IoT systems for the Security Need
Architecture of the Internet of Things systems potentially consists of various layers, components and their relationships. The goal of this survey is to explain differences and similarities between IoT architecture proposals. The major emphasis should be placed on the system and business assets to characterise various levels of the security need.
Proposed by R. Matulevicius
email: raimundas.matulevicius@ut.ee
A Survey of Modelling Techniques for the Security Risk Modelling
There exist a number of modelling techniques for security risk modelling (e.g., misuse cases, Secure Tropos, Attack trees, etc). The goal of this report is to explain what techniques should be selected for modelling some targeted security risks, e.g., distributed denial of service attacks, semantic social engineering attacks, and botnet-based attacks.
Proposed by R. Matulevicius
email: raimundas.matulevicius@ut.ee
How to make mutants resemble actual defects introduced into the code by programmers?
Mutation testing provides a way to measure the effectiveness of a test suite with regards to finding defects in a software program. A mutant is a slightly changed version of the program under test. If an existing test suite can detect the mutated line of code in the program, then the mutant has been killed – if not, the mutant is alive. The ratio between killed and alive mutants is a measure of strength of the test suite: the more mutants the test suite kills, the better. One limitation of mutation testing is that the automatically generated mutants might not correspond to actual defects introduced by programmers. To address this shortcoming, improved mutant generation methods have been proposed that help in creating potential defects that are more closely coupled with changes made by actual programmers. What are these methods? To answer this question, below follow a few references that help to get started with the literature search.
[1] [Wild-caught mutants] David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps. 2017. The care and feeding of wild-caught mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 511-522. DOI: https://doi.org/10.1145/3106237.3106280
[2] [Higher-Order Mutants] Jackson A. Prado Lima and Silvia R. Vergilio. 2017. A Multi-objective optimization approach for selection of second order mutant generation strategies. In Proceedings of the 2nd Brazilian Symposium on Systematic and Automated Software Testing (SAST). ACM, New York, NY, USA, Article 6, 10 pages. DOI: https://doi.org/10.1145/3128473.3128479
[3] [Weighted Random Mutant Selection] Ali Parsai, Alessandro Murgia, and Serge Demeyer. 2016. Evaluating random mutant selection at class-level in projects with non-adequate test suites. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE '16). ACM, New York, NY, USA, Article 11, 10 pages. DOI: https://doi.org/10.1145/2915970.2915992
Proposed by D. Pfahl
email: dietmar.pfahl@ut.ee
How to speed up mutation testing?
Mutation testing provides a way to measure the effectiveness of a test suite with regards to finding defects in a software program. A mutant is a slightly changed version of the program under test. If an existing test suite can detect the mutated line of code in the program, then the mutant has been killed – if not, the mutant is alive. The ratio between killed and alive mutants is a measure of strength of the test suite: the more mutants the test suite kills, the better. One limitation of mutation testing is that for large programs and test suites, the actual execution of all tests on all mutants might take prohibitive long time. Experiments in industry have indicated that it some situations it may take several weeks to execute all tests on all mutants. To address this shortcoming, various strategies have been proposed. What are these strategies? To answer this question, below follow a few references that help to get started with the literature search.
[1] [Using cloud computing] Sten Vercammen, Serge Demeyer, Markus Borg, and Sigrid Eldh. 2018. Speeding up mutation testing via the cloud: lessons learned for further optimisations. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '18). ACM, New York, NY, USA, Article 26, 9 pages. DOI: https://doi.org/10.1145/3239235.3240506
[2] [Systematic mutant reduction] Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '18). ACM, New York, NY, USA, 163-171. DOI: https://doi.org/10.1145/3183519.3183521
[3] [Mutation testing with focal methods] Sten Vercammen, Mohammad Ghafari, Serge Demeyer, and Markus Borg. 2018. Goal-oriented mutation testing with focal methods. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation (A-TEST 2018). ACM, New York, NY, USA, 23-30. DOI: https://doi.org/10.1145/3278186.3278190
[4] [Avoiding useless mutants] Leonardo Fernandes, Márcio Ribeiro, Luiz Carvalho, Rohit Gheyi, Melina Mongiovi, André Santos, Ana Cavalcanti, Fabiano Ferrari, and José Carlos Maldonado. 2017. Avoiding useless mutants. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017). ACM, New York, NY, USA, 187-198. DOI: https://doi.org/10.1145/3136040.3136053
[5] [Higher Order Mutation Testing] Jackson A. Prado Lima, Giovani Guizzo, Silvia R. Vergilio, Alan P. C. Silva, Helson L. Jakubovski Filho, and Henrique V. Ehrenfried. 2016. Evaluating Different Strategies for Reduction of Mutation Testing Costs. In Proceedings of the 1st Brazilian Symposium on Systematic and Automated Software Testing (SAST). ACM, New York, NY, USA, Article 4, 10 pages. DOI: https://doi.org/10.1145/2993288.2993292
[6] [Predictive Mutation Testing] Jie Zhang, Ziyi Wang, Lingming Zhang, Dan Hao, Lei Zang, Shiyang Cheng, and Lu Zhang. 2016. Predictive mutation testing. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 342-353. DOI: https://doi.org/10.1145/2931037.2931038
Proposed by D. Pfahl
email: dietmar.pfahl@ut.ee
Smart Contract Patterns
Blockchain technology is slowly maturing and part of this technology is the concept of smart contracts. Given the immutability of smart contracts, it is important to code them well. In light of this, a few papers have proposed design patterns for smart contracts. For this topic, you are to review the existing proposed patterns, categorise them, and present them as a repository of patterns. Starting point would be the following publications.
[1] Xu, Xiwei, et al. "A Pattern Collection for Blockchain-based Applications." Proceedings of the 23rd European Conference on Pattern Languages of Programs. ACM, 2018.
[2] Liu, Yue, et al. "Applying Design Patterns in Smart Contracts." International Conference on Blockchain. Springer, Cham, 2018.
[3] Bartoletti, Massimo, and Livio Pompianu. "An empirical analysis of smart contracts: platforms, applications, and design patterns." International Conference on Financial Cryptography and Data Security. Springer, Cham, 2017.
[4] Worley, Carl R., and Anthony Skjellum. "Opportunities, Challenges, and Future Extensions for Smart-Contract Design Patterns." International Conference on Business Information Systems. Springer, Cham, 2018.
[5] Wohrer, Maximilian, and Uwe Zdun. "Smart contracts: Security patterns in the ethereum ecosystem and solidity." 2018 International Workshop on Blockchain Oriented Software Engineering (IWBOSE). IEEE, 2018.
Proposed by F. Milani
email: fredrik.milani@ut.ee
Blockchain Application Architectures
Blockchain applications are growing and maturing. Given the research, there are enough applications and papers to analyse the architectures and architectural options for blockchain based applications. For this topic, you are to review papers dealign with this topic and synthesise the results.
[1] Zheng, Zibin, et al. "An overview of blockchain technology: Architecture, consensus, and future trends." 2017 IEEE international congress on big data (BigData congress). IEEE, 2017.
[2] Xu, Xiwei, et al. "A taxonomy of blockchain-based systems for architecture design." 2017 IEEE International Conference on Software Architecture (ICSA). IEEE, 2017.
[3] Cachin, Christian. "Architecture of the hyperledger blockchain fabric." Workshop on distributed cryptocurrencies and consensus ledgers. Vol. 310. 2016.
Proposed by F. Milani
email: fredrik.milani@ut.ee
HyperLedger Fabric Architecture
Fabric has become one of the most popular blockchain platforms for commercial use cases. However, Fabric is rich in its functionality and offer a wide range of design choices. For this topic, you are to review papers that present Fabric and produce a report that summarises and categories all the design choices (including rationale) available for Fabric. A good starting point would be the following papers.
[1] Cachin, Christian. "Architecture of the hyperledger blockchain fabric." Workshop on distributed cryptocurrencies and consensus ledgers. Vol. 310. 2016.
[2] Androulaki, Elli, et al. "Hyperledger fabric: a distributed operating system for permissioned blockchains." Proceedings of the Thirteenth EuroSys Conference. ACM, 2018.
[3] Brandenburger, Marcus, et al. "Blockchain and trusted computing: Problems, pitfalls, and a solution for hyperledger fabric." arXiv preprint arXiv:1805.08541 (2018).
[4] Sukhwani, Harish, et al. "Performance Modeling of Hyperledger Fabric (Permissioned Blockchain Network)." 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA). IEEE, 2018.
Proposed by F. Milani
email: fredrik.milani@ut.ee
Government Cloud
G-cloud, government cloud, or e-cloud are cloud-based solutions for governmental e-services. For this topic, the task is to review literature for the purpose of giving an overview of what G-cloud is including aspects such as naming differences, solutions, architectures, use cases, acceptance etc. A good starting point might be the following papers.
[1] Almarabeh, Tamara, Yousef Kh Majdalawi, and Hiba Mohammad. "Cloud computing of e-government." (2016).
[2] Aubakirov, Margulan, and Evgeny Nikulchev. "Development of system architecture for e-government cloud platforms." arXiv preprint arXiv:1603.08297 (2016).
[3] Mahmood, Zaigham. "Cloud computing technologies for open connected government." Cloud Computing Technologies for Connected Government. IGI Global, 2016. 1-14.
[4] Kotka, Taavi, and Innar Liiv. "Concept of Estonian Government cloud and data embassies." International Conference on Electronic Government and the Information Systems Perspective. Springer, Cham, 2015.
Proposed by F. Milani
email: fredrik.milani@ut.ee
Estonian Data embassy
Estonia has, as the first country in the world, implemented a data embassy in Luxembourg. This topic is to review and summarize, from a software solution perspective, how data embassies work, technical solutions, architecture, advantages/disadvantages, risks etc. A good starting point might be the following papers.
[1] Millard, Christopher. "Forced Localization of Cloud Services: Is Privacy the Real Driver?." IEEE Cloud Computing 2.2 (2015): 10-14.
[2] Robinson, Nick, and Keith Martin. "Distributed denial of government: The Estonian data embassy initiative." Network Security 2017.9 (2017): 13-16.
[3] Thurnay, Lőrinc, et al. "The Potential of the Estonian e-Governance Infrastructure in Supporting Displaced Estonian Residents." International Conference on Electronic Government and the Information Systems Perspective. Springer, Cham, 2017.
Proposed by F. Milani
email: fredrik.milani@ut.ee
Water-Scrum-Fall
Agile methods are very popular but there is an idea of combining agile with waterfall. Some propose it is more effective. For this topic, you are to review relevant papers and summarise what the research says on the topic. A good starting point would be the following references.
[1] West, Dave, et al. "Water-scrum-fall is the reality of agile for most organizations today." Forrester Research 26 (2011).
[2] Theocharis, Georgios, et al. "Is water-scrum-fall reality? on the use of agile and traditional development practices." International Conference on Product-Focused Software Process Improvement. Springer, Cham, 2015.
[3] Schlauderer, Sebastian, Sven Overhage, and Björn Fehrenbach. "Widely used but also highly valued? Acceptance factors and their perceptions in water-scrum-fall projects." (2015).
[4] Kuhrmann, Marco, et al. "Hybrid software and system development in practice: waterfall, scrum, and beyond." Proceedings of the 2017 International Conference on Software and System Process. ACM, 2017.
[5] Gregorio, Donna D. "How the Business Analyst supports and encourages collaboration on agile projects." 2012 IEEE International Systems Conference SysCon 2012. IEEE, 2012.
Proposed by F. Milani
email: fredrik.milani@ut.ee