Topics
Risk-based Testing Approach for Test Case Prioritization
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
It is impossible to test software systems exhaustively. If the software cannot be tested exhaustively, it must be tested selectively. Risk-based testing is an approach which analysis the software risks and aids to optimize available resources available for testing. There are mutual dependencies between risk analysis and testing processes helping us to determine the priority of tests during test automation. Your task is to provide a systematic overview of the techniques and methods that have been used for test case prioritization using risk models.
- http://www.sciencedirect.com/science/article/pii/S0164121200000194
- http://dl.acm.org/citation.cfm?id=2667940
- http://link.springer.com/article/10.1007/s10009-014-0330-5
- http://publica.fraunhofer.de/documents/N-256557.html
- http://dl.acm.org/citation.cfm?id=2667945
Automatic Test Case Generation
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Software testing has a valuable and important place in software development life cycle. The most difficult (and costly) activity during testing is test case design. Automatic test case generation will reduce time and effort of testers. Your task is to find - and categorize by type and tool support - techniques and tools used for automatic test case generation.
- http://link.springer.com.ezproxy.utlib.ut.ee/chapter/10.1007/11498490_18
- http://link.springer.com.ezproxy.utlib.ut.ee/chapter/10.1007/978-3-642-39277-1_2
- http://www.sciencedirect.com.ezproxy.utlib.ut.ee/science/article/pii/S0950584914001037
- http://dl.acm.org/citation.cfm?id=2771812&CFID=890586683&CFTOKEN=59410927
Techniques for measuring software energy consumption: A comparative analysis
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Green software engineering applies the practices and concepts of software engineering in such way that the products produced have minimal negative impact on its environment. Green software engineering domain can be explored in five dimensions; social, environmental, economics, technical and individual. As products like laptop, mobiles, tablets etc. are widely used today, one of the concerns within the technical domain is to find out how much energy is consumed by these devices at the software level and how can we achieve the same performance by minimizing the energy consumption. Your task is to produce a comparative analysis of the techniques used for measuring software energy consumption.
- Becker, C., Chitchyan, R., Duboc, L., Easterbrook, S., Penzenstadler, B., Seyff, N., & Venters, C. C. (2015). Sustainability Design and Software: The Karlskrona Manifesto. Proceedings - International Conference on Software Engineering, 2, 467–476. https://doi.org/10.1109/ICSE.2015.179
- Calero, C., & Piattini, M. (2015). Green in software engineering. Green in Software Engineering, 1–327. https://doi.org/10.1007/978-3-319-08581-4
- Chowdhury, S. A., & Hindle, A. (2016). GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora. Msr ’16. https://doi.org/10.1145/2901739.2901763
Evaluating the impact of refactoring code smells on the power consumption of android apps.
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Code smells point to areas in software applications that could benefit from refactoring. Refactoring is defined as a technique for restructuring an existing body of code, changing its internal structure without changing its external behavior. Choosing not to resolve code smells by refactoring will not result in the application failing to work but will likely increase the difficulty of maintaining it. Thus refactoring helps to improve the maintainability of an application. Given that maintenance is considered the most expensive stage of software development and that the proportion of maintenance cost over total software cost is increasing, refactoring is supposed to save development effort in the long run. Research suggests, however, that while refactoring might improve maintenance it might not reduce – or even increase – power consumption. Your task is to find and analyse literature with regards to evidence for effects of refactoring on (1) maintainability and (2) power consumption of software systems.
- Investigating the Energy Impact of Android Smells (2017). http://ieeexplore.ieee.org.ezproxy.utlib.ut.ee/document/7884614/?reload=true
- Understanding Code Smells in Android Applications (2016). https://dl.acm.org/citation.cfm?id=2897094
- Refactoring for Energy Efficiency: A Reflection on the State of the Art (2015). http://ieeexplore.ieee.org.ezproxy.utlib.ut.ee/document/7168335/
- How Do Code Refactoring Affect Energy Usage? (2014). https://dl.acm.org/citation.cfm?id=2652538
Automatic extraction of app features from App Reviews: Evaluation on a benchmark dataset
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
AppStore or Playstore platforms enable app users to give feedback in the form of reviews. Users express opinions towards various aspects of an app (feature evaluations, feature request, bugs/issues, praise) in these reviews. Over the past few years, research has been conducted for automatic identification, extraction, and summarization of this information to help software developers in improving their apps. In the same direction, few studies have attempted to automatically extract fine-grained app features from user reviews to perform feature-based sentiment analysis. Since each study use their own dataset (training and test) for the evaluation of proposed feature extraction technique. So, the accuracy of these techniques on a different dataset has not been evaluated. Thus, the purpose of this study is to evaluate or cross validate existing feature-extraction techniques on a common benchmark dataset.
- http://ieeexplore.ieee.org/document/7372064/
- http://ieeexplore.ieee.org/document/6912257/
- http://www.lrec-conf.org/proceedings/lrec2016/pdf/59_Paper.pdf
- http://dl.acm.org/citation.cfm?id=2916003
Evaluation of Sentiment analysis tools for Feature-based sentiment analysis on App Reviews
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
A recent study has shown that sentiments analysis tools do not agree with sentiments recognized by human annotators for software engineering domain because these tools have been trained on product reviews or customer reviews. Over the past few years, app reviews emerged as a special kind of software repository mining provides a wealth of information for developers to improve mobile apps based on users feedback. Existing research has used sentiment analysis tools, such as SentiStrength, Stanford NLP sentiment analyzer/deeply moving) for feature-based sentiment analysis of app reviews. So, the purpose of this study is to evaluate to what extent sentiment analysis tools agree with human annotators for the task of feature-based sentiment analysis on app reviews.
- http://link.springer.com/article/10.1007/s10664-016-9493-x
- http://ieeexplore.ieee.org/document/7372064/
- http://ieeexplore.ieee.org/document/6912257/
Data Mining for Technical Debt Analysis
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
The metaphor of technical debt in software development was introduced two decades ago to explain to nontechnical stakeholders the need for what we call now "refactoring." As the term is being used to describe a wide range of phenomena, decision support for managing technical debt requires clarity about the definition of debt items that can be identified and approaches to measure technical debt. Your task is to survey and summarize the existing literature related to technical debt identification and measurement. Particularly interesting are are automatic approaches to measuring technical debt using data mining.
- http://ieeexplore.ieee.org/abstract/document/6336722/
- http://dl.acm.org/citation.cfm?id=2666044
- http://dl.acm.org/citation.cfm?id=2892643
How to make mutants resemble actual defects introduced into the code by programmers?
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Mutation testing provides a way to measure the effectiveness of a test suite with regards to finding defects in a software program. A mutant is a slightly changed version of the program under test. If an existing test suite can detect the mutated line of code in the program, then the mutant has been killed – if not, the mutant is alive. The ratio between killed and alive mutants is a measure of strength of the test suite: the more mutants the test suite kills, the better. One limitation of mutation testing is that the automatically generated mutants might not correspond to actual defects introduced by programmers. To address this shortcoming, improved mutant generation methods have been proposed that help in creating potential defects that are more closely coupled with changes made by actual programmers. What are these methods? To answer this question, below follow a few references that help to get started with the literature search.
- [Wild-caught mutants] David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps. 2017. The care and feeding of wild-caught mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 511-522. DOI: https://doi.org/10.1145/3106237.3106280
- [Higher-Order Mutants] Jackson A. Prado Lima and Silvia R. Vergilio. 2017. A Multi-objective optimization approach for selection of second order mutant generation strategies. In Proceedings of the 2nd Brazilian Symposium on Systematic and Automated Software Testing (SAST). ACM, New York, NY, USA, Article 6, 10 pages. DOI: https://doi.org/10.1145/3128473.3128479
- [Weighted Random Mutant Selection] Ali Parsai, Alessandro Murgia, and Serge Demeyer. 2016. Evaluating random mutant selection at class-level in projects with non-adequate test suites. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE '16). ACM, New York, NY, USA, Article 11, 10 pages. DOI: https://doi.org/10.1145/2915970.2915992
How to speed up mutation testing?
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Mutation testing provides a way to measure the effectiveness of a test suite with regards to finding defects in a software program. A mutant is a slightly changed version of the program under test. If an existing test suite can detect the mutated line of code in the program, then the mutant has been killed – if not, the mutant is alive. The ratio between killed and alive mutants is a measure of strength of the test suite: the more mutants the test suite kills, the better. One limitation of mutation testing is that for large programs and test suites, the actual execution of all tests on all mutants might take prohibitive long time. Experiments in industry have indicated that it some situations it may take several weeks to execute all tests on all mutants. To address this shortcoming, various strategies have been proposed. What are these strategies? To answer this question, below follow a few references that help to get started with the literature search.
- [Using cloud computing] Sten Vercammen, Serge Demeyer, Markus Borg, and Sigrid Eldh. 2018. Speeding up mutation testing via the cloud: lessons learned for further optimisations. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '18). ACM, New York, NY, USA, Article 26, 9 pages. DOI: https://doi.org/10.1145/3239235.3240506
- [Systematic mutant reduction] Goran Petrović and Marko Ivanković. 2018. State of mutation testing at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '18). ACM, New York, NY, USA, 163-171. DOI: https://doi.org/10.1145/3183519.3183521
- [Mutation testing with focal methods] Sten Vercammen, Mohammad Ghafari, Serge Demeyer, and Markus Borg. 2018. Goal-oriented mutation testing with focal methods. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation (A-TEST 2018). ACM, New York, NY, USA, 23-30. DOI: https://doi.org/10.1145/3278186.3278190
- [Avoiding useless mutants] Leonardo Fernandes, Márcio Ribeiro, Luiz Carvalho, Rohit Gheyi, Melina Mongiovi, André Santos, Ana Cavalcanti, Fabiano Ferrari, and José Carlos Maldonado. 2017. Avoiding useless mutants. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017). ACM, New York, NY, USA, 187-198. DOI: https://doi.org/10.1145/3136040.3136053
- [Higher Order Mutation Testing] Jackson A. Prado Lima, Giovani Guizzo, Silvia R. Vergilio, Alan P. C. Silva, Helson L. Jakubovski Filho, and Henrique V. Ehrenfried. 2016. Evaluating Different Strategies for Reduction of Mutation Testing Costs. In Proceedings of the 1st Brazilian Symposium on Systematic and Automated Software Testing (SAST). ACM, New York, NY, USA, Article 4, 10 pages. DOI: https://doi.org/10.1145/2993288.2993292
- [Predictive Mutation Testing] Jie Zhang, Ziyi Wang, Lingming Zhang, Dan Hao, Lei Zang, Shiyang Cheng, and Lu Zhang. 2016. Predictive mutation testing. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 342-353. DOI: https://doi.org/10.1145/2931037.2931038
How to attack the oracle problem with metamorphic testing?
Proposer: Dietmar Pfahl (dietmar.pfahl@ut.ee)
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. Your task is to describe what metamorphic testing is and how it helps overcome the oracle problem. This could be done with the help of applications reported in the literature.
- E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering 41, 5 (May 2015), 507–525. https://doi.org/10.1109/TSE.2014.2372785
- J. Brown, Z. Q. Zhou, and Y.-W. Chow. 2018. Metamorphic Testing of Navigation Software: A Pilot Study with Google Maps. In Proceedings of the 51st Annual Hawaii International Conference on System Sciences (HICSS-51). 5687–5696. Available: http://hdl.handle.net/10125/50602
- T. Y. Chen, F.-C. Kuo, H. Liu, P. L. Poon, D. Towey, T. H. Tse, and Z. Q. Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Computing Surveys 51, 1 (2018), 4:1–4:27. https://doi.org/10.1145/3143561
- T. Y. Chen, F.-C. Kuo, W. Ma, W. Susilo, D. Towey, J. Voas, and Z. Q. Zhou. 2016. Metamorphic Testing for Cybersecurity. Computer 49, 6 (June 2016), 48–55. https://doi.org/10.1109/MC.2016.176
- S. Segura, G. Fraser, A. Sanchez, and A. Ruiz-Cortés. 2016. A Survey on Metamorphic Testing. IEEE Transactions on Software Engineering 42, 9 (Sept 2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875
- Z. Q. Zhou, S. Xiang, and T. Y. Chen. 2016. Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering 42, 3 (March 2016), 264–284. https://doi.org/10.1109/TSE.2015.2478001
Architectural Principles for Scalable Web applications
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Scalability is a crucial requirement in (enterprise) Web applications. Building scalable Web applications, particularly when resources are limited, requires a disciplined architectural approach. Your task is to critically review the literature on scalable Web architectures and to synthesize a set of basic concepts and principles that should be kept in mind when approaching scalability requirements in Web applications. Here are some starting points:
- http://aosabook.org/en/distsys.html
- http://aosabook.org/en/nginx.html
- https://blog.hartleybrody.com/scale-load/
Function-as-a-Service Architectures
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Function-as-a-Service (FaaS) is an architectural approach to develop cloud-based software applications in a way that decouples software development from the specific infrastructure in which the application is deployed. This architectural approach forces a separation between stateful components of a software application and its stateless (functional) components. FaaS has proved to be useful in a variety of settings. Your task is to conduct a critical review of the literature to address the following questions: What are the main elements of a FaaS architecture? What are the benefits of FaaS applications? What is the scope of applicability of FaaS architectures (i.e. for what type of applications it would be most useful)? And how does FaaS relates to microservice architectures? Here are some initial pointers:
- https://ieeexplore.ieee.org/document/8894540
- https://research.aalto.fi/files/30436087/mohanty2018serverless.pdf
- https://dl.acm.org/doi/abs/10.1145/3185768.3186308
Weaknesses and limitations of the F-score measure
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
F-score is a commonly used measure for assessing the quality of information retrieval systems and binary classification models. However, this measure has a few weaknesses. Your task is to critically read the literature on the limitations of F-score and to synthesize some recommendations regarding when and how to use the F-score measure and which other measures can be used to address the shortcomings of the F-score. Here are two initial pointers:
- https://arxiv.org/ftp/arxiv/papers/1503/1503.06410.pdf
- https://pdfs.semanticscholar.org/9046/27c2d5a91ab8cb1b682e42f06f1ca192aea6.pdf
- https://ieeexplore.ieee.org/document/5128907?denied=
- https://www.sciencedirect.com/science/article/pii/S2210832718301546
Turning predictions into actions with uplift modeling
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Uplift modeling is an approach to estimate the benefits of taking an action (or "applying a treatment") to achieve a given outcome versus not applying the treatment. Uplift modeling has a wide range of business applications. For example, it can be used to estimate the benefit of giving a discount to a customer in order to entice them to buy a product, versus not doing so. Your task is to conduct a literature review and critical analysis in order to address the following questions: What types of business applications can benefit from uplift modeling? What techniques for uplift modeling are available and what are their relative pros and cons? Below are some inital pointers:
- https://towardsdatascience.com/uplift-modeling-e38f96b1ef60
- http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf.
- http://stochasticsolutions.com/pdf/sig-based-up-trees.pdf
- https://humboldt-wi.github.io/blog/research/applied_predictive_modeling_19/multiple_treatments_uplift/
Privacy-preserving process mining
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Process mining is a family of techniques for analyzing business processes based on event logs extracted from information systems. Mainstream process mining tools are designed for intra-organizational settings, insofar as they assume that an event log is available for processing as a whole. The use of such tools for inter-organizational process analysis is hampered by the fact that such processes involve independent parties who are unwilling to, or sometimes legally prevented from, sharing detailed event logs with each other. For example, a typical healthcare treatment process involves multiple heathcare providers (multiple clinics, laboratories, and health insurance providers). Everyone would benefit from being able to analyze the end-to-end patient treatment process, but in many jurisdictions, the organizations involved are not allowed to share private patient data with each other.
Several methods and tool have been proposed to enable cross-organizational process mining. Some rely on anonymization. Others rely on cryptographic protocols such as secure multi-party computation protocols. Your task is to review the state-of-the art in this field and to identify which technologies are ready to be transferred, and which ones require further development in order to mature.
- http://ceur-ws.org/Vol-2420/paperDT9.pdf
- https://link.springer.com/article/10.1007%2Fs12599-019-00613-3
- https://ieeexplore.ieee.org/document/8786060
- https://arxiv.org/abs/1912.01855
Microservice Design Patterns and Smells
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Microservice architectures have become widely used in the past decade. A microservice architecture allows multiple autonomous teams to work effectively towards building large-scale systems. Also, when carefully designed, microservice architectures have appealing properties from the perspective of scalability and fault isolation. However, poorly designed and governed microservice architectures can create long-term maintainability issues. It is therefore important to follow proven patterns when designing microservices and to periodically refactor the system in order to prevent certain "smells". A number of microservice design patterns and microservice "smells" have been proposed in the literature. Your task is to review and synthesize these patterns and smells. Some pointers:
- https://arxiv.org/abs/1906.01553
- http://eprints.cs.univie.ac.at/6230/1/MAP-MS2019PaperOASIcs.pdf
- https://arxiv.org/abs/1609.05830
- https://microservices.io/patterns/
Demand forecasting methods for the retail sector
Proposer: Marlon Dumas (marlon.dumas@ut.ee)
Forecasting future demand for different types of products is a crucial task in supply chain management, particularly in the retail sector. It allows purchasing managers to manage their inventory efficiently and effectively, by avoiding out-of-stocks while maintaining relatively low inventory levels. Your task is to review different methods for demand forecasting, with a focus on demand forecasting in retail companies, and to provide a short-list of state-of-the-art methods and tools in this field.
- https://eprints.lancs.ac.uk/id/eprint/128587/2/retail_forecasting_review_180924_v9.2_1_.pdf
- https://aaltodoc.aalto.fi/bitstream/handle/123456789/37915/master_Moskalev_Artem_2019.pdf
- https://www.altexsoft.com/blog/demand-forecasting-methods-using-machine-learning/
- https://www.sciencedirect.com/science/article/abs/pii/S0148296317301376
Modelling Languages for Blockchain Applications
Proposers: Mubashar Iqbal and Prof. Raimundas Matulevičius (raimundas.matulevicius [ät] ut.ee)
While designing and developing blockchain applications (dApps) developers need to deal with the principles of distributed ledger, chains of blocks, smart contracts, crypto-hashes and other domain specific concepts. However the standard modelling languages (e.g., BPMN, UML, Archimate) does not contain constructs to represent dApp components. Although there exists a few attempts to enrich the modelling languages, but these mainly result in the model annotations and they do not include systematic extensions of the modelling language. The main goal of this topic is to develop the semantic and syntactic modelling constructs to support modelling of the blockchain applications. The main steps of the research include:
- Review the literature for the blockchain application modelling
- Define the dApp modelling domain
- Develop the semantics, concrete and abstract syntax for the dApp modelling (this could be done either as extensions of the existing standard languages or as a proposal of the new modelling language)
- Illustrate feasibility of the proposal in the dApp modelling example.
The topic is associated with the BlockNet project (https://www.knf.vu.lt/en/blocknet).
- Iqbal M., Matulevičius R. (2019) Comparison of Blockchain-Based Solutions to Mitigate Data Tampering Security Risk. In: Di Ciccio C. et al. (eds) Business Process Management: Blockchain and Central and Eastern Europe Forum. BPM 2019. Lecture Notes in Business Information Processing, vol 361. Springer, Cham
- Filipova S.: Modeling Business Processes on a Blockchain Ecosystem using CMMN, Master thesis, 2019, University of Tartu
- Markovska M.: Modelling Business Processes on a Blockchain Eco-System (BPMN), Master thesis, 2019, University of Tartu
- https://www.knf.vu.lt/en/blocknet
Information System Security Risk Management in the Autonomous Driving Vehicles. Thesis topic.
Proposers: Abasi-Amefon O. Affia and Prof. Raimundas Matulevičius (raimundas.matulevicius [ät] ut.ee)
Autonomous driving vehicles characterise a complex cyber-physical system. It uses a network, sensors, and electronic control unit (ECU) to control functions of the vehicle and to connect this vehicle to other system entities (e.g., other connected vehicles, road side equipment, and traffic management centres). This way it exchanges the information about the car location, environment, direction, condition of driving, and information necessary for vehicle’s device control. However, such a system could suffer from various security risks. For example, an attacker could establish a connection between the attacker’s device and target vehicle. Security risks could be mitigated by limiting the VMM port functionality, by monitoring the incoming information and by blocking the abnormal requests/services. The goals of this topic are:
- Explain the system and business assets in the autonomous driving vehicles;
- Assess the security risks in the autonomous driving vehicles;
- Analyse the trade-offs in order to define the best suited countermeasures to mitigate these risks.
To reach the above goals you would use the information systems security risk management approach combined with the model-driven and data analysis methods. The approach includes: (1) systematic explanation of the architecture of the connected autonomous vehicle, thus resulting in the models for the system and business assets; (2) definition of security needs (e.g., regarding the vehicle’s tire pressure data, fuel level data, braking service, gearing service, information in emergency situation, infotainment services, firmware, and etc.); (3) systematic analysis and estimation of the security risks using the data analysis methods; (4) reasoning and taking the security risk treatment decision; (5) elicitation of the security requirement; (6) recommendation to implement security controls regarding the secure network services, communication, data privacy, secure software/firmware, physical security, access control, data input, fault tolerance, and others.
This topic is associated to the BOLT project.
- Affia A-a. O., Matulevičius R., Nolte A.: Security Risk Management in Cooperative Intelligent Transportation Systems: A Systematic Literature Review. Panetto H. et al. (eds.), LNCS 11877, CoopIS 2019, Springer
- https://www.cs.ut.ee/en/news/bolt-kicks-self-driving-technology-research-partnership-university-tartu
Future, Challenges, Trend of EEG-based Brain Computer Interface (BCI) Applications
Proposer: Yar Muhammad (Yar dot Muhammad [ät] ut [dot] ee)
Brain computer interface (BCI) is a device by which a person could manipulate computer or other device via computer by one`s thoughts without using ordinary methods of working with computer (e.g. using hands). The BCI applications have been categorized based on domain (medical or non-medical) and by field describing in more detail the current trends in BCI applications development. One of the purpose of the study is to introduce the current trends and possibilities for development of EEG based BCI applications. The study shows that although the initial starting point for BCI applications development has been medical need the non-medical applications are in currently in rapid development. The initial BCI application development has started from the need to allow locked in patients to communicate with others and possibly take part in daily life via control of computer and external devices via brainwaves and is moving in daily life of ordinary people with applications monitoring their attention or developed for entertainment.
- https://pdfs.semanticscholar.org/0059/23c473094016346b4646e9c094b34569dc28.pdf
- https://iopscience.iop.org/article/10.1088/1361-6579/aad57e/pdf
Comparative Analyses on potential models for EEG-based BCI Applications
Proposer: Yar Muhammad (Yar dot Muhammad [ät] ut [dot] ee)
In this study focus will be comparative analyses on potential models such as NeuCube, EEGNet, Shallow ConvNet, Deep ConvNet, etc for EEG-based Brain Computer Interface (BCI) applications. Finally, conclusion will be drawn based on robust learning, accuracy, performance and efficiency. NeuCube: NeuCube model [1] for many brain computer interface (BCI) applications on Spatio- and spectro-temporal brain data (STBD). EEGNet: CNN architecture is based on a cross-paradigm CNN EEG classifier called EEGNet [2].
- Nikola K.Kasabov, NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data, Neural Netw. 52, 62-76 (2014)
- V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, ‘EEGNet: A compact convolutional neural network for EEG-based brain-computer interfaces’, J. Neural Eng., vol. 15, no. 5, pp. 1–30, 2018, https://doi.org/10.1088/1741-2552/aace8c
How Transfer Learning techniques are used and will be used in BCI
Proposer: Yar Muhammad (Yar dot Muhammad [ät] ut [dot] ee)
Brain computer interface (BCI) is a device by which a person could manipulate computer or other device via computer by one`s thoughts without using ordinary methods of working with computer (e.g. using hands). Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem, For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited. From the practical standpoint, reusing or transferring information from previously learned tasks for the learning of new tasks has the potential to significantly improve the sample efficiency of a reinforcement learning agent. Transfer learning extracts information from different domains (raw data, features, or classification domain) to compensate the lack of labelled data from the test subject
- https://pdfs.semanticscholar.org/0059/23c473094016346b4646e9c094b34569dc28.pdf
- https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
Analysis, formalization, critique and application of Ross’ Business Rules Diagrams. Thesis topic.
Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee)
The purpose of this Master’s project is to conduct an analysis of the feasibility of the notation for business rules proposed by Ronald G. Ross – Ross’ Business Rules Diagrams (see, for example, http://www.businessrulesgroup.org/first_paper/br01c0.htm), relying, among other materials, on the paper by The Master’s project also find out if and how Ross’ Business Rules Diagrams can be formalized by, e.g., predicate logic, Object Role Modeling, Object Constraint Language, etc. The thesis should relate Ross Business Rules’ diagrams to goal and domain models by Sterling, & Taveter (2009) and Miller, Lu, Sterling, Beydoun, & Taveter (2014). Finally, the thesis should compile an honest analysis of the applicability of Ross’ Business Rules Diagrams, based on real-life case studies preferably by the Master’s student herself/himself. In addition, the Master’s student should find out if Ross' Business Rules Diagrams are geared towards relational databases or are equally well usable in the context of No-SQL databases.
- Taveter, K., & Wagner, G. (2001). Agent-Oriented Enterprise Modeling Based on Business Rules. In: H.S. Kunii, S. Jajodia, and A. Solvberg (Eds.): Conceptual Modeling (ER 2001). Springer, 527-540.
- Sterling, L., & Taveter, K. (2009). The Art of Agent-Oriented Modeling. Cambridge, MA, and London, England: MIT Press.
- Miller, T., Lu, B., Sterling, L., Beydoun, G., & Taveter, K. (2014). Requirements Elicitation and Specification Using the Agent Paradigm: The Case Study of an Aircraft Turnaround Simulator. IEEE Transactions on Software Engineering, 40, 1007-1024.
Goal modelling for Xatkit. Thesis topic.
Proposer: Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee) and Jordi Cabot
Xatkit (https://xatkit.com) is an open source platform that allows anyone to easily create and deploy single chatbots using a domain-specific high-level chatbot definition language. Xatkit takes care of translating these chatbot specifications into the actual running bot. This specification involves a set of intents, where each intent represents a possible intention the customer has when interacting with the bot and for each intent the corresponding reaction to be executed – either a text reply as a part of the conversation, a call to an external service, possibly involving a person, or both. Intents are recognized via a Natural Language Understanding (NLU) component. The purpose of the Master’s thesis is to explore possible role of goal modelling as put forward by Sterling & Taveter (2009) and Miller, Lu, Sterling, Beydoun, & Taveter (2014) in deciding the intentions the customer might have when interacting with the chatbot. The resulting intentions are then specified as the intents for the chatbot. This method is based on systematic hierarchical modelling of the goals of the customer, including functional goals, answering the question “What should be accomplished?” as well as quality goals, characterizing what qualities should be considered when achieving the functional goals and emotional goals, characterizing what the customer should or should not feel when interacting with the chatbot to achieve the functional goals.
- Sterling, L., & Taveter, K. (2009). The Art of Agent-Oriented Modeling. Cambridge, MA, and London, England: MIT Press.
- Miller, T., Lu, B., Sterling, L., Beydoun, G., & Taveter, K. (2014). Requirements Elicitation and Specification Using the Agent Paradigm: The Case Study of an Aircraft Turnaround Simulator. IEEE Transactions on Software Engineering, 40, 1007-1024.
Design and implementation of a probabilistic cognitive architecture for predictive processing in the brain. Thesis topic.
Proposer: Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee)
The purpose of this Master’s project is to design and implement a deep probabilistic cognitive architecture for predictive processing in the brain. The implemented architecture would help us to get a better insight of how humans behave within sociotechnical systems, which these days is an acute research topic for many large corporations. The thesis should design and implement a deep probabilistic cognitive architecture for predictive processing in the brain. The architecture should be implemented in such a way that it could be right away used for experimenting. The implementation should be well documented for further development. The architecture should be implemented in a mainstream imperative programming language, such as Python. The thesis should also investigate computational complexity of the solution, as is described by Kwisthout & van Rooij (2019), and should propose a way how heuristics can be used for overcoming computational complexity for certain applications or domains. The outcome would be similar to how the appraisal theories of emotion have been simulated by software agents, as has been reported by Si, Marsella, & Pynadath (2010) but it would be based on a different paradigm – constructional view of how the brain works.
- The Bayesian Brain: An Introduction to Predictive Processing (https://www.mindcoolness.com/blog/bayesian-brain-predictive-processing/)
- Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt.
- Pfeffer, A., & Lynn, S. K. (2018). Scruff: A Deep Probabilistic Cognitive Architecture for Predictive Processing. In Biologically Inspired Cognitive Architectures Meeting (pp. 245-259). Springer, Cham.
- Si, M., Marsella, S. C., & Pynadath, D. V. (2010). Modeling appraisal in theory of mind reasoning. Autonomous Agents and Multi-Agent Systems, 20(1), 14.
- Kwisthout, J., & van Rooij, I. (2019). Computational Resource Demands of a Predictive Bayesian Brain. Computational Brain & Behavior, 1-15.
The dark side of hackathons - diversity and technical solutions
Proposer: Alexander Nolte (alexander dot nolte [ät] ut [dot] ee)
Hackathons have received wide spread attention in various domains in recent years. They are organized to with the aim to create innovative technical solutions that can be turned into products, tackle social and environmental issues, support informal and collaborative learning and to create new or expanding existing communities just to name a few. There is however also a growing concern that hackathons favor individuals that possess specific technical expertise which in turn can be intimidating to individuals who do not possess this expertise or who do not perceive themselves to be proficient enough to participate. Moreover the short time span of a hackathon results in a strong focus of participants just going for the first and allegedly easiest technical solution to a problem without understanding it properly. The aim of this topic is to conduct a comprehensive literature review covering the current state-of-the-art of research on potential issues of hackathons related to a lack of diversity and an unhealthy focus on technical solutionism. The review should outline current issues with the hackathon format, point towards potential solutions or approaches to overcome them and discuss open questions.
- Hope, A., D'Ignazio, C., Hoy, J., Michelson, R., Roberts, J., Krontiris, K., & Zuckerman, E. (2019, May). Hackathons as participatory design: iterating feminist utopias. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-14).
- Irani, L. (2015). Hackathons and the making of entrepreneurial citizenship. Science, Technology, & Human Values, 40(5), 799-824.
- Morozov, E. (2013). To save everything, click here: The folly of technological solutionism.
Passenger interaction with autonomous ride-hailing vehicles
Proposer: Alexander Nolte (alexander dot nolte [ät] ut [dot] ee)
There has been a surge of interest in the development of user interfaces for autonomous vehicles in recent years. The main focus in this context has been on supporting the driver to re-take control of a vehicle particularly in case the built in automation fails. Moreover most work has focused on a scenario where the car is operated by its owner. Little attention has been paid so far to the potential of using autonomous vehicles for ride-haling services such as Bolt, Uber, Lyft and others. This context is significantly different in that the passenger does not have time to get used to the car and in that s/he typically sits in the back with no opportunity to take control. The aim of this topic is thus to conduct a comprehensive literature review covering the current state-of-the-art of research on passenger interaction with autonomous vehicles. The review should outline current practices, open questions and shortcomings.
- Mirnig, A. G., Gärtner, M., Wallner, V., Trösterer, S., Meschtscherjakov, A., & Tscheligi, M. (2019). Where Does It Go? A Study on Visual On-Screen Designs for Exit Management in an Automated Shuttle Bus. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (pp. 233-243).
- Oliveira, L., Luton, J., Iyer, S., Burns, C., Mouzakitis, A., Jennings, P., & Birrell, S. (2018). Evaluating how interfaces influence the user interaction with fully autonomous vehicles. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (pp. 320-331).
Quo vadis entrepreneurial hackathons?
Proposer: Alexander Nolte (alexander dot nolte [ät] ut [dot] ee)
Time bounded events such as hackahons, data dives, codefests, hack-days, sprints and edit-a-thons have received wide spread attention in recent years. Events that are organized with the aim to support teams to develop innovative products and services that can be turned into successful start-ups have been at the forefront of this recent surge. The question however remains how hackathons and entrepreneurship are actually connected. The aim of this topic is to conduct a comprehensive literature review covering the current state-of-the-art of research on entrepreneurial hackathons and their connection to the startup scene. This review should outline current knowledge as well as open questions and shortcomings.
- Cobham, D., Hargrave, B., Jacques, K., Gowan, C., Laurel, J., & Ringham, S. (2017). From hackathon to student enterprise: an evaluation of creating successful and sustainable student entrepreneurial activity initiated by a university hackathon.
- Komssi, M., Pichlis, D., Raatikainen, M., Kindström, K., & Järvinen, J. (2015). What are hackathons for?. IEEE Software, 32(5), 60-67.
- Taylor, N., & Clarke, L. (2018, April). Everybody's Hacking: Participation and the Mainstreaming of Hackathons. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 172). ACM.
Multiple facets of XAI. Thesis topic.
Proposer: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)
Black box models under Artifical Intelligence (AI) have outperformed classical machine learning white box models in general. However, there are many domains (finances, biomedical, etc.) where researchers are not only interested in efficient (or almost accurate) models but also in transparency. By transparency we mean the model should be able to describe how and why a certain conclusion has been drawn. A new domain of AI, called as Explinable AI (XAI) deals with that. There have been many different research and philosophical paths which have started. For example, some deals with pre-modeling phase, where they understand the biasness present in the model and others deals with in-modeling phase to understand the fairness of the models. In this thesis, we expect student to perform a regorous study, comprising of various research papers dealing with multiple aspects of XAI. The outcome of this study could be a review article which uses a ontological based mechanism to summarise the multiple facets of XAI.
- [1] Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
- [2] Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5), 1-42.
- [3] Adadi, Amina, and Mohammed Berrada. "Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI)." IEEE Access 6 (2018): 52138-52160.
A user based study to understand XAI in the legal sector. Thesis topic.
Proposers: Rajesh Sharma and Martin Ebers (Prof. of Law, UT) (rajesh dot sharma [ät] ut [dot] ee)
Like any other sector, AI has also been explored in the legal sector for predicting the outcome of the legal cases. However, it has been shown that AI systems being used in the Justice and Legal systems have also been affected by the problem of biasing which lead to wrongful prediction of high risk level w.r.t sentencing [4], criminal behavior [5], and recidivism [3]. There has been very limited work under legal and justice domain such as [1], [2], where authors mainly investigated individual fairness metric. The student is expected to study possible impact of fairness and biases (especially) on AI models in the legal sector. If the student picks up the topic for master thesis later, we will provide a legal dataset which can be used for proposing explainable AI models for predicting the outcome of the legal cases. Alternatively, the student can perform user study (surveys and questionaires) to understand the contexutual definition of explainability among the multiple parties involve in the legal sector. The outcome of this user study would highlight the importance of checking the fairness and bias while employing AI models for classification tasks.
- [1] Grgic-Hlaca, Nina & Engel, Christoph & Gummadi, Krishna. (2019). Human Decision Making with Machine Assistance: An Experiment on Bailing and Jailing. 10.1145/3359280.
- [2] H Wang, N Grgic-Hlaca, P Lahoti, KP Gummadi, A Weller. An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision. arXiv preprint arXiv:1910.10255
- [3] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias: There’s Software Used Acrossthe Country to Predict Future Criminals. And it’s Biased Against Blacks.https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016. (This citation format is from Krishna’s paper)
- [4] John Lightbourne, Damned Lies & Criminal Sentencing Using Evidence-Based Tools, 15 Duke Law & Technology Review 327-343 (2017)
- [5] R. Berk and J. Bleich, ‘‘Statistical procedures for forecasting criminal behavior: A comparative assessment,’’ Criminol. Public Policy, vol. 12,
no. 3, pp. 513–544, 2013
Transfer Learning Based Explainable AI method for detecting tweets related to drug use. Thesis topic.
Proposers: Rajesh Sharma and Vijay Mago (Prof. in Computer Science, Lakehead University, Canada) (rajesh dot sharma [ät] ut [dot] ee)
Twitter is one of the most predominant platforms for expressing opinions and statements related to various topics. Among various topics, drug use is one of the most concerned topics as 1) government agencies are very much concerned about drug abuse and importantly 2) the nature of hidden slang and non-explicit expressions being used among the drug usage community to keep the conversations secret. In this work, we plan to use transfer learning techniques by training on a manually labeled dataset (~3000 tweets) for classifying tweets related to drug usage. As a master student, if you pick this topic, we expect you to perform an extensive study regarding transfer learning, especially in the domain of text analytics. If you choose this topic for future master thesis, the dataset will be provided for performing data science predicting techniques, that is, transfer learning and other deep learning techniques. We also plan to make these models explainable so that it would be helpful for governmental agencies to keep track of online social media conversations regarding drug use.
- [1] Pan, Sinno Jialin, and Qiang Yang. "A survey on transfer learning." IEEE Transactions on knowledge and data engineering 22.10 (2009): 1345-1359.
- [2] Do, Chuong B., and Andrew Y. Ng. "Transfer learning for text classification." Advances in Neural Information Processing Systems. 2006.
Fake news detection using Graph Neural Networks
Proposer: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)
In news articles there could be various entities (such as individuals, events, places etc) which might or might not be related to each other. The possible relations among the entities of an article can be used to create (knowledge) graphs*. On such Knowledge graphs, we plan to apply Graph neural networks is an emerging area, which lies at the intersection of graph theory and deep learning for detecting whether a news articles is fake or not.
- knowledge graphs is a service by Google where it reprsents the entities in infobox which are related to each other, stating the fact that data points are connected.
- [1] Popping, Roel. "Knowledge graphs and network text analysis." Social Science Information 42.1 (2003): 91-106.
- [2] Zhou, Jie, et al. "Graph neural networks: A review of methods and applications." arXiv preprint arXiv:1812.08434 (2018).
- [3] Zhou, Xinyi, and Reza Zafarani. "Fake news: A survey of research, detection methods, and opportunities." arXiv preprint arXiv:1812.00315 (2018).
Identifying Fake News using Linked Data and Network Science Approaches. Thesis topic.
Proposers: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Deepak Padmanabhan (Queen's University Belfast, U.K)
Fake news is often generated with malicious intent of spreading misinformation and for spreading rumours. The content in fake news is generally created to mislead readers in order to gain financially or politically, as well as to grab attention. Apart from social media such as Twitter and Facebook and Whatsapp, there are dedicated news agencies that propagate fake news. The goal of this thesis is to use the content present in the news stories to identify as fake or not by using “Linked Data” in combination with “Network Science” approaches. The linked data approach will be used for identifying fake news indicators such as enhanced topical scatter in news content to be analyzed. The network science approach will be used for identifying the similarity among the topics of the content to boost accuracy of fake news detection. This involves analysis of a corpus of news stories that will be collected for the purpose of this project. Guidance on network science and Linked Data will be provided to get started on the project.
- [1] Zhou, Xinyi, and Reza Zafarani. "Fake news: A survey of research, detection methods, and opportunities." arXiv preprint arXiv:1812.00315 (2018).
- [2] Jiawei Zhang, Bowen Dong, Philip S. Yu. FAKEDETECTOR: Effective Fake News Detection with Deep Diffusive Neural Network. https://arxiv.org
- [3] Thota, Aswini; Tilak, Priyanka; Ahluwalia, Simrat; and Lohia, Nibrat (2018) "Fake News Detection: A Deep Learning Approach," SMU Data Science Review: Vol. 1 : No. 3 , Article 10. Available at: https://scholar.smu.edu/datasciencereview/vol1/iss3/10
Analysing Server Logs for predicting Job Failures. Thesis topic.
Proposers: Rajesh Sharma (rajesh dot sharma ät ut dot ee) and Alina Sirbu (University of Pisa)
Server logs generally refer to files which are created for monitoring the activities being performed on servers. In recent years a lot of research has been performed in analysing server logs for analysing the status of the jobs or tasks that arrive on servers.
In this thesis, you will be analysing logs from Google cluster, which is a is a set of machines responsible for running real Google jobs for example, search queries. The research encompasses the domain of large scale data analytics and machine learning. The main contribution of the thesis includes proposing of model to predict the job failures on servers. Real dataset of Google traces will be provided along with related literature to ramp up the learning process.
- [1] Chunhong Liu et al, 2017 Predicting of Job Failure in Compute Cloud Based on Online Extreme Learning Machine: A Comparative Study 2017
- [2] Andrea Rosa et al, Predicting and Mitigating Jobs Failures in Big Data Cluster 2015
Discriminatory Speech on Digital Media Platforms. Thesis topic.
Proposers: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Christian Simon Ritter (Tallinn University)
This thesis will explore how discriminatory (for example, anti-immigrant) and non-discriminatory (for example, pro-immigrant) groups spread their ideas (emergence and circulation) on digital media platforms (on online social media). In particular, the student will identify some specific number of groups in each category. For social media platforms three platforms will be selected, namely, Instagram, Twitter and Facebook. Drawing on critical social science perspectives on group classifications and boundary maintenance within ethnic and religious online communities, the student will identify discriminatory narratives on digital media platforms. The project will involve mixed-method (qualitative and quantitative) research. By analyzing qualitative and quantitative data in parallel, the project will provide new insights into the circulation of hate (bridging) speech, exclusionary (or inclusive) narratives, and anti-immigration (pro-immigration) discourses on digital media platforms. The outcome of this work possibly could be recommending strategies for more inclusive platform politics.
- Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, Alexander Volfovsky. (2018) Exposure to opposing views on social media can increase political polarization Proceedings of the National Academy of Sciences Sep 2018, 115 (37) 9216-9221; DOI: 10.1073/pnas.1804840115
- Matuszewski, P., & Szabó, G. (2019). Are Echo Chambers Based on Partisanship? Twitter and Political Polarity in Poland and Hungary. Social Media + Society. https://doi.org/10.1177/2056305119837671
- Bracey, G. & Moore, W. (2017) “Race Tests”: Racial Boundary Maintenance in White Evangelical Churches 87(2), 282-302. https://doi.org/10.1111/soin.12174
- Burgess, J. & Matamoros-Fernandez, A. (2016) Mapping sociocultural controversies across digital media platforms: One week of #gamergate on Twitter, YouTube, and Tumblr. Communication Research and Practice 2(1), 79-96. https://doi.org/10.1080/22041451.2016.1155338
Learning Social Representation using Deep Neural Networks. Thesis topic.
Proposers: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Shirin Dora (Ulster University, U.K)
The catalogue of techniques in machine learning is massive but the recent research in this area has spotlighted the immense potential of deep neural networks for solving many problems. Deep learning is a field of machine learning that involves developing learning algorithms for training neural networks with large number of layers. Deep neural networks are presented with a real-valued multidimensional representation of an input and through multiple layer of processing, they learn to extract meaningful information from this input.
The focus of this thesis will be application of deep learning in learning social network representations. A social network is represented as a collection of nodes and edges which connected these nodes. Each node represents a single member of the network and the edges emanating from this node represent the connections of this member. As a result of this information representation mechanism, there is no straightforward way to represent each node using real valued features. This makes it difficult to use machine learning techniques to deal with problems pertaining to social networks like network classification, content recommendation, etc. The problem becomes more complex for large social networks.
To overcome this issue, many researchers focus on developing techniques that learn representations for each node using the information stored in the social network. These representations provide a real-valued multidimensional input for nodes in the social network which can be processed by existing machine learning techniques. These representations have been used for various problems in the area of neural networks. In this thesis, the goal is to leverage the capabilities of deep neural networks to train a neural network to simultaneously learn representations and perform a given social network related task. This generic approach would involve training the neural network on a particular social network problem without worrying about presenting appropriate representations as the onus of learning the suitable representations lies with the neural.
- Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
- Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
Bias and polarization in online media. Thesis topic.
Proposer: Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)
Description: News channel often try to portray news stories from their own perspectives. It has been observed particular about media houses that they are biased towards specific topics, people and political parties. In this thesis, you will be analyzing a set of news stories derived from different news websites (such as BBC, CNN etc). The study will be done with an intention to explore if the news channels are biased towards specific 1) Topics, 2) People or 2) Political parties etc. You will be using data science techniques (such as opinion mining, machine learning) for performing the empirical analysis of your study.
- Robert M. Entman. Media framing biases and political power: Explaining slant in news of Campaign 2008. Journalism. Vol 11, Issue 4, pp. 389 - 408
- David Niven. Bias in the News: Partisanship and Negativity in Media Coverage of Presidents George Bush and Bill Clinton. International Journal of Press and Politics. Vol 6, Issue 3, pp. 31 - 46
Predicting Transaction type to be perfomed by a mobile user
Proposers: Huber Flores and Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)
This projects consists of modeling the different data transferred rate of mobile users based on mobility patterns within a trajectory. Given the following dataset collected (features described below) in the wild, the goal is to estimate the different type of transactions (app usage sessions) and amount of data that can be transferred in a particular transaction type, such that it is possible to predict the transaction type to be perfomed by a mobile user. This prediction is important to extend mobility-based contracts that will ensure that there is enough time to perfom a valid transaction while the user is on the move. Dataset: Dataset features (CellularTraffic_OneWeek is the traffic data collected by an ISP from Shanghai between Aug 1st and Aug 7st 2014). For security reasons, the ID of devices and base station ID are all anonymized.
Exploring Group Mobility. Thesis topic.
Proposers: Huber Flores and Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)
While mobility of individual devices has been widely explored, group mobility is less well understood. In this thesis, we will analyze a large scale dataset to understand group mobility by analysing trajectories of users. Our goal is to identify groups of users that move together between different points of interests in a city. For instance, between train station and a residential area. Dataset will be provided.
Understanding Racism through Twitter. Thesis topic.
Proposers: Rajesh Sharma and Rahul Goel (rajesh dot sharma [ät] ut [dot] ee)
Online social media platforms such as Twitter are often being used by individuals for racial abuse. A report released by Demos (a U.K.-based think tank), found that on an average, there are roughly 10,000 per day racist and ethnic slurs in English being used on Twitter. The important question is how many of them are identified (by algorithms) as racist. In the past, researchers from multiple disciplines worked on the use of hate speech in different types of race (color, gender, etc). In this work, we will focus on the color aspect of racism using Twitter data. In this thesis, we will use machine learning techniques to classify tweets related to color. In particular, we plan to use techniques such as transfer learning on a semi labeled dataset.
- Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter." Proceedings of the NAACL student research workshop. 2016.
- Cisneros, J. David, and Thomas K. Nakayama. "New media, old racisms: Twitter, Miss America, and cultural logics of race." Journal of International and Intercultural Communication 8.2 (2015): 108-127.
- Waseem, Zeerak. "Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter." Proceedings of the first workshop on NLP and computational social science. 2016.
Understanding Veganism through Twitter
Proposers: Rajesh Sharma and Rahul Goel (rajesh dot sharma [ät] ut [dot] ee)
In 2019, it was reported that about 1.16% of the world population is vegan, compared to 0.25% i 2014. Vegans not only believe that it is morally correct to avoid animal products but also believe that it is healthier and better for the environment. Various researches have been done to study the psychology of vegan people, merits and demerits of vegan diets mostly with the help of face-to-face interviews. However, a study to understand the multi-dimensional aspect of vegan activism using social media data is still missing. In this study, we will use Twitter data as a representative of social media platforms to analyze vegan activism.
- Janssen, Meike, et al. "Motives of consumers following a vegan diet and their attitudes towards animal agriculture." Appetite 105 (2016): 643-651.
- Hodson, Gordon, and Megan Earle. "Conservatism predicts lapses from vegetarian/vegan diets to meat consumption (through lower social justice concerns and social support)." Appetite 120 (2018): 75-81.
Digital Detox through Twitter
Proposers: Rajesh Sharma and Rahul Goel (rajesh dot sharma [ät] ut [dot] ee)
Description: Detox is cleaning yourself from everything bad. In the digital era, many people consider mobile phones, social networking sites (such as Facebook, Twitter, Instagram, TikTok, etc) as toxic. One of the reasons for this is to break the pattern of holding your smartphone in your hand at all times, hearing it ring or beep when it isn't (phantom ringing), and not being able to leave the house without it. In this work, we will focus on the digital detox using Twitter data. We will identify the characteristics of users who are looking for a digital detox, their reasons and success rate.
- Syvertsen, Trine, and Gunn Enli. "Digital detox: Media resistance and the promise of authenticity." Convergence (2019): 1354856519847325.
- Goodin, Tanya. OFF. Your Digital Detox for a Better Life. Hachette UK, 2017.
Predicting calls through data science
Proposers: Rajesh Sharma and Rahul Goel (rajesh dot sharma [ät] ut [dot] ee)
Description: Call data records are a rich source of information for inferring hidden aspects of the population. For example, this information is used by various commercial sectors to identify densely populated areas to provide better services. This information can be used by mobile service providers for predicting expected calls so as to use their services better for the users. In this work, we will use time series forecasting with other factors like seasonality, events, etc for predicting the call traffic in Estonia.
- Bianchi, Filippo Maria, et al. "Identifying user habits through data mining on call data records. " Engineering Applications of Artificial Intelligence 54 (2016): 49-61.
- Hiir, Hendrik, et al. "Impact of Natural and Social Events on Mobile Call Data Records–An Estonian Case Study." International Conference on Complex Networks and Their Applications. Springer, Cham, 2019.