Seminar Topics

Topics of Chinmaya Dehury

1. Agent AI behind Cloud Management. In today's digital world, Artificial Intelligence (AI) is everywhere, such as Education, Healthcare, Agriculture, and Defense. Behind the stage, the cloud is providing resources to AI. But, who manages the cloud? Is it just a software package or human? Can we combine the intelligence of both to handle the large pool of resources in the cloud?

In this topic, we will focus on the following tasks:
- What is cloud resource management?
- What is AI (in brief)? Who is Agent AI?
- Survey of AI tools in cloud resource management.
- How far Agent AI already penetrated managing cloud resources?
- [Optional] What are the current challenges?
- [Optional] Making Agent AI more intelligent.
What will you learn?
- Basics of Cloud Computing
- What is cloud resources management
- Basic ML algorithms
- Get recent updates on applications of AI in cloud computing

2. AI based cloud resource failure prediction. As we know, today's business is offering cloud-based services to its users, such as Office 365, Netflix, Spotify, Snapchat, Pokémon, etc. The cloud service providers such as Google, Amazon, Microsoft are losing billions due to the cloud outage. So the goal of this topic would be to predict the failures by using AI tools.

In this topic, we will focus on the following tasks:
- Understanding the Cloud resource failure.
- Finding the reasons behind any failure.
- Gather the dataset related to cloud resource failure.
- Apply ML tools for failure prediction.
What will you learn?
- Basics of Cloud Computing
- How the cloud resources are distributed among users
- Basic ML algorithms
- Advantage and limitations of basic ML algorithms

3. Predicting Cloud service demands. As we know, most of the frequently used apps such as Instagram, Twitter, Spotify, etc are deployed on cloud environment. Sometimes the usage of such applications is very high and sometimes the usage is very low. But can we predict how heavy an app will be used in the next few hours? In short, what would be the future demand for a cloud-based service? This is the question, we will answer in this topic.

In this topic, we will focus on the following tasks:
- Find out how the cloud resources are allocated to an app/service.
- Gather the dataset related to the resource usage of different cloud-based applications (2-4 use cases)
- Apply AI tools to predict and verify the result using the dataset.
What will you learn?
- Basics of cloud computing.
- How are the resources allocated to the applications?
- Basics of ML algorithms.
- How to apply ML tools to predict something?

4. Understanding Cloud usage data. In this topic, we will look into the cloud server usage data, such as number of VMs deployed, percentage of server usage, resource utilizaiton of VMs and physical servers etc. We will gather the data from different cloud service providers, such as Google, Delft University of technology, etc.

In this topic, we will focus on the following tasks:
- Gathering the related dataset from 4-5 cloud service providers.
- Understand the data and their limitations.
- Apply ML/Scientific tools to understand how the cloud servers are performing.
- Analyze the data to acquire hidden information
What will you learn?
- Basics of cloud computing
- Will have basic knowledge on cloud infrastructure
- Will receive the knowledge on basic ML tools
- Will gain the knowledge on scientific tools such as SciPy (in Python)

5. Data flow from the user's device to the cloud.

In general, we will focus on the following tasks:
- Learn how data are uploaded from the user's devices to cloud infrastructure.
- Understand the concept of the data pipeline and ETL.
- Recent updates on data pipeline in the commercial cloud service provider.
- Recent literature survey on data pipeline frameworks/architectures.
- Research challenges in the data pipeline.
- How far we have progressed in maturing the data pipeline.
What will you learn?
- The concept of data pipeline
- Basics of ETL
- Advantages of Data pipeline
- Data pipeline architecture in AWS and other cloud infrastructure.

6. Survey of Reinforcement learning frameworks. Reinforcement learning is one of three ML paradigms. Here a software agent takes actions by understanding the environment and its experience. For example, finding a path from one location to other, solving a knight-prince problem, etc. There are several frameworks to address different kinds of problem. In this topic, we will study different RL frameworks.

In this topic, we will focus on the following tasks:
- Understanding the fundamental concept of Reinforcement Learning
- Survey of different RL frameworks (such as OpenAI Gym, DeepMind Lab, Amazon SageMaker RL, Dopamine, etc.)
What will you learn?
- Basics of AI
- Basics of Reinforcement Learning
- Advantages and limitations of RL frameworks

Topics of Mainak Adhikari

Auto-scaling of data pipeline in Serverless Environment. A microservice-based application is composed of a set of small services that run within their own processes and communicate with a lightweight mechanism. Nowadays, container-based virtualization technique has emerged in a serverless environment for processing such microservices efficiently. Still, there are some shortages such as resource provisioning and auto-scaling methods to leverage the unique features of the computing nodes for microservices. The auto-scaling can measure the capability of the cloud servers and scaling out or scaling down the resources automatically based on the status of the requests. It addresses two research challenges: i) cost efficiency, by allocating the required resources and ii) time efficiency, by allocating the applications to the available resources with minimum deployment time.
Quality Testing tools for validating non-functional requirements in Serverless Environment. Quality testing tools is used to validate non-functional requirements such as business logic encoded in microservices and serverless FaaS and data pipelines. For business logic, a continuous testing approach is used, through execution in development stages immediately preceding the actual deployment of software, to help with the detection of performance issues and bugs in software before they are manifested in production. Continuous testing is bootstrapped with the set-up of test cases, as part of a CI/CD on pipeline. As these test cases have to reflect the real usage of the software, one approach to deal with this challenge is to use the information extracted from the production data, in addition to predefined fixed inputs. For data pipelines, users require to model the data flow by defining custom data generation profiles for producing test data with the desired target characteristics to verify adherence to requirements. A challenge is to be able to automate the process of inference of representative workloads from given traces and historical data accounting for advanced properties of the data such as burstiness, and cross-correlation between events and data types in transit.
IoT task Scheduling with dwindling resource requirements. The main goal of the emerging technologies in a distributed environment is to utilize the resources efficiently of the computing devices. The services providers receive the sensed data from various sensors including several contiguous stages, each having a specific size and resource requirements. Each application associated with response time and a deadline. So, the main goal of the IoT task scheduling with dwindling resource requirements is to find an optimal computing device with sufficient resource capacity which should meet the deadline of the tasks with efficient resource utilization. For example, patient monitoring data should be offloaded to the local Fog devices for faster processing with a minimum delay within the QoS constraint.
Power and Energy efficient task offloading in hierarchical Fog-cloud Environment. Cloud Computing is a scale-based computing model which consists of thousands of servers and consumes an extremely large amount of electricity that will increase the cost of the service provider and harm the environment. However, the Fog devices are distributed globally with limited resource capacity and consume a minimum amount of energy while processing IoT applications. The energy consumption of the resources should directly proportional the CO2 emission rate and temperature of the computing devices. This should also affect the environment. Moreover, unlike VM instances, the containers contain a minimum amount of resources which consume minimum energy. So, energy efficient offloading strategy is an important issue in Fog and cloud environment domain for reducing the energy consumption and minimizes the CO2 emission rate and temperature of the computing devices. One of the energy efficient scheduling strategies is to place the IoT applications to the local computing devices with minimum delay and transmission time.
Workload prediction and run-time estimation in hierarchical Fog-cloud Environment. The emerging computing devices consist of large-scale heterogeneous devices that are required to meet the QoS parameters. However, understanding the characteristics and pattern of the workload of the computing devices is a critical task to improve the resource utilization and operational conditions of the system. Analysis the workload and predict the target computing device for deploying the tasks based on the realistic parameters such as CPU and memory usage are also urgently needed to investigate the impact of the workload characteristics on the emerging distributed environment including cloud, Fog ad serverless computing. On the other hand, the dynamic nature of the computing resources suggests the performance prediction of the workload should also be dynamic in order to get more accurate target device and estimate the runtime of the user requested IoT application. The performance model could be built based on the past usage of the computing resources, the dynamic monitoring of the performance of the resources, time series analysis of future trends, etc. The ideal scenario is to design the algorithms that are less reliant on runtime estimations and are capable of producing high-quality schedules to improve the prediction technique. This technique may help to find the optimal target computing devices for IoT applications with efficient resource utilization.
IoT task Scheduling with dwindling resource requirements of the Fog devices. The main goal of the emerging technologies in a distributed environment is to utilize the resources efficiently of the computing devices. The services providers receive the sensed data from various sensors including several contiguous stages, each having a specific size and resource requirements. Each application associated with response time and a deadline. So, the main goal of the IoT task scheduling with dwindling resource requirements is to find an optimal computing device with sufficient resource capacity which should meet the deadline of the tasks with efficient resource utilization. For example, patient monitoring data should be offloaded to the local Fog devices for faster processing with a minimum delay within the QoS constraint.
Resource prediction and task Scheduling in hierarchical Fog-cloud Environment. Most of the existing scheduling strategies deploy the IoT applications to the computing devices i.e. Fog nodes, containers or VM instances on a suitable cloud server, functions on a serverless environment, without concern about the availability and the current load of the devices. Moreover, the selection of suitable resources for IoT applications based on the QoS requirements is one of the biggest problems due to their dynamic nature. So, to overcome the above-mentioned challenges the resource prediction is one of the important issues for scheduling IoT applications. The resource prediction strategy may help to find a suitable computing device at each time instance for an IoT application based on the resource availability and the workloads of the devices. This may improve the accuracy of the system performance. Another important issue is that the better resource prediction strategy may meet the multiple QoS parameters including minimize computation time and cost, transmission time and delay, and maximize resource utilization while meeting various QoS constraints.

Topics of Mohan Liyanage

The Web of Things (WoT) is a high-level application protocol designed to maximize interoperability within the IoT. The WoT can be considered a refinement of the IoT to integrate smart things into the Internet and web architecture.

Study the WoT architecture and develop a fully functional Web of Things framework (Reference: "Building the Web of Things" by Dominique Guinard)
Develop an Android application reading QR, barcodes and an Arduino application reading NFC tags that based on the EVRYTHNG API to track items and give them an active digital identity on the Web
Mozilla WebThings is an open platform for monitoring and controlling devices over the web. It is an open source implementation of emerging Web of Things standards at the W3C. Setup a Mozilla's proposed WebThings Gateway for smart home that allows users to directly monitor and control their smart home over the web.

Real-time data processing -- Pelle Jakovits (Responsible person)

Orchestrating complex Data Pipelines processing real-time IoT data. Student should investigate existing Data pipeline orchestration frameworks (such as Apache Nifi) and resent literature on this topic, which concentrate on managing IoT data flows fusing data from a large number of geographically distributed data sources and which may require deploying data processing tasks at different distances from the data sources (Fog Computing scenario).
Real time vs micro-batching in streaming data processing: performance and guidelines Typically, stream processing frameworks buffer incoming data and process them in batches. But newer stream processing frameworks (such as Apache Storm) allow processing any incoming data objects in real time. Task of the student is to give an overview of the newest advances in Stream processing, compare the performance of real-time vs micro-batching engines for different use-cases. The student should also investigate which data or use case specific characteristics should be considered when choosing between the respective streaming data processing approaches. In addtition, student should also look into Structured Streaming, which is a new stream processing abstraction built ontop of the Spark SQL engine.
Stream data processing on resource constrained devices - With the increasing amount of data to be collected and processing from IoT data sources, it becomes more and more expensive to simply stream all data for cloud-side data processing. Depending on specific scenarios, it may be beneficial to pre-process the data as close to its source as possible. However, there are typically limitations on how much or how powerful computing resources are available in such cases. Student should study existing solutions which aim to solve such issues, give an overview of them and demonstrate example scenarios and solutions if possible.
Visualizing streaming data Student should perform a literature study and present in seminar what are the newest advances, best practices and available solutions for visualizing large scale streaming data. In the case of available open source visualization tools suitable in the context of this topic, student should demonstrate real-time data visualization on a an illustrative scenario.
Real-time visitor count estimation in lecture rooms - The Delta Building is a new building to house the Institute of Computer Science. Its construction is to be finished in 2020. There are plans for a number of different modern sensors to be placed in the building. The Computer Graphics and Virtual Reality lab’s students are working on a real-time visualization of the people and activities inside the building. For that purpose there is a desire to know how many people occupy each room (including the hallways) at any given moment. The goal of this topic is to study the state-of-the-art of sensor analytics or image processing (or fusion) and to propose a usable approach for real-time visitor count estimation in lecture rooms.

Cloud Computing Frameworks -- Pelle Jakovits (Responsible person)

Docker performance aspects when running large number of small docker containers.
Docker based device integration in Cumulocity: issues and challenges
Real-time event processing in Cumulocity: limitations, issues and performance.
Viability of Serverless - Performance of FaaS cloud applications in comparison to micro-service and monolithic applications in real life scenarios
Service-mesh based security of cloud applications - using service-mesh and security policies to secure cloud applications composed of micro-services

Edge analytics, 5G IoT hardware and Smart Cities (Alo Peets)

Real life-usage of 5G IoT hardware LPWAN (NB-IoT, SigFox, LoRa etc.. ) and real-life testing results

[IP:1] Description and selection of two (2) LPWAN hardware for further testing
[IP:2] Technical description how to connect and use LPWAN devices (description of actually completed tasks)
[IP:3] Presentation of results and analysis how to implement LPWAN devices in smart solutions

Review of edge-analytics use-cases in smart cities

[IP:1] Extensive review of research papers and internet searches of smart cities around the world (present few most interesting cases)
[IP:2] Deep technical analysis of smart city solutions and use-cases that use edge-analytics or could be impoved by using edge-analytics
[IP:3] Review paper and comparison table (list) of smart city usecases that illustrate how edge-analytics could(have) improve(d) smart-city solutions

Mobiili- ja pilvearvutuse seminar 2019/20 sügis