Seminar Topics
Topics of Chinmaya Dehury
email: chinmaya.dehury@ut.ee
1. Agent AI behind Cloud Management. In today's digital world, Artificial Intelligence (AI) is everywhere, such as Education, Healthcare, Agriculture, and Defense. Behind the stage, the cloud is providing resources to AI. But, who manages the cloud? Is it just a software package or human? Can we combine the intelligence of both to handle the large pool of resources in the cloud?
- In this topic, we will focus on the following tasks:
- What is cloud resource management?
- What is AI (in brief)? Who is Agent AI?
- Survey of AI tools in cloud resource management.
- How far Agent AI already penetrated managing cloud resources?
- [Optional] What are the current challenges?
- [Optional] Making Agent AI more intelligent.
- What will you learn?
- Basics of Cloud Computing
- What is cloud resources management
- Basic ML algorithms
- Get recent updates on applications of AI in cloud computing
2. AI based cloud resource failure prediction. As we know, today's business is offering cloud-based services to its users, such as Office 365, Netflix, Spotify, Snapchat, Pokémon, etc. The cloud service providers such as Google, Amazon, Microsoft are losing billions due to the cloud outage. So the goal of this topic would be to predict the failures by using AI tools.
- In this topic, we will focus on the following tasks:
- Understanding the Cloud resource failure.
- Finding the reasons behind any failure.
- Gather the dataset related to cloud resource failure.
- Apply ML tools for failure prediction.
- What will you learn?
- Basics of Cloud Computing
- How the cloud resources are distributed among users
- Basic ML algorithms
- Advantage and limitations of basic ML algorithms
3. Predicting Cloud service demands. As we know, most of the frequently used apps such as Instagram, Twitter, Spotify, etc are deployed on cloud environment. Sometimes the usage of such applications is very high and sometimes the usage is very low. But can we predict how heavy an app will be used in the next few hours? In short, what would be the future demand for a cloud-based service? This is the question, we will answer in this topic.
- In this topic, we will focus on the following tasks:
- Find out how the cloud resources are allocated to an app/service.
- Gather the dataset related to the resource usage of different cloud-based applications (2-4 use cases)
- Apply AI tools to predict and verify the result using the dataset.
- What will you learn?
- Basics of cloud computing.
- How are the resources allocated to the applications?
- Basics of ML algorithms.
- How to apply ML tools to predict something?
4. Understanding Cloud usage data. In this topic, we will look into the cloud server usage data, such as number of VMs deployed, percentage of server usage, resource utilizaiton of VMs and physical servers etc. We will gather the data from different cloud service providers, such as Google, Delft University of technology, etc.
- In this topic, we will focus on the following tasks:
- Gathering the related dataset from 4-5 cloud service providers.
- Understand the data and their limitations.
- Apply ML/Scientific tools to understand how the cloud servers are performing.
- Analyze the data to acquire hidden information
- What will you learn?
- Basics of cloud computing
- Will have basic knowledge on cloud infrastructure
- Will receive the knowledge on basic ML tools
- Will gain the knowledge on scientific tools such as SciPy (in Python)
5. Data flow from the user's device to the cloud.
- In general, we will focus on the following tasks:
- Learn how data are uploaded from the user's devices to cloud infrastructure.
- Understand the concept of the data pipeline and ETL.
- Recent updates on data pipeline in the commercial cloud service provider.
- Recent literature survey on data pipeline frameworks/architectures.
- Research challenges in the data pipeline.
- How far we have progressed in maturing the data pipeline.
- What will you learn?
- The concept of data pipeline
- Basics of ETL
- Advantages of Data pipeline
- Data pipeline architecture in AWS and other cloud infrastructure.
6. Survey of Reinforcement learning frameworks. Reinforcement learning is one of three ML paradigms. Here a software agent takes actions by understanding the environment and its experience. For example, finding a path from one location to other, solving a knight-prince problem, etc. There are several frameworks to address different kinds of problem. In this topic, we will study different RL frameworks.
- In this topic, we will focus on the following tasks:
- Understanding the fundamental concept of Reinforcement Learning
- Survey of different RL frameworks (such as OpenAI Gym, DeepMind Lab, Amazon SageMaker RL, Dopamine, etc.)
- What will you learn?
- Basics of AI
- Basics of Reinforcement Learning
- Advantages and limitations of RL frameworks
7. Survey of Reinforcement learning techniques in cloud resource management. Reinforcement learning is one of three ML paradigms. Here a software agent takes actions by understanding the environment and its experience. For example, finding a path from one location to other, solving a knight-prince problem, etc. TIn this topic student will focus on existing RL techniques for cloud resource management.
- In this topic, we will focus on the following tasks:
- Understanding the fundamental concept of Reinforcement Learning
- Basics of cloud computing
- Basics of resource management in cloud computing
- What will you learn?
- Basics of AI
- Basics of Reinforcement Learning
- What is resource management in cloud computing
Topics of Mainak Adhikari
- Auto-scaling of data pipeline in Serverless Environment. A microservice-based application is composed of a set of small services that run within their own processes and communicate with a lightweight mechanism. Nowadays, container-based virtualization technique has emerged in a serverless environment for processing such microservices efficiently. Still, there are some shortages such as resource provisioning and auto-scaling methods to leverage the unique features of the computing nodes for microservices. The auto-scaling can measure the capability of the cloud servers and scaling out or scaling down the resources automatically based on the status of the requests. It addresses two research challenges: i) cost efficiency, by allocating the required resources and ii) time efficiency, by allocating the applications to the available resources with minimum deployment time.
- Quality Testing tools for validating non-functional requirements in Serverless Environment. Quality testing tools is used to validate non-functional requirements such as business logic encoded in microservices and serverless FaaS and data pipelines. For business logic, a continuous testing approach is used, through execution in development stages immediately preceding the actual deployment of software, to help with the detection of performance issues and bugs in software before they are manifested in production. Continuous testing is bootstrapped with the set-up of test cases, as part of a CI/CD on pipeline. As these test cases have to reflect the real usage of the software, one approach to deal with this challenge is to use the information extracted from the production data, in addition to predefined fixed inputs. For data pipelines, users require to model the data flow by defining custom data generation profiles for producing test data with the desired target characteristics to verify adherence to requirements. A challenge is to be able to automate the process of inference of representative workloads from given traces and historical data accounting for advanced properties of the data such as burstiness, and cross-correlation between events and data types in transit.
- Power and Energy efficient task offloading in hierarchical Fog-cloud Environment. Cloud Computing is a scale-based computing model which consists of thousands of servers and consumes an extremely large amount of electricity that will increase the cost of the service provider and harm the environment. However, the Fog devices are distributed globally with limited resource capacity and consume a minimum amount of energy while processing IoT applications. The energy consumption of the resources should directly proportional the CO2 emission rate and temperature of the computing devices. This should also affect the environment. Moreover, unlike VM instances, the containers contain a minimum amount of resources which consume minimum energy. So, energy efficient offloading strategy is an important issue in Fog and cloud environment domain for reducing the energy consumption and minimizes the CO2 emission rate and temperature of the computing devices. One of the energy efficient scheduling strategies is to place the IoT applications to the local computing devices with minimum delay and transmission time.
- Workload prediction and run-time estimation in hierarchical Fog-cloud Environment. The emerging computing devices consist of large-scale heterogeneous devices that are required to meet the QoS parameters. However, understanding the characteristics and pattern of the workload of the computing devices is a critical task to improve the resource utilization and operational conditions of the system. Analysis the workload and predict the target computing device for deploying the tasks based on the realistic parameters such as CPU and memory usage are also urgently needed to investigate the impact of the workload characteristics on the emerging distributed environment including cloud, Fog ad serverless computing. On the other hand, the dynamic nature of the computing resources suggests the performance prediction of the workload should also be dynamic in order to get more accurate target device and estimate the runtime of the user requested IoT application. The performance model could be built based on the past usage of the computing resources, the dynamic monitoring of the performance of the resources, time series analysis of future trends, etc. The ideal scenario is to design the algorithms that are less reliant on runtime estimations and are capable of producing high-quality schedules to improve the prediction technique. This technique may help to find the optimal target computing devices for IoT applications with efficient resource utilization.
- IoT task Scheduling with dwindling resource requirements of the Fog devices. The main goal of the emerging technologies in a distributed environment is to utilize the resources efficiently of the computing devices. The services providers receive the sensed data from various sensors including several contiguous stages, each having a specific size and resource requirements. Each application associated with response time and a deadline. So, the main goal of the IoT task scheduling with dwindling resource requirements is to find an optimal computing device with sufficient resource capacity which should meet the deadline of the tasks with efficient resource utilization. For example, patient monitoring data should be offloaded to the local Fog devices for faster processing with a minimum delay within the QoS constraint.
- Resource prediction and task Scheduling in hierarchical Fog-cloud Environment. Most of the existing scheduling strategies deploy the IoT applications to the computing devices i.e. Fog nodes, containers or VM instances on a suitable cloud server, functions on a serverless environment, without concern about the availability and the current load of the devices. Moreover, the selection of suitable resources for IoT applications based on the QoS requirements is one of the biggest problems due to their dynamic nature. So, to overcome the above-mentioned challenges the resource prediction is one of the important issues for scheduling IoT applications. The resource prediction strategy may help to find a suitable computing device at each time instance for an IoT application based on the resource availability and the workloads of the devices. This may improve the accuracy of the system performance. Another important issue is that the better resource prediction strategy may meet the multiple QoS parameters including minimize computation time and cost, transmission time and delay, and maximize resource utilization while meeting various QoS constraints.
- Hybrid optimization Strategy on Edge/Fog computing. Hybrid optimization strategy with multiple NIMH algorithms are useful for taking an optimal decision for a complex problem. Edge computing has multiple conflicting parameters to take an optimal decision for meeting various research challenges. For example, making an optimal decision by a edge controller for finding a suitable edge device based on (a) congestion of the network and (b) CPU and memory availability of the active edge devices. In such scenarios, a single NIMH algorithm may not generate an optimal decision within a stipulated time periods. Thus, Hybrid optimization strategy with multiple NIMH algorithms are useful by dividing the tasks into multiple categories and optimize the QoS parameters independently for taking an optimal decision in edge environment.
- NIMH/Dynamic/Greedy algorithm with SDN controller for network control. The SDN becomes a promising network controller in edge computing due to its simplified switching and routing management using a centralized controller. Nowadays, the researchers want to incorporate the SDN controller with the NIMH algorithm due to two reasons: (a) Firstly, the CP has the global network and resource monitoring functions. As a result, the switches and routers of the network in the DP can feedback the performance matrices, i.e. packet data loss, queueing delay, resource utilization, link rate, etc.) to the centralized CP in real-time which helps to build a global network. (b) Secondly, CP has a centralized controller and NIMH-based controller can process and update the rules of the flow table based on the upcoming analyzed data from the CP controller. The main research issue in this field is to optimize the collected parametric values using a NIMH/Dynamic/Greedy/deep learning algorithm for better performance of the overall network while improving various performance matrices.
- Topology Control in Edge computing. Topology control is a fundamental research aspect in a dynamic edge computing environment. An edge computing domain is made up with heterogeneous nodes that seamlessly interact with each other when reshaping the network typology. Thus, one of the most challenging task is to relocate the fog devices at optimal locations to repair or augment coverage or they can deploy or reallocate the static sensor nodes or objects. Both of these perspectives require an optimal approach to meet the objectives. NIMH/Dynamic/Greedy/deep learning algorithms help in self-relocation of the sensor nodes or the edge devices based on the user demands which reduces the latency and energy consumption of the data offloading and processing. Moreover, the NIMH/Dynamic/Greedy/deep learning algorithms find the reliable communication networks for data transmission to suitable edge devices with quick succession. This policy reduces the overall transmission time of the real-time applications with minimum network overhead.
- Online resource provisioning for real-time applications. Online resource provisioning for real-time data processing on the local edge devices is one of the prominent research challenges in edge computing. Due to the streaming nature of the real-time applications, knowing the volume of the data sequence in advance for selecting a suitable edge device is not a feasible task for the edge controller. In this regard, the edge controller needs to find an optimal edge device or server for each real-time application in immediate mode using a NIMH/Dynamic/Greedy/deep learning algorithm based on different QoS constraints including resource availability, deadline and budget constraint, etc., without the prior knowledge of the data volume. Moreover, selecting an optimal fog device in online mode can minimize the total queueing time and energy consumption of the applications.
- Advancement of Federation Learning (FL) at edge devices. Unlike traditional ML approaches, the FL with different DL approaches is trained a distributed model such as edge computing based on the data stored in the local edge devices. Afterward, the trained gradient weights are aggregated and transferred to the centralized cloud servers to build a global intelligent edge-cloud model. Finally, the global model is push back to the edge devices for inference purpose. More importantly, during data analysis using FL technique, the data always stay at the local edge device, which can minimize the data leakage and data transmission cost. Due to the above-mentioned challenges of the FL technique at the edge devices and improves the accuracy and speed-up of the network model, FL-aware training at the edge computing is one of the important research challenges. Most of the FL-aware approaches can process hundreds of edge devices in parallel with a synchronizing order to train the edge network. However, due to the limited processing capacity and battery endurance of the edge devices with different types of offloading and scheduling strategies make it difficult to synchronize between the edge devices in each iteration. Moreover, due to the limited battery power, some of the edge devices can be in sleep state which may cause infrequent task training. As a result, asynchronous-aware FL technique at edge devices is one possible solution to overcome the above-mentioned bottlenecks. Thus, adjusting FL at different edge devices and train the network is still a research challenge in edge computing.
- Distributed DRL on edge computing. Although the DRL methodology is a powerful ML tool to extract network information and resource utilization of the edge devices dynamically, it can bring more burden to analyze the performance matrices on a single gateway or edge device. While by decreasing the complexity of the DRL methodology and minimizes the overhead of the running gateway or edge device, the researchers want to distribute the load among multiple gateway or edge devices using distributed DRL methodology. During distributed DRL methodology, the performance parameters of the network and edge devices are distributed among near by smart gateway devices which are analyzed independently by the distributed methodology. The gradient NN layer should update the weight of the layer to create a group with neighboring gateway and edge devices based on the proper allocation of the neural cells. Moreover, the gateway and edge devices analyzed the input QoS parameters and exchange output data over other edge devices with secure communication protocols. Such parameters optimize different performance matrices of the network and edge devices. As a result, the distributed DRL methodology converges into a stable result for real-time data offloading.
Real-time data processing -- Pelle Jakovits (Responsible person)
- Orchestrating complex Data Pipelines processing real-time IoT data. Student should investigate existing Data pipeline orchestration frameworks (such as Apache Nifi) and resent literature on this topic, which concentrate on managing IoT data flows fusing data from a large number of geographically distributed data sources and which may require deploying data processing tasks at different distances from the data sources (Fog Computing scenario).
- Real time vs micro-batching in streaming data processing: performance and guidelines Typically, stream processing frameworks buffer incoming data and process them in batches. But newer stream processing frameworks (such as Apache Storm) allow processing any incoming data objects in real time. Task of the student is to give an overview of the newest advances in Stream processing, compare the performance of real-time vs micro-batching engines for different use-cases. The student should also investigate which data or use case specific characteristics should be considered when choosing between the respective streaming data processing approaches. In addtition, student should also look into Structured Streaming, which is a new stream processing abstraction built ontop of the Spark SQL engine.
- Stream data processing on resource constrained devices - With the increasing amount of data to be collected and processing from IoT data sources, it becomes more and more expensive to simply stream all data for cloud-side data processing. Depending on specific scenarios, it may be beneficial to pre-process the data as close to its source as possible. However, there are typically limitations on how much or how powerful computing resources are available in such cases. Student should study existing solutions which aim to solve such issues, give an overview of them and demonstrate example scenarios and solutions if possible.
- Real-time Visualization of streaming data - Student should perform a literature study and present in seminar what are the newest advances, best practices and available solutions for visualizing large scale streaming data in real-time. In the case of available open source visualization tools suitable in the context of this topic, student should demonstrate real-time data visualization on a an illustrative scenario.
- Real-time visitor count estimation in lecture rooms - The Delta Building is a new building to house the Institute of Computer Science. Its construction is to be finished in 2020. There are plans for a number of different modern sensors to be placed in the building. The Computer Graphics and Virtual Reality lab’s students are working on a real-time visualization of the people and activities inside the building. For that purpose there is a desire to know how many people occupy each room (including the hallways) at any given moment. The goal of this topic is to study the state-of-the-art of sensor analytics or image processing (or fusion) and to propose a usable approach for real-time visitor count estimation in lecture rooms.
Cloud Computing Frameworks -- (Pelle Jakovits)
- Comparison of State-of-the-Art (SOTA) open source IoT platforms, focusing on platforms that provide features for both device integration and data storage. Theoretical part involves a survey into the most popular SOTA platforms. Practical part involves setting up and demonstrating a selected candidate platform.
- Docker performance aspects when running large number of small docker containers on a resource constrained devices.
- Docker based device integration in Cumulocity: issues and challenges
- Real-time event processing in Cumulocity: limitations, issues and performance.
- Viability of Serverless - Performance of FaaS cloud applications in comparison to micro-service and monolithic applications in real life scenarios
- Service-mesh based security of cloud applications - using service-mesh and security policies to secure cloud applications composed of micro-services
Topics by Shivananda R Poojara
Topic 1: Accelerating the faas-federation in edge environment The given topic aims in development of api to connect multiple faas providers across edge environment. This means to connect two or more different FaaS provider types together. For instance: Kubernetes (faas-netes) and Lambda (faas-lambda). This means you can have a single, centralized control-plane but deploy to both AWS Lambda and Kubernetes at the same time. The contribution can be made to open source project https://github.com/openfaas-incubator/faas-federation.
Topic 2: Intelligent Epilepsy seizure prediction using fog/edge computing environments. It aims to design a system to predict the occurrences of seizure in epilepsy patients by collecting various data from mobile phone and other behavioural parameters. Prediction can make use of existing novel machine learning algorithms. To detect the seizures on the fly, may be at work or travelling or at any point of time, fog environment can be used as computation infrastructure to process the prediction tasks. This type of applications are sensitive, requires more attention to get earlier results and respond immediately. So, service latency and reliability parameters are important to be considered to schedule the computation tasks on fog environment. It also aims to design, develop efficient scheduling algorithm on fog environment and use the power of serverless.
Topic 3: Minimizing the execution cost by faas-federation in edge environment The given topic aims in design of efficient faas-federation strategy to minimize the execution cost of application. This can be achieved by efficient allocation of serverless functions to different faas providers in real time processing across edge environments. The faas providers may include local faas clusters or commercial faas providers like aws lamda or azure functions.
Topic 4: Recent trends in health care edge analytics.
Topic 5: Evolution of edge analytics and an industry approach.
Topic 6: Role of devops in edge analytics.
Topics by Jakob Mass
Overview and comparison of IoT device management frameworks/firmwares
Management of a large number of end devices (sensors, actuators) can become tedious without some additional framework. These frameworks typically aid with functions such as:
- updating configuration of devices (e.g. how often to publish data, which server to publish to)
- updating the code (pushing new firmware over the internet)
- remote logging
- managing duty cycles of the devices
In this topic, the student should try out the below 2 platforms hands-on with ESP32 or similar development boards and present a comparison of them.