Cloud Computing - Courses - Institute of Computer Science

Practice 6 - Load balancing

In this lab, we will look at load-balancing web applications running on top of IaaS cloud instances. We will use Nginx as a load balancer and try out different load-balancing algorithms (such as round-robin, weighted round-robin, IP-hash, etc.). You will set up three instances: Two simple web servers and one for the load balancer. The goal is to learn how load balancing works in the cloud, how to set up load generation using Locust for simple performance testing, and how to configure different load balancing strategies. For the web application, we will use a simple Python Flask webserver application with Redis in-memory database.

Exercise 6.1. Setting up an Application Server instances

We will create two OpenStack instances and install the example Web application inside them. We will use a Flask application that tracks how many users have visited the page and from which IPs the user requests originated.

NB! Before starting, make sure you have deleted the VMs that you have created in the previous labs.

Log into OpenStack: https://stack.cloud.hpc.ut.ee/auth/login/?next=/project/
Create two VM instances from the VM Image: ubuntu22.04
Choose the following VM flavors:
- Application Server 1 Flavour: m3.nano - m3 uses a newer generation hardware.
- Application Server 2 Flavour: m1.xsmall - m1 is based on the oldest generation hardware.
  - If this time is not available, then use m2.tiny instead
  - Make sure to enable the Delete Volume on Instance Delete option
Make sure to use the proper naming convention for the images.
- Should start with "Lab6", contain your last name, and should be named so that you can easily differentiate them.
Log into the instances through SSH
Install the required software packages on the two instances:
- Refresh the list of software packages: sudo apt update
- Install the following Ubuntu packages: sudo apt install redis-server
  - You may also need to install pip and flask packages. Follow the same steps that you have used in the previous labs to prepare flask applications.

About the Python Application and Redis

This web app checks the IP address of who is making the request and stores/increments counter values in a hashmap, tracking how many times each IP address has visited (key: IP address, value: no. of requests) the application. The HashMap is stored in the Redis in-memory data structure store. In this lab, we are simply using Redis as a database, but Redis is designed for distributed use, meaning it can be used as a fast cache/database/message broker shared by multiple machines in a cluster.

Prepare & run the application on both of the servers
- Clone the sample code of the application git clone https://shivupoojar87@bitbucket.org/shivupoojar87/2023lab6app.git
- Create and run the Flask application similarly to how we have run it in the previous exercises.
- Don't forget to install the Python packages specified in the requirements.txt file using pip install command
Access your instance through a web browser using its Public IP and port 5000 to check that the website is working as required in both of the VMs
Modify ./templates/index.html to make the web page more personal to you.
- How you modify it is up to you, but your full name should at least be present on the page. It would also be good to make each application server look a bit different from the other; this may help with testing different load-balancing outcomes later.
- Background color could be added to make a difference between two application servers so you can recognize them when accessing the load balancer.
If at any point you need to clear your entire Redis storage, you can run this command:
- redis-cli FLUSHDB

Exercise 6.2. Setting up load balancer instance and balancing multiple application servers

In this exercise, we will set up another instance, install a NginX load balancer, and configure it to distribute incoming user requests between the two application servers.

A Load balancer (LB) distributes all user requests to a web page across multiple application servers. Load balancers are used to increase the capacity (concurrent users) and reliability of applications and are a required component for auto-scaling: meaning changing the number of application servers dynamically based on the number of active users.

(image taken from https://www.nginx.com/resources/glossary/load-balancing/)

We are using Nginx (http://nginx.org/en/) load balancer.
Create a new instance (Should also be Ubuntu 22.04) for the load balancer
- Make sure to select the "Delete Volume on Instance Delete" option in the Source tab
- Use m3.nano flavor
Install Nginx on the instance
- Refresh the list of software packages: sudo apt update
- Install the following Ubuntu package: nginx-full
Modify Nginx configuration to add your application server IP to the list of managed services.
- Download the example nginx load balancer configuration file: wget -O default "https://courses.cs.ut.ee/2023/cloud/spring/Main/Practice6?action=download&upname=default"
- Overwrite the default nginx site configuration on the instance with the downloaded configuration file
  - sudo cp default /etc/nginx/sites-enabled/default
- Modify the default configuration file sudo nano /etc/nginx/sites-enabled/default
  - Find the upstream lab6 block
  - Modify the example server 172.17.64.109:5000; lines in the configuration file.
  - Replace both entries with the IP addresses of your application servers.
Reload the Nginx service:
- sudo service nginx reload
- Remember to run this command again every time you modify the Nginx configuration file
Visit the IP address of the load balancer using a web browser and verify that it displays your any of the application servers
- You do not need to specify the port. Nginx will work on port 80 (not 5000) and route the traffic to one of the application servers and the correct port.
You can also check current incoming HTTP connections to your Application Server using the following command: netstat | grep http

Exercise 6.3. Generating additional user traffic for benchmarking

Let us check how well our load-balanced system can handle a higher number of simultaneous user requests and how the generated user requests are distributed between the applications server.

Your task is to generate many concurrent user requests to the load balancer address and investigate how many of those requests are routed to each of your application servers. We will also try out different load-balancing algorithms in the next exercise.

We will use Locust to generate a large number of user requests, which is a Python-based framework for simulating users and generating web traffic.

NB! Locust can be installed either on your PC/laptop or the load balancer OpenStack VM.

Let's install Locust and generate web user requests.
- Install Locust as a pip package pip3 install locust
Creating web traffic using the locust tool requires writing a locustfile.py Python file and running it.
- An excellent tutorial for creating locustfile.py can be found here: http://docs.locust.io/en/stable/quickstart.html
- Let's prepare locustfile.py for our web application testing, as shown below.
  - @task is a web task executed by the locust tool.
  - In the following code, tasks invoke the endpoint / of a web server.
  - stages contain the load generation specification with the following parameters:
    - duration -- How long should the request generation last. For example, 60 seconds.
    - users -- Total number of simultaneous users that will send/generate requests.
    - spawn_rate -- Number of users to add per second. Allows to ramp up the request generation.
```
from locust import User, HttpUser, TaskSet, events, task, constant
from locust import LoadTestShape

class HelloWorldUser(HttpUser):
    @task
    def hello_world(self):
        self.client.get("/")
    wait_time = constant(1)

class StagesShape(LoadTestShape):
    stages = [{'users': 100, 'duration': 60, 'spawn_rate': 10}]

    def tick(self):
        run_time = self.get_run_time()
        for stage in self.stages:
            if run_time < stage["duration"]:
                tick_data = (stage["users"], stage["spawn_rate"])
                return tick_data
        return None
```
- Run the locust script in the terminal/Windows PowerShell of your machine using python3 -m locust -f locustfile.py, and you should see the output below

Locust provides a web interface to generate web traffic.
- It can be accessed at port 8089.
  - For example, if you deploy Locust on your PC, its address would be: http://localhost:8089.
  - If you run Locust on an OpenStack VM, replace localhost with the actual IP address of the VM
Locust web interface should look something like this:

Change the Host to http://VM_IP:80 where VM_IP is the IP of your nginx load balancer server
Click on Start Swarming to start the load generation process
Open the Flask Application VM terminals. The Flask app output should display the invocations of the application service where the load balancer routed the request.

Open the Locust web user interface and wait for the load test to be completed.
- Note down how many requests ended up in the load test. You can get this information from # Requests from the dashboard.
- Answer the following question:
  - Question 1: What percent of generated user requests visited each of your servers? What was the distribution between Server 1 and Server 2? If the percentage is not 50%, why do you think that is so?
    - The total number of user requests sent to each application server can be estimated by noting down the counter values before and after the experiment on both Application server pages.
- Also, check the response time graph by click on Chart-->Response Time (ms), that shows two metrics:
  - 95% percentile response time: It indicates a response time of 95% of the user requests.
  - Mean response time: It indicates the mean response time of all user requests in a given time.

Download the test report from locust UI and name it "Task6_3_report.html". (To download the report, go to Locust UI-->Download Data-->Download Report).

Exercise 6.4. Working with load-balancing techniques using the Nginx service

In this exercise, we are going to learn different load-balancing techniques that are supported in the Nginx load balancer. Load balancing is a technique to distribute incoming user requests across multiple application servers. By default, the round-robin technique is used in the Nginx load balancer. The list of different strategies are listed below

Round Robin and weighted Round Robin: Distributes requests across the upstream servers in order.
Least Connections and weighted Least Connections: Forwards requests to the server with the lowest number of active connections.
Least Time and weighted Least Time: Forwards requests to the least loaded server based on a calculation that combines response time and the number of active connections. (Mostly used in commercial version NGINX Plus)
IP Hash (based on client IP address) and weighted IP Hash: Distributes requests based on the first three octets of the client IP address.
Hash (on specified request characteristics): Distributes requests based on a specified key, for example, client IP address or request URL.
Random with Two Choices: Pick two servers at random and forward the request to the one with the lower number of active connections.

You can also look at here for further information on available load-balancing approaches.

By default, nginx uses the Round robin approach, which we have experienced in the previous tasks.

Now let us work with the weighted load balancing technique:

Weighted round-robin configuration is accomplished by appending a weight value to the end of each entry in the server group section of the NGINX site configuration file. Set the weight of your slowest server to 1, and then set the weight of other servers relative to that setting. One server receives twice the amount of traffic, while the other server receives four times the amount.
Let's modify the Nginx configuration sudo nano /etc/nginx/sites-enabled/default by assigning weights to each application server. Assign weights to each server in the upstream. For example server server1.example.com weight=1; and server server2.example.com weight=2;
Reload the nginx server.
Move on to locust web UI, Click on New Test and start swarming.
Wait for the load test to complete and answer the question
- Question 2: What percent of your generated user requests visited each of your servers?
  - Also, provide information on what weights you assigned to the two servers with VM flavor types: m3.nano and m1.xsmall.
Take a screenshot of the load balancer while the application server 1 page is visible
Take a screenshot of the load balancer while the application server 2 page is visible

Exercise 6.5. Comparing the response time w.r.t different load-balancing techniques.

The Flask web application used in previous tasks performs no compute or memory-intensive operations. We will modify the application by adding a compute-intensive task and checking the response time of user requests with regard to various load-balancing techniques.

Task 6.5.1: Modify the web application

We will add code to calculate the value Pi for 1000 decimal points.

Add some Pi calculation code to the Python application on BOTH application servers
You can find a Python implementation for computing PI from here:
- https://levelup.gitconnected.com/generating-the-value-of-pi-to-a-known-number-of-decimals-places-in-python-e93986bb474d (Scroll to Writing Chudnovsky algorithm in python codes)
- Add the Pi calculation code as the method def compute_pi(n): in app.py and call it in the request handling part (the home() function ) with compute_pi(1000).
- Also, update the index.html template file so that it outputs the calculated value of Pi as well (you need to pass the value to the template as an argument to the render_template() at the end of home() ).
  - To make the calculated PI value fit into the HTML page, you can put inside some HTML element and add style that specifies element length and word-wrapping. For example, you can use something like: <h2 style= "width: 1200px; word-wrap: break-word;"> ... </h2>
- As a result, your application should now also print out a very long number that represents the first 1000 digits of Pi.

Task 6.5.2: Configure load balancer to use the least_conn load-balancing technique

Configure Nginx to use the least_conn; load balancing strategy.
Restart the Locust.
Start the load generation in locust Web UI and track how many user requests end up in Application server 1 vs Application server 2.
You should also check the CPU utilization using top command on both application server instances while the requests are being sent to the instances and answer the following questions:
- Question 3: What is the CPU load value on both instances? What does this indicate regarding the performance of these cloud instance types?
- Question 4: When using the least_conn; load balancing strategy, what percent of user requests are sent to the first and second application servers?
- Question 5: Which of these instances is more capable in terms of responding to user requests?
Download the test report from Locust UI and name it Task6_5_least_conn.

Task 6.5.3: Comparing the Round Robin and weight-based load balancing techniques.

In this subtask we will compare the results of using two of the previously used load-balancing strategies.

Configure Round robin technique (basically remove least_conn from the default configuration file and reload nginx)
Generate the load by click on New Test--> Start Swarming
Wait until the load test is completed.
Download the test report from locust UI and name it Task6_5_round_robin.html for the deliverable.
Repeat the same procedure for the weight-based technique. Assign a smaller weight to the m1.xsmall type VM and a larger weight to the m3.nano type VM.
Download the test report from Locust UI and name it Task6_5_weight_based.html for the deliverable.
Answer the following question:
1. Question 6: Which of these two techniques has the overall best minimum, maximum, and average response time? Why do you think it is so?
Also, retake the same two screenshots (now with updated counters after all the performed experiments):
1. Take a screenshot of the load balancer while the application server 1 page is visible
2. Take a screenshot of the load balancer while the application server 2 page is visible

Deliverables

DO NOT leave your instances running after submitting the solution. This lab requires several VM CPU cores per student, which may mean some students can not start lab exercises until previous students have shut down their instances.
Submit deliverables from the exercise:
1. Exercise 6.3: Answer to Q1, and report Task6_3_report.html
2. Exercise 6.4: Answer to Q2, 2 screenshots.
3. Exercise 6.5.2: Answer to Q3, Q4, Q5 and report Task6_5_least_conn.html
4. Exercise 6.5.3: Answer to Q6 and reports Task6_5_round_robin.html and Task6_5_weight_based.html, 2 screenshots.

You must be logged in and registered to the course in order to submit solutions.

In case of issues, check these potential solutions to common issues:

If you get errors about permissions when running command line commands inside the instance:
- In several exercises you are asked to modify, delete or edit files outside your home folder.
- You will have to use sudo command to elevate file manipulation command permissions to be able to complete such operations.
- For example: sudo nano /etc/nginx/sites-enabled/default will run nano command under root user with elevated permissions.
- NB! But be careful, not everything should be run through sudo!

To check nginx logs:
- Check the content of files in /var/log/nginx/
- Check service output:
  - sudo service nginx status

If you get "Too many authentication failures" trying to ssh into openstack instances, try ssh -o IdentitiesOnly=yes -i path_to_key_file.pem ubuntu@vm_ip

Cloud Computing 2023/24 spring