Practice 10 - Cloud Monitoring
Monitoring helps to keep track of important events and metrics so that cloud nodes, services, and cloud application downtime or other anomalies are noticed immediately. Administrators can be notified of abnormal behaviors by generating alerts and sending notifications using email, SMS, or messages to Zulip or Slack through webhooks.
In this practice session, we look into Azure monitoring for analyzing the logs and metrics of our App service (Message Board application). We also investigate Blob and COSMOS database service logs and metrics.
Azure Monitor is a monitoring solution for collecting, analyzing, and responding to monitoring data from cloud services. We can use Azure Monitor to maximize the availability and performance of your applications and services. Azure Monitor Logs is based on Azure Data Explorer, and log queries are written using the Kusto Query Language (KQL).
The following image shows a Logical model of how application insights can be gathered with Azure Monitor. The logic model diagram visualizes components of Application Insights and how they interact.
References
- Azure Monitor documentation: https://learn.microsoft.com/en-us/azure/azure-monitor/overview
- KQL: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/
Exercise 10.1 Get ready with the Message Board application
We will use the Message Board application created in Practice 5. Deploy the same application into the Azure App Service.
- All resources should be inside the same resource group.
- You can reuse existing resources
- If you create a new resource group, name it
lab10
- Make sure to create an Azure Blob container
- Make sure to create COSMOS database containers
- Deploy the application on Azure App Service
- You can use the same approach followed in Practice 5 or through the web-based deployment.
- Make sure to add environment variables (COSMOS_URL, MasterKey, STORAGE_ACCOUNT, CONN_KEY) in Azure App Service under Configuration or in Environment variables.
- Make sure you have prefix APPSETTING_ while accessing environment variables
Exercise 10.2 Enable Application Insight for the Azure resources
In this task, we use Azure Application Insights, which is a feature of Azure Monitor used for Application Performance Management (APM) for live applications and cloud services. We enable the Application Insights for our web application, Cosmos DB, and Blob storage service.
Enable the Application Insight feature for all the resources used in your web application.
- Open the Web App that you created
- Go to Settings -> Application Insights
- Click on button Turn on Appli...
- Click on Enable
- Select Create New Resource
- Keep the
Resource Name
as it is for this Application Insight resource - Make sure that the location is
North Europe
- Make sure it is created inside the same resource group.
- Click on Apply
- Keep the
- Under the App Service resource, enable Diagnostic Settings of your web application
- Go to Monitoring--> Diagnostic Settings
- Give the name of your choice - something like
lab10appinsights
- Select all the checkboxes under Categories
- Select Send to Log Analytics workspace in Destination details
- Select AllMetrics
- Give the name of your choice - something like
- Go to Monitoring--> Diagnostic Settings
- Similarly, enable Diagnostic Settings for Azure Blob and COSMOS DB under the Resource Group:
- Go to the resource group (lab10)
- Select Monitoring -> Diagnostic Settings
- To enable Diagnostic Setting for Blob service
- Click on the blob type.
- Click on Add Diagnostic Setting
- Give the Diagnostic setting a name - something like
lab10blobinsights
- Enable all logs, including the audit one, as below:
- Make sure that Send to log analytics workspace is checked.
- Click on Save button
- To enable Diagnostic Setting for Blob service
- Similarly, enable the Diagnostic Setting for CosmosDB as shown below :
- Diagnostic setting name could be
la10cosmosdbinsights
- Diagnostic setting name could be
- Similarly, enable the Diagnostic Setting for CosmosDB as shown below :
- Under the resource group (Diagnostic settings page) you should see the insights enabled for blob and Cosmos:
- Similarly, diagnostics should be enabled under the App Service Diagnostic settings page.
- PS!! Wait for a few minutes until Azure configures everything in the background.
Exercise 10.3 Sending workload to message board application and accessing application insights
In this task, you need to send some workloads to your message board application through the web interface of your application. Further, we will use Application Insights to access the monitoring data of the web application.
- Make sure you have completed Exercise 10.1 and the application is working.
- Now upload some messages and images (Create around 10 messages together with images).
- Verify that the blobs are created and there are entries inside the CosmosDB
- Try to refresh the web app multiple times. We are just trying to send multiple GET requests to the root endpoint of the web app.
- You may have to wait for a few minutes (5 to 10 minutes), so that recorded insights appear in Application Insights.
- Accessing application insights
- Move on to Application Insights (You can search in resources search box)
- Go to your application insight as shown below
- Move on to Investigate --> Application Map
- If you see No data available, try changing the Time Range (the default is Last hour)
- Move on to Investigate --> Application Map
- Explore the interface by yourself.
- You can refer to a quick guide here
- Take a screenshot of the Application Map and it should contain your username located in the top right corner. (Name the screenshot as 10_3.jpg/png)
- Explore the interface by yourself.
Exercise 10.4: Accessing the logs and some metrics
In this task, we will query the Azure logs using KQL. Kusto Query Language (KQL) is a language to query structured, semi-structured, and unstructured data. In this task we will analyze and monitor the Telemetry data of our application and cloud resources with KQL.
- Open the resource group Monitoring page Logs subpage to write KQL queries
- Go to your Resource group (lab10)
- Move on to Monitoring --> Logs
- Close this default Queries wizard
- You will see the query editor as below
- Expand the resource type, double-click on any table, and hit the Run' button.
- If the query includes the name of a single Table, the content of the table will be shown.
- Explore the result of the query with the metrics from this table.
- In the following exercises, we will make heavy use of these tables and write queries using KQL to know in and out of our resources (Web App, Blob storage, and cosmosdb)
- Let's familiarize ourselves with the KQL
- Writing a KQL query to count the number of entries in AzureMetrics table.
Task 10.4.1 Queries related to App Service
- Let us write KQL queries to analyze the App Service logs
- Q1.1: KQL query to count the number of times you invoked the web application in last 7 days
AppServiceHTTPLogs | where TimeGenerated > ago(7days) | where CsUriStem == "/" | count
- Similarly try with last 1hr (Q1.1.1), 12hrs(Q1.1.2) and 24hrs(Q1.1.3).
- Take a screenshot of the query editor containing KQL query (Q1.1.3) along with the corresponding result, and it should contain your username located in the top right corner. (Name screenshot as Q1_1_3.jpg/png)
- Q1.2: Write a KQL query to count the number of times /handle_message is invoked in last 7 days
- You need to change only
CsUriStem
- Take screenshot for deliverable (Q1_2.jpg/png)
- You need to change only
- Q1.3:List the top 5 slowest requests to your web app in last 7days
AppServiceHTTPLogs | where TimeGenerated > ago(7days) | where CsUriStem == "/" | order by TimeTaken desc | take 5
- Q1.4:Similarly list top 5 slowest requests to your /handle_message endpoint in last 7days
- Take screenshot for deliverable (Q1_4.jpg/png))
- Q1.5:List of failed requests in last 7 days
- Q1.4:Similarly list top 5 slowest requests to your /handle_message endpoint in last 7days
AppServiceHTTPLogs | where TimeGenerated > ago(7days) | where CsUriStem == "/handle_message" or CsUriStem == "/" | where Result contains "Error" or Result contains "fail"
- Q1.6: Endpoint Usage: Determine which endpoints are hit most frequently to understand the usage patterns of your app.
AppServiceHTTPLogs | where CsUriStem == "/" or CsUriStem == "/handle_message" | summarize EndpointCount = count() by CsUriStem | order by EndpointCount desc
- Q1.7:IP Address Analysis: Identify which IP addresses are making the most requests to spot abusive or suspicious activity potentially.
- Summerize by
CIp
and order byRequestCount
- Take screenshot for deliverable (Q1_7.jpg/png)
- Summerize by
- Q1.8:Response Times: Calculate average, minimum, maximum, and percentile response times to gauge the performance of your app.
- Q1.7:IP Address Analysis: Identify which IP addresses are making the most requests to spot abusive or suspicious activity potentially.
AppServiceHTTPLogs | where CsUriStem == "/" or CsUriStem == "/handle_message" | summarize AverageResponseTime = avg(TimeTaken), MinimumResponseTime = min(TimeTaken), MaximumResponseTime = max(TimeTaken), Percentile90ResponseTime = percentile(TimeTaken, 90), Percentile95ResponseTime = percentile(TimeTaken, 95), Percentile99ResponseTime = percentile(TimeTaken, 99)
- Take screenshot for deliverable (Q1_8.jpg/png)
Task 10.4.2 Queries related to Blob Storage
- Let us write KQL queries to analyze the Blob Storage logs and you need to access the logs from the table Storage Accounts. Use the knowledge from the previous task and try to write queries yourself.
- Q2.1 Write a KQL query to list the number of times you accessed (GET request) the blobs
StorageBlobLogs | where OperationName has_all ("GetBlob") | where StatusCode has_all ("200")
- Q2.2 No of times you upload a blob/file and are succeeded
- You can use OperationName as PutBlob
- Take screenshot for deliverable (Q2_2.jpg/png)
- Q2.3 Write KQL for calculating average upload duration (in Ms)
- Take screenshot for deliverable (Q2_3.jpg/png)
- Q2.4 Write KQL for calculating the average duration while reading the blobs
- Take screenshot for deliverable (Q2_4.jpg/png)
- Q2.5 Create a pie chart of operations used over the last three days.
- Q2.2 No of times you upload a blob/file and are succeeded
StorageBlobLogs | where TimeGenerated > ago(3d) | summarize count() by OperationName | sort by count_ desc | render piechart
- Take screenshot for deliverable (Q2_5.jpg/png)
Task 10.4.3 Queries related to CosmosDB
- Let us write KQL queries to analyze the CosmosDB logs and you need to access the logs from Azure Cosmos Db table
- Q3.1 The number of times you insert a record (from both Windows and Linux machine)
CDBDataPlaneRequests | where OperationName == "Create"
- Q3.2 Write a KQL query to read or query the records. Change the
OperationName
toRead
- Q3.3 Average read time (DurationMs), when the app is live and not your flask app.
- Q3.1 The number of times you insert a record (from both Windows and Linux machine)
CDBDataPlaneRequests | where OperationName == "Read" | where UserAgent contains "Linux" | summarize AverageReadTimeLinux = avg(DurationMs)
- Q3.4 Similarly, What is the average read time (DurationMs), when the app is NOT live and you are accessing the Flask app.
- Replace
Linux
withWindows
- Take screenshot for deliverable (Q3_4.jpg/png)
- Replace
- Q3.5 Write a KQL query to find the slowest read operation in the last hour.
- Take screenshot for deliverable (Q3_5.jpg/png)
- Q3.6 Total number of unique clients accessed your CosmosDB (through Web App)
- Q3.4 Similarly, What is the average read time (DurationMs), when the app is NOT live and you are accessing the Flask app.
CDBDataPlaneRequests | where OperationName == "Read" | summarize UniqueClients = dcount(ClientIpAddress)
- Q3.7 Create a pie chart for DataPlaneRequests from different client IPs over the last three days.
- Take screenshot for deliverable (Q3_7.jpg/png)
- Q3.7 Create a pie chart for DataPlaneRequests from different client IPs over the last three days.
Exercise 10.5: Querying the Deployment Logs
The deployment logs are an alternative to the Logs stream. Log stream may not be able to give you the history. For example, Checking the error logs to see if there are any. Here, custom information in text form can be used as a query instead of any table or JSON format.
- Q4.1: Check the number of times you deployed your application. When you deploy a new version, you basically start a new container. In this case, Azure might start a new container for the Web App. You can use the message "Starting container for site" as a query string to check how many times new containers are created for deployment purposes.
AppServicePlatformLogs | where tolower( Message) contains "Starting container for site"
- Q4.2: Write KQL query to list for container restarts in the deployment.
- Take screenshot for deliverable (Q4_2.jpg/png)
- Q4.3: Write KQL query to list for error logs in the deployment.
- You can try out Level in the query
- Take screenshot for deliverable (Q4_3.jpg/png)
- Q4.4: List all the Warnings that contain a message with the keyword "unhealthy"
- Take screenshot for deliverable (Q4_4.jpg/png)
Deliverables:
- Terminate and delete your web application, storage account, and CosmosDB
- Screenshots from the following tasks:
- From 10.3 - 1 screenshot
- From 10.4 - 12 screenshots
- From 10.5 - 3 screenshots