Amazon Cloud Watch

aws

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html - printed, Monitoring memory usage
https://forums.aws.amazon.com/message.jspa?messageID=265299 - Monitoring memory usage
http://arr.gr/blog/2013/08/monitoring-ec2-instance-memory-usage-with-cloudwatch/ - printed, Monitoring memory usage
https://signalfx.com/blog/12-top-things-to-monitor-in-amazon-ec2/
https://www.logicmonitor.com/blog/guide-to-monitoring-memory-for-aws-ec2-linux-instances/
https://www.otreva.com/blog/monitoring-memory-ram-aws-elasticbeanstalk-ec2-instances-cloudwatch/
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph_a_metric.html
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-monitoring.html
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/switch_graph_axes.html

What is Amazon Cloud Watch?

Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources and applications, starting with Amazon EC2. It provides you with visibility into resource utilization, operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic. You can get statistics, view graphs, and set alarms for your metric data.

What questions should we ask ourselves to come up with a monitoring plan?

  • What are our goals for monitoring?
  • What resources do we need to monitor?
  • How often do we need to monitor these resources?
  • What monitoring tools will we used?
  • Who will perform the monitoring tasks?
  • Who should be notified when something goes wrong?

Why should we establish a baseline?

After we have defined our monitoring goals and have created our monitoring plan, the next step is to establish a baseline for normal Amazon EC2 performance. We should measure EC2 performance at various times, and under different load conditions. As we monitor EC2, we should store a history of monitoring data that you've collected. We can compare current EC performance to this historical data to help us identify normal performance patterns and anomalies, and devise methods to address them. For example, we can monitor CPU utilization, disk I/O, and network utilization. When performance falls outside of established baseline, we might need to reconfigure or optimize the instances to reduce CPU utilization, improve disk I/O, or reduce network traffic.

What should we monitor?

  1. CPU utilization
  2. Memory utilization
  3. Memory used
  4. Memory available
  5. Network utilization
    1. NetworkIn
    2. NetworkOut
    3. Number of sockets open
    4. Number of sockets in time-wait state
  6. Disk performance
    1. DiskReadOps
    2. DiskWriteOps
  7. Disk swap utilization
  8. Page file utilization (Windows-only)

How can we send memory usage statistics to CloudWatch?

Because AWS does not have access to EC2 instance at the Operating System level, only CPU, Network utilization, IO and other metrics that can be monitored through Hypervisor layer are available by default in AWS Console. Because of this, AWS does not monitor memory utilization by default. However, AWS does provides a set of scripts that we can use to send memory utilization information to CloudWatch. The process of sending these custom metrics are different for Linux and Windows Instances. Even the process of installing pre-requisites on different Linux distributions is slightly different. The overall steps to do this:

  1. Create an IAM user. We do not need password or console access to this user.
  2. Copy and keep the access key for the above IAM user handy as we would need this every time we configure custom metrics to be sent to CloudWatch
  3. Create and Attach an Inline Policy to the user with below actions. See http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html for details
  4. Installing pre-requisites

To create an IAM user:

  1. Log into AWS Console
  2. Find the IAM service and click on it
  3. Click on the "Users" tab
  4. Click on the "Add" button
  5. Provide the username
  6. Check the check box next to "Programmatic access"
  7. Click no the "Next (Permission)" button
  8. Click on the "Next (Review)" button
  9. Click on the "Create user" button. On this screen, make sure that we copy the access key and the secret to a safe place.

To create a policy:

  1. Find the user that we just created
  2. Click on the user that we just created
  3. In the Permission tab, click on the "Add Permission" button.
  4. Click on the "Attach existing policy directly" button. This will open a new tab.
  5. Click on the "Create Policy" button
  6. Click on the "Select" button under the "Create Your Own Policy" section.
  7. Specify a name and description for this policy
  8. Copy and paste the policy from http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html into the "Policy Document" field.
  9. Click on the "Create Policy" button at the bottom. This creates the policy. We can now close this tab and go back to the previous tab.
  10. Click on the "Refresh" button that is next to the "Create Policy" button.
  11. In the search box, search for the policy that we just created.
  12. Check the check box next to the policy that we just created.
  13. Click on the "Next (Review)" button
  14. Click on the "Add Permission" button

To install pre-requisites:

sudo yum install perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
curl http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip -O
unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip
mv aws-scripts-mon /opt/
cd /opt/aws-scripts-mon/
cp awscreds.template awscreds.conf 
vi awscreds.conf

Run the below command:

/opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --verify –verbose

to to check if everything is OK. If everything is properly configured and we can proceed to configure Cron to send Memory Metrics to CloudWatch every 5 minutes. Type crontab –e at the shell prompt and append below line to the end of the file:

*/5 * * * * /opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --mem-used --mem-avail --swap-util --swap-used --from-cron

Save and exit. After 10 or 15 minutes you would see Memory utilization of this instance in CloudWatch console.

How can we create an alarm?

In the previous task, we created an Auto Scaling policy to add to the number of running instances. In this task, we’ll associate that policy with an alarm action. When the alarm is triggered, Auto Scaling is notified and makes the appropriate changes to your resources. To save time, we'll create just one alarm; however, you can apply the same procedure to create other alarms. For example, you could create another alarm to notify Auto Scaling that it needs to terminate an instance.

How can we delete your CloudWatch alarm?

  1. Open the Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the Navigation pane, under Regions, click US East (Virginia).
  3. Select the check box next to the alarm that you want to delete, and then click Delete.
  4. When a confirmation message appears, click Yes, Delete.

How can we create a CloudWatch alarm?

You can create an Amazon CloudWatch alarm that monitors any one of your Amazon EC2 instance’s CloudWatch metrics. CloudWatch will automatically send you a notification when the metric reaches a threshold you specify.

To create a CloudWatch alarm for high CPU:

  1. On the Amazon EC2 console, select the instance for which you want to create an alarm.
  2. On the instance’s Monitoring tab in the lower pane, click Create Alarm.
  3. In the Create Alarm dialog box, set the criteria for your alarm. In this example, we’ll set an alarm if the instance’s average CPU utilization is above 70 percent.
  4. The check box next to Send a notification is selected by default. Select an existing topic, or click Create topic and enter a name. (Notifications use Amazon Simple Notification Service (Amazon SNS)).
  5. In the With these recipients box, enter the email addresses of the recipients you want to notify. You can enter up to 10 email addresses, each separated by a comma.
  6. Configure the threshold for your alarm:
    1. In the Whenever boxes, select Average and CPU Utilization.
    2. In the Is boxes, define the threshold for the alarm by selecting > and entering 70.
    3. In the For at least boxes, specify the sampling period and number of samples evaluated by the alarm. You can leave the defaults or define your own. For our example, we’ll monitor for 1 period of 15 minutes. A shorter period creates a more sensitive alarm. A longer period can mitigate brief spikes in a metric.
    4. In Name of alarm, a name is automatically generated for you. You can type in the field to change the name. You cannot modify the name after you create the alarm.
  7. Click Create Alarm.

After you create the alarm, you can use the Monitoring tab in the Amazon EC2 console to view a summary of alarms that have been set for that instance. From there, you can also edit the alarm. If you created a new Amazon SNS topic for this alarm or added new email addresses to an existing topic, each email address added will receive a subscription confirmation email from Amazon SNS. The person who receives the email must confirm it by clicking the included link in order to receive notifications.

How can we create and edit status check alarms?

You can create instance status and system status alarms to notify you when an instance has a failed status check.

  • To create a status check alarm:
    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
    2. In the Navigation pane, click Instances.
    3. Select an instance, and then on the Status Checks tab, click Create Status Check Alarm.
    4. In the Create Alarm dialog box, select the Send a notification to check box, and then choose an existing Amazon Simple Notification Service (SNS) topic or create a new SNS topic to use for this alarm.
    5. In the With these recipients box, type your email address (e.g., moc.elpmaxe|selits.nhoj#moc.elpmaxe|selits.nhoj) and the addresses of any additional recipients, separated by commas.
    6. In the Whenever drop-down list, select the status check you want to be notified about (e.g., Status Check Failed (Any), Status Check Failed (Instance), or Status Check Failed (System)).
    7. In the For at least box, set the number of periods you want to evaluate (for example, 2) and in the consecutive periods drop-down menu, select the evaluation period duration (for example, 5 minutes) before triggering the alarm and sending an email
    8. To change the default name for the alarm, in the Name of alarm box, type a friendly name for the alarm (for example, StatusCheckFailed), and then click Create Alarm. If you added an email address to the list of recipients or created a new topic, Amazon SNS will send a subscription confirmation email message to each new address shortly after you create an alarm. Remember to click the link contained in that message, which confirms your subscription. Alert notifications are only sent to confirmed addresses.
  • To edit a status check alarm:
    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
    2. In the Navigation pane, click Instances.
    3. Select an instance, click Actions, and then click Add/Edit Alarms.
    4. In the Alarm Details dialog box, click the name of the alarm.
    5. In the Edit Alarm dialog box, make the desired changes, and then click Save.

How frequent does AWS perform status checks?

Status checks are performed every minute and each returns a pass or a fail status. If all checks pass, the overall status of the instance is OK. If one or more checks fail, the overall status is impaired.

Can we disable status checks?

No. Status checks are built into Amazon EC2, so they cannot be disabled or deleted. You can, however create or delete alarms that are triggered based on the result of the status checks.

What are the two types of status checks?

  1. system status checks: monitor the AWS systems required to use your instance to ensure they are working properly. These checks detect problems with your instance that require AWS involvement to repair.
  2. instance status checks: monitor the software and network configuration of your individual instance. These checks detect problems that require your involvement to repair. When an instance status check fails, typically you will need to address the problem yourself.

What can you do when there is a 'system status checks' failure?

When a system status check fails, you can choose to wait for AWS to fix the issue or you can resolve it yourself (for example, by stopping and restarting or terminating and replacing an instance). Examples of problems that cause system status checks to fail include:

  • Loss of network connectivity
  • Loss of system power
  • Software issues on the physical host
  • Hardware issues on the physical host

What can you do when there is a 'instance status check' failure?

When an instance status check fails, typically you will need to address the problem yourself (for example by rebooting the instance or by making modifications in your operating system). Examples of problems that may cause instance status checks to fail include:

  • Failed system status checks
  • Misconfigured networking or startup configuration
  • Exhausted memory
  • Corrupted file system
  • Incompatible kernel

Status checks that occur during instance reboot or while a Windows instance store-backed instance is being bundled will report an instance status check failure until the instance becomes available again.

How can we view status checks?

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Instances.
  3. On the Instances page, the Status Checks column lists the operational status of each instance.
  4. To view an individual instance’s status, select the instance, and then click the Status Checks tab.

How can we report status feedback?

You can provide feedback about your instances if you are having problems with an instance. AWS use reported feedback to identify issues impacting multiple customers, but do not respond to individual account issues reported via this form.

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Instances.
  3. On the Instances page, click on the instance on which you want to report status.
  4. Click the Status Checks tab and then click Submit feedback.
  5. Complete the information on the Report Instance Status page.

Does AWS respond to your individual feedback when you submit feedback for a status check?

No. AWS use reported feedback to identify issues impacting multiple customers, but do not respond to individual account issues reported via this form.

What are the differences between instance status and event?

Instance status describes specific events that AWS may schedule for your instances, such as a reboot or retirement. These scheduled events are not frequent. If one of your instances will be affected by a scheduled event, you'll receive an email prior to the scheduled event with details about the event, as well as a start and end date.

What are the different types of scheduled events?

  • Reboot: A reboot can be either an instance reboot or a system reboot.
  • System maintenance: An instance may be temporarily affected by network maintenance or power maintenance.
  • Instance retirement: An instance that's scheduled for retirement will be stopped or terminated.
  • Instance stop: An instance may need to be stopped in order to migrate it to new hardware.

How does AWS notify you about scheduled events?

If one of your instances will be affected by a scheduled event, you'll receive an email prior to the scheduled event with details about the event, as well as a start and end date.

How can we view scheduled events for your instances?

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Events. You can see a list of all resources with events associated with them. You can filter by instance or volume, or by specific status types.
  3. Alternatively, you can do the following to view upcoming scheduled events:
    1. In the navigation pane, click the EC2 Dashboard.
    2. Under Scheduled Events, you can see the events associated with your Amazon EC2 instances and volumes.

How can we work with instances that has a scheduled event?

If one of your instances is scheduled for any of the above events, you may be able to take actions to control the timing of the event, or to minimize downtime.

For all scheduled events, your course of action will differ depending on whether your instance’s root device volume is an Amazon EBS volume or an instance store volume. You can determine the root device type for an instance by checking the value of the Root device type field in the details pane on the Instances page.

See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License