Amazon Cloud Watch

aws

http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph_metrics.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-volume-status.html#ebs-metrics
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-cloudwatch-metrics.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring_get_statistics.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html - printed, Monitoring memory usage
https://forums.aws.amazon.com/message.jspa?messageID=265299 - Monitoring memory usage
http://arr.gr/blog/2013/08/monitoring-ec2-instance-memory-usage-with-cloudwatch/ - printed, Monitoring memory usage
https://signalfx.com/blog/12-top-things-to-monitor-in-amazon-ec2/
https://www.logicmonitor.com/blog/guide-to-monitoring-memory-for-aws-ec2-linux-instances/
https://www.otreva.com/blog/monitoring-memory-ram-aws-elasticbeanstalk-ec2-instances-cloudwatch/
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph_a_metric.html
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-monitoring.html
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/switch_graph_axes.html

What is Amazon Cloud Watch?

Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources and applications, starting with Amazon EC2. It provides you with visibility into resource utilization, operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic. You can get statistics, view graphs, and set alarms for your metric data.

Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and the applications you run on AWS in real time. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications. CloudWatch alarms send notifications or automatically make changes to the resources you are monitoring based on rules that you define. For example, you can monitor the CPU usage and disk reads and writes of your Amazon EC2 instances and then use this data to determine whether you should launch additional instances to handle increased load. You can also use this data to stop under-used instances. In addition to monitoring the built-in metrics that come with AWS, you can monitor your own custom metrics. With CloudWatch, you gain system-wide visibility into resource utilization, application performance, and operational health.

What questions should we ask ourselves to come up with a monitoring plan?

  • What are our goals for monitoring?
  • What resources do we need to monitor?
  • How often do we need to monitor these resources?
  • What monitoring tools will we used?
  • Who will perform the monitoring tasks?
  • Who should be notified when something goes wrong?

Why should we establish a baseline?

After we have defined our monitoring goals and have created our monitoring plan, the next step is to establish a baseline for normal Amazon EC2 performance. We should measure EC2 performance at various times, and under different load conditions. As we monitor EC2, we should store a history of monitoring data that you've collected. We can compare current EC performance to this historical data to help us identify normal performance patterns and anomalies, and devise methods to address them. For example, we can monitor CPU utilization, disk I/O, and network utilization. When performance falls outside of established baseline, we might need to reconfigure or optimize the instances to reduce CPU utilization, improve disk I/O, or reduce network traffic.

What should we monitor?

  1. CPU utilization
  2. Memory utilization
  3. Memory used
  4. Memory available
  5. Network utilization
    1. NetworkIn
    2. NetworkOut
    3. Number of sockets open
    4. Number of sockets in time-wait state
  6. Disk performance
    1. DiskReadOps
    2. DiskWriteOps
  7. Disk swap utilization
  8. Page file utilization (Windows-only)
  9. Page file available (Windows-only)
  10. Disk space utilization
  11. Disk space used
  12. Disk space available

What are the automated tools that are made available by CloudWatch?

  1. System Status Checks: Monitor the AWS systems required to use your instance. These checks detect problems with your instance that require AWS involvement to repair. When a system status check fails, we can choose to wait for AWS to fix the issue or we can resolve it ourselves, for example, by stopping and restarting or by terminating and replacing the instance. Examples of problems that cause system status checks to fail:
    1. Loss of network connectivity
    2. Loss of system power
    3. Software issues on the physical host
    4. Hardware issues on the physical host that impact network reachability
  2. Instance Status Checks: Monitor the software and network configuration of your individual instance. These checks detect problems that require our involvement to repair. When an instance status check fails, typically we need to address the problem ourselves, for example by rebooting the instance or by making modifications to the operating system ourselves. Examples of problems that may cause instance status checks to fail:
    1. Failed system status checks
    2. Misconfigured networking or startup configuration
    3. Exhausted memory
    4. Corrupted file system
    5. Incompatible kernel
  3. Amazon CloudWatch Alarms
  4. Amazon CloudWatch Events
  5. Amazon CloudWatch Logs
  6. AWS Management Pack for Microsoft System Center Operations Manager

What is the most important difference between 'System Status Checks' and 'Instance Status Checks'?

The most basic check is the 'System Status Checks', which requires involvement from AWS to resolve the issue. With 'System Status Checks', we can wait for AWS to fix the problem, or we can try to fix it ourselves by rebooting the instance, or by terminating and replacing the instance. With 'Instance Status Checks', we have to fix it by ourselves.

What is the purpose of Amazon CloudWatch Alarms?

Watch as single metric over a time period we specify, and perform one or more action based on the value of the metric relative to a given threshold over a number of time periods. The action is a notification sent to an Amazon Simple Notification Service (Amazon SNS) topic or Auto Scaling policy. Alarms invoke actions for sustained state changes only. CloudWatch alarms will not invoke action simply because they are in a particular state. The state must have changed and been maintained for a specified number of periods.

Will CloudWatch Alarm invoke action simply because they are in a particular state?

No. Alarms invoke actions for sustained state changes only. CloudWatch alarms will not invoke action simply because they are in a particular state. The state must have changed and been maintained for a specified number of periods.

What is the purpose of Amazon CloudWatch Events?

Automate our AWS services and respond automatically to system events. Events from AWS services are delivered to CloudWatch Events in near real time, and we can specify automated actions to take when an event matches a rule.

CloudWatch Events helps you to respond to state changes in your AWS resources. When your resources change state they automatically send events into an event stream. You can create rules that match selected events in the stream and route them to targets to take action. You can also use rules to take action on a pre-determined schedule. For example, you can configure rules to:

  1. Automatically invoke an AWS Lambda function to update DNS entries when an event notifies you that Amazon EC2 instance enters the Running state
  2. Direct specific API records from CloudTrail to a Kinesis stream for detailed analysis of potential security or availability risks
  3. Take a snapshot of an Amazon EBS volume on a schedule

AWS can schedule events for our instances, such as a reboot, stop/start, or retirement. These events do not occur frequently. If one of your instance will be affected by a scheduled event, AWS sends an email to the email address that is associated with your AWS account prior to the scheduled event, with details about the event, including the start and end date. Depending on the event, we might be able to take action to control the timing of the event.

What is the purpose of Amazon CloudWatch Logs?

Monitor, store, and access our log files from Amazon EC2 instances, AWS CloudTrail, or other sources.

What is the purpose of AWS Management Pack for Microsoft System Center Operations Manager?

Links AWS EC2 instances and the Windows or Linux operating system running inside them. This is an extension to Microsoft System Center Operation Manager. It uses a designated computer in our datacenter (called a watch node) and the Amazon Web Services APIs to remotely discover and collect information about our AWS resources.

Can we disable system status checks?

No. Status checks are performed every minute, and each returns a pass or fail status. If one or more checks fail, the overall status is impaired. Status checks are built into EC2, so they cannot be disabled or deleted. We can, however, create or delete alarms that are triggered based on the result of the status check.

How can we view the result of status checks?

To view status checks using the console:

  1. Log into AWS console
  2. In the navigation pane, click on Instances
  3. On the Instances page, the Status Checks column list the operational status of each instance
  4. To view the status of a specific instance, select the instance, and then choose the Status Checks tab.

How can we view the result of status checks using the Command Line or API?

To view the status of all instances:

aws ec2 describe-instance-status

To get the status of all instances with instance status of impaired:

aws ec2 describe-instance-status --filters Name=instance-status.status,Values=impaired

To get the status of a single instance:

aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0

How can we open a support ticket with AWS when we have a system status check failure?

There should be a button in the AWS console somewhere. I NEED TO COMPLETE THIS.

Does AWS respond to our individual feedback?

No. We can provide feedback if we are having problems with an instance whose status is not shown as impaired, or want to send AWS additional details about the problems we are experiencing with an impaired instance. AWS use the reported feedback to identify issues impacting multiple customers, but do not respond to individual issues. Providing feedback does not change the status check result.

How can we report status feedback using the console?

  1. Log into AWS console
  2. In the navigation pane, choose Instances
  3. Select the instance
  4. Select the Status Checks tab, and then choose Submit Feedback.
  5. Complete the Report Instance Status form, and click on Submit.

How can we send memory usage statistics to CloudWatch?

Because AWS does not have access to EC2 instance at the Operating System level, only CPU, Network utilization, IO and other metrics that can be monitored through Hypervisor layer are available by default in AWS Console. Because of this, AWS does not monitor memory utilization by default. However, AWS does provides a set of scripts that we can use to send memory utilization information to CloudWatch. The process of sending these custom metrics are different for Linux and Windows Instances. Even the process of installing pre-requisites on different Linux distributions is slightly different. The overall steps to do this:

  1. Create an IAM user. We do not need password or console access to this user.
  2. Copy and keep the access key for the above IAM user handy as we would need this every time we configure custom metrics to be sent to CloudWatch
  3. Create and Attach an Inline Policy to the user with below actions. See http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html for details
  4. Installing pre-requisites

To create an IAM user:

  1. Log into AWS Console
  2. Find the IAM service and click on it
  3. Click on the "Users" tab
  4. Click on the "Add" button
  5. Provide the username
  6. Check the check box next to "Programmatic access"
  7. Click no the "Next (Permission)" button
  8. Click on the "Next (Review)" button
  9. Click on the "Create user" button. On this screen, make sure that we copy the access key and the secret to a safe place.

To create a policy:

  1. Find the user that we just created
  2. Click on the user that we just created
  3. In the Permission tab, click on the "Add Permission" button.
  4. Click on the "Attach existing policy directly" button. This will open a new tab.
  5. Click on the "Create Policy" button
  6. Click on the "Select" button under the "Create Your Own Policy" section.
  7. Specify a name and description for this policy
  8. Copy and paste the policy from http://blog.krishnachaitanya.ch/2016/03/monitor-ec2-memory-usage-using-aws.html into the "Policy Document" field.
  9. Click on the "Create Policy" button at the bottom. This creates the policy. We can now close this tab and go back to the previous tab.
  10. Click on the "Refresh" button that is next to the "Create Policy" button.
  11. In the search box, search for the policy that we just created.
  12. Check the check box next to the policy that we just created.
  13. Click on the "Next (Review)" button
  14. Click on the "Add Permission" button

To install pre-requisites:

sudo yum install perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
curl http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip -O
unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip
mv aws-scripts-mon /opt/
cd /opt/aws-scripts-mon/
cp awscreds.template awscreds.conf 
vi awscreds.conf

Run the below command:

/opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --verify –verbose

to to check if everything is OK. If everything is properly configured and we can proceed to configure Cron to send Memory Metrics to CloudWatch every 5 minutes. Type crontab –e at the shell prompt and append below line to the end of the file:

*/5 * * * * /opt/aws-scripts-mon/mon-put-instance-data.pl --mem-util --mem-used --mem-avail --swap-util --swap-used --from-cron

Save and exit. After 10 or 15 minutes you would see Memory utilization of this instance in CloudWatch console.

How can we create a CloudWatch alarm?

In the previous task, we created an Auto Scaling policy to add to the number of running instances. In this task, we’ll associate that policy with an alarm action. When the alarm is triggered, Auto Scaling is notified and makes the appropriate changes to your resources. To save time, we'll create just one alarm; however, you can apply the same procedure to create other alarms. For example, you could create another alarm to notify Auto Scaling that it needs to terminate an instance.

You can create an Amazon CloudWatch alarm that monitors any one of your Amazon EC2 instance’s CloudWatch metrics. CloudWatch will automatically send you a notification when the metric reaches a threshold you specify.

To create a CloudWatch alarm for high CPU:

  1. On the Amazon EC2 console, select the instance for which you want to create an alarm.
  2. On the instance’s Monitoring tab in the lower pane, click Create Alarm.
  3. In the Create Alarm dialog box, set the criteria for your alarm. In this example, we’ll set an alarm if the instance’s average CPU utilization is above 70 percent.
  4. The check box next to Send a notification is selected by default. Select an existing topic, or click Create topic and enter a name. (Notifications use Amazon Simple Notification Service (Amazon SNS)).
  5. In the With these recipients box, enter the email addresses of the recipients you want to notify. You can enter up to 10 email addresses, each separated by a comma.
  6. Configure the threshold for your alarm:
    1. In the Whenever boxes, select Average and CPU Utilization.
    2. In the Is boxes, define the threshold for the alarm by selecting > and entering 70.
    3. In the For at least boxes, specify the sampling period and number of samples evaluated by the alarm. You can leave the defaults or define your own. For our example, we’ll monitor for 1 period of 15 minutes. A shorter period creates a more sensitive alarm. A longer period can mitigate brief spikes in a metric.
    4. In Name of alarm, a name is automatically generated for you. You can type in the field to change the name. You cannot modify the name after you create the alarm.
  7. Click Create Alarm.

After you create the alarm, you can use the Monitoring tab in the Amazon EC2 console to view a summary of alarms that have been set for that instance. From there, you can also edit the alarm. If you created a new Amazon SNS topic for this alarm or added new email addresses to an existing topic, each email address added will receive a subscription confirmation email from Amazon SNS. The person who receives the email must confirm it by clicking the included link in order to receive notifications.

We can create status check alarms to monitor instance status or system status. We can configure the alarm to send us notification by email, or stop, terminate, or recover an instance when it fails an instance status check or a system status check. To create a status check alarm:

  1. Log into AWS Console
  2. In the navigation pane, choose Instances
  3. Select the instance
  4. Select the Status Checks tab, and click on the Create Status Check Alarm button
  5. Select Send a notification to. Choose an existing SNS topic, or click create topic to create a new one. If creating a new topic, in With these parameters, enter your email address and the addresses of any additional recipients, separated by commas.
  6. (Optional). Choose Take the action, and then select the action that we would like to take
  7. In Whenever, select the status check that we want to be notified about. Note if we select Recover this instance in the previous step, select Status Check Failed (System).
  8. In For at least, set the number of periods we want to evaluate and in consecutive periods, select the evaluation period duration before triggering the alarm and sending an email.
  9. (Optional). In Name of alarm, replace the default name with another name for the alarm.
  10. Click on the Create Alarm button. Note if we added an email address to the list of recipients or created a new topic, Amazon SNS sends a subscription confirmation email message to each new address. Each recipient must confirm the subscription by clicking the link contained in that message. Alert notifications are sent only to confirmed addresses.

How can we edit a status check alarm?

  1. Log into AWS console
  2. In the navigation pane, choose Instances
  3. Select the instance, choose Actions, select CloudWatch Monitoring, and then choose Add/Edit Alarms
  4. In the Alarm Details dialog box, choose the name of the alarm
  5. In the Edit Alarm dialog box, make the desired changes, and then choose Save.

How can we delete your CloudWatch alarm?

  1. Open the Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the Navigation pane, under Regions, click US East (Virginia).
  3. Select the check box next to the alarm that you want to delete, and then click Delete.
  4. When a confirmation message appears, click Yes, Delete.

How can we create and edit status check alarms?

You can create instance status and system status alarms to notify you when an instance has a failed status check.

  • To create a status check alarm:
    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
    2. In the Navigation pane, click Instances.
    3. Select an instance, and then on the Status Checks tab, click Create Status Check Alarm.
    4. In the Create Alarm dialog box, select the Send a notification to check box, and then choose an existing Amazon Simple Notification Service (SNS) topic or create a new SNS topic to use for this alarm.
    5. In the With these recipients box, type your email address (e.g., moc.elpmaxe|selits.nhoj#moc.elpmaxe|selits.nhoj) and the addresses of any additional recipients, separated by commas.
    6. In the Whenever drop-down list, select the status check you want to be notified about (e.g., Status Check Failed (Any), Status Check Failed (Instance), or Status Check Failed (System)).
    7. In the For at least box, set the number of periods you want to evaluate (for example, 2) and in the consecutive periods drop-down menu, select the evaluation period duration (for example, 5 minutes) before triggering the alarm and sending an email
    8. To change the default name for the alarm, in the Name of alarm box, type a friendly name for the alarm (for example, StatusCheckFailed), and then click Create Alarm. If you added an email address to the list of recipients or created a new topic, Amazon SNS will send a subscription confirmation email message to each new address shortly after you create an alarm. Remember to click the link contained in that message, which confirms your subscription. Alert notifications are only sent to confirmed addresses.
  • To edit a status check alarm:
    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
    2. In the Navigation pane, click Instances.
    3. Select an instance, click Actions, and then click Add/Edit Alarms.
    4. In the Alarm Details dialog box, click the name of the alarm.
    5. In the Edit Alarm dialog box, make the desired changes, and then click Save.

How frequent does AWS perform status checks?

Status checks are performed every minute and each returns a pass or a fail status. If all checks pass, the overall status of the instance is OK. If one or more checks fail, the overall status is impaired.

Can we disable status checks?

No. Status checks are built into Amazon EC2, so they cannot be disabled or deleted. You can, however create or delete alarms that are triggered based on the result of the status checks.

What are the two types of status checks?

  1. system status checks: monitor the AWS systems required to use your instance to ensure they are working properly. These checks detect problems with your instance that require AWS involvement to repair.
  2. instance status checks: monitor the software and network configuration of your individual instance. These checks detect problems that require your involvement to repair. When an instance status check fails, typically you will need to address the problem yourself.

What can you do when there is a 'system status checks' failure?

When a system status check fails, you can choose to wait for AWS to fix the issue or you can resolve it yourself (for example, by stopping and restarting or terminating and replacing an instance). Examples of problems that cause system status checks to fail include:

  • Loss of network connectivity
  • Loss of system power
  • Software issues on the physical host
  • Hardware issues on the physical host

What can you do when there is a 'instance status check' failure?

When an instance status check fails, typically you will need to address the problem yourself (for example by rebooting the instance or by making modifications in your operating system). Examples of problems that may cause instance status checks to fail include:

  • Failed system status checks
  • Misconfigured networking or startup configuration
  • Exhausted memory
  • Corrupted file system
  • Incompatible kernel

Status checks that occur during instance reboot or while a Windows instance store-backed instance is being bundled will report an instance status check failure until the instance becomes available again.

How can we view status checks?

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Instances.
  3. On the Instances page, the Status Checks column lists the operational status of each instance.
  4. To view an individual instance’s status, select the instance, and then click the Status Checks tab.

How can we report status feedback?

You can provide feedback about your instances if you are having problems with an instance. AWS use reported feedback to identify issues impacting multiple customers, but do not respond to individual account issues reported via this form.

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Instances.
  3. On the Instances page, click on the instance on which you want to report status.
  4. Click the Status Checks tab and then click Submit feedback.
  5. Complete the information on the Report Instance Status page.

Does AWS respond to your individual feedback when you submit feedback for a status check?

No. AWS use reported feedback to identify issues impacting multiple customers, but do not respond to individual account issues reported via this form.

What are the differences between instance status and event?

Instance status describes specific events that AWS may schedule for your instances, such as a reboot or retirement. These scheduled events are not frequent. If one of your instances will be affected by a scheduled event, you'll receive an email prior to the scheduled event with details about the event, as well as a start and end date.

What are the different types of scheduled events?

  • Instance stop: An instance may need to be stopped in order to migrate it to new hardware. The instance will be stopped. When you start it again, it's migrated to a new host computer. This applies only to instances backed by Amazon EBS.
  • Instance retirement: An instance that's scheduled for retirement will be stopped or terminated.
  • Reboot: A reboot can be either an instance reboot or a system reboot.
  • System maintenance: An instance may be temporarily affected by network maintenance or power maintenance.

How does AWS notify you about scheduled events?

If one of your instances will be affected by a scheduled event, you'll receive an email prior to the scheduled event with details about the event, as well as a start and end date.

How can we view scheduled events for your instances?

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, click Events. We can see a list of all resources with events associated with them. We can filter by resource type, by specific event types, instance or volume, or by specific status types. We can select the resource to view details.
  3. Alternatively, you can do the following to view upcoming scheduled events:
    1. In the navigation pane, click the EC2 Dashboard.
    2. Under Scheduled Events, you can see the events associated with your Amazon EC2 instances and volumes.

Note that events are also shown for affected resource. For example, in the navigation pane, choose Instances, and then select an instance. If the instance has an associated event, it is displayed in the lower pane.

How can we work with instances that has a scheduled event?

If one of your instances is scheduled for any of the above events, you may be able to take actions to control the timing of the event, or to minimize downtime.

For all scheduled events, your course of action will differ depending on whether your instance’s root device volume is an Amazon EBS volume or an instance store volume. You can determine the root device type for an instance by checking the value of the Root device type field in the details pane on the Instances page.

See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html

What should we be aware of when dealing with instances that are scheduled to be stopped or retired?

When AWS detects irreparable failure of the underlying host computer for your instance, it schedules the instance to stop or terminate, depending on the type of root device for the instance. If the root device is an EBS volume, the instance is scheduled to stop. If the root device is an instance store volume, the instance is scheduled to terminate.

Any data stored on instance store volumes is lost when an instance is stopped or terminated. This includes instance store volumes that are attached to an instance that has an EBS volume as the root device. Be sure to save data from your instance store volumes that you will need later before the instance is stopped or terminated.

For instances backed by EBS, we can wait for the instance to stop as scheduled. Alternatively, we can stop and start the instance ourselves, which migrates it to a new host computer.

For instances not backed by EBS, AWS recommend that we launch a replacement instance from our most recent AMI and migrate all necessary data to the replacement instance before the instance is scheduled to terminate. Then, we can terminate the original instance, or wait for it to terminate as scheduled.

Can we do a system reboot?

No. When AWS needs to perform tasks such as installing updates or maintaining the underlying host computer, it can schedule an instance or the underlying host computer for the instance for a reboot. You can determine whether the reboot event is an instance reboot or a system reboot:

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, choose Events.
  3. Select Instance resources from the filter list, and then select your instance.
  4. In the bottom pane, locate Event type. The value is either system-reboot or instance-reboot.

You can wait for the instance reboot to occur within its scheduled maintenance window. Alternatively, you can reboot your instance yourself at a time that is convenient for you. After you reboot your instance, the scheduled event for the instance reboot is canceled immediately and the event's description is updated. The pending maintenance to the underlying host computer is completed, and you can begin using your instance again after it has fully booted.

It is not possible for you to reboot the system yourself. We recommend that you wait for the system reboot to occur during its scheduled maintenance window. A system reboot typically completes in a matter of minutes, the instance retains its IP address and DNS name, and any data on local instance store volumes is preserved. After the system reboot has occurred, the scheduled event for the instance is cleared, and you can verify that the software on your instance is operating as you expect.

Alternatively, if it is necessary to maintain the instance at a different time, you can stop and start an EBS-backed instance, which migrates it to a new host. However, the data on the local instance store volumes would not be preserved. In the case of an instance store-backed instance, you can launch a replacement instance from your most recent AMI.

How should we handle instances that are scheduled for maintenance?

When AWS needs to maintain the underlying host computer for an instance, it schedules the instance for maintenance. There are two types of maintenance events: network maintenance and power maintenance. During network maintenance, scheduled instances lose network connectivity for a brief period of time. Normal network connectivity to your instance will be restored after maintenance is complete. During power maintenance, scheduled instances are taken offline for a brief period, and then rebooted. When a reboot is performed, all of your instance's configuration settings are retained.

After your instance has rebooted (this normally takes a few minutes), verify that your application is working as expected. At this point, your instance should no longer have a scheduled event associated with it, or the description of the scheduled event begins with [Completed]. It sometimes takes up to 1 hour for this instance status to refresh. Completed maintenance events are displayed on the Amazon EC2 console dashboard for up to a week.

For instances backed by Amazon EBS, we can wait for the maintenance to occur as scheduled. Alternatively, we can stop and start the instance, which migrates it to a new host computer. For instances not backed by EBS, we can wait for the maintenance to occur as scheduled. Alternatively, if we want to maintain normal operation during a scheduled maintenance window, we can launch a replacement instance from our most recent AMI, migrate all necessary data to the replacement instance before the scheduled maintenance window, and then terminate the original instance.

How long does AWS CloudWatch retain historical information?

15 months. You can monitor your instances using Amazon CloudWatch, which collects and processes raw data from Amazon EC2 into readable, near real-time metrics. These statistics are recorded for a period of 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing.

By default, Amazon EC2 sends metric data to CloudWatch in 5-minute periods. To send metric data for your instance to CloudWatch in 1-minute periods, you can enable detailed monitoring on the instance.

What are the metrics?

  • CPUCreditUsage: [T2 instances] The number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). CPU credit metrics are available only at a 5 minute frequency. If you specify a period greater than five minutes, use the Sum statistic instead of the Average statistic. Units: Count
  • CPUCreditBalance: [T2 instances] The number of CPU credits available for the instance to burst beyond its base CPU utilization. Credits are stored in the credit balance after they are earned and removed from the credit balance after they expire. Credits expire 24 hours after they are earned. CPU credit metrics are available only at a 5 minute frequency. Units: Count
  • CPUUtilization: The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance. To use the percentiles statistic, you must enable detailed monitoring. Depending on the instance type, tools in your operating system can show a lower percentage than CloudWatch when the instance is not allocated a full processor core. Units: Percent
  • DiskReadOps: Completed read operations from all instance store volumes available to the instance in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period. Units: Count
  • DiskWriteOps: Completed write operations to all instance store volumes available to the instance in a specified period of time. To calculate the average I/O operations per second (IOPS) for the period, divide the total operations in the period by the number of seconds in that period. Units: Count
  • DiskReadBytes: Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. The number reported is the number of bytes received during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60. Units: Bytes
  • DiskWriteBytes: Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. The number reported is the number of bytes received during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60. Units: Bytes
  • NetworkIn: The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance. The number reported is the number of bytes received during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60. Units: Bytes
  • NetworkOut: The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance. The number reported is the number of bytes received during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60. Units: Bytes
  • NetworkPacketsIn: The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. Units: Count. Statistics: Minimum, Maximum, Average
  • NetworkPacketsOut: The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. Units: Count. Statistics: Minimum, Maximum, Average.
  • MemoryUtilization: The --mem-util option collects and send MemoryUtilization metrics in percentages. This option reports only memory allocated by applications and the operating system, and excludes memory in cache and buffers.
  • MemoryUsed: The --mem-used option collects and sends the MemoryUsed metrics, reported in megabytes. This option reports only memory allocated by applications and the operating system, and excludes memory in cache and buffers.
  • MemoryAvailable: The --mem-avail option collects and sends the MemoryAvailable metrics, reported in megabytes. This option reports memory available for use by application and the operating system.
  • SwapUtilization: The --swap-util option collects and sends SwapUtilization metrics, reported in percentages.
  • SwapUsed: The -swap-used option collects and sends SwapUsed metrics, reported in megabytes.

What are the status check metrics?

Status check metrics are available at a 1 minute frequency. For a newly-launched instance, status check metric data is only available after the instance has completed the initialization state (within a few minutes of the instance entering the running state).

  • StatusCheckFailed: Reports whether the instance has passed both the instance status check and the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed). Units: Count
  • StatusCheckFailed_Instance: Reports whether the instance has passed the instance status check in the last minute. This metric can be either 0 (passed) or 1 (failed). Units: Count
  • StatusCheckFailed_System: Reports whether the instance has passed the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed). Units: Count

What are the dimensions?

You can use the following dimensions to refine the metrics returned for your instances.

  • AutoScalingGroupName: This dimension filters the data you request for all instances in a specified capacity group. An Auto Scaling group is a collection of instances you define if you're using Auto Scaling. This dimension is available only for Amazon EC2 metrics when the instances are in such an Auto Scaling group. Available for instances with Detailed or Basic Monitoring enabled.
  • ImageId: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI). Available for instances with Detailed Monitoring enabled.
  • InstanceId: This dimension filters the data you request for the identified instance only. This helps you pinpoint an exact instance from which to monitor data.
  • InstanceType: This dimension filters the data you request for all instances running with this specified instance type. This helps you categorize your data by the type of instance running. For example, you might compare data from an m1.small instance and an m1.large instance to determine which has the better business value for your application. Available for instances with Detailed Monitoring enabled.

How can we view available metrics by category?

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the navigation pane, choose Metrics.
  3. Select the EC2 metric namespace.
  4. Select a metric dimension (for example, Per-Instance Metrics).
  5. To sort the metrics, use the column heading. To graph a metric, select the check box next to the metric. To filter by resource, choose the resource ID and then choose Add to search. To filter by metric, choose the metric name and then choose Add to search.

How does AWS CloudWatch make the aggregation?

Statistics are metric data aggregations over specified periods of time. CloudWatch provides statistics based on the metric data points provided by your custom data or provided by other services in AWS to CloudWatch. Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the time period you specify. The following table describes the available statistics.

  1. Minimum: The lowest value observed during the specified period. You can use this value to determine low volumes of activity for your application.
  2. Maximum: The highest value observed during the specified period. You can use this value to determine high volumes of activity for your application.
  3. Sum: All values submitted for the matching metric added together. This statistic can be useful for determining the total volume of a metric.
  4. Average: The value of Sum / SampleCount during the specified period. By comparing this statistic with the Minimum and Maximum, you can determine the full scope of a metric and how close the average use is to the Minimum and Maximum. This comparison helps you to know when to increase or decrease your resources as needed.
  5. SampleCount: The count (number) of data points used for the statistical calculation.
  6. pNN.NN: The value of the specified percentile. You can specify any percentile, using up to two decimal places (for example, p95.45).

How can we get statistics for a specific instance using the console?

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the navigation pane, choose Metrics.
  3. Select the EC2 metric namespace.
  4. Select the Per-Instance Metrics dimension.
  5. In the search field, type CPUUtilization and press Enter. Select the row for the specific instance, which displays a graph for the CPUUtilization metric for the instance. To name the graph, choose the pencil icon. To change the time range, select one of the predefined values or choose custom.
  6. To change the statistic or the period for the metric, choose the Graphed metrics tab. Choose the column heading or an individual value, and then choose a different value.

How can we get statistics for a specific instance using AWS CLI?

aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization  --period 3600 \
--statistics Maximum --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time 2016-10-18T23:18:00 --end-time 2016-10-19T23:18:00

How can we aggregate statistics across instances?

Aggregate statistics are available for the instances that have detailed monitoring enabled. Instances that use basic monitoring are not included in the aggregates. In addition, Amazon CloudWatch does not aggregate data across regions. Therefore, metrics are completely separate between regions. Before you can get statistics aggregated across instances, you must enable detailed monitoring (at an additional charge), which provides data in 1-minute periods.

This example shows you how to use detailed monitoring to get the average CPU usage for your EC2 instances. Because no dimension is specified, CloudWatch returns statistics for all dimensions in the AWS/EC2 namespace.

This technique for retrieving all dimensions across an AWS namespace does not work for custom namespaces that you publish to Amazon CloudWatch. With custom namespaces, you must specify the complete set of dimensions that are associated with any given data point to retrieve statistics that include the data point.

To display average CPU utilization across your instances:

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the navigation pane, choose Metrics.
  3. Select the EC2 namespace and then select Across All Instances.
  4. Select the row that contains CPUUtilization, which displays a graph for the metric for all your EC2 instances. To name the graph, choose the pencil icon. To change the time range, select one of the predefined values or choose custom.
  5. To change the statistic or the period for the metric, choose the Graphed metrics tab. Choose the column heading or an individual value, and then choose a different value.

How can we create alarms that stop, terminate, reboot, or recover an instance?

Using Amazon CloudWatch alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your instances. Every alarm action you create uses alarm action ARNs. One set of ARNs is more secure because it requires you to have the EC2ActionsAccess IAM role in your account. This IAM role enables you to perform stop, terminate, or reboot actions—previously you could not execute an action if you were using an IAM role. Existing alarms that use the previous alarm action ARNs do not require this IAM role, however it is recommended that you change the ARN and add the role when you edit an existing alarm that uses these ARNs.

The EC2ActionsAccess role enables AWS to perform alarm actions on your behalf. When you create an alarm action for the first time using the Amazon EC2 or Amazon CloudWatch consoles, AWS automatically creates this role for you.

You can add the stop, terminate, reboot, or recover actions to any alarm that is set on an Amazon EC2 per-instance metric, including basic and detailed monitoring metrics provided by Amazon CloudWatch (in the AWS/EC2 namespace), as well as any custom metrics that include the InstanceId dimension, as long as its value refers to a valid running Amazon EC2 instance.

If you are an AWS Identity and Access Management (IAM) user, you must have the following permissions to create or modify an alarm:

  1. ec2:DescribeInstanceStatus and ec2:DescribeInstances — For all alarms on Amazon EC2 instance status metrics
  2. ec2:StopInstances — For alarms with stop actions
  3. ec2:TerminateInstances — For alarms with terminate actions
  4. ec2:DescribeInstanceRecoveryAttribute, and ec2:RecoverInstances — For alarms with recover actions

If you have read/write permissions for Amazon CloudWatch but not for Amazon EC2, you can still create an alarm but the stop or terminate actions won't be performed on the Amazon EC2 instance. However, if you are later granted permission to use the associated Amazon EC2 APIs, the alarm actions you created earlier will be performed. For more information about IAM permissions, see Permissions and Policies in the IAM User Guide.

If you want to use an IAM role to stop, terminate, or reboot an instance using an alarm action, you can only use the EC2ActionsAccess role. Other IAM roles are not supported. If you are using another IAM role, you cannot stop, terminate, or reboot the instance. However, you can still see the alarm state and perform any other actions such as Amazon SNS notifications or Auto Scaling policies.

You can create an alarm that stops an Amazon EC2 instance when a certain threshold has been met. For example, you may run development or test instances and occasionally forget to shut them off. You can create an alarm that is triggered when the average CPU utilization percentage has been lower than 10 percent for 24 hours, signaling that it is idle and no longer in use. You can adjust the threshold, duration, and period to suit your needs, plus you can add an Amazon Simple Notification Service (Amazon SNS) notification, so that you will receive an email when the alarm is triggered.

Instances that use an Amazon EBS volume as the root device can be stopped or terminated, whereas instances that use the instance store as the root device can only be terminated.

See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html

How can we View the History of Triggered Alarms and Actions using Amazon CloudWatch Console?

You can view alarm and action history in the Amazon CloudWatch console. Amazon CloudWatch keeps the last two weeks' worth of alarm and action history. To view the history of triggered alarms and actions:

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. In the navigation pane, choose Alarms.
  3. Select an alarm.
  4. The Details tab shows the most recent state transition along with the time and metric values.
  5. Choose the History tab to view the most recent history entries.

How can we view our custom metrics?

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. Choose View Metrics
  3. For viewing, our custom metrics are available under System/Linux

How can we troubleshoot issue related to CloudWatchClient.pm?

The CloudWatchClient.pm module caches instance metadata locally. If we create an AMI from an instance where we have run the monitoring scripts, any instances launched from the AMI within the cache TTL (default: six hours for Auto Scaling groups) emit metrics using the instance ID of the original instance. After the cache TTL time period passes, the script retrieves fresh data and the monitoring scripts use the instance ID of the current instance. To immediately correct this, remove the cached data using the following command:

rm /var/tmp/aws-mon/instance-id
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License