Power to People: Monitoring Power Consumption of Low Power Nodes
- June 22, 2018 by
- Joe Stubbs
Summary
Did you ever wonder how much power was consumed by executing a program? The Chameleon team recently implemented a feature that automatically collects power usage data on all low power nodes in the system. Instantaneous power usage data (in watts) are collected through the IPMI interface on the chassis controller for the nodes. This “out-of-band” approach does not consume additional power on the node itself and runs even when the node is powered off. Low power nodes for which power usage data are now being collected include all Intel Atoms, low power Xeons, and ARM64s. In this blog post we look at this new feature and use it to calculate the power consumption associated with multiplying two large Python numpy arrays.
Pre-requisites
To illustrate the power monitoring feature, we assume we have reserved a low-power Atom node and installed the CLI and metrics plugins (for instructions on how to reserve a node, see Reservations; for installation instructions, see the Metrics ddocumentation.
Getting Power Data for an Instance
To retrieve power data for a specific low power node, nothing needs to be configured on the node: all we need is the node’s UUID and the Chameleon command line interface (CLI). Once you have an instance reserved, retrieve the corresponding node UUID (not the instance UUID) by selecting your lease from the Lease tab on CHI@TACC:
https://chi.tacc.chameleoncloud.org/dashboard/project/leases/
The node ID is displayed at the bottom of the Lease Details page under “Nodes”
From the screenshot above, the node ID for my node can be seen to be 05dd5e25-440f-4492-b3b8-9d39af83b8bc
Once we have identified the UUID of the node, we can start retrieving power consumption data using the CLI. The basic command is:
$ openstack metric measures show power --resource-id=<node_uuid> --refresh
*Note: in an effort to preserve computing resources, the metrics API does not always make the most recent metrics available in its responses. In order to ensure the show power command retrieves the very latest data, pass the --refresh command. We recommend always passing the --refresh option, at least the first time a query is issued. In a future release, we plan to make all metrics data immediately available which will make the --refresh option unnecessary.
For general information on using the OpenStack CLI to collect metrics, see the documentation.
Let’s look at power consumption right before our reservation started.
$ openstack metric measures show power --start 2018-05-14T14:35:00 --stop 2018-05-14T14:55:00 --resource-id=$uuid --refresh"
+---------------------------+----------------+---------------------------------+
| timestamp | granularity | value |
+---------------------------+----------------+---------------------------------+
| 2018-05-14T14:00:00-05:00 | 3600.0 | 3.04133333333 |
| 2018-05-14T14:38:00-05:00 | 60.0 | 3.075 |
| 2018-05-14T14:40:00-05:00 | 60.0 | 2.988 |
| 2018-05-14T14:38:29-05:00 | 1.0 | 3.075 |
| 2018-05-14T14:40:49-05:00 | 1.0 | 2.988 |
+---------------------------+-------------+------------------------------------+
We see the power consumption was right at 3W even though the node was not in use. That’s actually expected because the node uses some power even when the CPU is not running.
What power consumption was required to boot the node? Let’s retrieve the power consumption data points for a 45 minute interval, starting about 15 minutes before we launched the instance.
$ openstack metric measures show power --start 2018-05-14T14:30:00-00:00 --stop 2018-05-14T15:20:00-00:00 --resource-id=$uuid --refresh
+---------------------------+-------------+--------------------+
| timestamp | granularity | value |
+---------------------------+-------------+--------------------+
| 2018-05-14T14:30:00-05:00 | 60.0 | 3.075 |
| 2018-05-14T14:32:00-05:00 | 60.0 | 3.0315 |
| 2018-05-14T14:38:00-05:00 | 60.0 | 3.075 |
| 2018-05-14T14:40:00-05:00 | 60.0 | 2.988 |
| 2018-05-14T14:42:00-05:00 | 60.0 | 3.075 |
| 2018-05-14T14:43:00-05:00 | 60.0 | 3.074 |
| 2018-05-14T14:44:00-05:00 | 60.0 | 2.914 |
| 2018-05-14T14:45:00-05:00 | 60.0 | 2.84 |
| 2018-05-14T14:46:00-05:00 | 60.0 | 8.7165 |
| 2018-05-14T14:47:00-05:00 | 60.0 | 8.752 |
| 2018-05-14T14:48:00-05:00 | 60.0 | 8.839 |
| 2018-05-14T14:53:00-05:00 | 60.0 | 12.979 |
| 2018-05-14T14:54:00-05:00 | 60.0 | 7.723 |
| 2018-05-14T14:57:00-05:00 | 60.0 | 12.52 |
| 2018-05-14T14:58:00-05:00 | 60.0 | 13.562 |
| 2018-05-14T14:59:00-05:00 | 60.0 | 9.062 |
| 2018-05-14T15:00:00-05:00 | 60.0 | 9.372 |
| 2018-05-14T15:01:00-05:00 | 60.0 | 8.752 |
| 2018-05-14T15:02:00-05:00 | 60.0 | 8.913 |
| 2018-05-14T15:03:00-05:00 | 60.0 | 8.9065 |
| 2018-05-14T15:04:00-05:00 | 60.0 | 8.603 |
| 2018-05-14T15:08:00-05:00 | 60.0 | 8.913 |
| 2018-05-14T15:10:00-05:00 | 60.0 | 8.677 |
| 2018-05-14T15:14:00-05:00 | 60.0 | 8.839 |
| 2018-05-14T15:15:00-05:00 | 60.0 | 8.913 |
| 2018-05-14T15:17:00-05:00 | 60.0 | 8.8795 |
| 2018-05-14T15:18:00-05:00 | 60.0 | 9.065 |
+---------------------------+-------------+--------------------+
The following plot shows the power consumption as a function of time. We see power consumption ramping up while the node is powering on until it reaches a steady state of about 9W.
Finally, let us SSH to the node and run some computation. We’ll use Python’s numpy library to multiply two large matrices. While it is running, we will check resource consumption, and after it is done, we will compare that with power consumption.
Here is a basic Python function to multiply two numpy arrays of a fixed size.
In [1]: import numpy as np
In [2]: def f(std_dev, size):
...: A = np.random.normal(0, std_dev, (size, size))
...: B = np.random.normal(0, std_dev, (size, size))
...: C = np.dot(A, B)
...: return C[0]
In [3]: for i in range(20):
...: f(100, 8138)
After running the code above, let's retrieve and plot the associated power consumption data.
$ openstack metric measures show power --start 2018-05-14T18:04:00 --stop 2018-05-14T18:33:00 --resource-id=$uuid --refresh
+---------------------------+-------------+-------------------+
| timestamp | granularity | value |
+---------------------------+-------------+-------------------+
| 2018-05-14T18:04:00-05:00 | 60.0 | 8.952 |
| 2018-05-14T18:04:10-05:00 | 1.0 | 8.842 |
| 2018-05-14T18:04:49-05:00 | 1.0 | 9.062 |
| 2018-05-14T18:07:00-05:00 | 60.0 | 9.607 |
| 2018-05-14T18:07:50-05:00 | 1.0 | 9.607 |
| 2018-05-14T18:09:00-05:00 | 60.0 | 12.062 |
| 2018-05-14T18:09:29-05:00 | 1.0 | 12.062 |
| 2018-05-14T18:12:00-05:00 | 60.0 | 25.8905 |
| 2018-05-14T18:12:29-05:00 | 1.0 | 25.748 |
| 2018-05-14T18:12:50-05:00 | 1.0 | 26.033 |
| 2018-05-14T18:14:00-05:00 | 60.0 | 25.822 |
| 2018-05-14T18:14:50-05:00 | 1.0 | 25.822 |
| 2018-05-14T18:17:00-05:00 | 60.0 | 28.115 |
| 2018-05-14T18:17:10-05:00 | 1.0 | 28.115 |
| 2018-05-14T18:20:00-05:00 | 60.0 | 27.5486666667 |
| 2018-05-14T18:20:10-05:00 | 1.0 | 28.189 |
| 2018-05-14T18:20:29-05:00 | 1.0 | 28.189 |
| 2018-05-14T18:20:49-05:00 | 1.0 | 26.268 |
| 2018-05-14T18:24:00-05:00 | 60.0 | 24.658 |
| 2018-05-14T18:25:00-05:00 | 60.0 | 27.954 |
| 2018-05-14T18:26:00-05:00 | 60.0 | 28.115 |
| 2018-05-14T18:28:00-05:00 | 60.0 | 26.9995 |
| 2018-05-14T18:24:29-05:00 | 1.0 | 24.658 |
| 2018-05-14T18:25:29-05:00 | 1.0 | 27.954 |
| 2018-05-14T18:26:10-05:00 | 1.0 | 28.115 |
| 2018-05-14T18:28:10-05:00 | 1.0 | 25.884 |
| 2018-05-14T18:28:29-05:00 | 1.0 | 28.115 |
| 2018-05-14T18:30:00-05:00 | 60.0 | 9.136 |
| 2018-05-14T18:30:29-05:00 | 1.0 | 9.136 |
+---------------------------+-------------+-------------------+
We can see that multiplying large matrices consumed between 25 and 27.5 W.
No comments