summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorHemant Kumar <hekumar@redhat.com>2017-04-04 18:31:47 -0400
committerHemant Kumar <hekumar@redhat.com>2017-04-27 15:42:46 -0400
commitb0ddb73377ad888e65a52f57cd0fcdb947363098 (patch)
treedea6d8ba4692f83ce5c2d0c0243c9e6c0a150b02
parent712dd6e91bb250f84ad8773a6154ffa5e996a713 (diff)
Amend cloudprovider metric proposal for new metric names
We are going to use same name for all metrics. But each api request being made will be added as a dimension. This has the benefit that - aggregating metrics is way easier without without resorting to wildcards and stuff.
-rw-r--r--contributors/design-proposals/cloudprovider-storage-metrics.md91
1 files changed, 71 insertions, 20 deletions
diff --git a/contributors/design-proposals/cloudprovider-storage-metrics.md b/contributors/design-proposals/cloudprovider-storage-metrics.md
index a0217b62..fbe5a394 100644
--- a/contributors/design-proposals/cloudprovider-storage-metrics.md
+++ b/contributors/design-proposals/cloudprovider-storage-metrics.md
@@ -49,37 +49,88 @@ Since we are interested in count(or rate) and latency percentile metrics of API
the external Cloud Provider - we will use [Histogram](https://prometheus.io/docs/practices/histograms/) type for
emitting these metrics.
-We will be using `HistogramVec` type so as we can attach dimensions at runtime. Whenever available
-`namespace` will reported as a dimension with the metric.
+We will be using `HistogramVec` type so as we can attach dimensions at runtime. All metrics will contain API action
+being taken as a dimension. The cloudprovider maintainer may choose to add additonal dimensions as needed. If a
+dimension is not available at point of emission sentinel value `<n/a>` should be emitted as a placeholder.
-### GCE Implementation
+We are also interested in counter of cloudprovider API errors. `NewCounterVec` type will be used for keeping
+track of API errors.
-For GCE we simply use `gensupport.RegisterHook()` to register a function which will be called
-when request is made and response returns.
+### GCE Implementation
To begin with we will start emitting following metrics for GCE. Because these metrics are of type
-`Summary` - both count and latency will be automatically calculated.
+`Histogram` - both count and latency will be automatically calculated.
+
+#### GCE Latency metrics
+
+All gce latency metrics will be named - `cloudprovider_gce_api_request_duration_seconds`. api request
+being made will be reported as dimensions.
+
+
+To begin we will start emitting following metrics:
+
+```
+cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
+cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
+```
-1. gce_instance_list
-2. gce_disk_insert
-3. gce_disk_delete
-4. gce_attach_disk
-5. gce_detach_disk
-6. gce_list_disk
+#### GCE API error metrics.
+
+All gce error metrics will be named `cloudprovider_gce_api_request_errors`. api request being made will be
+reported as a dimension.
+
+To begin with we expect to report following error metrics:
+
+```
+cloudprovider_gce_api_request_errors { request = "instance_list"}
+cloudprovider_gce_api_request_errors { request = "disk_insert"}
+cloudprovider_gce_api_request_errors { request = "disk_delete"}
+cloudprovider_gce_api_request_errors { request = "attach_disk"}
+cloudprovider_gce_api_request_errors { request = "detach_disk"}
+cloudprovider_gce_api_request_errors { request = "list_disk"}
+```
-A POC implementation can be found here - https://github.com/kubernetes/kubernetes/pull/40338/files
### AWS Implementation
For AWS currently we will use wrapper type `awsSdkEC2` to intercept all storage API calls and
emit metric datapoints. The reason we are not using approach used for `aws/log_handler` is - because AWS SDK doesn't uses Contexts and hence we can't pass custom information such as API call name or namespace to record with metrics.
+
+#### AWS Latency metrics
+
+All aws API metrics will be named - `cloudprovider_aws_api_request_duration_seconds`. `request` will be reported as dimensions.
+AWS maintainer may choose to add additional dimensions as needed.
+
To begin with we will start emitting following metrics for AWS:
-1. aws_attach_volume
-2. aws_create_tags
-3. aws_create_volume
-4. aws_delete_volume
-5. aws_describe_instance
-6. aws_describe_volume
-7. aws_detach_volume
+```
+cloudprovider_aws_api_request_duration_seconds { request = "attach_volume"}
+cloudprovider_aws_api_request_duration_seconds { request = "detach_volume"}
+cloudprovider_aws_api_request_duration_seconds { request = "create_tags"}
+cloudprovider_aws_api_request_duration_seconds { request = "create_volume"}
+cloudprovider_aws_api_request_duration_seconds { request = "delete_volume"}
+cloudprovider_aws_api_request_duration_seconds { request = "describe_instance"}
+cloudprovider_aws_api_request_duration_seconds { request = "describe_volume"}
+```
+
+#### AWS Error metrics
+
+All aws error metrics will be named `cloudprovider_aws_api_request_errors`. api request being made will be
+reported as a dimension.
+
+To begin with we expect to report following error metrics:
+
+```
+cloudprovider_aws_api_request_errors { request = "attach_volume"}
+cloudprovider_aws_api_request_errors { request = "detach_volume"}
+cloudprovider_aws_api_request_errors { request = "create_tags"}
+cloudprovider_aws_api_request_errors { request = "create_volume"}
+cloudprovider_aws_api_request_errors { request = "delete_volume"}
+cloudprovider_aws_api_request_errors { request = "describe_instance"}
+cloudprovider_aws_api_request_errors { request = "describe_volume"}
+```