diff options
| author | Saad Ali <saadali@google.com> | 2017-05-05 10:38:04 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2017-05-05 10:38:04 -0700 |
| commit | efed6b2ec82e8647377683f4f35cd5d7b535b725 (patch) | |
| tree | 5ba7a9419a880594fc0e5450f5acfd8a7f7b05f1 | |
| parent | 09f3ae3b1ffbc6ef977b535eec518960b766e7cb (diff) | |
| parent | b0ddb73377ad888e65a52f57cd0fcdb947363098 (diff) | |
Merge pull request #507 from gnufied/cloudprovider-metrics-amends
Amend cloudprovider metric proposal for new metric names
| -rw-r--r-- | contributors/design-proposals/cloudprovider-storage-metrics.md | 91 |
1 files changed, 71 insertions, 20 deletions
diff --git a/contributors/design-proposals/cloudprovider-storage-metrics.md b/contributors/design-proposals/cloudprovider-storage-metrics.md index a0217b62..fbe5a394 100644 --- a/contributors/design-proposals/cloudprovider-storage-metrics.md +++ b/contributors/design-proposals/cloudprovider-storage-metrics.md @@ -49,37 +49,88 @@ Since we are interested in count(or rate) and latency percentile metrics of API the external Cloud Provider - we will use [Histogram](https://prometheus.io/docs/practices/histograms/) type for emitting these metrics. -We will be using `HistogramVec` type so as we can attach dimensions at runtime. Whenever available -`namespace` will reported as a dimension with the metric. +We will be using `HistogramVec` type so as we can attach dimensions at runtime. All metrics will contain API action +being taken as a dimension. The cloudprovider maintainer may choose to add additonal dimensions as needed. If a +dimension is not available at point of emission sentinel value `<n/a>` should be emitted as a placeholder. -### GCE Implementation +We are also interested in counter of cloudprovider API errors. `NewCounterVec` type will be used for keeping +track of API errors. -For GCE we simply use `gensupport.RegisterHook()` to register a function which will be called -when request is made and response returns. +### GCE Implementation To begin with we will start emitting following metrics for GCE. Because these metrics are of type -`Summary` - both count and latency will be automatically calculated. +`Histogram` - both count and latency will be automatically calculated. + +#### GCE Latency metrics + +All gce latency metrics will be named - `cloudprovider_gce_api_request_duration_seconds`. api request +being made will be reported as dimensions. + + +To begin we will start emitting following metrics: + +``` +cloudprovider_gce_api_request_duration_seconds { request = "instance_list"} +cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"} +cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"} +cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"} +cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"} +cloudprovider_gce_api_request_duration_seconds { request = "list_disk"} +``` -1. gce_instance_list -2. gce_disk_insert -3. gce_disk_delete -4. gce_attach_disk -5. gce_detach_disk -6. gce_list_disk +#### GCE API error metrics. + +All gce error metrics will be named `cloudprovider_gce_api_request_errors`. api request being made will be +reported as a dimension. + +To begin with we expect to report following error metrics: + +``` +cloudprovider_gce_api_request_errors { request = "instance_list"} +cloudprovider_gce_api_request_errors { request = "disk_insert"} +cloudprovider_gce_api_request_errors { request = "disk_delete"} +cloudprovider_gce_api_request_errors { request = "attach_disk"} +cloudprovider_gce_api_request_errors { request = "detach_disk"} +cloudprovider_gce_api_request_errors { request = "list_disk"} +``` -A POC implementation can be found here - https://github.com/kubernetes/kubernetes/pull/40338/files ### AWS Implementation For AWS currently we will use wrapper type `awsSdkEC2` to intercept all storage API calls and emit metric datapoints. The reason we are not using approach used for `aws/log_handler` is - because AWS SDK doesn't uses Contexts and hence we can't pass custom information such as API call name or namespace to record with metrics. + +#### AWS Latency metrics + +All aws API metrics will be named - `cloudprovider_aws_api_request_duration_seconds`. `request` will be reported as dimensions. +AWS maintainer may choose to add additional dimensions as needed. + To begin with we will start emitting following metrics for AWS: -1. aws_attach_volume -2. aws_create_tags -3. aws_create_volume -4. aws_delete_volume -5. aws_describe_instance -6. aws_describe_volume -7. aws_detach_volume +``` +cloudprovider_aws_api_request_duration_seconds { request = "attach_volume"} +cloudprovider_aws_api_request_duration_seconds { request = "detach_volume"} +cloudprovider_aws_api_request_duration_seconds { request = "create_tags"} +cloudprovider_aws_api_request_duration_seconds { request = "create_volume"} +cloudprovider_aws_api_request_duration_seconds { request = "delete_volume"} +cloudprovider_aws_api_request_duration_seconds { request = "describe_instance"} +cloudprovider_aws_api_request_duration_seconds { request = "describe_volume"} +``` + +#### AWS Error metrics + +All aws error metrics will be named `cloudprovider_aws_api_request_errors`. api request being made will be +reported as a dimension. + +To begin with we expect to report following error metrics: + +``` +cloudprovider_aws_api_request_errors { request = "attach_volume"} +cloudprovider_aws_api_request_errors { request = "detach_volume"} +cloudprovider_aws_api_request_errors { request = "create_tags"} +cloudprovider_aws_api_request_errors { request = "create_volume"} +cloudprovider_aws_api_request_errors { request = "delete_volume"} +cloudprovider_aws_api_request_errors { request = "describe_instance"} +cloudprovider_aws_api_request_errors { request = "describe_volume"} +``` |
