Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #14829 to 7.x: [Metricbeat] Add Google Cloud Platform module #15571

Closed
wants to merge 1 commit into from

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Jan 15, 2020

Cherry-pick of PR #14829 to 7.x branch. Original message:

ONGOING work on docs bust most code is ready to go.

Seed PR for the Google Cloud Platform module for Metricbeat.

It includes the following:

  • Stackdriver metricset
  • Compute metricset based on Stackdriver as config based module

Ignore the following Metricsets which are already included in the PR for testing purposes but they are not going to be merged yet (they'll be removed before merging):

  • Storage
  • Firebase
  • Firestore
  • Loadbalancing
  • PubSub

Some vocabulary for people new to Google Cloud

You can find some translations for GCP services in AWS:

  • Stackdriver -> Cloudwatch
  • Compute -> EC2
  • PubSub -> SQS
  • Storage (GCS) -> S3
  • Firebase / Firestore -> ~DynamoDB
  • Bigquery -> ~Redshift+Athena

Labels / Metadata

You'll see lots of mentions to Metadata inside the code. This refers to two different entities within GCP: labels and metadata. For Elasticsearch purposes both can be considered metadata so whenever you read "label" or "metadata" it's going to be treated as the same thing at the end of the pipeline.

Grouping of events

The way that GCP labels metrics is somehow complex to generate "service based events". They export their metrics individually so you don't request "compute metrics" or "metrics of this compute instance" but instead you have to request "give all cpu_utilization values of compute instances" so a single response will bring one or more values per instance for a specified timeframe for all your instances. That's a single response.

For example, a request for CPU utilization can return (in pseudocode):

{
	"metadata": {
		"zone": "eu-central-1",
		"project": "project1"
    },
    "metric": "cpu_utilization",
	"points": [
		{
			"time": 1,
			"value": 2,
			"metadata": {
				"instance": "instance-1"
			}
		},
		{
			"time": 2,
			"value": 2,
			"metadata": {
				"instance": "instance-1"
			}
		}
	]
}

Then, a new call must be done to (in this example it will be Compute API) to request Instance metadata (like working group, network group, user labels or user metadata which is associated only to the instance and not to a particular metrics like CPU). Then you get data like this (again, in pseudocode)

{
    "instance":"instance-1",
    "metadata":{
        "user":{
            "key":"value"
        },
        "system":{
            "key":"value"
        }
    },
    ...
}

At the end, both response for that particular metric must be grouped into a single event that share some common metadata. For compute this includes instance_id and availability zone apart from timestamp. Each service requires an specifici implementation to get non-stackdriver metadata. The service metadata implementation is only developed for Compute at the moment and can be seen in googlecloud/stackdriver/compute, the rest of the services uses only metadata provided by Stackdriver.

ECS

Metadata returned from Stackdriver is ECS compliant for Compute metadata (mainly availability zone, account id and cloud provider, instance id and instance name). Some of the metadata might be written out of the ECS fields. More deployment configurations plus testing is needed find them all.

Modules

All services from https://cloud.google.com/monitoring/api/metrics_gcp can be added as more configuration. Tests until now shows no problem but their specific metadata must be developed separatedly for each of them.

Limitations

You cannot set period under 300s (you can right now, but it won't return any metric). I think it's some kind of limitation of Stackdriver because their metrics are sampled each 60 to 300 seconds.

Happy reviewing :)

Sorry for the big PR, it was impossible to make it smaller

This PR introduces the support for Google Cloud Platform to Functionbeat. This branch is located in the `elastic/beats` repository, so anyone on our team has access to it.

### Manager

#### Authentication

To use the API to deploy, remove and update functions, users need to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS`. This variable should point to a JSON file which contains all the relevant information for Google to authenticate.

(About authentication for GCP libs: https://cloud.google.com/docs/authentication/getting-started)

#### Required roles

* Cloud Functions Developer
* Cloud Functions Service Agent
* Service Account User
* Storage Admin
* Storage Object Admin

Note: Cloud Functions Developer role is in beta. We should not make GCP support GA, until it becomes stable.

#### Configuration

```yaml
# Configure functions to run on Google Cloud Platform, currently, we assume that the credentials
# are present in the environment to correctly create the function when using the CLI.
#
# Configure which region your project is located in.
functionbeat.provider.gcp.location_id: "europe-west1"
# Configure which Google Cloud project to deploy your functions.
functionbeat.provider.gcp.project_id: "my-project-123456"
# Configure the Google Cloud Storage we should upload the function artifact.
functionbeat.provider.gcp.storage_name: "functionbeat-deploy"

functionbeat.provider.gcp.functions:
```

#### Export

Function templates can be exported into YAML. With this YAML configuration, users can deploy the function using the [Google Cloud Deployment Manager](https://cloud.google.com/deployment-manager/).

### New functions

#### Google Pub/Sub

A function under the folder `pkg/pubsub` is available to get events from Google Pub/Sub.

##### Configuration

```yaml
  # Define the list of function availables, each function required to have a unique name.
  # Create a function that accepts events coming from Google Pub/Sub.
  - name: pubsub
    enabled: false
    type: pubsub

    # Description of the method to help identify them when you run multiples functions.
    description: "Google Cloud Function for Pub/Sub"

    # The maximum memory allocated for this function, the configured size must be a factor of 64.
    # Default is 256MiB.
    #memory_size: 256MiB

    # Execution timeout in seconds. If the function does not finish in time,
    # it is considered failed and terminated. Default is 60s.
    #timeout: 60s

    # Email of the service account of the function. Defaults to {projectid}@appspot.gserviceaccount.com
    #service_account_email: {projectid}@appspot.gserviceaccount.com

    # Labels of the function.
    #labels:
    # mylabel: label

    # VPC Connector this function can connect to.
    # Format: projects/*/locations/*/connectors/* or fully-qualified URI
    #vpc_connector: ""

    # Number of maximum instances running at the same time. Default is unlimited.
    #maximum_instances: 0

    trigger:
      event_type: "providers/cloud.pubsub/eventTypes/topic.publish"
      resource: "projects/_/pubsub/myPubSub"
      #service: "pubsub.googleapis.com"

    # Optional fields that you can specify to add additional information to the
    # output. Fields can be scalar values, arrays, dictionaries, or any nested
    # combination of these.
    #fields:
    #  env: staging

    # Define custom processors for this function.
    #processors:
    #  - dissect:
    #      tokenizer: "%{key1} %{key2}"
```

#### Google Cloud Storage

A function under the folder pkg/storage is available to get events from Google Cloud Storage.

##### Configuration
```yaml
 # Create a function that accepts events coming from Google Cloud Storage.
 - name: storage
   enabled: false
   type: storage

   # Description of the method to help identify them when you run multiples functions.
   description: "Google Cloud Function for Cloud Storage"

   # The maximum memory allocated for this function, the configured size must be a factor of 64.
   # Default is 256MiB.
   #memory_size: 256MiB

   # Execution timeout in seconds. If the function does not finish in time,
   # it is considered failed and terminated. Default is 60s.
   #timeout: 60s

   # Email of the service account of the function. Defaults to {projectid}@appspot.gserviceaccount.com
   #service_account_email: {projectid}@appspot.gserviceaccount.com

   # Labels of the function.
   #labels:
   # mylabel: label

   # VPC Connector this function can connect to.
   # Format: projects/*/locations/*/connectors/* or fully-qualified URI
   #vpc_connector: ""

   # Number of maximum instances running at the same time. Default is unlimited.
   #maximum_instances: 0

   # Optional fields that you can specify to add additional information to the
   # output. Fields can be scalar values, arrays, dictionaries, or any nested
   # combination of these.
   #fields:
   #  env: staging

   # Define custom processors for this function.
   #processors:
   #  - dissect:
   #      tokenizer: "%{key1} %{key2}"
```

### Vendor
* `cloud.google.com/go/functions/metadata`
*  `cloud.google.com/go/storage`

(cherry picked from commit e8e18d0)

# Conflicts:
#	vendor/vendor.json
@sayden sayden closed this Jan 15, 2020
@sayden sayden deleted the backport_14829_7.x branch January 15, 2020 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants