History

Minni Walia f57ffa2a9b Add new output plugin for Azure Data Explorer(ADX) (#9426 )		2021-07-13 17:25:24 -04:00
..
README.md	Add new output plugin for Azure Data Explorer(ADX) (#9426 )	2021-07-13 17:25:24 -04:00
azure_data_explorer.go	Add new output plugin for Azure Data Explorer(ADX) (#9426 )	2021-07-13 17:25:24 -04:00
azure_data_explorer_test.go	Add new output plugin for Azure Data Explorer(ADX) (#9426 )	2021-07-13 17:25:24 -04:00

README.md

Azure Data Explorer output plugin

This plugin writes metrics collected by any of the input plugins of Telegraf to Azure Data Explorer.

Pre-requisites:

Create Azure Data Explorer cluster and database
VM/compute or container to host Telegraf - it could be hosted locally where an app/services to be monitored are deployed or remotely on a dedicated monitoring compute/container.

Configuration:

[[outputs.azure_data_explorer]]
  ## The URI property of the Azure Data Explorer resource on Azure
  ## ex: https://myadxresource.australiasoutheast.kusto.windows.net
  # endpoint_url = ""

  ## The Azure Data Explorer database that the metrics will be ingested into.
  ## The plugin will NOT generate this database automatically, it's expected that this database already exists before ingestion.
  ## ex: "exampledatabase"
  # database = ""

  ## Timeout for Azure Data Explorer operations
  # timeout = "15s"
  
  ## Type of metrics grouping used when pushing to Azure Data Explorer. 
  ## Default is "TablePerMetric" for one table per different metric. 
  ## For more information, please check the plugin README.
  # metrics_grouping_type = "TablePerMetric"
  
  ## Name of the single table to store all the metrics (Only needed if metrics_grouping_type is "SingleTable").
  # table_name = ""

  # timeout = "20s"

Metrics Grouping

Metrics can be grouped in two ways to be sent to Azure Data Explorer. To specify which metric grouping type the plugin should use, the respective value should be given to the metrics_grouping_type in the config file. If no value is given to metrics_grouping_type, by default, the metrics will be grouped using TablePerMetric.

TablePerMetric

The plugin will group the metrics by the metric name, and will send each group of metrics to an Azure Data Explorer table. If the table doesn't exist the plugin will create the table, if the table exists then the plugin will try to merge the Telegraf metric schema to the existing table. For more information about the merge process check the .create-merge documentation.

The table name will match the name property of the metric, this means that the name of the metric should comply with the Azure Data Explorer table naming constraints in case you plan to add a prefix to the metric name.

SingleTable

The plugin will send all the metrics received to a single Azure Data Explorer table. The name of the table must be supplied via table_name the config file. If the table doesn't exist the plugin will create the table, if the table exists then the plugin will try to merge the Telegraf metric schema to the existing table. For more information about the merge process check the .create-merge documentation.

Tables Schema

The schema of the Azure Data Explorer table will match the structure of the Telegraf Metric object. The corresponding Azure Data Explorer command would be like the following:

.create-merge table ['table-name']  (['fields']:dynamic, ['name']:string, ['tags']:dynamic, ['timestamp']:datetime)

The corresponding table mapping would be like the following:

.create-or-alter table ['table-name'] ingestion json mapping 'table-name_mapping' '[{"column":"fields", "Properties":{"Path":"$[\'fields\']"}},{"column":"name", "Properties":{"Path":"$[\'name\']"}},{"column":"tags", "Properties":{"Path":"$[\'tags\']"}},{"column":"timestamp", "Properties":{"Path":"$[\'timestamp\']"}}]'

Note: This plugin will automatically create Azure Data Explorer tables and corresponding table mapping as per the above mentioned commands. Since the Metric object is a complex type, the only output format supported is JSON.

Authentiation

Supported Authentication Methods

This plugin provides several types of authentication. The plugin will check the existence of several specific environment variables, and consequently will choose the right method.

These methods are:

AAD Application Tokens (Service Principals with secrets or certificates).

For guidance on how to create and register an App in Azure Active Directory check this article, and for more information on the Service Principals check this article.
AAD User Tokens
- Allows Telegraf to authenticate like a user. This method is mainly used for development purposes only.
Managed Service Identity (MSI) token
- If you are running Telegraf from Azure VM or infrastructure, then this is the prefered authentication method.

Whichever method, the designated Principal needs to be assigned the Database User role on the Database level in the Azure Data Explorer. This role will allow the plugin to create the required tables and ingest data into it.

Configurations of the chosen Authentication Method

The plugin will authenticate using the first available of the following configurations, it's important to understand that the assessment, and consequently choosing the authentication method, will happen in order as below:

Client Credentials: Azure AD Application ID and Secret.

Set the following environment variables:
- AZURE_TENANT_ID: Specifies the Tenant to which to authenticate.
- AZURE_CLIENT_ID: Specifies the app client ID to use.
- AZURE_CLIENT_SECRET: Specifies the app secret to use.
Client Certificate: Azure AD Application ID and X.509 Certificate.
- AZURE_TENANT_ID: Specifies the Tenant to which to authenticate.
- AZURE_CLIENT_ID: Specifies the app client ID to use.
- AZURE_CERTIFICATE_PATH: Specifies the certificate Path to use.
- AZURE_CERTIFICATE_PASSWORD: Specifies the certificate password to use.
Resource Owner Password: Azure AD User and Password. This grant type is not recommended, use device login instead if you need interactive login.
- AZURE_TENANT_ID: Specifies the Tenant to which to authenticate.
- AZURE_CLIENT_ID: Specifies the app client ID to use.
- AZURE_USERNAME: Specifies the username to use.
- AZURE_PASSWORD: Specifies the password to use.
Azure Managed Service Identity: Delegate credential management to the platform. Requires that code is running in Azure, e.g. on a VM. All configuration is handled by Azure. See Azure Managed Service Identity for more details. Only available when using the Azure Resource Manager.

Querying collected metrics data in Azure Data Explorer

With all above configurations, you will have data stored in following standard format for each metric type stored as an Azure Data Explorer table -

ColumnName	ColumnType
fields	dynamic
name	string
tags	dynamic
timestamp	datetime

As "fields" and "tags" are of dynamic data type so following multiple ways to query this data -

Query JSON attributes directly: This is one of the coolest feature of Azure Data Explorer so you can run query like this -
```
Tablename
| where fields.size_kb == 9120
```
Use Update policy: to transform data, in this case, to flatten dynamic data type columns. This is the recommended performant way for querying over large data volumes compared to querying directly over JSON attributes.
```
// Function to transform data
.create-or-alter function Transform_TargetTableName() {
       SourceTableName 
       | extend clerk_type = tags.clerk_type
       | extend host = tags.host
} 

// Create the destination table (if it doesn't exist already)
.set-or-append TargetTableName <| Transform_TargetTableName() | limit 0

// Apply update policy on destination table
.alter table TargetTableName policy update
@'[{"IsEnabled": true, "Source": "SourceTableName", "Query": "Transform_TargetTableName()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
```
There are two ways to flatten dynamic columns as explained below. You can use either of these ways in above mentioned update policy function - 'Transform_TargetTableName()'
- Use bag_unpack plugin to unpack the dynamic columns as shown below. This method will unpack all columns, it could lead to issues in case source schema changes.
```
Tablename
| evaluate bag_unpack(tags)
| evaluate bag_unpack(fields)
```
- Use extend operator as shown below. This is the best way provided you know what columns are needed in the final destination table. Another benefit of this method is even if schema changes, it will not break your queries or dashboards.
```
Tablename
| extend clerk_type = tags.clerk_type
| extend host = tags.host
```