docs: updated azure data explorer plugin documentation (#9816)

This commit is contained in:
Minni Walia 2021-10-05 21:51:45 +00:00 committed by GitHub
parent d2a25456d5
commit e0c45e4a76
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 65 additions and 45 deletions

View File

@ -1,10 +1,11 @@
# Azure Data Explorer output plugin
This plugin writes metrics collected by any of the input plugins of Telegraf to [Azure Data Explorer](https://azure.microsoft.com/en-au/services/data-explorer/).
This plugin writes data collected by any of the Telegraf input plugins to [Azure Data Explorer](https://azure.microsoft.com/en-au/services/data-explorer/).
Azure Data Explorer is a distributed, columnar store, purpose built for any type of logs, metrics and time series data.
## Pre-requisites:
- [Create Azure Data Explorer cluster and database](https://docs.microsoft.com/en-us/azure/data-explorer/create-cluster-database-portal)
- VM/compute or container to host Telegraf - it could be hosted locally where an app/services to be monitored are deployed or remotely on a dedicated monitoring compute/container.
- VM/compute or container to host Telegraf - it could be hosted locally where an app/service to be monitored is deployed or remotely on a dedicated monitoring compute/container.
## Configuration:
@ -21,7 +22,7 @@ This plugin writes metrics collected by any of the input plugins of Telegraf to
# database = ""
## Timeout for Azure Data Explorer operations
# timeout = "15s"
# timeout = "20s"
## Type of metrics grouping used when pushing to Azure Data Explorer.
## Default is "TablePerMetric" for one table per different metric.
@ -30,9 +31,6 @@ This plugin writes metrics collected by any of the input plugins of Telegraf to
## Name of the single table to store all the metrics (Only needed if metrics_grouping_type is "SingleTable").
# table_name = ""
# timeout = "20s"
```
## Metrics Grouping
@ -48,12 +46,12 @@ The table name will match the `name` property of the metric, this means that the
### SingleTable
The plugin will send all the metrics received to a single Azure Data Explorer table. The name of the table must be supplied via `table_name` the config file. If the table doesn't exist the plugin will create the table, if the table exists then the plugin will try to merge the Telegraf metric schema to the existing table. For more information about the merge process check the [`.create-merge` documentation](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/create-merge-table-command).
The plugin will send all the metrics received to a single Azure Data Explorer table. The name of the table must be supplied via `table_name` in the config file. If the table doesn't exist the plugin will create the table, if the table exists then the plugin will try to merge the Telegraf metric schema to the existing table. For more information about the merge process check the [`.create-merge` documentation](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/create-merge-table-command).
## Tables Schema
The schema of the Azure Data Explorer table will match the structure of the Telegraf `Metric` object. The corresponding Azure Data Explorer command would be like the following:
The schema of the Azure Data Explorer table will match the structure of the Telegraf `Metric` object. The corresponding Azure Data Explorer command generated by the plugin would be like the following:
```
.create-merge table ['table-name'] (['fields']:dynamic, ['name']:string, ['tags']:dynamic, ['timestamp']:datetime)
```
@ -63,7 +61,7 @@ The corresponding table mapping would be like the following:
.create-or-alter table ['table-name'] ingestion json mapping 'table-name_mapping' '[{"column":"fields", "Properties":{"Path":"$[\'fields\']"}},{"column":"name", "Properties":{"Path":"$[\'name\']"}},{"column":"tags", "Properties":{"Path":"$[\'tags\']"}},{"column":"timestamp", "Properties":{"Path":"$[\'timestamp\']"}}]'
```
**Note**: This plugin will automatically create Azure Data Explorer tables and corresponding table mapping as per the above mentioned commands. Since the `Metric` object is a complex type, the only output format supported is JSON.
**Note**: This plugin will automatically create Azure Data Explorer tables and corresponding table mapping as per the above mentioned commands.
## Authentiation
@ -95,7 +93,7 @@ The plugin will authenticate using the first available of the
following configurations, **it's important to understand that the assessment, and consequently choosing the authentication method, will happen in order as below**:
1. **Client Credentials**: Azure AD Application ID and Secret.
Set the following environment variables:
- `AZURE_TENANT_ID`: Specifies the Tenant to which to authenticate.
@ -126,50 +124,72 @@ following configurations, **it's important to understand that the assessment, an
[arm]: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview
## Querying collected metrics data in Azure Data Explorer
With all above configurations, you will have data stored in following standard format for each metric type stored as an Azure Data Explorer table -
ColumnName | ColumnType
---------- | ----------
fields | dynamic
name | string
tags | dynamic
timestamp | datetime
## Querying data collected in Azure Data Explorer
Examples of data transformations and queries that would be useful to gain insights -
1. **Data collected using SQL input plugin**
Sample SQL metrics data -
name | tags | timestamp | fields
-----|------|-----------|-------
sqlserver_database_io|{"database_name":"azure-sql-db2","file_type":"DATA","host":"adx-vm","logical_filename":"tempdev","measurement_db_type":"AzureSQLDB","physical_filename":"tempdb.mdf","replica_updateability":"READ_WRITE","sql_instance":"adx-sql-server"}|2021-09-09T13:51:20Z|{"current_size_mb":16,"database_id":2,"file_id":1,"read_bytes":2965504,"read_latency_ms":68,"reads":47,"rg_read_stall_ms":42,"rg_write_stall_ms":0,"space_used_mb":0,"write_bytes":1220608,"write_latency_ms":103,"writes":149}
sqlserver_waitstats|{"database_name":"azure-sql-db2","host":"adx-vm","measurement_db_type":"AzureSQLDB","replica_updateability":"READ_WRITE","sql_instance":"adx-sql-server","wait_category":"Worker Thread","wait_type":"THREADPOOL"}|2021-09-09T13:51:20Z|{"max_wait_time_ms":15,"resource_wait_ms":4469,"signal_wait_time_ms":0,"wait_time_ms":4469,"waiting_tasks_count":1464}
Since collected metrics object is of complex type so "fields" and "tags" are stored as dynamic data type, multiple ways to query this data-
- **Query JSON attributes directly**: Azure Data Explorer provides an ability to query JSON data in raw format without parsing it, so JSON attributes can be queried directly in following way -
```
Tablename
| where name == "sqlserver_azure_db_resource_stats" and todouble(fields.avg_cpu_percent) > 7
```
```
Tablename
| distinct tostring(tags.database_name)
```
**Note** - This approach could have performance impact in case of large volumes of data, use belwo mentioned approach for such cases.
- **Use [Update policy](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/updatepolicy)**: Transform dynamic data type columns using update policy. This is the recommended performant way for querying over large volumes of data compared to querying directly over JSON attributes.
As "fields" and "tags" are of dynamic data type so following multiple ways to query this data -
1. **Query JSON attributes directly**: This is one of the coolest feature of Azure Data Explorer so you can run query like this -
```
Tablename
| where fields.size_kb == 9120
```
2. **Use [Update policy](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/updatepolicy)**: to transform data, in this case, to flatten dynamic data type columns. This is the recommended performant way for querying over large data volumes compared to querying directly over JSON attributes.
```
// Function to transform data
.create-or-alter function Transform_TargetTableName() {
SourceTableName
| extend clerk_type = tags.clerk_type
| extend host = tags.host
SourceTableName
| mv-apply fields on (extend key = tostring(bag_keys(fields)[0]))
| project fieldname=key, value=todouble(fields[key]), name, tags, timestamp
}
// Create the destination table (if it doesn't exist already)
// Create destination table with above query's results schema (if it doesn't exist already)
.set-or-append TargetTableName <| Transform_TargetTableName() | limit 0
// Apply update policy on destination table
.alter table TargetTableName policy update
@'[{"IsEnabled": true, "Source": "SourceTableName", "Query": "Transform_TargetTableName()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
@'[{"IsEnabled": true, "Source": "SourceTableName", "Query": "Transform_TargetTableName()", "IsTransactional": true, "PropagateIngestionProperties": false}]'
```
There are two ways to flatten dynamic columns as explained below. You can use either of these ways in above mentioned update policy function - 'Transform_TargetTableName()'
- Use [bag_unpack plugin](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/bag-unpackplugin) to unpack the dynamic columns as shown below. This method will unpack all columns, it could lead to issues in case source schema changes.
```
Tablename
| evaluate bag_unpack(tags)
| evaluate bag_unpack(fields)
```
- Use [extend](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extendoperator) operator as shown below. This is the best way provided you know what columns are needed in the final destination table. Another benefit of this method is even if schema changes, it will not break your queries or dashboards.
```
Tablename
| extend clerk_type = tags.clerk_type
| extend host = tags.host
```
2. **Data collected using syslog input plugin**
Sample syslog data -
name | tags | timestamp | fields
-----|------|-----------|-------
syslog|{"appname":"azsecmond","facility":"user","host":"adx-linux-vm","hostname":"adx-linux-vm","severity":"info"}|2021-09-20T14:36:44Z|{"facility_code":1,"message":" 2021/09/20 14:36:44.890110 Failed to connect to mdsd: dial unix /var/run/mdsd/default_djson.socket: connect: no such file or directory","procid":"2184","severity_code":6,"timestamp":"1632148604890477000","version":1}
syslog|{"appname":"CRON","facility":"authpriv","host":"adx-linux-vm","hostname":"adx-linux-vm","severity":"info"}|2021-09-20T14:37:01Z|{"facility_code":10,"message":" pam_unix(cron:session): session opened for user root by (uid=0)","procid":"26446","severity_code":6,"timestamp":"1632148621120781000","version":1}
There are multiple ways to flatten dynamic columns using 'extend' or 'bag_unpack' operator. You can use either of these ways in above mentioned update policy function - 'Transform_TargetTableName()'
- Use [extend](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extendoperator) operator - This is the recommended approach compared to 'bag_unpack' as it is faster and robust. Even if schema changes, it will not break queries or dashboards.
```
Tablenmae
| extend facility_code=toint(fields.facility_code), message=tostring(fields.message), procid= tolong(fields.procid), severity_code=toint(fields.severity_code),
SysLogTimestamp=unixtime_nanoseconds_todatetime(tolong(fields.timestamp)), version= todouble(fields.version),
appname= tostring(tags.appname), facility= tostring(tags.facility),host= tostring(tags.host), hostname=tostring(tags.hostname), severity=tostring(tags.severity)
| project-away fields, tags
```
- Use [bag_unpack plugin](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/bag-unpackplugin) to unpack the dynamic type columns automatically. This method could lead to issues if source schema changes as its dynamically expanding columns.
```
Tablename
| evaluate bag_unpack(tags, columnsConflict='replace_source')
| evaluate bag_unpack(fields, columnsConflict='replace_source')
```