我正在考虑设置一个开放遥测管道,该管道将使用开放遥测收集器将指标数据路由到不同的后端。我正在考虑允许代码中的 datadog 仪表使用 statd 接收器和 datadog 及文件导出将指标数据发送到开放遥测部署。这是我的测试配置:
receivers:
statsd/2:
endpoint: "localhost:9125"
aggregation_interval: 10s
exporters:
datadog:
api:
site: datadoghq.com
key: ${env:DD_API_KEY}
file/no_rotation:
path: /tmp/otel_exporter
service:
pipelines:
metrics:
receivers: [statsd/2]
processors: []
exporters: [datadog,file/no_rotation]
为了测试目的,我并行运行 datadog 代理和 otel statsd 收集器(在不同的端口上),并并行向它们发送数据,如下所示:
watch -n 1 'echo "test.metricA:1|c" | nc -w 1 -u -c localhost 9125'
watch -n 1 'echo "test.metricB:1|c" | nc -w 1 -u -c localhost 8125'
因此两者每秒都会收到一次 ping,值为 1。考虑到我的聚合块为 10 秒,我应该在每个块上获得大约 10 的计数。
然而,由于某种原因,statsd 接收器似乎不想正确地聚合这些,当我查看文件导出器时,我看到多个值以相同的时间戳存储,计数为 1。
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "1"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
这似乎是以这种形式发送给 datadog 并导致无效计数:
我不知道具体发生了什么,但 datadog 只打印了 10 个值的时间块中的一个值。我推测 datadog 中的某种重复数据删除会删除其他值,或者可能只发送了一个值?无论哪种方式,由于我们有一个 10 秒的聚合块,我应该看到所有这 10 个指标合并为一个,并发送给 datadog 或文件导出器,如下所示:
{
"resourceMetrics": [
{
"resource": {},
"scopeMetrics": [
{
"scope": {
"name": "otelcol/statsdreceiver",
"version": "0.85.0"
},
"metrics": [
{
"name": "test.metricA",
"sum": {
"dataPoints": [
{
"startTimeUnixNano": "1696956608876128922",
"timeUnixNano": "1696956618875880086",
"asInt": "10"
}
],
"aggregationTemporality": 1
}
}
]
}
]
}
]
}
我做错了什么,为什么它不起作用? 在我看来,statsd 接收器不可能在 otel 中损坏。