我通过 BigQuery 提取了 GCP 计费数据。我进行了查询以检索特定时间段的数据,然后发现了两种类型的重复数据:
取消标签列嵌套导致数据重复
正常查询产生的重复数据(如下所示)行的所有列的值完全相同。
billing_account_id AS id,
service.id AS ServiceId,
service.description AS ServiceDescription,
sku.id AS SkuId,
sku.description AS SkuDescription,
usage_start_time,
usage_end_time,
project.id AS ProjectId,
project.number AS ProjectNumber,
project.name AS ProjectName,
project.labels as ProjectLabels,
project.ancestry_numbers AS ProjectAncestryNumbers,
labels AS Labels,
system_labels AS SystemLabels,
location.location AS Location,
location.country AS Country,
location.region AS Region,
location.zone AS Zone,
SUBSTRING(cast(export_time as string), 1, 19) AS ExportTime,
cost AS Cost,
currency AS Currency,
currency_conversion_rate AS CurrencyConversionRate,
usage.amount AS UsageAmount,
usage.unit AS UsageUnit,
usage.amount_in_pricing_units AS UsageAmountInPricingUnits,
usage.pricing_unit AS UsagePricingUnit,
credits AS Credits,
invoice.month AS InvoiceMonth,
cost_type AS CostType,
adjustment_info.id AS adjustmentInfoId,
adjustment_info.description AS adjustmentInfoDescription,
adjustment_info.mode AS adjustmentInfoMode,
adjustment_info.type AS adjustmentInfoType
FROM
'NAME OF TABLE'
WHERE DATE(_PARTITIONTIME) BETWEEN subtractFromTime(utcnow(),60,'Day','yyyy-MM-dd')AND formatDateTime(utcnow(),'yyyy-MM-dd')```
**I'd like to know why there are duplicate data? How they can be handled and how can be sure that cost calculation has not been affected by duplicate data?**
Appreciate it if anyone can help me.
Bests,
Shokoufeh