Google Cloud 账单数据有重复行

Google Cloud 账单数据有重复行

我通过 BigQuery 提取了 GCP 计费数据。我进行了查询以检索特定时间段的数据,然后发现了两种类型的重复数据:

  1. 取消标签列嵌套导致数据重复

  2. 正常查询产生的重复数据(如下所示)行的所有列的值完全相同。

      billing_account_id AS id,
      service.id AS ServiceId,
      service.description AS ServiceDescription,
      sku.id AS SkuId,
      sku.description AS SkuDescription,
      usage_start_time,
      usage_end_time,
      project.id AS ProjectId,
      project.number AS ProjectNumber,
      project.name AS ProjectName,
      project.labels as ProjectLabels,
      project.ancestry_numbers AS ProjectAncestryNumbers,
      labels   AS Labels,
      system_labels  AS SystemLabels,
      location.location AS Location,
      location.country AS Country,
      location.region AS Region,
      location.zone AS Zone,
      SUBSTRING(cast(export_time as string), 1, 19) AS ExportTime,
      cost AS Cost,
      currency AS Currency,
      currency_conversion_rate AS CurrencyConversionRate,
      usage.amount AS UsageAmount,
      usage.unit AS UsageUnit,
      usage.amount_in_pricing_units AS UsageAmountInPricingUnits,
      usage.pricing_unit AS UsagePricingUnit,
      credits  AS Credits,
      invoice.month AS InvoiceMonth,
      cost_type AS CostType,
      adjustment_info.id AS adjustmentInfoId,
      adjustment_info.description AS adjustmentInfoDescription,
      adjustment_info.mode AS adjustmentInfoMode,
      adjustment_info.type AS adjustmentInfoType
    FROM 
    'NAME OF TABLE' 
    WHERE DATE(_PARTITIONTIME) BETWEEN subtractFromTime(utcnow(),60,'Day','yyyy-MM-dd')AND formatDateTime(utcnow(),'yyyy-MM-dd')```


**I'd like to know why there are duplicate data? How they can be handled and how can be sure that cost calculation has not been affected by duplicate data?**

Appreciate it if anyone can help me.


Bests,
Shokoufeh

答案1

1. 为什么取消嵌套标签字段时行会重复?

当您取消嵌套重复类型字段(如)时labels,行重复是意料之中的事情。确切地说,每行都会根据该行labels数组的长度进行重复。

取消嵌套计数与标签总数

2. 为什么在取消嵌套之前就会有重复的行?

如果您创建 2 个具有完全相同配置和位置的 Compute Engine 虚拟机,则除非另有标记,否则这两个虚拟机的帐单导出中的空闲使用行完全相同。导出表没有明确公开的主键。

导出表的粒度仅限于服务和 SKU,而不是每个资源。这会导致数据看起来重复,但实际上它们是有效的用途。

相关内容