使用 PowerQuery 导入键值文本文件

使用 PowerQuery 导入键值文本文件

我需要将防火墙中的一些大型日志文件导入 Excel。截断版本(每个记录有近 100 对)如下所示:

"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""

开箱即用,PowerQuery 可以很好地导入这些内容,创建键值对的文本列:

   Column1       |     Column2
itime=1682240960 | date="2023-04-23" ... etc

基于分隔符的简单转换/提取即可获得数据。

如果真那么容易就好了。

由于某些奇怪的原因,只有 Fortinet 知道防火墙的导出格式不一致,并且没有两个 CSV 完全相同,这完全扰乱了基于预期哪一列中的数据的导入。

现在,我可以用 VBA 编写脚本,或者把它带到外面用 Python 执行,但我忍不住想到,在 Excel / PowerQuery 中一定有一种更优雅的方式来做到这一点

答案1

如果唯一的区别是某些行不包含所有对key=value,那么在 Power Query 中处理起来相对容易。

鉴于这种csv

"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240966","date=""2023-04-23""","time=""18:40:26""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:10:20""","devid=""FG101FTK21000840""","type=""traffic""","subtype=""forward"""

阅读代码注释并检查应用步骤以了解算法。

let

//change next line to reflect your actual csv document
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.txt"),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),

//add an index column representing the row number
    #"Add Row Index" = Table.AddIndexColumn(Source,"Row",0,1,Int64.Type),

//unpivot columns other than the Row column to => a three column table
  //Remove the Attribute column (which would be the column headers
  //And any rows with no key=value pairs
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Add Row Index", {"Row"}, "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute"}),

//Add a custom column consisting of only the "Key" portion of the pair
    #"Add Key Column" = Table.AddColumn(#"Removed Columns", "Key", each Text.Split([Value],"="){0}, type text),

//Create a list of the eventual Column Names
    #"Col Names" = List.Distinct(#"Add Key Column"[Key]),

//Group by Row
//then Pivot each subtable
// will => error if any Key=Value pair is duplicated
    #"Grouped Rows" = Table.Group(#"Add Key Column", {"Row"}, {
        {"Pivot", (t)=>Table.Pivot(t, #"Col Names","Key", "Value")}}),

//Remove now unneeded Row column and expand the Pivoted tables
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Row"}),
    #"Expanded Pivot" = Table.ExpandTableColumn(#"Removed Columns1", "Pivot", #"Col Names" ),
    #"Set Data Type" = Table.TransformColumnTypes(#"Expanded Pivot", List.Transform(#"Col Names", each {_, type text}))
in
    #"Set Data Type"

结果
在此处输入图片描述

相关内容