我需要将防火墙中的一些大型日志文件导入 Excel。截断版本(每个记录有近 100 对)如下所示:
"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
开箱即用,PowerQuery 可以很好地导入这些内容,创建键值对的文本列:
Column1 | Column2
itime=1682240960 | date="2023-04-23" ... etc
基于分隔符的简单转换/提取即可获得数据。
如果真那么容易就好了。
由于某些奇怪的原因,只有 Fortinet 知道防火墙的导出格式不一致,并且没有两个 CSV 完全相同,这完全扰乱了基于预期哪一列中的数据的导入。
现在,我可以用 VBA 编写脚本,或者把它带到外面用 Python 执行,但我忍不住想到,在 Excel / PowerQuery 中一定有一种更优雅的方式来做到这一点
答案1
如果唯一的区别是某些行不包含所有对key=value
,那么在 Power Query 中处理起来相对容易。
鉴于这种csv
:
"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240966","date=""2023-04-23""","time=""18:40:26""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:10:20""","devid=""FG101FTK21000840""","type=""traffic""","subtype=""forward"""
阅读代码注释并检查应用步骤以了解算法。
let
//change next line to reflect your actual csv document
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.txt"),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
//add an index column representing the row number
#"Add Row Index" = Table.AddIndexColumn(Source,"Row",0,1,Int64.Type),
//unpivot columns other than the Row column to => a three column table
//Remove the Attribute column (which would be the column headers
//And any rows with no key=value pairs
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Add Row Index", {"Row"}, "Attribute", "Value"),
#"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute"}),
//Add a custom column consisting of only the "Key" portion of the pair
#"Add Key Column" = Table.AddColumn(#"Removed Columns", "Key", each Text.Split([Value],"="){0}, type text),
//Create a list of the eventual Column Names
#"Col Names" = List.Distinct(#"Add Key Column"[Key]),
//Group by Row
//then Pivot each subtable
// will => error if any Key=Value pair is duplicated
#"Grouped Rows" = Table.Group(#"Add Key Column", {"Row"}, {
{"Pivot", (t)=>Table.Pivot(t, #"Col Names","Key", "Value")}}),
//Remove now unneeded Row column and expand the Pivoted tables
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Row"}),
#"Expanded Pivot" = Table.ExpandTableColumn(#"Removed Columns1", "Pivot", #"Col Names" ),
#"Set Data Type" = Table.TransformColumnTypes(#"Expanded Pivot", List.Transform(#"Col Names", each {_, type text}))
in
#"Set Data Type"