使用 PowerQuery 导入键值文本文件

Question

如果唯一的区别是某些行不包含所有对key=value，那么在 Power Query 中处理起来相对容易。

鉴于这种csv：

"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240966","date=""2023-04-23""","time=""18:40:26""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:10:20""","devid=""FG101FTK21000840""","type=""traffic""","subtype=""forward"""

阅读代码注释并检查应用步骤以了解算法。

let

//change next line to reflect your actual csv document
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.txt"),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),

//add an index column representing the row number
    #"Add Row Index" = Table.AddIndexColumn(Source,"Row",0,1,Int64.Type),

//unpivot columns other than the Row column to => a three column table
  //Remove the Attribute column (which would be the column headers
  //And any rows with no key=value pairs
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Add Row Index", {"Row"}, "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute"}),

//Add a custom column consisting of only the "Key" portion of the pair
    #"Add Key Column" = Table.AddColumn(#"Removed Columns", "Key", each Text.Split([Value],"="){0}, type text),

//Create a list of the eventual Column Names
    #"Col Names" = List.Distinct(#"Add Key Column"[Key]),

//Group by Row
//then Pivot each subtable
// will => error if any Key=Value pair is duplicated
    #"Grouped Rows" = Table.Group(#"Add Key Column", {"Row"}, {
        {"Pivot", (t)=>Table.Pivot(t, #"Col Names","Key", "Value")}}),

//Remove now unneeded Row column and expand the Pivoted tables
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Row"}),
    #"Expanded Pivot" = Table.ExpandTableColumn(#"Removed Columns1", "Pivot", #"Col Names" ),
    #"Set Data Type" = Table.TransformColumnTypes(#"Expanded Pivot", List.Transform(#"Col Names", each {_, type text}))
in
    #"Set Data Type"

结果

Answer 1

如果唯一的区别是某些行不包含所有对key=value，那么在 Power Query 中处理起来相对容易。

鉴于这种csv：

"itime=1682240966","date=""2023-04-23""","time=""18:39:26""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:39:20""","devid=""FG101FTK21000840""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240966","date=""2023-04-23""","time=""18:40:26""","vd=""root""","type=""traffic""","subtype=""forward"""
"itime=1682240960","date=""2023-04-23""","time=""18:10:20""","devid=""FG101FTK21000840""","type=""traffic""","subtype=""forward"""

阅读代码注释并检查应用步骤以了解算法。

let

//change next line to reflect your actual csv document
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.txt"),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),

//add an index column representing the row number
    #"Add Row Index" = Table.AddIndexColumn(Source,"Row",0,1,Int64.Type),

//unpivot columns other than the Row column to => a three column table
  //Remove the Attribute column (which would be the column headers
  //And any rows with no key=value pairs
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Add Row Index", {"Row"}, "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute"}),

//Add a custom column consisting of only the "Key" portion of the pair
    #"Add Key Column" = Table.AddColumn(#"Removed Columns", "Key", each Text.Split([Value],"="){0}, type text),

//Create a list of the eventual Column Names
    #"Col Names" = List.Distinct(#"Add Key Column"[Key]),

//Group by Row
//then Pivot each subtable
// will => error if any Key=Value pair is duplicated
    #"Grouped Rows" = Table.Group(#"Add Key Column", {"Row"}, {
        {"Pivot", (t)=>Table.Pivot(t, #"Col Names","Key", "Value")}}),

//Remove now unneeded Row column and expand the Pivoted tables
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Row"}),
    #"Expanded Pivot" = Table.ExpandTableColumn(#"Removed Columns1", "Pivot", #"Col Names" ),
    #"Set Data Type" = Table.TransformColumnTypes(#"Expanded Pivot", List.Transform(#"Col Names", each {_, type text}))
in
    #"Set Data Type"

结果

使用 PowerQuery 导入键值文本文件

答案1

相关内容