将 Office 365 日志中的嵌套数组转换为表

将 Office 365 日志中的嵌套数组转换为表

我想发布一个基于 jq 的简单工具,将一些常见的 Office 365 统一审核日志事件转换为表格报告格式,但在某些关键数组的嵌套方式方面遇到了挑战。特别是,当我深入到包含 ID、路径集和包含消息 ID 和大小行的文件夹 [] 集时,我无法找到一种方法来使数组中的相关值保留在同步/整理 - 相反,我得到了每个值的大量组合,就好像我无意中迭代了它们一样。

这是一些示例数据:

{"CreationTime":"2024-02-06T12:13:14","Id":"abcdabcd-1234-1234-5555-888888888888","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"5.5.5.5","Folders":[{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":12345},{"InternetMessageId":"<[email protected]>","SizeInBytes":11122},{"InternetMessageId":"<[email protected]>","SizeInBytes":88888}],"Id":"EEEEEEEE","Path":"\\Outbox"},{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":44444},{"InternetMessageId":"<[email protected]>","SizeInBytes":100000},{"InternetMessageId":"<[email protected]>","SizeInBytes":109000},{"InternetMessageId":"<[email protected]>","SizeInBytes":22000},{"InternetMessageId":"<[email protected]>","SizeInBytes":333333}],"Id":"FFFFFFFFFFFFFFFFFAB","Path":"\\Inbox"}]}
{"CreationTime":"2024-02-06T20:00:00","Id":"abcdabcd-1234-1234-6666-9999999999999","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"7.7.7.7","Folders":{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":77777},{"InternetMessageId":"<[email protected]>","SizeInBytes":888888},{"InternetMessageId":"<[email protected]>","SizeInBytes":99999}],"Id":"12341234","Path":"\\Temp"}}

期望的输出:

创建时间 ID 用户身份 客户端IP地址 文件夹ID 文件夹路径 互联网消息ID 大小(以字节为单位)
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 EEEEEEEE \发件箱 [电子邮件受保护] 12345
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 EEEEEEEE \发件箱 [电子邮件受保护] 11122
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 EEEEEEEE \发件箱 [电子邮件受保护] 88888
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 FFFFFFFFFFFFFFFFAB \收件箱 [电子邮件受保护] 44444
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 FFFFFFFFFFFFFFFFAB \收件箱 [电子邮件受保护] 100000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 FFFFFFFFFFFFFFFFAB \收件箱 [电子邮件受保护] 109000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 FFFFFFFFFFFFFFFFAB \收件箱 [电子邮件受保护] 22000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [电子邮件受保护] 5.5.5.5 FFFFFFFFFFFFFFFFAB \收件箱 [电子邮件受保护] 333333
2024-02-06T20:00:00 12341234 [电子邮件受保护] 7.7.7.7 12341234 \温度 [电子邮件受保护] 77777
2024-02-06T20:00:00 12341234 [电子邮件受保护] 7.7.7.7 12341234 \温度 [电子邮件受保护] 888888
2024-02-06T20:00:00 12341234 [电子邮件受保护] 7.7.7.7 12341234 \温度 [电子邮件受保护] 99999

请注意,该.Folders元素有时可以采用字符串格式,但我可以使用 轻松有条件地加载fromjson。例如:

[...]"Folders": "[{\"FolderItems\":[{\"InternetMessageId\":\""Fo<[email protected]>\",\"SizeInBytes\":12345},[...]

到目前为止的代码:

cat | jq '
    if has("Folders") then
        if(.Folders | type=="string") and .Folders != "" then .Folders |= fromjson  end |
        if(.Folders | type=="string") and .Folders == "" then .Folders = null end
    end | .' |     # works up to here at least
    jq '
if has("Item") then .Item |= (if type=="string" and .!="" then fromjson else {} end) else .Item|={}  end |
    if has("Item") then
            if .Item | has("Id") then .ItemId = .Item.Id else .ItemId={} end |
            if .Item | has("ParentFolder") then
                .ItemParentFolderId=.Item.ParentFolder.Id? |
                    .ItemParentFolderPath=.Item.ParentFolder.Path? |
                    .ItemParentFolderName=.Item.ParentFolder.Name?
            end
        end | . ' | cat # works up to here at least
    jq '
    if has("Folders") then
        if (.Folders | select(type=="array")) then
            .Folders[].Id? |
            .FoldersPath=.Folders[].Path? |
            .FoldersFolderItems=.Folders[].FolderItems?
        else . end
    end
    ' |
jq -r '. | (.TimeGenerated // .CreationTime) as $EventTime |
.ClientIP = if .ClientIP == "" then null else .ClientIP end |
.ClientIP_ = if .ClientIP_ == "" then null else .ClientIP_ end |
.Client_IPAddress = if .Client_IPAddress == "" then null else .Client_IPAddress end |
.ClientIPAddress = if .ClientIPAddress == "" then null else .ClientIPAddress end |
.ActorIpAddress = if .ActorIpAddress == "" then null else .ActorIpAddress end |
(.ClientIP // .ClientIP_ // .Client_IPAddress // .ClientIPAddress // .ActorIpAddress) as $IPAddress |
(.UserId // .UserId_) as $LogonUser |
.FFIIMI as $InternetMessageId |
.FFISIB as $SizeInBytes |
{EventTime: $EventTime, IPAddress: $IPAddress, LogonUser: $LogonUser, InternetMessageId: $InternetMessageId, SizeInBytes: $SizeInBytes} + . |
[.Id, .EventTime, .IPAddress, .LogonUser, .MailboxOwnerUPN, .Operation, .InternetMessageId, .SizeInBytes] | @csv'

答案1

我从您作为示例数据提供的 JSON 开始。目前尚不清楚此 JSON 是否经过某种方式预处理。

由于顶级Folders数组似乎不是如果它包含单个项目,则它是一个数组,如果它还不是一个数组,我们首先需要将其转换为数组。在 中jq,我们可以这样做

.Folders |= (if type == "array" then . else [.] end)

其余转换的总体思路是将上层数据(即来自最顶层的一些键+值对以及来自每个元素的键)Id复制到最低层(即元素)。然后我们可以将每个元素转换为 CSV 记录。PathFoldersFolderItemsFolderItems

为了避免键具有重复的名称,我们还需要将Folders元素的Id键重命名为FolderId(并将Path同一级别重命名为FolderPath,以保持一致性)。

我们想要的最顶层的数据可以被挑选出来并使用内部变量向下传输

pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record

这将$record创建

{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5"
}

... 第一个 JSON 对象。我们获取的键并不是按照最终 CSV 输出中所需的顺序获取的。

然后我们可以只提取 的Folders元素,.Folders[]并为每个元素挑选出IdPath。由于我们要重命名它们,因此无法pick()再次使用,因此我们需要更多的动手操作:

.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder

然后我们可以使用它.FolderItems[]来获取一组FolderItems元素,我们可以在其前面添加$record和:$folder

.FolderItems[] | $record + $folder + .

作为单个jq表达式:

.Folders |= (if type == "array" then . else [.] end) |
pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record |
.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder |
.FolderItems[] | $record + $folder + .

考虑到问题中的数据,结果:

{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5",
  "FolderId": "EEEEEEEE",
  "FolderPath": "\\Outbox",
  "InternetMessageId": "<[email protected]>",
  "SizeInBytes": 12345
}
{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5",
  "FolderId": "EEEEEEEE",
  "FolderPath": "\\Outbox",
  "InternetMessageId": "<[email protected]>",
  "SizeInBytes": 11122
}

(ETC。)

我个人会使用以下方法将这组 JSON 对象转换为 CSV磨坊主:

$ jq '.Folders |= (if type == "array" then . else [.] end) | pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record | .Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder | .FolderItems[] | $record + $folder + .' file | mlr --j2c cat
CreationTime,Id,UserId,ClientIPAddress,FolderId,FolderPath,InternetMessageId,SizeInBytes
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,12345
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,11122
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,88888
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,44444
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,100000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,109000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,22000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,333333
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,77777
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,888888
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,99999

jq要使用 Miller 来转换为 CSV ,请将mlr命令替换为类似的命令

jq -s -r '(first|keys|@csv), map([.[]]|@csv)[]'

这将从第一个输入对象中选取键作为 CSV 标头,并将所有值格式化为每个对象的一条记录。

相关内容