我想发布一个基于 jq 的简单工具,将一些常见的 Office 365 统一审核日志事件转换为表格报告格式,但在某些关键数组的嵌套方式方面遇到了挑战。特别是,当我深入到包含 ID、路径集和包含消息 ID 和大小行的文件夹 [] 集时,我无法找到一种方法来使数组中的相关值保留在同步/整理 - 相反,我得到了每个值的大量组合,就好像我无意中迭代了它们一样。
这是一些示例数据:
{"CreationTime":"2024-02-06T12:13:14","Id":"abcdabcd-1234-1234-5555-888888888888","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"5.5.5.5","Folders":[{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":12345},{"InternetMessageId":"<[email protected]>","SizeInBytes":11122},{"InternetMessageId":"<[email protected]>","SizeInBytes":88888}],"Id":"EEEEEEEE","Path":"\\Outbox"},{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":44444},{"InternetMessageId":"<[email protected]>","SizeInBytes":100000},{"InternetMessageId":"<[email protected]>","SizeInBytes":109000},{"InternetMessageId":"<[email protected]>","SizeInBytes":22000},{"InternetMessageId":"<[email protected]>","SizeInBytes":333333}],"Id":"FFFFFFFFFFFFFFFFFAB","Path":"\\Inbox"}]}
{"CreationTime":"2024-02-06T20:00:00","Id":"abcdabcd-1234-1234-6666-9999999999999","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"7.7.7.7","Folders":{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":77777},{"InternetMessageId":"<[email protected]>","SizeInBytes":888888},{"InternetMessageId":"<[email protected]>","SizeInBytes":99999}],"Id":"12341234","Path":"\\Temp"}}
期望的输出:
创建时间 | ID | 用户身份 | 客户端IP地址 | 文件夹ID | 文件夹路径 | 互联网消息ID | 大小(以字节为单位) |
---|---|---|---|---|---|---|---|
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | EEEEEEEE | \发件箱 | [电子邮件受保护] | 12345 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | EEEEEEEE | \发件箱 | [电子邮件受保护] | 11122 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | EEEEEEEE | \发件箱 | [电子邮件受保护] | 88888 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | FFFFFFFFFFFFFFFFAB | \收件箱 | [电子邮件受保护] | 44444 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | FFFFFFFFFFFFFFFFAB | \收件箱 | [电子邮件受保护] | 100000 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | FFFFFFFFFFFFFFFFAB | \收件箱 | [电子邮件受保护] | 109000 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | FFFFFFFFFFFFFFFFAB | \收件箱 | [电子邮件受保护] | 22000 |
2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [电子邮件受保护] | 5.5.5.5 | FFFFFFFFFFFFFFFFAB | \收件箱 | [电子邮件受保护] | 333333 |
2024-02-06T20:00:00 | 12341234 | [电子邮件受保护] | 7.7.7.7 | 12341234 | \温度 | [电子邮件受保护] | 77777 |
2024-02-06T20:00:00 | 12341234 | [电子邮件受保护] | 7.7.7.7 | 12341234 | \温度 | [电子邮件受保护] | 888888 |
2024-02-06T20:00:00 | 12341234 | [电子邮件受保护] | 7.7.7.7 | 12341234 | \温度 | [电子邮件受保护] | 99999 |
请注意,该.Folders
元素有时可以采用字符串格式,但我可以使用 轻松有条件地加载fromjson
。例如:
[...]"Folders": "[{\"FolderItems\":[{\"InternetMessageId\":\""Fo<[email protected]>\",\"SizeInBytes\":12345},[...]
到目前为止的代码:
cat | jq '
if has("Folders") then
if(.Folders | type=="string") and .Folders != "" then .Folders |= fromjson end |
if(.Folders | type=="string") and .Folders == "" then .Folders = null end
end | .' | # works up to here at least
jq '
if has("Item") then .Item |= (if type=="string" and .!="" then fromjson else {} end) else .Item|={} end |
if has("Item") then
if .Item | has("Id") then .ItemId = .Item.Id else .ItemId={} end |
if .Item | has("ParentFolder") then
.ItemParentFolderId=.Item.ParentFolder.Id? |
.ItemParentFolderPath=.Item.ParentFolder.Path? |
.ItemParentFolderName=.Item.ParentFolder.Name?
end
end | . ' | cat # works up to here at least
jq '
if has("Folders") then
if (.Folders | select(type=="array")) then
.Folders[].Id? |
.FoldersPath=.Folders[].Path? |
.FoldersFolderItems=.Folders[].FolderItems?
else . end
end
' |
jq -r '. | (.TimeGenerated // .CreationTime) as $EventTime |
.ClientIP = if .ClientIP == "" then null else .ClientIP end |
.ClientIP_ = if .ClientIP_ == "" then null else .ClientIP_ end |
.Client_IPAddress = if .Client_IPAddress == "" then null else .Client_IPAddress end |
.ClientIPAddress = if .ClientIPAddress == "" then null else .ClientIPAddress end |
.ActorIpAddress = if .ActorIpAddress == "" then null else .ActorIpAddress end |
(.ClientIP // .ClientIP_ // .Client_IPAddress // .ClientIPAddress // .ActorIpAddress) as $IPAddress |
(.UserId // .UserId_) as $LogonUser |
.FFIIMI as $InternetMessageId |
.FFISIB as $SizeInBytes |
{EventTime: $EventTime, IPAddress: $IPAddress, LogonUser: $LogonUser, InternetMessageId: $InternetMessageId, SizeInBytes: $SizeInBytes} + . |
[.Id, .EventTime, .IPAddress, .LogonUser, .MailboxOwnerUPN, .Operation, .InternetMessageId, .SizeInBytes] | @csv'
答案1
我从您作为示例数据提供的 JSON 开始。目前尚不清楚此 JSON 是否经过某种方式预处理。
由于顶级Folders
数组似乎不是如果它包含单个项目,则它是一个数组,如果它还不是一个数组,我们首先需要将其转换为数组。在 中jq
,我们可以这样做
.Folders |= (if type == "array" then . else [.] end)
其余转换的总体思路是将上层数据(即来自最顶层的一些键+值对以及来自每个元素的键)Id
复制到最低层(即元素)。然后我们可以将每个元素转换为 CSV 记录。Path
Folders
FolderItems
FolderItems
为了避免键具有重复的名称,我们还需要将Folders
元素的Id
键重命名为FolderId
(并将Path
同一级别重命名为FolderPath
,以保持一致性)。
我们想要的最顶层的数据可以被挑选出来并使用内部变量向下传输
pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record
这将$record
创建
{
"CreationTime": "2024-02-06T12:13:14",
"Id": "abcdabcd-1234-1234-5555-888888888888",
"UserId": "[email protected]",
"ClientIPAddress": "5.5.5.5"
}
... 第一个 JSON 对象。我们获取的键并不是按照最终 CSV 输出中所需的顺序获取的。
然后我们可以只提取 的Folders
元素,.Folders[]
并为每个元素挑选出Id
和Path
。由于我们要重命名它们,因此无法pick()
再次使用,因此我们需要更多的动手操作:
.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder
然后我们可以使用它.FolderItems[]
来获取一组FolderItems
元素,我们可以在其前面添加$record
和:$folder
.FolderItems[] | $record + $folder + .
作为单个jq
表达式:
.Folders |= (if type == "array" then . else [.] end) |
pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record |
.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder |
.FolderItems[] | $record + $folder + .
考虑到问题中的数据,结果:
{
"CreationTime": "2024-02-06T12:13:14",
"Id": "abcdabcd-1234-1234-5555-888888888888",
"UserId": "[email protected]",
"ClientIPAddress": "5.5.5.5",
"FolderId": "EEEEEEEE",
"FolderPath": "\\Outbox",
"InternetMessageId": "<[email protected]>",
"SizeInBytes": 12345
}
{
"CreationTime": "2024-02-06T12:13:14",
"Id": "abcdabcd-1234-1234-5555-888888888888",
"UserId": "[email protected]",
"ClientIPAddress": "5.5.5.5",
"FolderId": "EEEEEEEE",
"FolderPath": "\\Outbox",
"InternetMessageId": "<[email protected]>",
"SizeInBytes": 11122
}
(ETC。)
我个人会使用以下方法将这组 JSON 对象转换为 CSV磨坊主:
$ jq '.Folders |= (if type == "array" then . else [.] end) | pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record | .Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder | .FolderItems[] | $record + $folder + .' file | mlr --j2c cat
CreationTime,Id,UserId,ClientIPAddress,FolderId,FolderPath,InternetMessageId,SizeInBytes
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,12345
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,11122
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,88888
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,44444
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,100000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,109000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,22000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,333333
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,77777
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,888888
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,99999
jq
要使用 Miller 来转换为 CSV ,请将mlr
命令替换为类似的命令
jq -s -r '(first|keys|@csv), map([.[]]|@csv)[]'
这将从第一个输入对象中选取键作为 CSV 标头,并将所有值格式化为每个对象的一条记录。