获取日期和时间戳范围内的 json 日志

获取日期和时间戳范围内的 json 日志

我的日志文件示例(其为 json 格式):

somecontent"TransDateTime\":\"2020-07-01T09:15:01.000Z","receiveTimestamp":"2020-07-01T02:15:01.335142083Z","textPayload":"[7/1/20 23:05],","timestamp":"2020-07-01T23:32:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:02.000Z","receiveTimestamp":"2020-07-01T02:15:02.335142083Z","textPayload":"[7/1/20 23:06],","timestamp":"2020-07-01T23:32:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:03.000Z","receiveTimestamp":"2020-07-01T02:15:03.335142083Z","textPayload":"[7/1/20 23:07],","timestamp":"2020-07-01T23:34:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:04.000Z","receiveTimestamp":"2020-07-01T02:15:04.335142083Z","textPayload":"[7/1/20 23:08],","timestamp":"2020-07-01T23:34:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:05.000Z","receiveTimestamp":"2020-07-01T02:15:05.335142083Z","textPayload":"[7/1/20 23:09],","timestamp":"2020-07-01T23:35:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:06.000Z","receiveTimestamp":"2020-07-01T02:15:06.335142083Z","textPayload":"[7/1/20 23:10],","timestamp":"2020-07-01T23:35:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:07.000Z","receiveTimestamp":"2020-07-01T02:15:07.335142083Z","textPayload":"[7/1/20 23:11],","timestamp":"2020-07-01T23:36:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:08.000Z","receiveTimestamp":"2020-07-01T02:15:08.335142083Z","textPayload":"[7/1/20 23:11],","timestamp":"2020-07-01T23:36:37.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:09.000Z","receiveTimestamp":"2020-07-01T02:15:09.335142083Z","textPayload":"[7/1/20 23:12],","timestamp":"2020-07-01T23:37:10.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:10.000Z","receiveTimestamp":"2020-07-01T02:15:10.335142083Z","textPayload":"[7/1/20 23:13],","timestamp":"2020-07-01T23:37:15.8",somecontent

该日志文件在任何地方都没有时间戳,但我必须比较最后一个“时间戳”的时间线。

我花了很多时间,但没有找到解决方案。

我已尝试以下命令。

cat test | grep '"timestamp":"2020-07-01T23:32:35.8"'

这将获取符合该条件的单行。

cat test | sed -n -e '/"timestamp":/p' -> 

这是列出符合条件“时间戳”的行

cat test | sed -n "/23:32/,/23:36/ p" | egrep "manivel"

这个收集两个时间戳之间的日志,并符合 grep 标准。但它没有考虑日志文件中的最后一个时间戳。

我不会在没有做任何研究的情况下发表此文章。

问题是这个日志文件在很多地方和时间上都有字符串“timestamp”(T09:15:06.000Z),就像在一行中的很多地方一样。

这正是我所震惊的地方。如果您能提供专业答案​​,我将不胜感激,并节省我的时间。

答案1

如果我理解正确的话,您尝试打印最后一个时间戳(紧跟在“时间戳”字符串之后)位于两个指定时间之间的行。我假设这些行是按时间顺序排列的。

$ sed -n '/"timestamp":"[^"]*T23:32:/,/"timestamp":"[^"]*T23:36:/p' test
somecontent"TransDateTime\":\"2020-07-01T09:15:01.000Z","receiveTimestamp":"2020-07-01T02:15:01.335142083Z","textPayload":"[7/1/20 23:05],","timestamp":"2020-07-01T23:32:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:02.000Z","receiveTimestamp":"2020-07-01T02:15:02.335142083Z","textPayload":"[7/1/20 23:06],","timestamp":"2020-07-01T23:32:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:03.000Z","receiveTimestamp":"2020-07-01T02:15:03.335142083Z","textPayload":"[7/1/20 23:07],","timestamp":"2020-07-01T23:34:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:04.000Z","receiveTimestamp":"2020-07-01T02:15:04.335142083Z","textPayload":"[7/1/20 23:08],","timestamp":"2020-07-01T23:34:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:05.000Z","receiveTimestamp":"2020-07-01T02:15:05.335142083Z","textPayload":"[7/1/20 23:09],","timestamp":"2020-07-01T23:35:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:06.000Z","receiveTimestamp":"2020-07-01T02:15:06.335142083Z","textPayload":"[7/1/20 23:10],","timestamp":"2020-07-01T23:35:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:07.000Z","receiveTimestamp":"2020-07-01T02:15:07.335142083Z","textPayload":"[7/1/20 23:11],","timestamp":"2020-07-01T23:36:36.8",somecontent

让我来逐一解释一下。通过"timestamp"在匹配字符串中指定,您可以确保要匹配的时间是在之后的时间"timestamp"

序列[^"]*表示除引号之外的任何字符。这样做的原因是为了确保不会在以后的行中在字符串"timestamp"和时间戳本身之间添加新字段。

我使用匹配字符串T23:36:而不是仅仅23:36为了它不会意外匹配某些分钟和秒数,就像在 中一样23:23:36.8

请注意,这将打印与第一个匹配字符串匹配的第一行和与最后一个匹配字符串匹配的第一行之间的所有行。因此,在此示例中,有两行带有时间戳“23:35”,但只打印了第一行。

答案2

我准备了以下脚本来收集日期和时间戳以及关键字之间的日志。在这里发布的原因可能是,它可能对正在寻找类似脚本的人有所帮助。

#!/bin/bash set +x DTE=$(date "+%d-%m-%Y-v%H%m%s") startdate=$1 enddate=$2 start_Time=$3 end_Time=$4 keyword=$5 BKT=storage/folder i=$start_time i1=$(sed 's/.\{3\}$//' <<< "$i") j=$end_time j1=$(sed 's/.\{3\}$//' <<< "$j") curr="$startdate" while true; do echo "$curr" [ "$curr" \< "$enddate" ] || break output=$(gsutil cat -h gs://storage/folder/"$curr"/"$i1:00:00_$j1:59:59*" | sed -n '/"timestamp":"[^"]*T'$i':/,/"timestamp":"[^"]*T'$j':/p' | grep "$keyword") echo $output >> $"/tmp/folder/mylog-$DTE" curr=$( date +%Y/%m/%d --date "$curr +1 day" ) done gsutil cp -r /tmp/folder/mylog-$DTE gs://storage/folder/

感谢@jezzaaaa 的 sed 命令。

相关内容