如何使用 bash 将日志文件转换为 CSV

如何使用 bash 将日志文件转换为 CSV

是否可以使用 sed/awk 将此日志条目转换为格式正确的 csv 文件?我尝试了多种方法来使用 sed/awk 来做到这一点。可惜我做不到。在我用 Perl 做之前。这在 AWK/sed 甚至 bash 中可能吗

Jan 21 11:10:45 hostname.internal.nl THREAT 2015/01/21 11:10:44 critical 5 reset-both outside 10.10.10.10 inside 10.16.61.4 tcp 39898 80 web-browsing any Bash Remote Code Execution Vulnerability(36731)

像这样

Date  | Hostname | Threat | DATE+time | Critical/High | Count | --- | External IP | Internal IP | TCP/UDP | Port | External Port| Category | Vulnerability 

Jan 21 11:10:45 | hostname.internal.nl | THREAT | 2015/01/21 11:10:44 | critical 5 reset-both | outside 10.10.10.10 | inside 10.16.61.4 | tcp  39898 | 80 | web-browsing | 4any Bash Remote Code Execution Vulnerability(36731)

答案1

是的,你可以在 bash 中做到这一点,但我不知道你为什么想要这样做。这是一个纯粹的 bash 解决方案:

$ while read -r mon day time host threat date time crit count sugg out exip \
                in inip tcp port export cat vuln; do 
     printf "%s | " "$mon $day $time" "$host" "$threat" "$date $time" \ 
                    "$crit $count $sugg" "$out $exip" "$in $inip" "$tcp \
                    $port" "$export" "$cat" "$vuln"
  done < file; printf "\n"
Jan 21 11:10:44 | hostname.internal.nl | THREAT | 2015/01/21 11:10:44 | critical 5 reset-both | outside 10.10.10.10 | inside 10.16.61.4 | tcp 39898 | 80 | web-browsing | any Bash Remote Code Execution Vulnerability(36731) | 

我可以在 perl 或 awk 中为您提供更好的方法,但它们将取决于您如何准确地定义字段。如果您用更多详细信息更新您的问题(例如,关键字段中总是有 3 个单词吗?我们需要知道您的输入中哪些内容始终相同以及哪些内容可以更改),我可以为您提供更好的解决方案。

答案2

如果您的输入线是相同类型,您可以这样做:

#!/bin/bash

LOG="/root/1.txt"

echo "Date | Hostname | Threat | DATE+time | Critical/High | Count | --- | External IP | Internal IP | TCP/UDP | Port | External Port | Category | Vulnerability" > 1.csv

< $LOG awk '{print $1" "$2" "$3 " | " $4 " | " $5 " | " $6" "$7 " | " $8" "$9" "$10 " | " $11" "$12 " | " $13" "$14 " | " $15" "$16 " | " $17 " | " $18 " | " $19" "$20" "$21" "$22" "$23" "$24}' >> 1.csv

输入:

Jan 21 11:10:45 hostname.internal.nl THREAT 2015/01/21 11:10:44 critical 5 reset-both outside 10.10.10.10 inside 10.16.61.4 tcp 39898 80 web-browsing any Bash Remote Code Execution Vulnerability(36731)

Jan 22 11:12:45 hostname2.internal.nl THREAT2 2015/01/22 12:10:44 critical2 52 reset-both2 outside2 10.10.10.12 inside2 10.16.61.42 udp 39899 82 web-browsing2 any2 Bash Remote Code Execution Vulnerability(36731)2

输出:

Date | Hostname | Threat | DATE+time | Critical/High | Count | --- | External IP | Internal IP | TCP/UDP | Port | External Port | Category | Vulnerability
Jan 21 11:10:45 | hostname.internal.nl | THREAT | 2015/01/21 11:10:44 | critical 5 reset-both | outside 10.10.10.10 | inside 10.16.61.4 | tcp 39898 | 80 | web-browsing | any Bash Remote Code Execution Vulnerability(36731)
Jan 22 11:12:45 | hostname2.internal.nl | THREAT2 | 2015/01/22 12:10:44 | critical2 52 reset-both2 | outside2 10.10.10.12 | inside2 10.16.61.42 | udp 39899 | 82 | web-browsing2 | any2 Bash Remote Code Execution Vulnerability(36731)2

答案3

我发现这对于从 S3 Cloudfront 收集日志文件并将其加载到 Google Drive 中非常有用。

使用自制 awscli 安装后,我在 Mac-Os 上使用了 awscli。我运行了命令:

aws s3 sync s3://bucketname/domain/ .

该存储桶充满了云前端日志(我在云前端编辑中打开了此功能)以及拉到我的本地计算机的所有文件。我解压了 .gz 文件并获得了文本文件。

之后,在同一文件夹中,我使用 find + exec 命令构建一个大日志文件,如下所述。之后,我将日志分解为 cloudfront 日志格式,并在 awk 中使用逗号作为分隔符生成一个 CSV 文件。该 CSV 已上传到 Google 云端硬盘进行一些枢轴分析,并帮助我找到不良边缘和丢失的文件。

这是我为此使用的 bash 脚本:

`#!bash/bin
#Join Logs into one Big File using cat , if there are any default  headers on your files . you need to remove them
FILE_NAMES_START = 'E1I*'
find . -type f -name FILE_NAMES_START -exec cat {} + >> big_log_file.txt
#Output Log File 
LOG="big_log_file.txt"
echo "DATE , TIME , X-Edge-Location , SC-Bytes , Client-IP , Method , Cs-Host , File-URI , HTTP Status , Referer , User Agent , URI Query , Cookie-Id , Edge-Result , Edge-Request-ID , Host Header , Protocol , Bytes , Time Taken , X-Forwader , SSL Protocol , SSL Cipher , CDN-Result-Type" >> csv_data.csv
< $LOG awk '{print $1 " ,  " $2" ,  " $3 " , " $4 " , " $5 " , " $6 " , " $7 " , " $8 " , " $9 " , " $10 " , " $11 " , " $12 " , " $13 " , " $14 " , " $15 " , " $16 " , " $17 " , " $18 " , " $19 " , " $20 " , " $21 " , " $22 " , " $23 " , " $24}' >> big_csv_file.csv`

确保 txt 文件和此 bash 代码位于同一库或将所需的 / 添加到脚本中。

csv > 100Mb 在 Google 表格上效果不太好。

相关内容