在 bash 中将文本文件拆分为带有多个分隔符的 CSV?

在 bash 中将文本文件拆分为带有多个分隔符的 CSV?

尝试将文本文件解析为 CSV。问题是我目前有多个分隔符,我理想情况下希望将其用作列标题,但可以从 csv 结果中删除。理想情况下宁愿使用 bash,但无论有效...在 Mac OS 系统上运行它。

Sample text (DISA STIG)


 ----------
Group ID (Vulid): V-81749
Group Title: SRG-OS-000067-GPOS-00035
Rule ID: SV-96463r1_rule
Severity: CAT II
Rule Version (STIG-ID): AOSX-13-067035
Rule Title: The macOS system must enable certificate for smartcards.
_
_

 Vulnerability Discussion: To prevent untrusted certificates the certificates on a smartcard card must be valid in these ways: its issuer is system-trusted, the certificate is not expired, its "valid-after" date is in the past, and it passes CRL and OCSP checking.
Check Content:
To view the setting for the smartcard certification configuration, run the following command:
sudo /usr/sbin/system_profiler SPConfigurationProfileDataType | /usr/bin/grep checkCertificateTrust
If the output is null or not "checkCertificateTrust = 1;" this is a finding.
Fix Text: This setting is enforced using the "Smartcard" configuration profile.
CCI: CCI-000186 ___________________________________________________
<Break>

----------

基本上,我想分解以下列的 CSV:

Group ID (Vulid)
Group Title:
Rule ID:
Severity:
Rule Version (STIG-ID):
Rule Title:
Vulnerability Discussion:
Check Content:
Fix Text:
CCI:

分隔<Break>符将转到下一行。

我希望我的专栏最终能得到这样的结果:

    Group ID (Vulid)    Group Title:    Rule ID:    Severity:   Rule Version (STIG-ID)  Rule Title: Vulnerability Discussion    Check Content   Fix Text:   CCI:    CCI:

最好的方法是删除每个标头,用任意分隔符替换,然后使用 awk 进行拆分?从未尝试过像这样用多个标准进行拆分,所以有点难于如何最好地处理它。

答案1

回答

首先,你需要清理文件并使其看起来统一,就像后记一样505942.txt。我把这项工作留给你,因为你只知道原始文件及其复杂性,并且你可以轻松地通过谷歌搜索简单的sed命令。请注意,您可能必须sed为某些偏离规范的行编写特定命令,或者如果不太麻烦的话手动进行更改(例如,我不会为简单的字符删除编写 5-6 行脚本)。

使用字符串时,您需要将工作分成简单的任务。我举了一个示例,说明如何将给定文件转换为逗号分隔值文件 (CSV)。最终文件是505942.csv这也是我后记。

sed -i 's/^.*\(: \)/\1/g' 505942.txt # Use '-i' for editing files in place (in the file itself). replace everything until the first colon ':' excluding, in other words, remove the headers from each line.
sed -i 's/^\(: \)//g' 505942.txt # Remove the first colon and the subsequent white space.
sed -i 's/^/"/' 505942.txt # Add double quotes in the beginning of each line. Quotes whill help you to parse the final comma seperated value file, since some of the fields seem to already contain commas.
sed -i 's/$/",/' 505942.txt # Add double quotes in the end of each line.
cat 505942.txt | xargs -n10 -d'\n' > 505942-after-xargs.txt # Join every 10 lines of the file.
sed -i 's/,$//' 505942-after-xargs.txt # Remove the last comma from each line.

sed -n 1,10p 505942.txt > 505942-headers.txt # Keep the first 10 lines from which you will extract the headers.
sed -i 's/:.*//' 505942-headers.txt # Remove everything after (including) the first colon.
sed -i 's/^/"/' 505942-headers.txt # Similar to above command.
sed -i 's/$/",/' 505942-headers.txt # Similar to above command.
cat 505942-headers.txt | xargs -n10 -d'\n' > 505942-headers-after-xargs.txt # Similar to above command.
sed -i 's/,$//' 505942-headers-after-xargs.txt # Similar to above command.

cat 505942-after-xargs.txt >> 505942-headers-after-xargs.txt # Join the files; append to the header file.

cat 505942-headers-after-xargs.txt # Check everything seems fine.
cp 505942-headers-after-xargs.txt 505942.csv # Copy to the final .csv file.

后记

内容505942.txt:

Group Title: SRG-OS-000067-GPOS-00035
Rule ID: SV-96463r1_rule
Severity: CAT II
Rule Version (STIG-ID): AOSX-13-067035
Rule Title: The macOS system must enable certificate for smartcards.
Vulnerability Discussion: To prevent untrusted certificates the certificates on a smartcard card must be valid in these ways: its issuer is system-trusted, the certificate is not expired, its "valid-after" date is in the past, and it passes CRL and OCSP checking.
Check Content: To view the setting for the smartcard certification configuration, run the following command: sudo /usr/sbin/system_profiler SPConfigurationProfileDataType | /usr/bin/grep checkCertificateTrust If the output is null or not "checkCertificateTrust = 1;" this is a finding.
Fix Text: This setting is enforced using the "Smartcard" configuration profile.
CCI: CCI-000186
Group ID (Vulid): V-81749
Group Title: SRG-OS-000067-GPOS-00035
Rule ID: SV-96463r1_rule
Severity: CAT II
Rule Version (STIG-ID): AOSX-13-067035
Rule Title: The macOS system must enable certificate for smartcards.
Vulnerability Discussion: To prevent untrusted certificates the certificates on a smartcard card must be valid in these ways: its issuer is system-trusted, the certificate is not expired, its "valid-after" date is in the past, and it passes CRL and OCSP checking.
Check Content: To view the setting for the smartcard certification configuration, run the following command: sudo /usr/sbin/system_profiler SPConfigurationProfileDataType | /usr/bin/grep checkCertificateTrust If the output is null or not "checkCertificateTrust = 1;" this is a finding.
Fix Text: This setting is enforced using the "Smartcard" configuration profile.
CCI: CCI-000186

内容505942.csv:

"Group ID (Vulid)", "Group Title", "Rule ID", "Severity", "Rule Version (STIG-ID)", "Rule Title", "Vulnerability Discussion", "Check Content", "Fix Text", "CCI"
"V-81749", "SRG-OS-000067-GPOS-00035", "SV-96463r1_rule", "CAT II", "AOSX-13-067035", "The macOS system must enable certificate for smartcards.", "its issuer is system-trusted, the certificate is not expired, its "valid-after" date is in the past, and it passes CRL and OCSP checking.", "sudo /usr/sbin/system_profiler SPConfigurationProfileDataType | /usr/bin/grep checkCertificateTrust If the output is null or not "checkCertificateTrust = 1;" this is a finding.", "This setting is enforced using the "Smartcard" configuration profile.", "CCI-000186"
"V-81749", "SRG-OS-000067-GPOS-00035", "SV-96463r1_rule", "CAT II", "AOSX-13-067035", "The macOS system must enable certificate for smartcards.", "its issuer is system-trusted, the certificate is not expired, its "valid-after" date is in the past, and it passes CRL and OCSP checking.", "sudo /usr/sbin/system_profiler SPConfigurationProfileDataType | /usr/bin/grep checkCertificateTrust If the output is null or not "checkCertificateTrust = 1;" this is a finding.", "This setting is enforced using the "Smartcard" configuration profile.", "CCI-000186"

答案2

将每个字段拆分为一个分隔符,将多个逻辑字段视为一个字段。然后,将该字段拆分为另一个分隔符。最后写出整个记录。

这可能不是一个“纯粹”的解决方案,但这就是 bash...

相关内容