在文本文件中,如果一行中的前 25 个字符是空格,我如何将该行附加到上一行,直到出现另一行以第一列中的 ASCII 字符开头。由于这里无法显示,所以我添加了屏幕截图。在原始文件中,我必须首先删除每行的尾随空格。这可行,但我不知道如何实现其余部分。我更喜欢将整个事情作为脚本(没有 Perl 或类似的)
原始文件:
08/07/2023 09:02:07 ANR8592T Session 137576 connection is using protocol
TLSVI3, cipher specification TLS_AES_256_GCM_SHA384,
certificate TSM Self-Signed Certificate. (SESSION:
137576)
08/07/2023 09:02:07 ANR@B4OT Session 137576 started for administrator ADMIN
(WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]-
:65234) on MU-SV-SPS1.de.bertrandt.net:1500. (SESSTON:
137576)
08/07/2023 09:02:07 ANR2017T Administrator ADMIN issued command: select
status from processes where process="NAS SnapMirror
Backup’ and status like 'WMU-SV-CL2%' (SESSION: 137576)
08/07/2023 09:02:07 ANR@46ST Session 137576 ended for administrator ADMIN
(WinNT). (SESSION: 137576)
08/07/2023 09:02:38 ANR8592T Session 137577 connection is using protocol
TLSVI3, cipher specification TLS_AES_256_GCM_SHA384,
certificate TSM Self-Signed Certificate. (SESSION:
137577)
08/07/2023 09:02:38 ANR@B4OT Session 137577 started for administrator ADMIN
(WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]-
:65235) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON:
137577)
08/07/2023 09:02:38 ANR2017T Administrator ADMIN issued command: select
node_name, filespace_name, BACKUP_START, BACKUP_END,
CAPACITY, PCT_UTIL from filespaces where node_name like
“MU-SV-CL2%" (SESSION: 137577)
08/07/2023 09:02:38 ANR@46ST Session 137577 ended for administrator ADMIN
(WinNT). (SESSION: 137577)
08/07/2023 09:02:38 ANR8592T Session 137578 connection is using protocol
TLSVI3, cipher specification TLS_AES_256_GCM_SHA384,
certificate TSM Self-Signed Certificate. (SESSION:
137578)
08/07/2023 09:02:38 ANR@B4OT Session 137578 started for administrator ADMIN
(WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]-
:65236) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON:
137578)
请求的结果:
08/07/2023 09:02:07 ANR8592T Session 137576 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137576)
08/07/2023 09:02:07 ANR@B4OT Session 137576 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65234) on MU-SV-SPS1.de.bertrandt.net:1500. (SESSTON: 137576)
08/07/2023 09:02:07 ANR2017T Administrator ADMIN issued command: select status from processes where process="NAS SnapMirror Backup’ and status like 'WMU-SV-CL2%' (SESSION: 137576) 08/07/2023 09:02:07 ANR@46ST Session 137576 ended for administrator ADMIN (WinNT). (SESSION: 137576)
08/07/2023 09:02:38 ANR8592T Session 137577 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137577)
08/07/2023 09:02:38 ANR@B4OT Session 137577 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65235) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON: 137577)
08/07/2023 09:02:38 ANR2017T Administrator ADMIN issued command: select node_name, filespace_name, BACKUP_START, BACKUP_END, CAPACITY, PCT_UTIL from filespaces where node_name like “MU-SV-CL2%" (SESSION: 137577)
08/07/2023 09:02:38 ANR@46ST Session 137577 ended for administrator ADMIN (WinNT). (SESSION: 137577)
08/07/2023 09:02:38 ANR8592T Session 137578 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137578)
08/07/2023 09:02:38 ANR@B4OT Session 137578 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65236) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON: 137578)
答案1
使用sed
本答案末尾的解决方案。
使用ed
编辑器首先将行开头的所有空格(制表符或空格)替换为单个空格,然后将以空格开头的每一行与之前的行连接起来:
printf '%s\n' 'g/^[[:blank:]]\{1,\}/ s// /' 'g/^ / -,.j' ,p Q | ed -s file
生成的文档将打印到标准输出流,但您可以更改,p
为w
将其写回原始文件。
此编辑会话中的两个主要命令:
g/^[[:blank:]]\{1,\}/ s// /
这将从任何行的开头删除一个或多个空格的所有运行,并将它们替换为单个空格。g/^ / -,.j
这会将以空格开头的每一行与其上一行连接起来。
将这两个g
命令组合成一个同时g
执行s
和j
命令的命令:
printf '%s\n' 'g/^[[:blank:]]\{1,\}/ s// /\' '-,.j' ,p Q | ed -s file
对此示例输入进行测试:
XXX XXX XXX YYY YYY YYY YYY YYY
ZZZ ZZZ ZZZ ZZZ ZZZ
YYY ZZZ YYY ZZZ YYY
YYY ZZZ YYY ZZZ YYY
YYY ZZZ YYY ZZZ YYY
XXX XXX XXX YYY YYY YYY YYY YYY
ZZZ ZZZ ZZZ ZZZ ZZZ
YYY ZZZ YYY ZZZ YYY
XXX XXX XXX YYY YYY YYY YYY YYY
ZZZ ZZZ ZZZ ZZZ ZZZ
YYY ZZZ YYY ZZZ YYY
YYY ZZZ YYY ZZZ YYY
YYY ZZZ YYY ZZZ YYY
结果:
XXX XXX XXX YYY YYY YYY YYY YYY ZZZ ZZZ ZZZ ZZZ ZZZ YYY ZZZ YYY ZZZ YYY YYY ZZZ YYY ZZZ YYY YYY ZZZ YYY ZZZ YYY
XXX XXX XXX YYY YYY YYY YYY YYY ZZZ ZZZ ZZZ ZZZ ZZZ YYY ZZZ YYY ZZZ YYY
XXX XXX XXX YYY YYY YYY YYY YYY ZZZ ZZZ ZZZ ZZZ ZZZ YYY ZZZ YYY ZZZ YYY YYY ZZZ YYY ZZZ YYY YYY ZZZ YYY ZZZ YYY
缩进行的缩进是用空格还是制表符都没关系。此外,每个“部分”的第一位(XXX...
我的示例中的位)与其余部分之间的间距(无论是使用空格还是制表符完成)都不会改变。
和sed
:
sed -e '/^[[:blank:]]\{1,\}/ { s///; H; $!d; }' -e 'x; y/\n/ /' file
这会检测到任何带有一个或多个初始空白的行,删除这些空白并将该行附加到保留空间(一个辅助缓冲区,在sed
周期之间不会被擦除)。如果该行不是最后一行,则将其丢弃,并且脚本跳至下一个输入行。
对于任何其他行(以及文档中的最后一行,如果它以空格开头),缓冲区将与保留空格交换,并且所有换行符(由命令作为分隔符插入H
)在输出结果之前都将替换为空格。
这会产生与上面的管道相同的输出ed
,但如果它不以空白开头,则将无法处理最后一行输入(据我从图像中可以看出,示例文本中的情况并非如此)。
答案2
awk 的解决方案
脚本format_text.awk
:
#! /usr/bin/awk -f
/^[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]/ {
if (line) {
print line
}
line = $0
next
}
{
gsub(/^[ \t]+/, "")
line = line " " $0
}
END {
if (line) {
print line
}
}
和:
chmod +x format_text.awk
像这样运行:
./format_text.awk log.txt
答案3
使用任何 POSIX awk:
$ awk -F'^[[:space:]]+' '
NF==1 { if (NR>1) print rec; rec=$0; next }
{ rec = rec OFS $2 }
END { print rec }
' file
08/07/2023 09:02:07 ANR8592T Session 137576 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137576)
08/07/2023 09:02:07 ANR@B4OT Session 137576 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65234) on MU-SV-SPS1.de.bertrandt.net:1500. (SESSTON: 137576)
08/07/2023 09:02:07 ANR2017T Administrator ADMIN issued command: select status from processes where process="NAS SnapMirror Backup’ and status like 'WMU-SV-CL2%' (SESSION: 137576) 08/07/2023 09:02:07 ANR@46ST Session 137576 ended for administrator ADMIN (WinNT). (SESSION: 137576)
08/07/2023 09:02:38 ANR8592T Session 137577 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137577)
08/07/2023 09:02:38 ANR@B4OT Session 137577 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65235) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON: 137577)
08/07/2023 09:02:38 ANR2017T Administrator ADMIN issued command: select node_name, filespace_name, BACKUP_START, BACKUP_END, CAPACITY, PCT_UTIL from filespaces where node_name like “MU-SV-CL2%" (SESSION: 137577)
08/07/2023 09:02:38 ANR@46ST Session 137577 ended for administrator ADMIN (WinNT). (SESSION: 137577)
08/07/2023 09:02:38 ANR8592T Session 137578 connection is using protocol TLSVI3, cipher specification TLS_AES_256_GCM_SHA384, certificate TSM Self-Signed Certificate. (SESSION: 137578)
08/07/2023 09:02:38 ANR@B4OT Session 137578 started for administrator ADMIN (WinNT) (SSL MU-SV-SPS1.de.bertrandt.net[192.168.171.56]- :65236) on MU-SV-SPS1.de.bertrandt.net:1560. (SESSTON: 137578)
答案4
这是一个单行解决方案,使用sed
sed -E -z 's/\n([ ]{25})//g' ./input.txt > ./output.txt
-E
指定我们将使用正则表达式-z
匹配\n
字符s/\n([ ]{25})//g
s/
取代\n([ ]{25})/
将一行 return 后跟 25 个空格替换为空/g
对内容全局执行操作