在巨型文件上使用带有多行表达式的 sed 时内存不足

Question 1

你的前三个命令是罪魁祸首：

:a
N
$!ba

这会立即将整个文件读入内存。以下脚本一次只能在内存中保留一个段：

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

这个 awk 解决方案将打印每一行，因此内存中一次只有一行：

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Answer

你的前三个命令是罪魁祸首：

:a
N
$!ba

这会立即将整个文件读入内存。以下脚本一次只能在内存中保留一个段：

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

这个 awk 解决方案将打印每一行，因此内存中一次只有一行：

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Question 2

为了完整起见，Perl 解决方案：perl -p -e '/)$/ || chomp'

为了对称：-p将脚本包装在循环中，逐行读取和打印；表达式-e/脚本)在行尾匹配，如果不匹配（匹配为假），则继续到chomp，这会删除末尾的换行符。

Answer

为了完整起见，Perl 解决方案：perl -p -e '/)$/ || chomp'

为了对称：-p将脚本包装在循环中，逐行读取和打印；表达式-e/脚本)在行尾匹配，如果不匹配（匹配为假），则继续到chomp，这会删除末尾的换行符。

Question 3

用这个：

sed -i -z -u 's/\n/ /g' reallyBigFile.log

-z, --null-data
用 NUL 字符分隔行

-u, --unbuffered
从输入文件加载最少量的数据并更频繁地刷新输出缓冲区

Answer

用这个：

sed -i -z -u 's/\n/ /g' reallyBigFile.log

-z, --null-data
用 NUL 字符分隔行

-u, --unbuffered
从输入文件加载最少量的数据并更频繁地刷新输出缓冲区

在巨型文件上使用带有多行表达式的 sed 时内存不足

答案1

答案2

答案3

相关内容