如何使用 sed 删除重复字母？

Question 1

方法#1

您可以使用此sed命令来执行此操作：

$ sed 's/\([A-Za-z]\)\1\+/\1/g' file.txt

例子

使用上面的示例输入，我创建了一个文件sample.txt.

$ sed 's/\([A-Za-z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

方法#2

还有一种方法可以删除所有重复字符：

$ sed 's/\(.\)\1/\1/g' file.txt

例子

$ sed 's/\(.\)\1/\1/g' sample.txt 
NAME
    nice - run a program with modified scheduling priority

    SYNOPSIS
       nice   [-n  adjustment]  [-adjustment] [-adjustment=adjustment] [command [a$

方法#3（仅大写）

OP 询问您是否可以对其进行修改，以便仅删除大写字符，以下是使用修改后的方法#1 的方法。

例子

$ sed 's/\([A-Z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

上述方法的详细说明

所有示例都使用了一种技术，当第一次遇到字符集 AZ 或 az 中的字符时，它的值将被保存。将字符括起来的括号表示sed将它们保存起来供以后使用。然后，该值将存储在一个临时变量中，您可以立即或稍后访问该变量。这些变量被命名为\1和\2。

所以我们使用的技巧是匹配第一个字母。

\([A-Za-z]\)

然后我们转身并使用刚刚保存的值作为辅助字符，该字符必须紧接在上面的第一个字符之后出现，因此：

\([A-Za-z]\)\1.

我们sed还利用搜索和替换功能，s/../../g.这g意味着我们正在全球范围内开展这项工作。

因此，当我们遇到一个字符，然后又遇到另一个字符时，我们会将其替换掉，并仅用一个相同的字符替换它。

Answer

方法#1

您可以使用此sed命令来执行此操作：

$ sed 's/\([A-Za-z]\)\1\+/\1/g' file.txt

例子

使用上面的示例输入，我创建了一个文件sample.txt.

$ sed 's/\([A-Za-z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

方法#2

还有一种方法可以删除所有重复字符：

$ sed 's/\(.\)\1/\1/g' file.txt

例子

$ sed 's/\(.\)\1/\1/g' sample.txt 
NAME
    nice - run a program with modified scheduling priority

    SYNOPSIS
       nice   [-n  adjustment]  [-adjustment] [-adjustment=adjustment] [command [a$

方法#3（仅大写）

OP 询问您是否可以对其进行修改，以便仅删除大写字符，以下是使用修改后的方法#1 的方法。

例子

$ sed 's/\([A-Z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

上述方法的详细说明

所有示例都使用了一种技术，当第一次遇到字符集 AZ 或 az 中的字符时，它的值将被保存。将字符括起来的括号表示sed将它们保存起来供以后使用。然后，该值将存储在一个临时变量中，您可以立即或稍后访问该变量。这些变量被命名为\1和\2。

所以我们使用的技巧是匹配第一个字母。

\([A-Za-z]\)

然后我们转身并使用刚刚保存的值作为辅助字符，该字符必须紧接在上面的第一个字符之后出现，因此：

\([A-Za-z]\)\1.

我们sed还利用搜索和替换功能，s/../../g.这g意味着我们正在全球范围内开展这项工作。

因此，当我们遇到一个字符，然后又遇到另一个字符时，我们会将其替换掉，并仅用一个相同的字符替换它。

Question 2

此命令删除所有双字母：

sed 's/\([[:alpha:]]\)\1/\1/g'

\1代表里面的文本\(…\)，所以这个命令的意思是：只要后面有字母字符，就用该字母字符单独替换。

这将转变command为例如comand。最好将转换限制在需要的地方：非缩进行。

sed '/^[[:alpha:]]/ s/\([[:alpha:]]\)\1/\1/g'

此文本是为终端呈现的手册页，其中粗体由重击表示：C\bC呈现为粗体，其中\b是退格字符（字符号 8，也称为 ^H）。如果控制字符仍然存在，请忘记重复的字母，而是删除重印。

sed -e 's/.\b//g'

如果您有办法格式化输出，请将其转换C\bC为粗体和_\bC下划线。

sed -e 's/\(.\)\b\1/\e[1m\1\e[22m/g' -e 's/_\b\(.\)/\e[4m\1\e[24m/g' |
sed -e 's/\e[22m\e[1m//g' -e 's/\e[24m\e[4m//g'

如果您的 sed 不理解反斜杠转义，请使用文字字符（对于 Ctrl+H\b和对于 Ctrl+[ \e）。

Answer

此命令删除所有双字母：

sed 's/\([[:alpha:]]\)\1/\1/g'

\1代表里面的文本\(…\)，所以这个命令的意思是：只要后面有字母字符，就用该字母字符单独替换。

这将转变command为例如comand。最好将转换限制在需要的地方：非缩进行。

sed '/^[[:alpha:]]/ s/\([[:alpha:]]\)\1/\1/g'

此文本是为终端呈现的手册页，其中粗体由重击表示：C\bC呈现为粗体，其中\b是退格字符（字符号 8，也称为 ^H）。如果控制字符仍然存在，请忘记重复的字母，而是删除重印。

sed -e 's/.\b//g'

如果您有办法格式化输出，请将其转换C\bC为粗体和_\bC下划线。

sed -e 's/\(.\)\b\1/\e[1m\1\e[22m/g' -e 's/_\b\(.\)/\e[4m\1\e[24m/g' |
sed -e 's/\e[22m\e[1m//g' -e 's/\e[24m\e[4m//g'

如果您的 sed 不理解反斜杠转义，请使用文字字符（对于 Ctrl+H\b和对于 Ctrl+[ \e）。

Question 3

这绝不是一项微不足道的任务。简单地替换双字母将是灾难性的。想想它会对“注意”或“忘记”或（与您的情况更相关）“命令”等词做什么。下面的脚本是一个简单的解决方案的首次尝试。它利用字典来确定哪些单词确实有重复的字母。

#!/usr/bin/perl

use strict;
use warnings;

my $input_file = shift//die "No file name given\n";
my $dictionary = shift//'/usr/share/dict/words';
open my $if,'<',$input_file or die "$input_file: $!\n";
open my $dict,'<',$dictionary or die "$dictionary: $!\n";
my %dictionary;
for(<$dict>){
    chomp;
    $dictionary{$_}++;
}
close $dictionary;

LINE: while(<$if>){
    chomp;

    WORD: for my $word ( split /\s+/ ){
            print "$word " and next WORD if exists $dictionary{lc $word};

            SUBSTITUTION: while($word=~ s{([A-Z])\1}{$1}i){
                exists $dictionary{lc $word} and last SUBSTITUTION;
            } #END SUBSTITUTION
            print "$word ";

     } #END WORD

     print "\n";

} #END LINE

称呼它为

[user@host]./myscript.pl input_file optional_dictionary_file >output_file

如果您不提供第二个参数，则字典文件默认为/usr/share/dict/words，这应该在像样的 GNU/Linux 上可用。

免责声明：这是未经测试的。

注意事项：

它至少会破坏连字符的单词（它使用空格来决定“单词”是什么）。
它只会删除重复的大写字母，以避免弄乱man页面本身的内容。
它会对十六进制造成严重破坏，例如0xFFFF.
可能还有很多我看不到的。

Answer

这绝不是一项微不足道的任务。简单地替换双字母将是灾难性的。想想它会对“注意”或“忘记”或（与您的情况更相关）“命令”等词做什么。下面的脚本是一个简单的解决方案的首次尝试。它利用字典来确定哪些单词确实有重复的字母。

#!/usr/bin/perl

use strict;
use warnings;

my $input_file = shift//die "No file name given\n";
my $dictionary = shift//'/usr/share/dict/words';
open my $if,'<',$input_file or die "$input_file: $!\n";
open my $dict,'<',$dictionary or die "$dictionary: $!\n";
my %dictionary;
for(<$dict>){
    chomp;
    $dictionary{$_}++;
}
close $dictionary;

LINE: while(<$if>){
    chomp;

    WORD: for my $word ( split /\s+/ ){
            print "$word " and next WORD if exists $dictionary{lc $word};

            SUBSTITUTION: while($word=~ s{([A-Z])\1}{$1}i){
                exists $dictionary{lc $word} and last SUBSTITUTION;
            } #END SUBSTITUTION
            print "$word ";

     } #END WORD

     print "\n";

} #END LINE

称呼它为

[user@host]./myscript.pl input_file optional_dictionary_file >output_file

如果您不提供第二个参数，则字典文件默认为/usr/share/dict/words，这应该在像样的 GNU/Linux 上可用。

免责声明：这是未经测试的。

注意事项：

它至少会破坏连字符的单词（它使用空格来决定“单词”是什么）。
它只会删除重复的大写字母，以避免弄乱man页面本身的内容。
它会对十六进制造成严重破坏，例如0xFFFF.
可能还有很多我看不到的。

Question 4

尝试：

sed -e 's/\([A-Za-z]\)\1/\1/g'

只需删除\\+，然后只有两个字母就会减少为单个字母。（假设所有字符都已复制的情况下工作）

试试这个小测试：

echo "PPaayy Atttteenttiioonn ttoo aallll ccoommmmaanndds" > test.txt
sed -e 's/\([A-z]\)\1/\1/g' < test.txt > test2.txt
cat test2.txt

Answer

尝试：

sed -e 's/\([A-Za-z]\)\1/\1/g'

只需删除\\+，然后只有两个字母就会减少为单个字母。（假设所有字符都已复制的情况下工作）

试试这个小测试：

echo "PPaayy Atttteenttiioonn ttoo aallll ccoommmmaanndds" > test.txt
sed -e 's/\([A-z]\)\1/\1/g' < test.txt > test2.txt
cat test2.txt

如何使用 sed 删除重复字母？

答案1

方法#1

例子

方法#2

例子

方法#3（仅大写）

例子

上述方法的详细说明

答案2

答案3

答案4

相关内容