sed:替换同一行上未知数量的模式

sed:替换同一行上未知数量的模式

我正在尝试使用 sed 搜索可能存在于多行中的某个“主要”模式,每个主要模式后面跟着 --unknown-- 个“次要”模式。

包含模式的行以 开头:test(header_name) 同一行后面是任意数量的字符串。我想将这些字符串移到它们自己的行上,以便每个字符串前面都有自己的test(header_name)

例如原始文件(mytest.txt):

apples
test("Type1", "hat", "cat", "dog", "house");
bananas
oranges
test("Type2", "brown", "red", "green", "yellow", "purple", "orange");

我希望它变成:

apples
test("Type1", "hat");
test("Type1", "cat");
test("Type1", "dog");
test("Type1", "house");
bananas
oranges
test("Type2", "brown");
test("Type2", "red");
test("Type2", "green");
test("Type2", "yellow");
test("Type2", "purple");
test("Type2", "orange");

如果我们知道每行字符串的数量,这将很容易做到,但在这种情况下,它不是固定的。

笨拙的方法是这样做:

while ( a line exists that starts with 'test' and contains more than two args)
do

   Keep the first and second args
   Move the rest of that line to a new line and add 'test(header)' to the front

done

但这很耗时,特别是当有数百个字符串时。

有任何想法吗?

答案1

虽然不好看,但是:

awk '
    /test\(/ {
        split($0, a)
        i=2
        while (a[i]) {
            sub(/(,|\);)$/, "", a[i])
            printf("%s %s);\n", a[1], a[i])
            i++
        }
        next
    }
    {print}
'

答案2

好的,我找到了一个使用 WHILE 循环和 SED 的解决方案。是的,它很混乱,但它比我之前发布的算法更快!

# Look for first line that has more than two args
line_num=`sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt`

while [ "$line_num" != "" ]
do

    # Get the first argument
    first_arg=`sed -ne ''$line_num' s/test("\([^"]*\)".*/\1/pg' myfile.txt`

    # All strings will be moved to their own line that includes 'test(first_arg)'
    sed -i -e ''$line_num' s/", "/\ntest("'"$first_arg"'", "/g' myfile.txt

    # No longer need first line after all targets moved to other lines     
    sed -i -e ''$line_num'd' myfile.txt


    # Check for remaining lines with more than two args
    line_num=`sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt`

done


# Minor adjustments to the output (add close-quotation, close-bracket and semi-colon)
sed -i \
    -e 's/");//g' \
    -e 's/\(test("[^)]*\)/\1");/g' \
myfile.txt

相关内容