我正在尝试使用 sed 搜索可能存在于多行中的某个“主要”模式,每个主要模式后面跟着 --unknown-- 个“次要”模式。
包含模式的行以 开头:test(header_name)
同一行后面是任意数量的字符串。我想将这些字符串移到它们自己的行上,以便每个字符串前面都有自己的test(header_name)
。
例如原始文件(mytest.txt):
apples
test("Type1", "hat", "cat", "dog", "house");
bananas
oranges
test("Type2", "brown", "red", "green", "yellow", "purple", "orange");
我希望它变成:
apples
test("Type1", "hat");
test("Type1", "cat");
test("Type1", "dog");
test("Type1", "house");
bananas
oranges
test("Type2", "brown");
test("Type2", "red");
test("Type2", "green");
test("Type2", "yellow");
test("Type2", "purple");
test("Type2", "orange");
如果我们知道每行字符串的数量,这将很容易做到,但在这种情况下,它不是固定的。
笨拙的方法是这样做:
while ( a line exists that starts with 'test' and contains more than two args)
do
Keep the first and second args
Move the rest of that line to a new line and add 'test(header)' to the front
done
但这很耗时,特别是当有数百个字符串时。
有任何想法吗?
答案1
虽然不好看,但是:
awk '
/test\(/ {
split($0, a)
i=2
while (a[i]) {
sub(/(,|\);)$/, "", a[i])
printf("%s %s);\n", a[1], a[i])
i++
}
next
}
{print}
'
答案2
好的,我找到了一个使用 WHILE 循环和 SED 的解决方案。是的,它很混乱,但它比我之前发布的算法更快!
# Look for first line that has more than two args
line_num=`sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt`
while [ "$line_num" != "" ]
do
# Get the first argument
first_arg=`sed -ne ''$line_num' s/test("\([^"]*\)".*/\1/pg' myfile.txt`
# All strings will be moved to their own line that includes 'test(first_arg)'
sed -i -e ''$line_num' s/", "/\ntest("'"$first_arg"'", "/g' myfile.txt
# No longer need first line after all targets moved to other lines
sed -i -e ''$line_num'd' myfile.txt
# Check for remaining lines with more than two args
line_num=`sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt`
done
# Minor adjustments to the output (add close-quotation, close-bracket and semi-colon)
sed -i \
-e 's/");//g' \
-e 's/\(test("[^)]*\)/\1");/g' \
myfile.txt