我有一个 CSV 文件 ( data.csv
),如下所示:
apple_val, balloon_val, cherry_val, dog_val
1 ,5 ,6 ,7
3 ,19 ,2 ,3
我有一个文本文件(sentence.txt
)如下:
I have apple_val apple(s) and balloon_val balloons. My dog_val dogs were biting the cherry_val cherries.
我想要我的输出文件(output.txt
)如下:
I have 1 apple(s) and 5 balloons. My 7 dogs were biting the 6 cherries.
I have 3 apple(s) and 19 balloons. My 3 dogs were biting the 2 cherries.
我使用了下面的脚本。但我的脚本特定于上面的示例。
awk -F "," {print $1, $2, $3, $4} data.csv | while read a, b, c,d
do
sed -e "s/apple_val/$a/g" -e "s/balloon_val/$b/g" -e "s/dog_val/$d/g" -e "s/cherry_val/$c/g" sentence.txt >> output.txt
done
我想通过读取 CSV 文件的第一行(标题)并替换文本文件中出现的这些字符串(如 apple_val)来使其通用。
我该怎么做?
答案1
修改的外星人变体(使用数组):
#!/bin/bash
tr -s ',' ' ' <data.csv | {
read -a tokens
while read -a values; do
for index in $(seq 0 $((${#tokens[*]}-1))); do
echo "s/${tokens[$index]}/${values[$index]}/g"
done | sed -f - sentence.txt
done
}
awk
与:相同
awk -F"[, ]+" '
NR == FNR{
s=s $0 "\n"
next}
FNR == 1{
for(i=1;i<=NF;i++)
val[i]=$i
next}
{
p=s
for(i=1;i<=NF;i++)
gsub(val[i], $i, p)
printf p}
' sentence.txt data.csv
答案2
您在这里尝试做的事情被称为“模板化”,而进行自己的模板化通常会充满意想不到的坑:)
这是一个 shell 脚本,它可以完成您所要求的操作,但它并不漂亮,而且可能非常脆弱。我强烈建议您寻找更强大的模板解决方案。
#!/bin/bash
sentence=$(cat sentence.txt)
tokens=$(head -n1 data.csv | cut -d, -f1- --output-delimiter="")
cat data.csv | tail -n +2 | while read i; do
token_number=0
new_sentence=$sentence
for token in $tokens; do
let token_number+=1
value=$(echo $i | cut -d, -f${token_number})
new_sentence=$(echo $new_sentence | sed -e "s/${token}/${value}/g")
done
echo $new_sentence
done
根据您在问题中指定的输入,结果输出:
I have 1 apple(s) and 5 balloons. My 7 dogs were biting the 6 cherries.
I have 3 apple(s) and 19 balloons. My 3 dogs were biting the 2 cherries.