我有一个 shell 脚本来从给定文件中提取文件名和列:并且需要从目录中读取的示例文件是:
2222_AAA Accounting Statistic-42005_04May2020_0900-04May2020_1000.csv
#!/bin/bash
# Go to where the files are located
filedir=/home/vikrant_singh_rana/AAA_USP/sample-Files/*
for filename in $filedir
do
#echo "Processing $filepath"
# do something on $f
printf '%s,%s\n' "$(basename "$filename" ".csv" | grep -oP '(?<=_).*(?=\-\d\d\d)' )" "$(head -n1 "$filename")"
done > test.txt;
上面的 shell 脚本将产生以下输出: 输入文件中的文件名和标题列
cat test.txt
AAA Accounting Statistic,TIMESTAMP,C420050004,C420050005,C420050006,C420050007
我期待文件名和文件中的列的笛卡儿乘积:
AAA Accounting Statistic,TIMESTAMP
AAA Accounting Statistic,C420050004
AAA Accounting Statistic,C420050005
AAA Accounting Statistic,C420050006
AAA Accounting Statistic,C420050007
答案1
您需要第二个循环来处理第一行$filename
for filename in /home/vikrant_singh_rana/AAA_USP/sample-Files/*; do
# ...
b=$(basename "$filename" ".csv" | grep -oP '(?<=_).*(?=\-\d\d\d)' )
for c in $(head -n1 "$filename" | sed 's/,/ /g'); do
printf '%s,%s\n' "$b" "$c"
done
done > test.txt
PS:这假设 的第一行中没有空格字符或换行符$filename
。
答案2
#!/bin/sh
for pathname in /home/vikrant_singh_rana/AAA_USP/sample-Files/*.csv
do
name=${pathname##*/} # remove directory path
name=${name#*_} # remove *_ prefix (up to first underscore)
name=${name%%-*} # remove -* suffix (from first dash)
awk -F , -v name="$name" 'BEGIN { OFS=FS } { for (i = 1; i <= NF; ++i) print name, $i; exit }' "$pathname"
done
这会迭代所有 CSV 文件,并NNNN_
从名称中删除目录路径和初始字符串,以及第一个-
字符之后的所有内容。该字符串保存在$name
.
然后在该文件上运行一个简短的awk
程序,该程序将文件第一行中的字段打印在单独的行上,每行都以 中提取的值作为前缀$name
。
这假设 CSV 文件是简单的第一行字段中没有嵌入逗号或换行符的 CSV 文件。
如果你没有数千个文件,你也可以awk
像这样使用 GNU:
awk -F , '
BEGIN { OFS=FS }
BEGINFILE {
name = FILENAME
sub(".*/", "", name) # remove directory path
sub("^[^_]*_", "", name) # remove *_ prefix (up to first underscore)
sub("-.*", "", name) # remove -* suffix (from first dash)
}
{
for (i = 1; i <= NF; ++i) print name, $i
nextfile
}' /home/vikrant_singh_rana/AAA_USP/sample-Files/*.csv