我需要找到第一行中包含字符串的所有文件:“StockID”和“SellPrice”。
以下是一些文件示例:
1.csv:
StockID Dept Cat2 Cat4 Cat5 Cat6 Cat1 Cat3 Title Notes Active Weight Sizestr Colorstr Quantity Newprice StockCode DateAdded SellPrice PhotoQuant PhotoStatus Description stockcontrl Agerestricted
<blank> 1 0 0 0 0 22 0 RAF Air Crew Oxygen Connector 50801 1 150 <blank> <blank> 0 0 50866 2018-09-11 05:54:03 65 5 1 <br />\r\nA wartime RAF aircrew oxygen hose connector.<br />\r\n<br />\r\nAir Ministry stamped with Ref. No. 6D/482, Mk IVA.<br />\r\n<br />\r\nBrass spring loaded top bayonet fitting for the 'walk around' oxygen bottle extension hose (see last photo).<br />\r\n<br />\r\nIn a good condition. 2 0
<blank> 1 0 0 0 0 15 0 WW2 US Airforce Type Handheld Microphone 50619 1 300 <blank> <blank> 1 0 50691 2017-12-06 09:02:11 20 9 1 <br />\r\nWW2 US Airforce Handheld Microphone type NAF 213264-6 and sprung mounting Bracket No. 213264-2.<br />\r\n<br />\r\nType RS 38-A.<br />\r\n<br />\r\nMade by Telephonics Corp.<br />\r\n<br />\r\nIn a un-issued condition. 3 0
<blank> 1 0 0 0 0 22 0 RAF Seat Type Parachute Harness <blank> 1 4500 <blank> <blank> 1 0 50367 2016-11-04 12:02:26 155 8 1 <br />\r\nPost War RAF Pilot Seat Type Parachute Harness.<br />\r\n<br />\r\nThis Irvin manufactured harness is 'new old' stock and is unissued.<br />\r\n<br />\r\nThe label states Irvin Harness type C, Mk10, date 1976.<br />\r\nIt has Irvin marked buckles and complete harness straps all in 'mint' condition.<br />\r\n<br />\r\nFully working Irvin Quick Release Box and a canopy release Irvin 'D-Ring' Handle.<br />\r\n<br />\r\nThis harness is the same style type as the WW2 pattern seat type, and with some work could be made to look like one.<br />\r\n<br />\r\nIdeal for the re-enactor or collector (Not sold for parachuting).<br />\r\n<br />\r\nTotal weight of 4500 gms. 3 0
2.csv:
id user_id organization_id hash name email date first_name hear_about
1 2 15 <blank> Fairley [email protected] 1129889679 John 0
我只想找到第一行包含的文件:“StockID”和“SellPrice”;所以在这个例子中,我只想输出 ./1.csv
我设法做到了这一点,但我现在陷入困境;(
where=$(find "./backup -type f)
for x in $where; do
head -1 $x | grep -w "StockID"
done
答案1
find
+awk
解决方案:
find ./backup -type f -exec \
awk 'NR == 1{ if (/StockID.*SellPrice/) print FILENAME; exit }' {} \;
如果关键单词的顺序可能不同,请将模式替换/StockID.*SellPrice/
为/StockID/ && /SellPrice/
。
如果文件数量巨大,更有效的替代方案是(一次处理一堆文件;命令的调用总数将远小于匹配文件的数量):
find ./backup -type f -exec \
awk 'FNR == 1 && /StockID.*SellPrice/{ print FILENAME }{ nextfile }' {} +
答案2
使用 GNUgrep
或兼容:
grep -Hrnm1 '^' ./backup | sed -n '/StockID.*SellPrice/s/:1:.*//p'
递归 grep 将打印每个文件的第一行并打印filename:1:line
没有读取整个文件(该-m1
标志应使其在第一个匹配时退出),并且sed
将打印filename
该line
部分与模式匹配的位置。
这将失败并显示文件名字其中包含:1:
自身或换行符,但这是一个值得冒的风险,而不是放置一些慢find
+awk
组合,为每个文件执行另一个进程。
答案3
为了避免每个文件运行一个命令并读取整个文件,使用 GNU awk
:
(unset -v POSIXLY_CORRECT; exec find backup/ -type f -exec gawk '
/\<StockID\>/ && /\<SellPrice\>/ {print FILENAME}; {nextfile}' {} +)
或者与zsh
:
set -o rematchpcre # where we know for sure \b is supported
for file (backup/**/*(ND.)) {
IFS= read -r line < $file &&
[[ $line =~ "\bStockID\b" ]] &&
[[ $line =~ "\bSellPrice\b" ]] &&
print -r $file
}
或者:
set -o rematchpcre
print -rl backup/**/*(D.e:'
IFS= read -r line < $REPLY &&
[[ $line =~ "\bStockID\b" ]] &&
[[ $line =~ "\bSellPrice\b" ]]':)
或者在本机扩展正则表达式支持,字边界运算符bash
的系统上(在其他系统上,您也可以尝试/或):\<
\>
[[:<:]]
[[:>:]]
\b
RE1='\<StockId\>' RE2='\<SellPrice\>' find backup -type f -exec bash -c '
for file do
IFS= read -r line < "$file" &&
[[ $line =~ $RE1 ]] &&
[[ $line =~ $RE2 ]] &&
printf "%s\n" "$file"
done' bash {} +
答案4
egrep
+ awk
:
egrep -Hrn 'StockID|SellPrice' ./backup | awk -F ':' '$2==1{print $1}'