如何在shell脚本中查找两个文件匹配的数据并在shell中的另一个文件中查找重复的数据存储?
#!/bin/bash
file1="/home/vekomy/santhosh/bigfiles.txt"
file2="/home/vekomy/santhosh/bigfile2.txt"
while read -r $file1; do
while read -r $file2 ;do
if [$file1==$file2] ; then
echo "two files are same"
else
echo "two files content different"
fi
done
done
我写了代码,但没有成功。怎么写呢?
答案1
要测试两个文件是否相同,请使用cmp -s
:
#!/bin/bash
file1="/home/vekomy/santhosh/bigfiles.txt"
file2="/home/vekomy/santhosh/bigfile2.txt"
if cmp -s "$file1" "$file2"; then
printf 'The file "%s" is the same as "%s"\n' "$file1" "$file2"
else
printf 'The file "%s" is different from "%s"\n' "$file1" "$file2"
fi
标志-s
tocmp
将使该实用程序“静音”。当比较两个相同的文件时,退出状态cmp
将为零。上面的代码中使用它来打印有关两个文件是否相同的消息。
如果你的两个输入文件包含路径名列表您想要比较的文件,然后使用双循环,如下所示:
#!/bin/bash
filelist1="/home/vekomy/santhosh/bigfiles.txt"
filelist2="/home/vekomy/santhosh/bigfile2.txt"
mapfile -t files1 <"$filelist1"
while IFS= read -r file2; do
for file1 in "${files1[@]}"; do
if cmp -s "$file1" "$file2"; then
printf 'The file "%s" is the same as "%s"\n' "$file1" "$file2"
fi
done
done <"$filelist2" | tee file-comparison.out
在这里,结果同时在终端和文件中生成file-comparison.out
。
假设两个输入文件中的路径名不包含任何嵌入的换行符。
files1
该代码首先使用 ,将其中一个文件中的所有路径名读取到数组 中mapfile
。我这样做是为了避免多次读取该文件,因为我们必须遍历另一个文件中每个路径名的所有这些路径名。您会注意到,$filelist1
我只是迭代数组中的名称,而不是从内部循环中读取files1
。
答案2
最简单的方法是使用命令diff
。
例子:
让我们假设第一个文件是file1.txt
并且他包含:
I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.`
和第二个文件file2.txt
I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.
然后我们可以使用 diff 自动显示两个文件之间哪些行不同,命令如下:
diff file1.txt file2.txt
输出将是:
2,4c2,4
< I need to run the laundry.
< I need to wash the dog.
< I need to get the car detailed.
---
> I need to do the laundry
> I need to wash the car.
> I need to get the dog detailed.
我们来看看这个输出意味着什么。要记住的重要一点是,当 diff 向您描述这些差异时,它是在规定的上下文中这样做的:它告诉您如何更改第一个文件以使其与第二个文件匹配。 diff 输出的第一行将包含:
- 对应于第一个文件的行号,
- 一个字母(a 表示添加,c 表示更改,d 表示删除)
- 与第二个文件对应的行号。
在我们上面的输出中,“2,4c2,4”意思是:“线2通过4第一个文件中的内容需要更改为匹配行2通过4在第二个文件中。”然后它告诉我们每个文件中这些行的内容:
- 前面有 < 的行是来自第一个文件的行;
- > 前面的行是第二个文件中的行。
- 三个破折号(“---”)仅分隔文件 1 和文件 2 的行。
答案3
这是一个用于比较文件的纯 bash shell 脚本:
#!/usr/bin/env bash
# @(#) s1 Demonstrate rudimentary diff using shell only.
# Infrastructure details, environment, debug commands for forum posts.
# Uncomment export command to run as external user: not context, pass-fail.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f "$C" ] && $C
set -o nounset
FILE1=${1-data1}
shift
FILE2=${1-data2}
# Display samples of data files.
pl " Data files:"
head "$FILE1" "$FILE2"
# Set file descriptors.
exec 3<"$FILE1"
exec 4<"$FILE2"
# Code based on:
# http://www.linuxjournal.com/content/reading-multiple-files-bash
# Section 2, solution.
pl " Results:"
eof1=0
eof2=0
count1=0
count2=0
while [[ $eof1 -eq 0 || $eof2 -eq 0 ]]
do
if read a <&3; then
let count1++
# printf "%s, line %d: %s\n" $FILE1 $count1 "$a"
else
eof1=1
fi
if read b <&4; then
let count2++
# printf "%s, line %d: %s\n" $FILE2 $count2 "$b"
else
eof2=1
fi
if [ "$a" != "$b" ]
then
echo " File $FILE1 and $FILE2 differ at lines $count1, $count2:"
pe "$a"
pe "$b"
# exit 1
fi
done
exit 0
生产:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.9 (jessie)
bash GNU bash 4.3.30
-----
Data files:
==> data1 <==
I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.
==> data2 <==
I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.
-----
Results:
File data1 and data2 differ at lines 2, 2:
I need to run the laundry.
I need to do the laundry.
File data1 and data2 differ at lines 3, 3:
I need to wash the dog.
I need to wash the car.
File data1 and data2 differ at lines 4, 4:
I need to get the car detailed.
I need to get the dog detailed.
如果您希望查看所读取的每一行,可以删除特定命令的注释,以便在看到第一个差异时退出。
参见页面http://www.linuxjournal.com/content/reading-multiple-files-bash有关文件描述符(例如“&3”)的详细信息。
最美好的祝愿...干杯,drl