我有两个文件 A 和 B,它们几乎相同,但有些行不同,有些行混乱。由于这两个文件是 systemverilog 文件,因此这些行还包含特殊字符,例如; , = +
等。
我想循环遍历 fileA 的每一行并检查 fileB 中是否有相应的匹配项。比较应遵循规则
- 行首和行尾的空格可以忽略。
- 单词之间的多个空格/制表符可以被视为单个空格。
- 空行可以忽略
结果应显示 fileA 中存在但 fileB 中不存在的行。
我尝试过tkdiff
,但由于有些线条很混乱,因此显示出许多差异。
答案1
我无法说出它的便携性,但我试图涵盖所有基础。我根据您的信息尽力在测试中复制了这两个文件。如果您在 sed 中遇到特殊字符问题,可以在 cleanLine 函数的第二行中将其转义。
#!/bin/bash
# compare two files and return lines in
# first file that are missing in second file
ProgName=${0##*/}
Pid=$$
CHK_FILE="$1"
REF_FILE="$2"
D_BUG="$3"
TMP_FILE="/tmp/REF_${Pid}.tmp"
declare -a MISSING='()'
m=0
scriptUsage() {
cat <<ENDUSE
$ProgName <file_to_check> <reference_file> [-d|--debug]
Lines in 'file_to_check' not present in 'reference_file'
are printed to standard output.
file_to_check: File being checked
reference_file: File to be checked against
-d|--debug: Run script in debug mode (Optional)
-h|--help: Print this help message
ENDUSE
}
# delete temp file on any exit
trap 'rm $TMP_FILE > /dev/null 2>&1' EXIT
#-- check args
[[ $CHK_FILE == "-h" || $CHK_FILE == "--help" ]] && { scriptUsage; exit 0; }
[[ -n $CHK_FILE && -n $REF_FILE ]] || { >&2 echo "Not enough arguments!"; scriptUsage; exit 1; }
[[ $D_BUG == "-d" || $D_BUG == "--debug" ]] && set -x
[[ -s $CHK_FILE ]] || { >&2 echo "File $CHK_FILE not found"; exit 1; }
[[ -s $REF_FILE ]] || { >&2 echo "File $REF_FILE not found"; exit 1; }
#--
#== edit temp file to 3 match comparison rules
# copy ref file to temp for editing
cp "$REF_FILE" $TMP_FILE || { >&2 echo "Unable to create temporary file"; exit 1; }
# rule 3 - ignore empty lines
sed -i '/^\s*$/d' $TMP_FILE
# rule 1 - ignore begin/end of line spaces
sed -i 's/^[[:space:]][[:space:]]*//;s/[[:space:]][[:space:]]*$//' $TMP_FILE
# rule 2 - multi space/tab as single space
sed -i 's/[[:space:]][[:space:]]*/ /g' $TMP_FILE
#==
# function to clean LINE to match 3 rules
# & escape '/' and '.' for later sed command
cleanLine() {
var=$(echo "$1" | sed 's/^[[:space:]][[:space:]]*//;s/[[:space:]][[:space:]]*$//;s/[[:space:]][[:space:]]*/ /g')
echo $var | sed 's/\//\\\//g;s/\./\\\./g'
}
### parse check file
while IFS='' read -r LINE || [[ -n $LINE ]]
do
if [[ -z $LINE ]]
then
continue
else
CLN_LINE=$(cleanLine "$LINE")
FOUND=$(sed -n "/$CLN_LINE/{p;q}" $TMP_FILE)
[[ -z $FOUND ]] && MISSING[$m]="$LINE" && ((m++))
FOUND=""
fi
done < "$CHK_FILE"
###
#++ print missing line(s) (if any)
if (( $m > 0 ))
then
printf "\n Missing line(s) found:\n"
#*SEE BELOW ON THIS
for (( p=0; $p<$m; p++ ))
do
printf " %s\n" "${MISSING[$p]}"
done
echo
else
printf "\n **No missing lines found**\n\n"
fi
#* using 'for p in ${MISSING[@]}' causes:
#* "SPACED LINES" to become:
#* "SPACED"
#* "LINES" when printed to stdout!
#++
答案2
一个简单的解决方案:
diff -bB fileA fileB | grep -v '^>'
-b
(或--ignore-space-change
) 表示“忽略空白量的变化”。
-B
(或--ignore-blank-lines
) 表示“忽略行全为空白的更改”。
grep -v '>'
删除 fileB 中但 fileA 中不存在的行的报告。
这不会忽略前导空格,但在其他方面它接近您似乎想要的。
如果“B 中存在但 A 中不存在的行
是也很有趣”,为什么不直接做diff -bB fileA fileB
而不是做一半的 diff 并做两次呢?
答案3
diff -w file1 file2
标志-w
todiff
将使其忽略空白字符(这是大多数diff
实现实现的扩展)。
输入以下内容:
file1
:
hello world
abc
123
this is line 2 (the last line)
file2
:
hello world
abc
123
this is line 3 (the last line)
该命令生成
6c6
< this is line 2 (the last line)
---
> this is line 3 (the last line)
要使其忽略空行,请通过删除空行来预处理输入文件。使用能够理解进程替换的 shell(例如bash
或ksh93
):
diff -w <( sed '/^[[:space:]]*$/d' file1 ) <( sed '/^[[:space:]]*$/d' file2 )
如果您diff
有忽略空行的选项(-B
如果您使用的是 GNU ,请在手册中查找diff
),然后使用它。我的没有这样的选项。
答案4
这是 bash 脚本。我没有验证论点$1
和$2
。您需要验证是否存在两个文件。我没有进行太多测试,但我想这里满足了你的 3 个条件。这是源代码。如果两个文件相等则脚本返回 0,否则返回 1。echo $?
运行脚本后即可运行。
#!/bin/bash
code=0;
n=1;
dstcount=`wc -l $2 | awk '{print $1}'`
while read line
do
#remove spaces from the beginning of line and compress tab/spaces
src=`echo $line | tr '\t' ' ' | tr -s ' '`
dst=`echo $(sed -n "$n"p $2) | tr '\t' ' ' | tr -s ' '`
if [ -z "$src" ]
then
continue
#advance to next line in source file
fi
if [ -z "$dst" ]
then
#advance to next in destination file
while [ $n -le $dstcount ]
do
dst=`echo $(sed -n "$n"p $2) | tr '\t' ' ' | tr -s ' '`
if [ ! -z "$dst" ]
then
break;
fi
n=`expr $n + 1`
done
if [ $n -gt $dstcount ]
then
code=1
break
fi
fi
if [ ! "$src" == "$dst" ]
then
code=1
break
fi
n=`expr $n + 1`
done < $1
exit $code;