帐户.txt

Question 1

另一个简单的选择是使用comm;它只需要排序的输入，因此通过过滤“有效帐号”（整行仅包含 9 位数字）为其提供干净的输入，然后在重定向到新文件之前通过管道进行排序：

grep -Ex '[[:digit:]]{9}' account.txt   | sort > account.txt.sorted
grep -Ex '[[:digit:]]{9}' customer.txt  | sort > customer.txt.sorted

...然后按照您的指示使用comm：

{ echo 'Missing Account Number:'; comm -23 account.txt.sorted customer.txt.sorted; }

{ echo 'Extra Customer Number:'; comm -13 account.txt.sorted customer.txt.sorted; }

给定样本输入：

帐户.txt

garbage
876251251
716126181
888281211
666615211
666615211extra
787878787
111212134
extra

客户.txt

garbage
876251251
876251251extra
716126181
792342108
792332668
666615211
760332429
791952441
676702288
junk

结果输出是：

Missing Account Number:
111212134
787878787
888281211

Extra Customer Number:
676702288
760332429
791952441
792332668
792342108

Answer

另一个简单的选择是使用comm;它只需要排序的输入，因此通过过滤“有效帐号”（整行仅包含 9 位数字）为其提供干净的输入，然后在重定向到新文件之前通过管道进行排序：

grep -Ex '[[:digit:]]{9}' account.txt   | sort > account.txt.sorted
grep -Ex '[[:digit:]]{9}' customer.txt  | sort > customer.txt.sorted

...然后按照您的指示使用comm：

{ echo 'Missing Account Number:'; comm -23 account.txt.sorted customer.txt.sorted; }

{ echo 'Extra Customer Number:'; comm -13 account.txt.sorted customer.txt.sorted; }

给定样本输入：

帐户.txt

garbage
876251251
716126181
888281211
666615211
666615211extra
787878787
111212134
extra

客户.txt

garbage
876251251
876251251extra
716126181
792342108
792332668
666615211
760332429
791952441
676702288
junk

结果输出是：

Missing Account Number:
111212134
787878787
888281211

Extra Customer Number:
676702288
760332429
791952441
792332668
792342108

Question 2

是的，这是可能的，也许是最简单的diff。

$ diff account.txt customer.txt
1c1
< **account.txt**
---
> **customer.txt**
5c5,6
< 888281211
---
> 792342108
> 792332668
7,8c8,10
< 787878787
< 111212134
---
> 760332429
> 791952441
> 676702288

$ diff account.txt customer.txt|grep '^<'
< **account.txt**
< 888281211
< 787878787
< 111212134

$ diff account.txt customer.txt|grep '^>'
> **customer.txt**
> 792342108
> 792332668
> 760332429
> 791952441
> 676702288

下面的 shellscriptdiff-script更加完善。

#!/bin/bash

# assuming 9-digit account and customer numbers

sort account.txt  | uniq > account.srt
sort customer.txt | uniq > customer.srt

diff account.srt customer.srt > diff.txt

echo 'only in account.srt:' > result.txt
< diff.txt grep -E '^< [0-9]{9}$' | sed s'/^< //' >> result.txt

echo 'only in customer.srt:' >> result.txt
< diff.txt grep -E '^> [0-9]{9}$' | sed s'/^> //' >> result.txt

echo "The result is in the file 'result.txt'"
echo "You can read it with 'less result.txt'"

演示示例，

$ ./diff-script
The result is in the file 'result.txt'
You can read it with 'less result.txt'

$ cat result.txt 
only in account.srt:
111212134
787878787
888281211
only in customer.srt:
676702288
760332429
791952441
792332668
792342108

Answer

是的，这是可能的，也许是最简单的diff。

$ diff account.txt customer.txt
1c1
< **account.txt**
---
> **customer.txt**
5c5,6
< 888281211
---
> 792342108
> 792332668
7,8c8,10
< 787878787
< 111212134
---
> 760332429
> 791952441
> 676702288

$ diff account.txt customer.txt|grep '^<'
< **account.txt**
< 888281211
< 787878787
< 111212134

$ diff account.txt customer.txt|grep '^>'
> **customer.txt**
> 792342108
> 792332668
> 760332429
> 791952441
> 676702288

下面的 shellscriptdiff-script更加完善。

#!/bin/bash

# assuming 9-digit account and customer numbers

sort account.txt  | uniq > account.srt
sort customer.txt | uniq > customer.srt

diff account.srt customer.srt > diff.txt

echo 'only in account.srt:' > result.txt
< diff.txt grep -E '^< [0-9]{9}$' | sed s'/^< //' >> result.txt

echo 'only in customer.srt:' >> result.txt
< diff.txt grep -E '^> [0-9]{9}$' | sed s'/^> //' >> result.txt

echo "The result is in the file 'result.txt'"
echo "You can read it with 'less result.txt'"

演示示例，

$ ./diff-script
The result is in the file 'result.txt'
You can read it with 'less result.txt'

$ cat result.txt 
only in account.srt:
111212134
787878787
888281211
only in customer.srt:
676702288
760332429
791952441
792332668
792342108

Question 3

对于这项工作我会选择 awk。以下代码仅针对行中 9 个数字的有效数据运行。空行、数字大于或小于 9 的行以及包含字母的行将被忽略。

$ cat account
876251251

716126181
888281211
asdferfggggg
666615211
787878787
123456789123
111212134

$ cat customer
876251251
716126181
eeeeeeeee
792342108
792332668
666615211
760332429

791952441
676702288

$ awk '/^[0-9]{9}$/{a[$0]++;b[$0]="found only in " FILENAME}END{for (i in a) if (a[i]==1) print i,b[i]}' account customer |sort -k2
111212134 found only in account
787878787 found only in account
888281211 found only in account
676702288 found only in customer
760332429 found only in customer
791952441 found only in customer
792332668 found only in customer
792342108 found only in customer

Answer

对于这项工作我会选择 awk。以下代码仅针对行中 9 个数字的有效数据运行。空行、数字大于或小于 9 的行以及包含字母的行将被忽略。

$ cat account
876251251

716126181
888281211
asdferfggggg
666615211
787878787
123456789123
111212134

$ cat customer
876251251
716126181
eeeeeeeee
792342108
792332668
666615211
760332429

791952441
676702288

$ awk '/^[0-9]{9}$/{a[$0]++;b[$0]="found only in " FILENAME}END{for (i in a) if (a[i]==1) print i,b[i]}' account customer |sort -k2
111212134 found only in account
787878787 found only in account
888281211 found only in account
676702288 found only in customer
760332429 found only in customer
791952441 found only in customer
792332668 found only in customer
792342108 found only in customer

帐户.txt

答案1

帐户.txt

客户.txt

答案2

答案3

相关内容