Reason for giving me more than 2 letters after sorting in regex

Reason for giving me more than 2 letters after sorting in regex

I have a text file that is named main.txt and it contains the strings:

1246
12
43
1

I have written the code:

cat main.txt | egrep "[12-4]{,2}"

I got the following output:

1246
12
43
1

Can you please explain me why I have a line that contains in a range of more than 2 numbers betweem 1-4?
(the one that caught *124*6)

答案1

There are a couple of things going on here.

First, by default, grep will output whole lines that contain a match anywhere (you can change this using the -o or --only-matching flag).

Second, the quantifier {,2} matches from zero to two times - so every line of any file will match (even non-numeric lines). For example

$ echo a | egrep "[12-4]{,2}"
a

If you try this yourself with grep colors enabled, you will notice that the a is not colored: it's not a that is matching, it's zero occurrences of [12-4].

If you want to output only lines consisting of up to two digits in the set [12-4] exactly1 then you can either use line anchors, or add -x or --line-regexp whole-line flags:

$ cat main.txt 
1246
12
43
1

$ egrep '^[12-4]{,2}$' main.txt 
12
43
1

$ egrep -x '[12-4]{,2}' main.txt 
12
43
1

Note that both of these will match empty lines as well (since an empty line consists of exactly zero digits).


1 note that [12-4] is equivalent to [1234] and could be written more succinctly as [1-4]

答案2

GNU regex does not seem to allow the omission of the min value. What I suspect it happens there is that a scanf is invoked to read the value and it ends up with a zero and then it loops from zero, hence matching 3 values instead of just 2 (didn't test this theory).

But the fault here is that you didn't specify the minimum repetition value... You are only allowed to skip the maximum one, when you don't want to impose a maximum value.

相关内容