sort
使用和 命令-g
有什么区别-n
?
我已尝试使用这两个标志ls -la
并且输出是相同的。
手册页上说-g
“根据一般数值进行比较”和-n
“根据字符串数值进行比较”?
我不明白他们那是什么意思。
“一般数值”是什么意思?“字符串数值”是什么意思?
答案1
主要的区别在于对数字的处理科学计数法. 从 开始info sort
,使用-n
(数字)排序时
Neither a leading `+' nor exponential notation is recognized. To
compare such strings numerically, use the `--general-numeric-sort'
(`-g') option.
例如,
$ cat file
+1.23e-1
1.23e-2
1.23e-3
1.23e4
1.23e+5
-1.23e6
然后
$ sort -n file
-1.23e6
+1.23e-1
1.23e-2
1.23e-3
1.23e4
1.23e+5
然而
$ sort -g file
-1.23e6
1.23e-3
1.23e-2
+1.23e-1
1.23e4
1.23e+5
答案2
从sort
信息页面,排序-g
由这些解释
‘-g’
‘--general-numeric-sort’
‘--sort=general-numeric’
Sort numerically, converting a prefix of each line to a long
double-precision floating point number. *Note Floating point::.
Do not report overflow, underflow, or conversion errors. Use the
following collating sequence:
• Lines that do not start with numbers (all considered to be
equal).
• NaNs (“Not a Number” values, in IEEE floating point
arithmetic) in a consistent but machine-dependent order.
• Minus infinity.
• Finite numbers in ascending numeric order (with -0 and +0
equal).
• Plus infinity.
Use this option only if there is no alternative; it is much slower
than ‘--numeric-sort’ (‘-n’) and it can lose information when
converting to floating point.
sort -n
是我们通常期望的自然排序
‘-n’
‘--numeric-sort’
‘--sort=numeric’
Sort numerically. The number begins each line and consists of
optional blanks, an optional ‘-’ sign, and zero or more digits
possibly separated by thousands separators, optionally followed by
a decimal-point character and zero or more digits. An empty number
is treated as ‘0’. The ‘LC_NUMERIC’ locale specifies the
decimal-point character and thousands separator. By default a
blank is a space or a tab, but the ‘LC_CTYPE’ locale can change
this.
Comparison is exact; there is no rounding error.
Neither a leading ‘+’ nor exponential notation is recognized. To
compare such strings numerically, use the ‘--general-numeric-sort’
(‘-g’) option.
查看Steeldriver 的回答以获得更好的解释。
答案3
从手册sort
:
‘-n’
‘--numeric-sort’
‘--sort=numeric’按数字排序。每行开头的数字由可选的空格、可选的“-”号和零个或多个数字组成,这些数字可能由千位分隔符分隔,后面可选跟一个小数点字符和零个或多个数字。空数被视为“0”。语言
LC_NUMERIC
环境指定小数点字符和千位分隔符。默认情况下,空格是空格或制表符,但语言LC_CTYPE
环境可以更改这一点。比较准确;没有舍入误差。
前导“+”和指数符号均无法识别。要以数字方式比较此类字符串,请使用
--general-numeric-sort
(-g
) 选项。
和;
'-G'
‘--general-numeric-sort’(一般数字排序)
‘--sort=general-numeric’按数字排序,将每行的前缀转换为长双精度浮点数。请参阅浮点。不报告溢出、下溢或转换错误。使用以下排序顺序:
- 不以数字开头的行(均视为相等)。
- NaN(IEEE 浮点算术中的“非数字”值)以一致但依赖于机器的顺序排列。
- 负无穷。
- 按升序排列的有限数(-0 和 +0 相等)。
- 正无穷。
仅在没有其他选择时才使用此选项;它比
--numeric-sort
(-n
)慢得多,并且在转换为浮点时可能会丢失信息。
因此,似乎-g
由于精度损失,使用可能会造成比较不正确,但无论出于何种原因,我都无法产生这样的结果:
$ printf "%s\n" 1 1.23 1.234 1.2345 1.23456 1.234567 1.2345678 1.23456789 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888878888888888 | sort -g
1
1.23
1.234
1.2345
1.23456
1.234567
1.2345678
1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888878888888888
1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
1.23456789
sort -g
正确地将第二个长小数放在第一个长小数之前,但两者之间的差异远远超出了 a 的精度double
:
$ cat test.cpp
#include<iostream>
using namespace std;
int main()
{
cout << (1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888887888888888888888888888 < 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888) << endl;
cout << (1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888887888888888888888888888 > 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888) << endl;
}
$ make test
g++ test.cpp -o test
$ ./test
0
0