我有一个csv
包含不同收入的文件。我想根据收入对 csv 文件进行排序高的到低的价值。我无法找到如何在不使用 python 的情况下在终端中执行此操作。
我不想使用Python。
我想使用简单的东西,比如mlr
// sed
。awk
输入:
name,location,capital,profit-lost,revenue,employees,year
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
输出:
name,location,capital,profit-lost,revenue,employees,year
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
收入空到几十亿。
希望有人也能帮助我解决这个问题
答案1
因此,您希望按数字降序对收入进行(稳定)排序,这听起来在 Miller 中应该很容易,除了它null 处理规则说:
具有一个或多个空排序字段值的记录在具有所有排序字段值的记录之后排序
这意味着他们排序第一的按降序排列:
$ mlr --csv sort -nr revenue file.csv
name,location,capital,profit-lost,revenue,employees,year
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
然而使用然后链接使用将数字 0 分配给空收入的键来装饰-排序-取消装饰很简单:
$ mlr --csv put '$key = is_empty($revenue) ? 0 : $revenue' \
then sort -nr key then cut -x -f key file.csv
name,location,capital,profit-lost,revenue,employees,year
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
答案2
使用sort
:
cat input.csv | (sed -u 1q; sort -t, -r -n -k5)
需要sed -u 1q
忽略sort
标头。它基本上意味着,处理第一行并退出,然后将剩余的传递给sort
.-u
是 的缩写--unbuffered
,告诉sed
不要缓冲线路。
排序的标志:
-t,
将分隔符指定为逗号。-r
使排序后的输出降序排列。默认情况下是升序的。-n
按数字排序。-k5
对第五个键/列进行排序。
演示:
$ cat input.csv | (sed -u 1q; sort -t, -r -n -k5)
name,location,capital,profit-lost,revenue,employees,year
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company5,location5,368506.18,11997.04,,,2019
company4,location4,1050987.60,426317.61,,24.90,2021
company2,location2,-155921.70,-146.03,,,2020
company1,location1,35527.19,-33226.25,,0.70,2020
答案3
使用所有 Unix 系统上可用的强制 POSIX 工具:
$ { head -n 1; sort -t, -k5,5rn; } < file
name,location,capital,profit-lost,revenue,employees,year
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
请参阅下面的评论和head 读取的输入行数可以多于输出行数吗?有关上述脚本的其他重要信息。
答案4
使用乐(以前称为 Perl_6)
~$ raku -e 'lines.head.put; my @a = lines(); .put for @a.sort(-*.split(",")[4]);' file
#OR
~$ raku -e 'lines.head.put; .put for lines.sort(-*.split(",")[4]);' file
简而言之,第一行line
(标题行)被读取并立即输出put
。然后读取其余行。在第一个示例中,值行存储在@a
数组中。在第二个示例中,行直接排序。该sort
函数采用映射器,这里对split
逗号所在的行进行排序,然后采用第五列(零索引 = 4)。排序是按字母顺序排序的,因此在排序标准前面加上+
或会强制进行数字比较(例如)。负号用于反转排序顺序(降序而不是升序)。-
.sort(+*.split(",")[4]
-*.
输入示例:
name,location,capital,profit-lost,revenue,employees,year
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
示例输出:
name,location,capital,profit-lost,revenue,employees,year
company8,location8,6161574.62,906591.96,124804038.64,51.30,2021
company6,location6,7965648.89,369947.14,64413602.44,103.30,2019
company3,location3,1873134.74,778424.56,13320152.32,16.90,2020
company7,location7,1531534.27,125750.94,3054307.36,12.10,2020
company1,location1,35527.19,-33226.25,,0.70,2020
company2,location2,-155921.70,-146.03,,,2020
company4,location4,1050987.60,426317.61,,24.90,2021
company5,location5,368506.18,11997.04,,,2019
对于更复杂的 CSV 文件:
~$ raku -MText::CSV -e 'my @a = csv(in => $*IN); @a[1..*] = @a[1..*].sort(-*.[4]); csv(in => @a, out => $*OUT);' < file
https://docs.raku.org/routine/lines
https://docs.raku.org/routine/split
https://github.com/Tux/CSV
https://raku.org