通过每行上存在的有序子字符串对文件进行稳定排序

通过每行上存在的有序子字符串对文件进行稳定排序

我有一个正在使用脚本处理的声源列表。一个例子是:

alsa_input.usb-AVEO_Technology_Corp._USB2.0_Camera-02-Camera.analog-mono
alsa_input.pci-0000_00_14.2.analog-stereo
alsa_input.usb-Plantronics_Plantronics_GameCom_780-00-P780.analog-stereo

我想按任意位置的子字符串对它们进行排序。

例如,我想sort --by usb file.txt将 USB 设备放在第一位(同时保留它们的顺序):

alsa_input.usb-AVEO_Technology_Corp._USB2.0_Camera-02-Camera.analog-mono
alsa_input.usb-Plantronics_Plantronics_GameCom_780-00-P780.analog-stereo
alsa_input.pci-0000_00_14.2.analog-stereo

我希望能够指定多个子字符串以获得更细粒度的优先级。因此,sort --by Platronics --by usb file.txt将首先放置任何包含“Platronics”的行,然后是包含“usb”的行,然后是其余的行。

我可以使用任何类型的命令行实用程序来完成此操作吗?

答案1

听起来你想要一个评分系统。编写一个脚本为每一行分配一个分数,指示您希望在输出中多久看到它。  awk看起来很适合这份工作。对于你的例子:

#!/bin/sh
awk '{score=0}
    /usb/ {score=1}
    /Plantronics/ {score=2}
    {print score, NR, $0}' "$@"

默认情况下,这会为每行分配 0 分,然后如果该行包含 则用 1 覆盖它,usb 如果该行包含 则用 2覆盖它Plantronics。我已按该顺序放置了usbandPlantronics语句,因此,如果一行包含两个字符串,则最终值将为 2。然后

./score file.txt | sort -k1nr -k2n | cut -d" " -f3-

(其中score是脚本的名称)。  sort -k1nr意味着根据第一个字段(分数)进行排序,将其视为数字并首先对较高的值进行排序(因为脚本score为您最感兴趣的行分配了高分)。  -k2n意味着,对于第一个字段中具有相同值的行,按第二个字段排序,作为正常升序的数字。第二个字段是NR,记录号(也称为行号)。这确保了具有相同分数的行(例如,包含usb但不包含 的行Plantronics)按其原始顺序出现。如果您不关心这一点,请NR,从语句中删除 ,从命令中print删除,然后将 更改为。 (实际上,默认情况下可能会保留这样的顺序,因此您可能根本不需要它。)当然,这会去除脚本在数据前面添加的数字。-k2nsort-f3--f2-sortcut -d" " -f3-score

如果您不完全理解这是如何工作的,请尝试运行

./score file.txt

./score file.txt | sort -k1nr -k2n

这种做法相当灵活。例如,上面的代码将按顺序生成:

  • 所有包含Plantronics,的行
  • 所有包含usb(但不包含Plantronics)的行,以及
  • 所有不包含上述内容的行,

每个组按输入文件中的出现顺序排序。但是,通过score如下更改脚本,

#!/bin/sh
awk '{score=0}
    /usb/ {score+=1}
    /Plantronics/ {score+=2}
    {print score, NR, $0}' "$@"

我们可以为包含两个字符串的行分配 3 分,所以现在我们有

  • 所有包含的行Plantronics usb, 其次是
  • 所有包含Plantronics(但不包含usb)的行,后跟
  • 所有包含usb(但不包含Plantronics)的行,然后
  • 所有不包含上述内容的行。

答案2

sqliteandORDER BY子句:

$sqlite3 <<\EOT
CREATE TABLE file(line);
.import file.txt file
SELECT * FROM file
ORDER BY
CASE
    WHEN line LIKE '%USB%' THEN 0
    WHEN line LIKE '%Realtek%' THEN 1
    ELSE 3
END;
EOT
00:10.0 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 10)
00:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 10)
00:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 10)
00:00.0 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. PT890 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237/VX700 PCI Bridge
00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
01:00.0 VGA compatible controller: VIA Technologies, Inc. CN700/P4M800 Pro/P4M800 CE/VN800 Graphics [S3 UniChrome Pro] (rev 01)

答案3

你所做的并不是真正的排序,而是选择。因此,您要做的就是首先选择包含“Platronics”的所有行,然后选择包含“usb”的所有行,依此类推,最后选择所有不匹配的行。

我不知道有哪个命令可以一次性执行此操作,但您可以使用python它编写一个小脚本来执行您想要的操作:

import sys

substrings = sys.argv[1:]
lines = sys.stdin.readlines()

# for each substring, print matching lines
for substring in substrings:
    i=0
    while i<len(lines):
        if substring in lines[i]:
            # match: print and remove
            sys.stdout.write(lines[i])
            del lines[i]
        else:
            # no match: try next line
            i += 1

# finally, print all lines that weren't matched
for line in lines:
    sys.stdout.write(line)

它做你想做的事:

$ lspci | python substrings.py "USB controller" "PCI bridge"
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:00.0 Host bridge: Intel Corporation 82Q33 Express DRAM Controller (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82Q33 Express Integrated Graphics Controller (rev 02)
00:02.1 Display controller: Intel Corporation 82Q33 Express Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82Q33 Express MEI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA Controller [IDE mode] (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA Controller [IDE mode] (rev 02)

答案4

猫文件:

22
z
there's
a
moose
loose
in
the
hoose

例如。将 args 设置为:loose moose- 由脚本读取"$@" 您可以使用任意多个。

by=( "$@" )
nl file | sed -nf <(for ((i=0;i<${#by[@]};i++)) ;do 
                      echo "s/.*${by[i]}.*/$i\t&/; t p"
                    done; echo "s/^/$i\t&/; :p p") |
            sort -nk1 -nk2 | cut -f3-

输出:

loose
moose
22
z
there's
a
in
the
hoose

相关内容