在 sed 中重命名以允许在替换之前进一步匹配

Question 1

需要renamefile根据长度重新排序，首先替换较长的名称

awk '{ print length, $0 }' renamefile| sort -nr | cut -d" " -f2- > renamefile2

输出

s/\<Strawberry juice with lemon\>/3076/g
s/\<Orange juice with pulp\>/3072/g
s/\<Apple juice with lemon\>/3075/g
s/\<Watermelon juice\>/3074/g
s/\<Orange juice\>/3073/g
s/\<Apple juice\>/3071/g

然后就可以申请了，没有任何问题

sed -f renamefile2 fileA

描述：

awk循环行

length是 awk 的内置函数。当不带参数调用时，它将打印当前行的大小（更多信息位于awk 长度）
$0当前行

以下命令将在该行本身旁边打印每行的长度

awk '{ print length, $0 }' renamefile

24 s/\<Apple juice\>/3071/g
35 s/\<Orange juice with pulp\>/3072/g
25 s/\<Orange juice\>/3073/g

sort将对输入文本进行排序

-n将按数字排序
-r反转结果，使其下降。

cut将选择文本的一部分（因为我们不需要最终脚本中的长度，并且只需要选择sed行的一部分）

-d" "指定此处的分隔符space。
-f2-从字段 2 到行尾

Answer

需要renamefile根据长度重新排序，首先替换较长的名称

awk '{ print length, $0 }' renamefile| sort -nr | cut -d" " -f2- > renamefile2

输出

s/\<Strawberry juice with lemon\>/3076/g
s/\<Orange juice with pulp\>/3072/g
s/\<Apple juice with lemon\>/3075/g
s/\<Watermelon juice\>/3074/g
s/\<Orange juice\>/3073/g
s/\<Apple juice\>/3071/g

然后就可以申请了，没有任何问题

sed -f renamefile2 fileA

描述：

awk循环行

length是 awk 的内置函数。当不带参数调用时，它将打印当前行的大小（更多信息位于awk 长度）
$0当前行

以下命令将在该行本身旁边打印每行的长度

awk '{ print length, $0 }' renamefile

24 s/\<Apple juice\>/3071/g
35 s/\<Orange juice with pulp\>/3072/g
25 s/\<Orange juice\>/3073/g

sort将对输入文本进行排序

-n将按数字排序
-r反转结果，使其下降。

cut将选择文本的一部分（因为我们不需要最终脚本中的长度，并且只需要选择sed行的一部分）

-d" "指定此处的分隔符space。
-f2-从字段 2 到行尾

Question 2

/如果产品名称后面总是有两位数字，后面跟着一个，然后再有两位数字，您可以将它们包含在正则表达式中，并使用反向引用将它们替换为自身。

您还可以匹配前面的四个空格字符并将它们替换为自身。

重新命名文件：

s/( {4})Apple juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13071\2/
s/( {4})Orange juice with pulp( [[:digit:]]{2}\/[[:digit:]]{2})/\13072\2/
s/( {4})Orange juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13073\2/
s/( {4})Watermelon juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13074\2/
s/( {4})Apple juice with lemon( [[:digit:]]{2}\/[[:digit:]]{2})/\13075\2/
s/( {4})Strawberry juice with lemon( [[:digit:]]{2}\/[[:digit:]]{2})/\13076\2/

输出：

$ sed -Ef renamefile fileA
AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

Answer

/如果产品名称后面总是有两位数字，后面跟着一个，然后再有两位数字，您可以将它们包含在正则表达式中，并使用反向引用将它们替换为自身。

您还可以匹配前面的四个空格字符并将它们替换为自身。

重新命名文件：

s/( {4})Apple juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13071\2/
s/( {4})Orange juice with pulp( [[:digit:]]{2}\/[[:digit:]]{2})/\13072\2/
s/( {4})Orange juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13073\2/
s/( {4})Watermelon juice( [[:digit:]]{2}\/[[:digit:]]{2})/\13074\2/
s/( {4})Apple juice with lemon( [[:digit:]]{2}\/[[:digit:]]{2})/\13075\2/
s/( {4})Strawberry juice with lemon( [[:digit:]]{2}\/[[:digit:]]{2})/\13076\2/

输出：

$ sed -Ef renamefile fileA
AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

Question 3

使用 awk 更简单：

$ cat tst.awk
BEGIN {
    id = 3071
    map["Apple juice"]                  = id++
    map["Orange juice with pulp"]       = id++
    map["Orange juice"]                 = id++
    map["Watermelon juice"]             = id++
    map["Apple juice with lemon"]       = id++
    map["Strawberry juice with lemon"]  = id++
}
match($0,/^((\S+\s+){3})(.*\S)((\s+\S+){2})/,a) {
    $0 = a[1] map[a[3]] a[4]
    print
}

$ awk -f tst.awk file
AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

上面使用了 GNU awk，因为您使用的是 GNU sed for\<和\>单词边界。

Answer

使用 awk 更简单：

$ cat tst.awk
BEGIN {
    id = 3071
    map["Apple juice"]                  = id++
    map["Orange juice with pulp"]       = id++
    map["Orange juice"]                 = id++
    map["Watermelon juice"]             = id++
    map["Apple juice with lemon"]       = id++
    map["Strawberry juice with lemon"]  = id++
}
match($0,/^((\S+\s+){3})(.*\S)((\s+\S+){2})/,a) {
    $0 = a[1] map[a[3]] a[4]
    print
}

$ awk -f tst.awk file
AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

上面使用了 GNU awk，因为您使用的是 GNU sed for\<和\>单词边界。

Question 4

使用 GNU sed，我们首先动态修改重命名文件（意味着您不必对其进行任何手动编辑），然后将其用作 sed 代码来执行文件中的编辑A

我们在重命名文件中所做的更改是寻找换行符作为 RHS 边界而不是 \>。但在此之前，我们在 fileA 的模式空间中插入一个换行符。

$ sed -re '
     1i\
s/(\\s+\\S+){2}\\s*$/\\n&/
     s/\\>/\\n/
' renamefile | sed -rf - fileA

输出：

AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

Answer

使用 GNU sed，我们首先动态修改重命名文件（意味着您不必对其进行任何手动编辑），然后将其用作 sed 代码来执行文件中的编辑A

我们在重命名文件中所做的更改是寻找换行符作为 RHS 边界而不是 \>。但在此之前，我们在 fileA 的模式空间中插入一个换行符。

$ sed -re '
     1i\
s/(\\s+\\S+){2}\\s*$/\\n&/
     s/\\>/\\n/
' renamefile | sed -rf - fileA

输出：

AB12345    100    0    3071 20/05   AB
CD67890    150    0    3072 22/05   CS
EF25879    100    0    3074 19/05   CG
GH96314    98    0    3073 20/05   PU
IJ74123    95    0    3076 17/05   ST

在 sed 中重命名以允许在替换之前进一步匹配

答案1

答案2

答案3

答案4

相关内容