在 LINUX 中重新排列由 : 分隔的列

在 LINUX 中重新排列由 : 分隔的列

问题是如何按所需顺序重新排列列和值。

输入

"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"

预期输出应为 a、b、c、d 及其相应值。

"a":"val1"|"b":"val3"|"c":"val2"|"d":"val4"
"a":"val1"|"b":[]|"c":"val3"|"d":"val4"
"a":"val1"|"b":"val4"|"c":"val3"|"d":["val2","val32]
"a":"val2"|"b":"val4"|"c":"val3"|"d":"val1"

答案1

由于您的问题随着时间的推移发生了很大的变化,我将尝试解决三个不同的问题。

你的尝试1

您的awk命令试图在出现admin:.即使它有意义,您也只能引用字段$1和,因为每一行中$2只出现一次。admin:

您可能正在寻找以下内容:

printf '%s\n' '"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"' |
  sed 's/"//g' |
  awk -F' ,' -v OFS='|' '{if ($2~/name:/){print $1,$3,$4,$2} else {$1=$1; print $0}}'

当然,这可能不是一个好主意:/name:/匹配任何包含 name:,而不仅仅是确切的标签name:

无论如何,这看起来像一个XY问题


重新排列列

这是一个awk解决方案,您可以自定义该解决方案来选择列并重新排序,假设它们来自分隔文本文件

它假设输入数据中的字段不能包含任何",。根据您发布的代码,这听起来很合理1,但实际上似乎并非如此。您应该求助于一些专门用于处理结构化数据的工具(见下文),例如csvkit对于 CSV 或杰克对于 JSON(感谢猕猴桃以获得提示)。

鉴于脚本prog_file

BEGIN {
                        # Create an array of labels for the fileds you want
                        # to keep, in the order you want to print them
    labels[1] = "\"_id\""
    labels[2] = "\"admin\""
    labels[3] = "\"creat\""
    labels[4] = "\"name\""
}
{
                        # Split any field on ":" and make an array of
                        # full fields indexed by their label.
                        # This assumes labels DO NOT CONTAIN any ":"
    for ( i=1; i<=NF; i++ ) {
        split($i, chunks, ":")
        fields[chunks[1]] = $i
    }
                        # Reset the record
    $0 = ""
                        # Re-build the record with only the fields
                        # whose labels are in the array we defined in
                        # the BEGIN block.
                        # Explicitly use "4" as the upper bound because
                        # POSIX does not specify the order in which
                        # "for (var in array)" assigns indexes to var
    for ( i=1; i<=4; i++ ) {
        $i = fields[labels[i]]
    }
                        # Strip any double quote
    gsub("\"","")
    print $0
}

和输入2

"_id":"123" ,"admin":[src] ,"creat":"date1" ,"name":"dedu"
"_id":"2w3" ,"admin":[analise] ,"creat":"date2" ,"name":"csv"
"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"
"_id":"scd" ,"admin":[] ,"creat":"date4" ,"name":"tzpi"

调用:

awk -v FS=' ,' -v OFS='|' -f prog_file input_file

给出3

_id:123|admin:[src]|creat:date1|name:dedu
_id:2w3|admin:[analise]|creat:date2|name:csv
_id:asc|admin:[]|creat:date3|name:enygren
_id:scd|admin:[]|creat:date4|name:tzpi

处理数据格式

最后的您编辑到问题中的输入数据样本似乎不是来自分隔文本文件。它看起来像一个 JSON 对象列表。
尽管是人类可读的,JSON是一个数据格式并需要不同的方法——实际上,上述awk解决方案不适用于该输入。

添加位结构后,您的示例可以转换(返回?)为有效的 JSON:

$ cat file
"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"

(请注意,我认为缺少"in"d":["val2","val32]是一个拼写错误,因此使用了"d":["val2","val32"]它)。

$ sed 's/^/{/; s/$/},/; 1 s/^/[/; $ s/,$/]/' file >tmpfile
$ cat tmpfile 
[{"a":"val1","c":"val2","b":"val3","d":"val4"},
{"a":"val1","b":[],"c":"val3","d":"val4"},
{"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"},
{"d":"val1","a":"val2","c":"val3","b":"val4"}]

然后,安全的方法是使用 JSON 处理器来jq过滤和重新排序数据:

$ jq -r '.[] | {a: .a, b: .b, c: .c, d: .d} | @text' tmpfile
{"a":"val1","b":"val3","c":"val2","d":"val4"}
{"a":"val1","b":[],"c":"val3","d":"val4"}
{"a":"val1","b":"val4","c":"val3","d":["val2","val32"]}
{"a":"val2","b":"val4","c":"val3","d":"val1"}

删除剩余的开括号和闭括号很简单而且安全,而它不会安全盲目删除双引号 ( ") 或用竖线 ( ,|) 替换逗号以完全匹配您的示例输出。


1来自问题修订号 47号
2从问题的最后部分推断修订号 6
3从问题来看修订号 6

答案2

抱歉,如果我忽略你的尝试。对我来说,它看起来太复杂了,需要通过许多脚本和工具进行管道传输。

据我了解,各列的顺序是正确的,除了idxg_name,它应该放在最后。所以我建议简单地做:

sed 's/"//g;s/\(,idxg_name:[^,]*\)\(.*\)/\2\1/' yourfile
  • s/"//g部分删除了"您已经做过的事情
  • ,idxg_name:[^,]*匹配idxg_name字段,从逗号开始,包括下一个逗号之前的所有内容(请注意,如果名称包含逗号,这将失败!如果发生这种情况,考虑到逗号是否在里面,它会让事情变得更复杂""
  • 匹配.*该行的其余部分并且
  • 替换\2\1更改了这些内部两个部分的顺序\(\),从而将名称字段放在行的末尾。完毕。

相关内容