问题是如何按所需顺序重新排列列和值。
输入
"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"
预期输出应为 a、b、c、d 及其相应值。
"a":"val1"|"b":"val3"|"c":"val2"|"d":"val4"
"a":"val1"|"b":[]|"c":"val3"|"d":"val4"
"a":"val1"|"b":"val4"|"c":"val3"|"d":["val2","val32]
"a":"val2"|"b":"val4"|"c":"val3"|"d":"val1"
答案1
由于您的问题随着时间的推移发生了很大的变化,我将尝试解决三个不同的问题。
你的尝试1
您的awk
命令试图在出现admin:
.即使它有意义,您也只能引用字段$1
和,因为每一行中$2
只出现一次。admin:
您可能正在寻找以下内容:
printf '%s\n' '"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"' |
sed 's/"//g' |
awk -F' ,' -v OFS='|' '{if ($2~/name:/){print $1,$3,$4,$2} else {$1=$1; print $0}}'
当然,这可能不是一个好主意:/name:/
匹配任何包含 name:
,而不仅仅是确切的标签name:
。
无论如何,这看起来像一个XY问题。
重新排列列
这是一个awk
解决方案,您可以自定义该解决方案来选择列并重新排序,假设它们来自分隔文本文件。
它假设输入数据中的字段不能包含任何"
或,
。根据您发布的代码,这听起来很合理1,但实际上似乎并非如此。您应该求助于一些专门用于处理结构化数据的工具(见下文),例如csvkit对于 CSV 或杰克对于 JSON(感谢猕猴桃以获得提示)。
鉴于脚本prog_file
:
BEGIN {
# Create an array of labels for the fileds you want
# to keep, in the order you want to print them
labels[1] = "\"_id\""
labels[2] = "\"admin\""
labels[3] = "\"creat\""
labels[4] = "\"name\""
}
{
# Split any field on ":" and make an array of
# full fields indexed by their label.
# This assumes labels DO NOT CONTAIN any ":"
for ( i=1; i<=NF; i++ ) {
split($i, chunks, ":")
fields[chunks[1]] = $i
}
# Reset the record
$0 = ""
# Re-build the record with only the fields
# whose labels are in the array we defined in
# the BEGIN block.
# Explicitly use "4" as the upper bound because
# POSIX does not specify the order in which
# "for (var in array)" assigns indexes to var
for ( i=1; i<=4; i++ ) {
$i = fields[labels[i]]
}
# Strip any double quote
gsub("\"","")
print $0
}
和输入2:
"_id":"123" ,"admin":[src] ,"creat":"date1" ,"name":"dedu"
"_id":"2w3" ,"admin":[analise] ,"creat":"date2" ,"name":"csv"
"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"
"_id":"scd" ,"admin":[] ,"creat":"date4" ,"name":"tzpi"
调用:
awk -v FS=' ,' -v OFS='|' -f prog_file input_file
给出3:
_id:123|admin:[src]|creat:date1|name:dedu
_id:2w3|admin:[analise]|creat:date2|name:csv
_id:asc|admin:[]|creat:date3|name:enygren
_id:scd|admin:[]|creat:date4|name:tzpi
处理数据格式
这最后的您编辑到问题中的输入数据样本似乎不是来自分隔文本文件。它看起来像一个 JSON 对象列表。
尽管是人类可读的,JSON是一个数据格式并需要不同的方法——实际上,上述awk
解决方案不适用于该输入。
添加位结构后,您的示例可以转换(返回?)为有效的 JSON:
$ cat file
"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"
(请注意,我认为缺少"
in"d":["val2","val32]
是一个拼写错误,因此使用了"d":["val2","val32"]
它)。
$ sed 's/^/{/; s/$/},/; 1 s/^/[/; $ s/,$/]/' file >tmpfile
$ cat tmpfile
[{"a":"val1","c":"val2","b":"val3","d":"val4"},
{"a":"val1","b":[],"c":"val3","d":"val4"},
{"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"},
{"d":"val1","a":"val2","c":"val3","b":"val4"}]
然后,安全的方法是使用 JSON 处理器来jq
过滤和重新排序数据:
$ jq -r '.[] | {a: .a, b: .b, c: .c, d: .d} | @text' tmpfile
{"a":"val1","b":"val3","c":"val2","d":"val4"}
{"a":"val1","b":[],"c":"val3","d":"val4"}
{"a":"val1","b":"val4","c":"val3","d":["val2","val32"]}
{"a":"val2","b":"val4","c":"val3","d":"val1"}
删除剩余的开括号和闭括号很简单而且安全,而它不会安全盲目删除双引号 ( "
) 或用竖线 ( ,
→ |
) 替换逗号以完全匹配您的示例输出。
答案2
抱歉,如果我忽略你的尝试。对我来说,它看起来太复杂了,需要通过许多脚本和工具进行管道传输。
据我了解,各列的顺序是正确的,除了idxg_name
,它应该放在最后。所以我建议简单地做:
sed 's/"//g;s/\(,idxg_name:[^,]*\)\(.*\)/\2\1/' yourfile
- 该
s/"//g
部分删除了"
您已经做过的事情 ,idxg_name:[^,]*
匹配idxg_name
字段,从逗号开始,包括下一个逗号之前的所有内容(请注意,如果名称包含逗号,这将失败!如果发生这种情况,考虑到逗号是否在里面,它会让事情变得更复杂""
)- 匹配
.*
该行的其余部分并且 - 替换
\2\1
更改了这些内部两个部分的顺序\(\)
,从而将名称字段放在行的末尾。完毕。