我怎样才能用计数器替换文件中的“单词”:
{"word":"resolucion","count":40723},{"word":"general","count":20976},
{"word":"","count":13334},{"word":"publica","count":12379},
{"word":"direccion","count":11958},{"word":"secretaria","count":9907},
{"word":"al","count":9324},{"word":"orden","count":8604},
{"word":"anuncia","count":8589},{"word":"concurso","count":6953},
{"word":"diciembre","count":6893},{"word":"adjudicacion","count":6762},
{"word":"estado","count":6154},{"word":"procedimiento","count":5694},
{"word":"julio","count":5598},{"word":"marzo","count":5440},
{"word":"-","count":5437},{"word":"convocatoria","count":5319},
{"word":"ayuntamiento","count":5259},{"word":"publico","count":5203},
{"word":"junio","count":4995},{"word":"convenio","count":4925},
{"word":"real","count":4916},{"word":"febrero","count":4896},
{"word":"proyecto","count":4826},{"word":"abierto","count":4782},
例如:
{"0":"resolucion","count":40723},{"1":"general","count":20976},
{"2":"","count":13334}, {"3":"publica","count":12379},
{"4":"direccion","count":11958},{"5":"secretaria","count":9907},
{"6":"al","count":9324},{"7":"orden","count":8604},
{"8":"anuncia","count":8589},
等等。
答案1
以下是我们grep
和sed
朋友的一个可能的解决方案。这对于小文件来说很好,否则perl
(或awk
?)解决方案将更加高效。这是bash
语法:
i=1
maxnum=$(grep -o '\<word\>' datafile | wc -l)
while (( i <= maxnum )); do
sed -i "s/word/$i/" datafile
(( i++ ))
done
grep -o
统计总数单词datafile
这是来自格伦。这里唯一的技巧是sed
不能全局使用,因此只替换第一个匹配的字符串。这就是为什么这段代码如此缓慢,因为它调用sed
最大数量次。
请注意,这sed -i
会更改您的原始数据文件,因此请先复制一份。
答案2
如果是 JSON 文件,你可以使用一些脚本语言来修改它。例如,如果你安装了 NodeJS,你可以运行以下程序:
var data = require('./data.json')
console.log(data)
data.forEach(function (obj, idx) { obj[idx] = obj['word']; delete obj.word; });
console.log(data)
我假设该文件名为“data.json”,并且它是有效的 JSON 语法(您的语法不完全如此:您缺少包装,并且在末尾[
]
有一个虚假的字符串。,
答案3
这应该相当快。
perl -0 -ne 's/"word"/q{"} . $x++ . q{"}/ge; print;' INFILE > OUTFILE