我需要跟踪一个巨大的日志,更准确地说是一个列。在本专栏中,我有一些范围在 103 到 17431 之间的整数值。原始文件示例
402
402
402
667
942
342
990
402
对于每个数字,我必须分配一个从 0 到 9 的索引值。我正在考虑将感兴趣的列隔离在一个单独的文件中,然后检查每一行并将找到的数字替换为他的特定索引。最终输出将类似于:
3
3
5
9
7
8
3
我尝试采用的解决方案是AWK
但我失败了。我的代码:
csvtool col 2 /my/path/to/list.csv >tmp
awk '($0>=363 && $0<=499) || ($0>=4645 && $0<=4646) {$0="0"}1' tmp
awk '($0>=2174 && $0<=2193) {$0="1"}1' tmp
awk '($0=500) || ($0>=12308 && $0<=12356) {$0="2"}1' tmp
awk '($0>=103 && $0<=220) || ($0>=252 && $0<=299) || ($0>=1980 && $0<=1986) || ($0>=2921 && $0<=2922) {$0="3"}1' tmp
awk '($0>=221 && $0<=251) || ($0>=8085 && $0<=8091) || ($0=8350) || ($0>=12809 && $0<=12945) || ($0>=16834 && $0<=17033) {$0="4"}1' tmp
awk '($0>=300 && $0<=362) || ($0=522) || ($0>=2923 && $0<=2925) || ($0>=3441 && $0<=3442) || ($0=4644)|| ($0>=5677 && $0<=5695) || ($0>=8082 && $0<=8083)|| ($0>=8093 && $0<=8349) || ($0>=12946 && $0<=12947) || ($0>=21986 && $0<=13215) || ($0>=13309 && $0<=13311) {$0="5"}1' tmp
awk '($0>=501 && $0<=504) || ($0>=566 && $0<=600) || ($0>=613 && $0<=637) || ($0>=2015 && $0<=2040) || ($0>=2103 && $0<=2126) || ($0>=2373 && $0<=2374) || ($0>=3828 && $0<=4125) || ($0>=4237 && $0<=4636) || ($0>=4647 && $0<=4889) || ($0>=4991 && $0<=5676) || ($0>=5696 && $0<=5705) || ($0>=6502 && $0<=6595) || ($0>=8429 && $0<=8460) || ($0>=8552 && $0<=8699) || ($0>=10487 && $0<=10977) || ($0>=11326 && $0<=11617) || ($0>=11688 && $0<=11815) || ($0>=11844 && $0<=11938) || ($0>=12490 && $0<=12597) || ($0>=12973 && $0<=12982) || ($0>=13367 && $0<=13414) {$0="6"}1' tmp
awk '($0>=523 && $0<=548) || ($0>=555 && $0<=565) || ($0>=2005 && $0<=2014) || ($0>=2041 && $0<=2063) || ($0>=2091 && $0<=2102) || ($0=2394) || ($0>=2407 && $0<=2411) || ($0>=2926 && $0<=3008) || ($0>=3443 && $0<=3473) || ($0>=3486 && $0<=3813) || ($0>=4132 && $0<=4144) || ($0>=4637 && $0<=4643) || ($0>=4916 && $0<=4981) || ($0>=5711 && $0<=5741) || ($0>=6403 && $0<=6405) || ($0>=6415 && $0<=6466) || ($0>=6701 && $0<=7002) || ($0>=7035 && $0<=7048) || ($0>=8426 && $0<=8428) || ($0>=8496 && $0<=8541) || ($0>=8857 && $0<=9323) || ($0>=9429 && $0<=9618) || ($0>=9674 && $0<=9789) || ($0>=9802 && $0<=9811) || ($0>=9850 && $0<=10009) || ($0>=10131 && $0<=10136) || ($0>=10396 && $0<=10402) || ($0>=11000 && $0<=11175) || ($0=11618) || ($0>=12100 && $0<=12111) || ($0>=12212 && $0<=12219) || ($0=12489) || ($0>=12807 && $0<=12808) || ($0=12983) || ($0>=14616 && $0<=14627) || ($0>=15723 && $0<=15897) {$0="7"}1' tmp
awk '($0=521) || ($0=554) || ($0>=601 && $0<=612) || ($0>=651 && $0<=708) || ($0>=1905 && $0<=1942) || ($0>=1949 && $0<=1979) || ($0>=1987 && $0<=1993) || ($0>=2259 && $0<=2278) || ($0>=2352 && $0<=2362) || ($0>=2395 && $0<=2406) || ($0>=2412 && $0<=2449) || ($0>=2673 && $0<=2919) || ($0>=3009 && $0<=3016) || ($0>=3814 && $0<=3827) || ($0>=4126 && $0<=4131) || ($0>=4982 && $0<=4990) || ($0>=5706 && $0<=5710) || ($0>=6012 && $0<=6181) || ($0>=6285 && $0<=6339) || ($0>=6409 && $0<=6411) || ($0>=6596 && $0<=6700) || ($0>=7191 && $0<=7424) || ($0=8081) || ($0>=8550 && $0<=8551) || ($0>=8700 && $0<=8716) || ($0>=9324 && $0<=9326) || ($0>=9619 && $0<=9624) || ($0=9729) || ($0>=10018 && $0<=10064) || ($0>=10115 && $0<=10126) || ($0>=10198 && $0<=10386) || ($0=10486) || ($0>=12112 && $0<=12115) || ($0>=12209 && $0<=12211) {$0="8"}1' tmp
awk '($0>=489 && $0<=498) || ($0>=505 && $0<=520) || ($0>=549 && $0<=553) || ($0>=638 && $0<=650) || ($0>=709 && $0<=1904) || ($0>=1943 && $0<=1948) || ($0>=1994 && $0<=2004) || ($0>=2064 && $0<=2090) || ($0>=2127 && $0<=2173) || ($0>=2194 && $0<=2258) || ($0>=2279 && $0<=2351) || ($0>=2363 && $0<=2372) || ($0=2393) || ($0>=2450 && $0<=2672) || ($0>=3474 && $0<=3485) || ($0>=4145 && $0<=4236) || ($0>=4890 && $0<=4915) || ($0>=5742 && $0<=6011) || ($0>=7003 && $0<=7034) || ($0>=7049 && $0<=7295) || ($0>=7425 && $0<=8080) || ($0=8084) || ($0>=8352 && $0<=8425) || ($0>=8461 && $0<=8495) || ($0>=8542 && $0<=8549) || ($0>=8717 && $0<=8856) || ($0>=9327 && $0<=9428) || ($0>=9625 && $0<=9673) || ($0>=9790 && $0<=9791) || ($0>=9793 && $0<=9801) || ($0>=9812 && $0<=9849) || ($0>=10010 && $0<=10017) || ($0>=10065 && $0<=10114) || ($0>=10128 && $0<=10130) || ($0>=10137 && $0<=10197) || ($0>=10387 && $0<=10395) || ($0>=10403 && $0<=10485) || ($0>=10978 && $0<=10999) || ($0>=11176 && $0<=11325) || ($0>=11620 && $0<=11687) || ($0>=11816 && $0<=11843) || ($0>=11939 && $0<=12099) || ($0>=12116 && $0<=12208) || ($0>=12220 && $0<=12307) || ($0>=12357 && $0<=12488) || ($0>=12598 && $0<=12806) || ($0>=12948 && $0<=12972) || ($0>=13216 && $0<=13306) || ($0>=13312 && $0<=13366) || ($0>=13415 && $0<=14615) || ($0>=14628 && $0<=15722) || ($0>=15989 && $0<=16833) || ($0>=17402 && $0<=17431) {$0="9"}1' tmp
不幸的是,上面的代码将生成:
9
9
9
9
9
9
9
关于如何使其发挥作用有什么想法吗?还有其他方法吗?谢谢。
答案1
要$0
与一个值进行比较,请使用==
and not =
。=
为 赋予新值$0
。如果您分配一个新值,则 awk 将表达式$0=2393
(例如)评估为 true,然后 awk 打印9
。
答案2
perl -pi -e 's/(^[^,]*,\d)\d+,/$1,/g' a.csv
按第一个数字分箱。