7/04/27 10:50:17 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
17/04/27 10:50:17 INFO Master: Launching driver driver-20170427105017-0000 on worker worker-20170427103840-192.168.5.242-7078
17/04/27 10:50:22 INFO Master: 192.168.5.5:53156 got disassociated, removing it.
17/04/27 10:50:22 INFO Master: 192.168.5.5:37668 got disassociated, removing it.
17/04/27 10:50:22 INFO Master: 192.168.5.5:53154 got disassociated, removing it.
17/04/27 10:55:27 INFO Master: Registering app ETL DataPipeline App
17/04/27 10:55:27 INFO Master: Registered app ETL DataPipeline App with ID app-20170427105527-0000
17/04/27 10:55:27 INFO Master: Launching executor app-20170427105527-0000/0 on worker worker-20170427103842-192.168.5.175-7078
17/04/27 10:55:27 INFO Master: Launching executor app-20170427105527-0000/1 on worker worker-20170427103838-192.168.5.37-7078
17/04/27 11:08:25 INFO Master: Asked to kill driver driver-20170427105017-0000
17/04/27 11:08:25 INFO Master: Kill request for driver-20170427105017-0000 submitted
17/04/27 11:08:26 INFO Master: Received unregister request from application app-20170427105527-0000
我将如何获取 driver-20170427105017-0000 和相应的 192.168.5.242 以及类似地如何 grep app-20170427105527-0000/0 及其相应的 192.168.5.175 。
答案1
使用sed
来获得全部 driver
以及executor
与“启动”相关的消息:
$ sed -n -E 's/^.*Launching (driver|executor) ([^ ]*).*worker-[0-9]*-([^-]*).*$/\2 \3/p' file.in
driver-20170427105017-0000 192.168.5.242
app-20170427105527-0000/0 192.168.5.175
app-20170427105527-0000/1 192.168.5.37
[^ ]*
将匹配任意数量的任意字符(空格除外)。\2
和\3
分别是对第二个和第三个括号匹配的内容的反向引用。第二个括号包含并将匹配或[^ ]*
之后的文本,第三个括号包含并将匹配 IP 地址(直到终止地址)。Launching driver
Launching executor
[^-]*
-
^
in$
将s/^...$/.../p
正则表达式锚定在行的开头和结尾,而 whilep
告诉sed
“打印”替换的结果(如果进行了替换)。
或者,由于正则表达式的魔力较少,可能会更健壮,使用awk
:
$ awk '/Launching/ { split($NF, a, "-"); print $7, a[3] }' file.in