因此,我有巨大的(超过 100k 条记录)日志文件,并且需要根据日期戳提取所有 GPS 位置。
./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924
因此,基本上,对于我需要找到的这 3 条记录,将其10th February 2022
剪切并粘贴"GPS:"
到名为 的新文件中2022-02-10.txt
,或者最好粘贴到合适的.KML
文件中。
答案1
每个事件都在单独的行中,因此您可以逐行阅读并用来regex
查找后面TS:
和后面的文本GPS:
- 然后您可以用作TS
文件名并写入append mode
最小的工作示例。
我仅使用io
withtext
来模拟内存中的文件,但你应该使用open()
text = '''./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924
'''
import io
import re
# open file for reading
#file_in = open("filename.log")
file_in = io.StringIO(text)
# read line by line
for line in file_in:
# find values
ts = re.findall('TS: ([^ ]*) ', line)[0]
gps = re.findall('GPS: ([^ ]*), ', line)[0]
val = gps.split(',')
gps = f'{val[1]},{val[0]}'
print('TS:', ts, '| GPS:', gps)
# open file for writing in `append mode`
with open(f'{ts}.txt', 'a') as file_out:
# write in new line
file_out.write(gps + '\n')
结果:
TS: 2022-02-10 | GPS: 20.8162,52.1773033
TS: 2022-02-10 | GPS: 20.8162,52.1773033
TS: 2022-02-10 | GPS: 20.8162,52.1773033
KML
是更复杂的格式(使用XML
结构),我不会尝试编写它。
但是有Python模块可以写KML
——即。简单的kml
它可能没有附加到文件的功能,因此首先它可能需要获取所有 GPS 值,按数据分组,然后为每个组创建 KML 并立即保存所有点。
编辑:
text = '''./production.log.109.gz:I, [2022-02-10T10:00:59.703529 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:35 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26343
./production.log.109.gz:I, [2022-02-10T10:01:13.939349 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:00:40 +0000, GPS: 52.1773033,20.8162, SAT: 17, KM/H: 0, V: 26352
./production.log.109.gz:I, [2022-02-10T10:10:44.757308 #25190] INFO -- : #<Event::TeltonikaServer:3ffcbe931d90>:357544377733734 TS: 2022-02-10 10:10:40 +0000, GPS: 52.1773033,20.8162, SAT: 18, KM/H: 0, V: 25924
'''
import io
import re
import simplekml
#f = open("filename.log")
f = io.StringIO(text)
# -----------------------
groups = {}
for line in f:
ts = re.findall('TS: ([^ ]*) ', line)[0]
gps = re.findall('GPS: ([^ ]*), ', line)[0]
val = gps.split(',')
gps = [val[1],val[0]]
print('TS:', ts, '| GPS:', gps)
if ts not in groups:
groups[ts] = []
groups[ts].append(gps)
#----------------------------------------
for name, values in groups.items():
print('name:', name)
kml = simplekml.Kml()
for gps in values:
kml.newpoint(coords=[gps])
# --- after loop ---
kml.save(f"{name}.kml")
答案2
所以我在学习Python几天后彻底改变了代码。
基本上:
- 读取文件数据.log
- 提取 GPS 和时间戳
- 将 GPS 放入关联数组中
- 根据 TS 到文件中列出的数组(例如 TS.txt)
- 添加了带有一个 for 循环的页眉和页脚
就这样,我已经把它变成了我想要的样子。感谢大家的帮助。