解析 .txt 文件以生成 .csv

2024-5-27 • tag-icon

我有一个包含以下内容的文本文件

Torrent file  : Linux.Format.-.October.2016.-.True.Pdf.-.Set.1001.[ECLiPSE].torrent
Metadata info : 9968 bytes, 412 pieces, 65536 bytes per piece
Torrent name  : Linux Format - October 2016 - True Pdf - Set 1001 [ECLiPSE]
Content info  : 3 files, 26965176 bytes
Announce URL  : http://explodie.org:6969/announce

F#  Bytes       File name
--- ----------- ---------------------------------------------------------------
  1    26944026 linfor1016.pdf
  2       19963 ECLiPSE.txt
  3        1187 Read Me.txt

Torrent file  : linuxmint-13-cinnamon-dvd-64bit.iso.torrent
Metadata info : 32303 bytes, 1602 pieces, 524288 bytes per piece
Torrent name  : linuxmint-13-cinnamon-dvd-64bit.iso
Content info  : single file, 839909376 bytes
Announce URL  : http://torrents.linuxmint.com/announce.php
Torrent file  : linuxmint-13-kde-dvd-64bit.iso.torrent
Metadata info : 35938 bytes, 1784 pieces, 524288 bytes per piece
Torrent name  : linuxmint-13-kde-dvd-64bit.iso
Content info  : single file, 935329792 bytes
Announce URL  : http://torrents.linuxmint.com/announce.php

文件是通过生成的

for i in *.torrent;do torrentcheck -t $i >> info.txt;done

现在我想转换这个txt文件，这样我就可以得到一个csv文件两列，即种子文件&内容信息（作为标头）通过上述 bash 命令解析的每个 torrent 文件，例如：

Torrent file,Content info 
Linux.Format.-.October.2016.-.True.Pdf.-.Set.1001.[ECLiPSE].torrent,3 files, 26965176 bytes
linuxmint-13-cinnamon-dvd-64bit.iso.torrent,single file, 839909376 bytes
linuxmint-13-kde-dvd-64bit.iso.torrent,single file, 935329792 bytes

然后可以在任何电子表格应用程序中进一步处理这些列，以按种子大小或其中的文件数量对种子进行排序...

我可以搜索文件字符串，例如

grep 'Torrent file' info.txt or grep 'Content' info.txt

但是我如何使用返回文本字符串来提取所需的信息，就像我得到的那样，Torrent file : linuxmint-13-cinnamon-dvd-64bit.iso.torrent我可以使用电子表格 MID、LEN 命令将字符串减少到仅linuxmint-13-cinnamon-dvd-64bit.iso.torrent

答案1

一个简单的 awk 脚本可以解析数据，例如：

awk -F': ' 'BEGIN { print "Torrent file,Content info,Size" }
$0~/^Torrent file/ { save = $2 }
$0~/^Content info/ { printf "%s,%s\n",save,$2 }'  <info.txt

以“：”分割行，保存一行的第二个字段，稍后在找到另一行时打印它。

答案1

相关内容