根据 Bash 中另一个文件的两个值调整一个 CSV 文件的行

Question 1

假设您有两个 CSV 文件，如下所示：

$ cat file1
Chr_Name,h,j,start_pos,end_pos
Chrk,10,20,1010,1025
Chrk,20,10,1020,1040
ChrM,10,10,50,120

$ cat file2
Chr_Name,position
Chrk,1030
ChrM,70

您可以使用磨坊主（mlr）到加入共同字段上的两个文件Chr_Name，筛选position通过仅提取字段介于start_pos和之间的记录来获取结果数据end_pos，最后切position数据中不需要的字段。

$ mlr --csv join -f file2 -j Chr_Name then filter '$start_pos <= $position && $position <= $end_pos' then cut -x -f position file1
Chr_Name,h,j,start_pos,end_pos
Chrk,20,10,1020,1040
ChrM,10,10,50,120

该mlr命令的格式很好：

mlr --csv \
    join -f file2 -j Chr_Name then \
    filter '$start_pos <= $position && $position <= $end_pos' then \
    cut -x -f position \
    file1

使用与上面相同的两个文件，但使用 SQLite3 和内存数据库正如 Marcus Müller 在评论中所建议的:

$ sqlite3 :memory: '.mode csv' '.headers on' '.import file1 file1' '.import file2 file2' 'SELECT file1.* FROM file1 JOIN file2 ON (file1.Chr_Name = file2.Chr_name) WHERE CAST(position AS INTEGER) BETWEEN start_pos AND end_pos'
Chr_Name,h,j,start_pos,end_pos
Chrk,20,10,1020,1040
ChrM,10,10,50,120

SQLite3语句：

.mode csv
.headers on
.import file1 file1
.import file2 file2

SELECT file1.* FROM file1
    JOIN file2 ON (file1.Chr_Name = file2.Chr_name)
    WHERE CAST(position AS INTEGER) BETWEEN start_pos AND end_pos

虚线命令将两个文件导入到表中file1，而file2语句则SELECT执行实际查询。

Answer