从 CSV 中提取列符合条件的行

从 CSV 中提取列符合条件的行

我有一个 3GB(2400 万行)的 CSV 文件,需要从中提取数据。我使用 Excel 完成了此类提取,但此文件太大,无法在 Excel 中处理。我想根据第三列的值获取数据行。

这是数据样本。没有标题(但可以根据需要添加)。

5/23/2017,00:04:50,421,         0, ,B,     657.78240967,        -1
5/23/2017,00:04:50,421,         1, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,         2, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,         3, ,B,       6.03906202,        -1
5/23/2017,00:04:50,421,         4, ,B,    6000.00000000,        -1
5/23/2017,00:04:50,421,         5, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,         6, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,         7, ,B,      11.00000000,        -1
5/23/2017,00:04:50,421,         8, ,B,      65.00000000,        -1
5/23/2017,00:04:50,421,         9, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        10, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        11, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        12, ,B,       2.00000000,        -1
5/23/2017,00:04:50,421,        13, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        14, ,B,       5.28633022,        -1
5/23/2017,00:04:50,421,        15, ,B,    2000.00000000,        -1
5/23/2017,00:04:50,421,        16, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        17, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        18, ,B,       4.40085888,        -1
5/23/2017,00:04:50,421,        19, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        20, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        21, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        22, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        23, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        24, ,B,       7.00000000,        -1
5/23/2017,00:04:50,421,        25, ,B,      60.00000000,        -1
5/23/2017,00:04:50,421,        26, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        27, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        28, ,B,      80.00000000,        -1
5/23/2017,00:04:50,421,        29, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        30, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        31, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        32, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        33, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        34, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        35, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        36, ,B,       4.32588148,        -1
5/23/2017,00:04:50,421,        37, ,B,    1718.00000000,        -1
5/23/2017,00:04:50,421,        38, ,B,       0.05945313,        -1
5/23/2017,00:04:50,421,        39, ,B,       2.95723248,        -1
5/23/2017,00:04:50,421,        40, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        41, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        42, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        43, ,B,   35946.00000000,        -1
5/23/2017,00:04:50,421,        44, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        45, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        46, ,B,   42271.00000000,        -1
5/23/2017,00:04:50,421,        47, ,B,   25402.00000000,        -1
5/23/2017,00:04:50,421,        48, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        49, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        50, ,B,   40745.00000000,        -1
5/23/2017,00:04:50,421,        51, ,B,   24373.00000000,        -1
5/23/2017,00:04:50,421,        52, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        53, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        54, ,B,   46517.00000000,        -1
5/23/2017,00:04:50,421,        55, ,B,      35.00000000,        -1
5/23/2017,00:04:50,421,        56, ,B,      51.00000000,        -1
5/23/2017,00:04:50,421,        57, ,B,      51.00000000,        -1
5/23/2017,00:04:50,421,        58, ,B,   49396.00000000,        -1
5/23/2017,00:04:50,421,        59, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        60, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        61, ,B,   13610.00000000,        -1
5/23/2017,00:04:50,421,        62, ,B,   50711.00000000,        -1
5/23/2017,00:04:50,421,        63, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        64, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        65, ,B,   93927.00000000,        -1
5/23/2017,00:04:50,421,        66, ,B,   50173.00000000,        -1
5/23/2017,00:04:50,421,        67, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        68, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        69, ,B,   13797.00000000,        -1
5/23/2017,00:04:50,421,        70, ,B,      75.00000000,        -1
5/23/2017,00:04:50,421,        71, ,B,     114.00000000,        -1
5/23/2017,00:04:50,421,        72, ,B,      75.00000000,        -1
5/23/2017,00:04:50,421,        73, ,B,   47238.00000000,        -1
5/23/2017,00:04:50,421,        74, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        75, ,B,       1.00000000,        -1
5/23/2017,00:04:50,421,        76, ,B,   19338.00000000,        -1
5/23/2017,00:04:50,421,        77, ,B,   29879.00000000,        -1
5/23/2017,00:04:50,421,        78, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        79, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        80, ,B,   19633.00000000,        -1
5/23/2017,00:04:50,421,        81, ,B,   16687.00000000,        -1
5/23/2017,00:04:50,421,        82, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        83, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        84, ,B,    6823.00000000,        -1
5/23/2017,00:04:50,421,        85, ,B,      85.00000000,        -1
5/23/2017,00:04:50,421,        86, ,B,      91.00000000,        -1
5/23/2017,00:04:50,421,        87, ,B,      91.00000000,        -1
5/23/2017,00:04:50,421,        88, ,B,       0.00000000,        -1
5/23/2017,00:04:50,421,        89, ,B,       0.00000000,        -1
5/23/2017,00:04:54,312,         0, , ,     660.28082275,         0
5/23/2017,00:04:54,312,         1, , ,       1.00000000,         1
5/23/2017,00:04:54,312,         2, , ,       0.00000000,         2
5/23/2017,00:04:54,312,         3, , ,       5.95734215,         3
5/23/2017,00:04:54,312,         4, , ,    6000.00000000,         4
5/23/2017,00:04:54,312,         5, , ,       0.00000000,         5
5/23/2017,00:04:54,312,         6, , ,       0.00000000,         6
5/23/2017,00:04:54,312,         7, , ,      11.00000000,         7
5/23/2017,00:04:54,312,         8, , ,      65.00000000,         8
5/23/2017,00:04:54,312,         9, , ,       0.00000000,         9
5/23/2017,00:04:54,312,        10, , ,       0.00000000,        10
5/23/2017,00:04:54,312,        11, , ,       0.00000000,        11
5/23/2017,00:04:54,312,        12, , ,       2.00000000,        12
5/23/2017,00:04:54,312,        13, , ,       0.00000000,        13
5/23/2017,00:04:54,312,        14, , ,       5.32481575,        14
5/23/2017,00:04:54,312,        15, , ,    2000.00000000,        15
5/23/2017,00:04:54,312,        16, , ,       0.00000000,        16
5/23/2017,00:04:54,312,        17, , ,       0.00000000,        17
5/23/2017,00:04:54,312,        18, , ,       4.46131372,        18
5/23/2017,00:04:54,312,        19, , ,       0.00000000,        19
5/23/2017,00:04:54,312,        20, , ,       0.00000000,        20
5/23/2017,00:04:54,312,        21, , ,       0.00000000,        21
5/23/2017,00:04:54,312,        22, , ,       0.00000000,        22
5/23/2017,00:04:54,312,        23, , ,       0.00000000,        23
5/23/2017,00:04:54,312,        24, , ,       7.00000000,        24
5/23/2017,00:04:54,312,        25, , ,      60.00000000,        25
5/23/2017,00:04:54,312,        26, , ,       0.00000000,        26
5/23/2017,00:04:54,312,        27, , ,       0.00000000,        27
5/23/2017,00:04:54,312,        28, , ,      80.00000000,        28
5/23/2017,00:04:54,312,        29, , ,       0.00000000,        29
5/23/2017,00:04:54,312,        30, , ,       0.00000000,        30
5/23/2017,00:04:54,312,        31, , ,       0.00000000,        31
5/23/2017,00:04:54,312,        32, , ,       0.00000000,        32
5/23/2017,00:04:54,312,        33, , ,       0.00000000,        33
5/23/2017,00:04:54,312,        34, , ,       0.00000000,        34
5/23/2017,00:04:54,312,        35, , ,       0.00000000,        35
5/23/2017,00:04:54,312,        36, , ,       4.33114624,        36
5/23/2017,00:04:54,312,        37, , ,    1733.00000000,        37
5/23/2017,00:04:54,312,        38, , ,       0.71863282,        38
5/23/2017,00:04:54,312,        39, , ,       2.95582461,        39
5/23/2017,00:04:54,312,        40, , ,       0.00000000,        40
5/23/2017,00:04:54,312,        41, , ,       0.00000000,        41
5/23/2017,00:04:54,312,        42, , ,       0.00000000,        42
5/23/2017,00:04:54,312,        43, , ,   35946.00000000,        43
5/23/2017,00:04:54,312,        44, , ,       0.00000000,        44
5/23/2017,00:04:54,312,        45, , ,       0.00000000,        45
5/23/2017,00:04:54,312,        46, , ,   42271.00000000,        46
5/23/2017,00:04:54,312,        47, , ,   25402.00000000,        47
5/23/2017,00:04:54,312,        48, , ,       1.00000000,        48
5/23/2017,00:04:54,312,        49, , ,       1.00000000,        49
5/23/2017,00:04:54,312,        50, , ,   40745.00000000,        50
5/23/2017,00:04:54,312,        51, , ,   24373.00000000,        51
5/23/2017,00:04:54,312,        52, , ,       0.00000000,        52
5/23/2017,00:04:54,312,        53, , ,       0.00000000,        53
5/23/2017,00:04:54,312,        54, , ,   46517.00000000,        54
5/23/2017,00:04:54,312,        55, , ,      35.00000000,        55
5/23/2017,00:04:54,312,        56, , ,      50.00000000,        56
5/23/2017,00:04:54,312,        57, , ,      50.00000000,        57
5/23/2017,00:04:54,312,        58, , ,   49396.00000000,        58
5/23/2017,00:04:54,312,        59, , ,       0.00000000,        59
5/23/2017,00:04:54,312,        60, , ,       0.00000000,        60
5/23/2017,00:04:54,312,        61, , ,   13610.00000000,        61
5/23/2017,00:04:54,312,        62, , ,   50711.00000000,        62
5/23/2017,00:04:54,312,        63, , ,       0.00000000,        63
5/23/2017,00:04:54,312,        64, , ,       0.00000000,        64
5/23/2017,00:04:54,312,        65, , ,   93927.00000000,        65
5/23/2017,00:04:54,312,        66, , ,   50173.00000000,        66
5/23/2017,00:04:54,312,        67, , ,       1.00000000,        67
5/23/2017,00:04:54,312,        68, , ,       1.00000000,        68
5/23/2017,00:04:54,312,        69, , ,   13797.00000000,        69
5/23/2017,00:04:54,312,        70, , ,      75.00000000,        70
5/23/2017,00:04:54,312,        71, , ,     114.00000000,        71
5/23/2017,00:04:54,312,        72, , ,      75.00000000,        72
5/23/2017,00:04:54,312,        73, , ,   47238.00000000,        73
5/23/2017,00:04:54,312,        74, , ,       1.00000000,        74
5/23/2017,00:04:54,312,        75, , ,       1.00000000,        75
5/23/2017,00:04:54,312,        76, , ,   19338.00000000,        76
5/23/2017,00:04:54,312,        77, , ,   29879.00000000,        77
5/23/2017,00:04:54,312,        78, , ,       0.00000000,        78
5/23/2017,00:04:54,312,        79, , ,       0.00000000,        79
5/23/2017,00:04:54,312,        80, , ,   19633.00000000,        80
5/23/2017,00:04:54,312,        81, , ,   16687.00000000,        81
5/23/2017,00:04:54,312,        82, , ,       0.00000000,        82
5/23/2017,00:04:54,312,        83, , ,       0.00000000,        83
5/23/2017,00:04:54,312,        84, , ,    6823.00000000,        84
5/23/2017,00:04:54,312,        85, , ,      85.00000000,        85
5/23/2017,00:04:54,312,        86, , ,      91.00000000,        86
5/23/2017,00:04:54,312,        87, , ,      91.00000000,        87
5/23/2017,00:04:54,312,        88, , ,       0.00000000,        88
5/23/2017,00:04:54,312,        89, , ,       0.00000000,        89

假设我想要具有索引(第三列)的行,以便37我的数据看起来像这样。

5/23/2017,00:04:50,421,        37, ,B,    1718.00000000,        -1
5/23/2017,00:04:54,312,        37, , ,    1733.00000000,        37

我可以使用什么工具来实现这一点?有什么建议吗?我主要使用Windows 10linux 机器,但如果有很好的工具,也可以找到它。我已经用它对这个文件进行了基本的操作,Sublime Text 3效果很好。

任何帮助将不胜感激。

答案1

我认为你指的是第 4 列。你可以awk在命令行中使用:

awk -F, -v OFS=, '$4==37' input.csv > output.csv

-F,将输入字段分隔符设置为逗号

-v OFS=,将输出字段分隔符设置为逗号

$4==37匹配并打印第四个字段为 37 的任何行

这将创建一个新文件output.csv

5/23/2017,00:04:50,421,        37, ,B,    1718.00000000,        -1
5/23/2017,00:04:54,312,        37, , ,    1733.00000000,        37

awk用途非常广泛。您还可以使用它来合并多个 CSV 文件并删除标题。它是 Linux 和 MacOS 的标准配置,并且提供 Windows 版本。

答案2

我同意 teylyn 的评论:Excel 应该能够处理3GB 文件24k 行

64 位环境对文件大小没有硬性限制。工作簿大小仅受可用内存和系统资源的限制。

工作表上的总行数和列数:1,048,576 行 x 16,384 列

安全的选择是导入或解析 CSV,因为引号字符串内可能有一个逗号(确实,您的数据似乎没有逗号)。


然而,我想我会提出一个perl解决方案:

perl -F, -lane 'print if $F[3]==37' input.csv

perl在类 UNIX 操作系统上是标准配置;要在 Windows 上使用它,您需要安装例如赛格威西弗吉尼亚海岸也应该提供,尽管我没有检查。


可以用 来实现同样的效果awk,正如 Mike Fitzpatrick 所建议的:

awk -F, -v OFS=, '$4==37' input.csv

也可以通过以下方式实现相同的效果sed

sed -n '/[^,]*,[^,]*,[^,]*, *37,/p' input.csv

相关内容