计算数据表中匹配记录之间的成对时间差（以毫秒为单位）

Question

我通过添加三行相同的行来调整您的样本数据CPID，370013以表明这些行被拒绝，符合“恰好出现两次”的要求。我还添加了另外两行，CPID 370014这是我们想要的一对：

TIME         MPID    CPID
14:00:04.909 10048  370007
14:00:05.320 10048  370007
14:00:05.462 10048  370008
14:00:05.761 10048  370008
14:00:05.809 10048  370009
14:00:05.833 10048  370009
14:00:11.320 10048  370010
14:00:11.453 10048  370010
14:00:11.693 10048  370011
14:00:13.097 10048  370012
14:00:14.124 10048  370012
14:00:14.189 10048  370013
14:00:14.320 10048  370013
14:00:15.020 10048  370013
14:00:16.123 10048  370014
14:00:16.790 10048  370014

跑步：

$ txr data.txr data
MPID 10048 CPID 370007 Total time difference: 411 mili seconds
MPID 10048 CPID 370008 Total time difference: 299 mili seconds
MPID 10048 CPID 370009 Total time difference: 24 mili seconds
MPID 10048 CPID 370010 Total time difference: 133 mili seconds
MPID 10048 CPID 370012 Total time difference: 1027 mili seconds
MPID 10048 CPID 370014 Total time difference: 667 mili seconds

不表示单个370011条目，也不表示三重条目370013。

代码：

@(do (defun mk-time-ms (date ms)
       (let ((tsec (time-parse-utc "%H:%M:%S" date)))
         (+ (* tsec 1000) ms))))
TIME         MPID    CPID
@(repeat)
@d0.@ms0 @mpid @cpid
@d1.@ms1 @mpid @cpid
@  (collect :gap 0)
@extra @mpid @cpid
@  (end)
@  (do (unless (boundp 'extra)
         (let ((t0 (mk-time-ms d0 (toint ms0)))
               (t1 (mk-time-ms d1 (toint ms1))))
           (put-line `MPID @mpid CPID @cpid Total time difference: @(- t1 t0) mili seconds`))))
@(end)

mk-time-ms是一个将日期解析为整数（自 Unix 纪元以来的时间）并将其与毫秒值组合的函数。我们将秒时间乘以 1000 再加上毫秒。

我们逐字匹配标题行：

TIME         MPID    CPID

然后开始@(repeat)比赛。我们正在寻找以具有相同cpid( 和mpid) 的两个连续行开头的行序列。通过附加，@(collect)我们将附加零个或更多行与相同的mpid或匹配零个或更多行cpid。从这些中，我们收集时间列表作为extra变量。对于每个匹配，如果extra变量尚未在模式匹配中绑定，则意味着我们恰好匹配了两行，并且没有额外的内容：在这种情况下，我们计算时间差并生成所需的输出。

@(repeat)跳过任何没有发生比赛的行，这会照顾单打。因为@(collect)同样，默认情况下，会跳过不匹配的行，所以我们必须严格限制:gap 0：不允许有任何间隙。否则它将消耗整个数据，而外部@(repeat)则什么也没有留下。

如果数据中实际上没有任何三重复项，只有成对或单例，则可以是：

@(do (defun mk-time-ms (date ms)
       (let ((tsec (time-parse-utc "%H:%M:%S" date)))
         (+ (* tsec 1000) ms))))
TIME         MPID    CPID
@(repeat)
@d0.@ms0 @mpid @cpid
@d1.@ms1 @mpid @cpid
@  (do (let ((t0 (mk-time-ms d0 (toint ms0)))
             (t1 (mk-time-ms d1 (toint ms1))))
         (put-line `MPID @mpid CPID @cpid Total time difference: @(- t1 t0) mili seconds`)))
@(end)

对于我们的示例数据，现在包括以下输出：

MPID 10048 CPID 370013 Total time difference: 131 mili seconds

前两370013行被比较并消耗。下一个370013看起来像单例并且被跳过。如果我们想包含这两个差异，我们可以进行以下修改，包括@(trailer)在正确的位置添加该行：

@(do (defun mk-time-ms (date ms)
       (let ((tsec (time-parse-utc "%H:%M:%S" date)))
         (+ (* tsec 1000) ms))))
TIME         MPID    CPID
@(repeat)
@d0.@ms0 @mpid @cpid
@  (trailer)
@d1.@ms1 @mpid @cpid
@  (do (let ((t0 (mk-time-ms d0 (toint ms0)))
             (t1 (mk-time-ms d1 (toint ms1))))
         (put-line `MPID @mpid CPID @cpid Total time difference: @(- t1 t0) mili seconds`)))
@(end)

现在这些行出现在输出中：

MPID 10048 CPID 370013 Total time difference: 131 mili seconds
MPID 10048 CPID 370013 Total time difference: 700 mili seconds

第一和第二370013以及第二和第三之间的区别。

@(trailer)意味着后面的内容都是尾随上下文：它被匹配，但不被消耗。因此，@(repeat)尽管匹配了两行，但 now 只消耗一行，导致的下一次迭代@(repeat)即使匹配了两行并产生了时间差，也向前移动了一行。

Answer 1