我有金融交易软件。它可以解码快速/修复消息。我正在运行相同的两台不同机器上的二进制文件在非常相似的数据集上进行测试。软件接收“消息”并对其进行解码。一般规则是 - 较长的消息需要更多时间来解码:
i7-860,Windows 7:
Debug 18:23:48.8047325 count=51 decoding take microseconds = 300
Debug 18:23:49.7287854 count=53 decoding take microseconds = 349
Debug 18:23:49.7397860 count=110 decoding take microseconds = 516
Debug 18:23:49.7497866 count=92 decoding take microseconds = 512
Debug 18:23:49.7597872 count=49 decoding take microseconds = 267
Debug 18:23:49.7717878 count=194 decoding take microseconds = 823
Debug 18:23:49.7797883 count=49 decoding take microseconds = 296
Debug 18:23:49.7997894 count=50 decoding take microseconds = 299
Debug 18:23:50.7328428 count=101 decoding take microseconds = 583
Debug 18:23:50.7418433 count=42 decoding take microseconds = 281
Debug 18:23:50.7538440 count=151 decoding take microseconds = 764
Debug 18:23:50.7618445 count=57 decoding take microseconds = 279
Debug 18:23:50.7738452 count=122 decoding take microseconds = 712
Debug 18:23:50.8028468 count=52 decoding take microseconds = 281
Debug 18:23:51.7389004 count=137 decoding take microseconds = 696
Debug 18:23:51.7499010 count=100 decoding take microseconds = 485
Debug 18:23:51.7689021 count=185 decoding take microseconds = 872
Debug 18:23:51.8079043 count=49 decoding take microseconds = 315
Debug 18:23:52.7349573 count=90 decoding take microseconds = 532
Debug 18:23:52.7439578 count=53 decoding take microseconds = 277
Debug 18:23:52.7539584 count=134 decoding take microseconds = 623
Debug 18:23:52.7629589 count=47 decoding take microseconds = 294
Debug 18:23:52.7749596 count=198 decoding take microseconds = 868
Debug 18:23:52.8039613 count=52 decoding take microseconds = 291
Debug 18:23:53.7400148 count=132 decoding take microseconds = 666
Debug 18:23:53.7480153 count=81 decoding take microseconds = 430
Debug 18:23:53.7570158 count=49 decoding take microseconds = 301
Debug 18:23:53.7710166 count=156 decoding take microseconds = 752
Debug 18:23:53.7770169 count=45 decoding take microseconds = 270
Debug 18:23:54.7350717 count=108 decoding take microseconds = 578
Debug 18:23:54.7430722 count=52 decoding take microseconds = 286
Debug 18:23:54.7540728 count=138 decoding take microseconds = 567
Debug 18:23:54.7760741 count=160 decoding take microseconds = 753
Debug 18:23:54.8030756 count=53 decoding take microseconds = 292
Debug 18:23:55.7411293 count=110 decoding take microseconds = 629
Debug 18:23:55.7481297 count=48 decoding take microseconds = 294
Debug 18:23:55.7591303 count=84 decoding take microseconds = 386
Debug 18:23:55.7701309 count=90 decoding take microseconds = 484
Debug 18:23:55.7801315 count=120 decoding take microseconds = 527
Debug 18:23:55.8101332 count=53 decoding take microseconds = 290
Debug 18:23:56.7341861 count=121 decoding take microseconds = 667
Debug 18:23:56.7421865 count=53 decoding take microseconds = 293
Debug 18:23:56.7531872 count=127 decoding take microseconds = 586
Debug 18:23:56.7621877 count=58 decoding take microseconds = 306
Debug 18:23:56.7751884 count=138 decoding take microseconds = 649
Debug 18:23:56.8021900 count=53 decoding take microseconds = 288
Debug 18:23:57.7392436 count=139 decoding take microseconds = 699
Debug 18:23:57.7502442 count=121 decoding take microseconds = 548
Debug 18:23:57.7582446 count=61 decoding take microseconds = 301
Debug 18:23:57.7692453 count=98 decoding take microseconds = 500
Debug 18:23:57.7792458 count=94 decoding take microseconds = 460
Debug 18:23:57.8092476 count=41 decoding take microseconds = 274
Xeon E3-1220,Windows Server 2008 R2基础:
Debug 18:28:57.5087967 count=117 decoding take microseconds = 255
Debug 18:28:57.5087967 count=85 decoding take microseconds = 187
Debug 18:28:57.5087967 count=55 decoding take microseconds = 155
Debug 18:28:57.5243967 count=86 decoding take microseconds = 189
Debug 18:28:57.5243967 count=53 decoding take microseconds = 139
Debug 18:28:57.5243967 count=52 decoding take microseconds = 153
Debug 18:28:57.5243967 count=55 decoding take microseconds = 146
Debug 18:28:57.5243967 count=103 decoding take microseconds = 239
Debug 18:28:57.5243967 count=83 decoding take microseconds = 182
Debug 18:28:57.5243967 count=85 decoding take microseconds = 180
Debug 18:28:57.5243967 count=80 decoding take microseconds = 202
Debug 18:28:57.5243967 count=58 decoding take microseconds = 135
Debug 18:28:57.5243967 count=55 decoding take microseconds = 140
Debug 18:28:57.5243967 count=81 decoding take microseconds = 183
Debug 18:28:57.5243967 count=74 decoding take microseconds = 172
Debug 18:28:57.5243967 count=80 decoding take microseconds = 174
Debug 18:28:57.5243967 count=88 decoding take microseconds = 175
Debug 18:28:57.5243967 count=55 decoding take microseconds = 131
Debug 18:28:57.5243967 count=80 decoding take microseconds = 182
Debug 18:28:57.5243967 count=80 decoding take microseconds = 183
Debug 18:28:57.5243967 count=101 decoding take microseconds = 231
Debug 18:28:57.5243967 count=58 decoding take microseconds = 134
Debug 18:28:57.5243967 count=57 decoding take microseconds = 126
Debug 18:28:57.5243967 count=57 decoding take microseconds = 134
Debug 18:28:57.5399967 count=115 decoding take microseconds = 234
Debug 18:28:57.5399967 count=106 decoding take microseconds = 225
Debug 18:28:57.5399967 count=108 decoding take microseconds = 241
Debug 18:28:57.5399967 count=84 decoding take microseconds = 177
Debug 18:28:57.5399967 count=54 decoding take microseconds = 141
Debug 18:28:57.5399967 count=84 decoding take microseconds = 186
Debug 18:28:57.5399967 count=82 decoding take microseconds = 184
Debug 18:28:57.5399967 count=82 decoding take microseconds = 179
Debug 18:28:57.5399967 count=56 decoding take microseconds = 133
Debug 18:28:57.5399967 count=57 decoding take microseconds = 127
Debug 18:28:57.5399967 count=82 decoding take microseconds = 185
Debug 18:28:57.5399967 count=76 decoding take microseconds = 178
Debug 18:28:57.5399967 count=82 decoding take microseconds = 184
Debug 18:28:57.5399967 count=54 decoding take microseconds = 139
Debug 18:28:57.5399967 count=54 decoding take microseconds = 137
Debug 18:28:57.5399967 count=81 decoding take microseconds = 184
Debug 18:28:57.5399967 count=136 decoding take microseconds = 275
Debug 18:28:57.5399967 count=55 decoding take microseconds = 138
Debug 18:28:57.5555968 count=52 decoding take microseconds = 140
Debug 18:28:57.5555968 count=53 decoding take microseconds = 136
Debug 18:28:57.5555968 count=54 decoding take microseconds = 139
Debug 18:28:57.5555968 count=55 decoding take microseconds = 138
Debug 18:28:57.5555968 count=57 decoding take microseconds = 134
Debug 18:28:57.5555968 count=53 decoding take microseconds = 136
Debug 18:28:57.5555968 count=80 decoding take microseconds = 174
Debug 18:28:57.5555968 count=74 decoding take microseconds = 175
Debug 18:28:57.5555968 count=57 decoding take microseconds = 133
Debug 18:28:57.5555968 count=57 decoding take microseconds = 149
Debug 18:28:57.5555968 count=100 decoding take microseconds = 262
Debug 18:28:57.5555968 count=56 decoding take microseconds = 156
Debug 18:28:57.5555968 count=55 decoding take microseconds = 165
从这个测试中我发现 E3-1220 比 i7-860 快两倍。
这可能吗?因为在处理器评级中这些处理器大致相同。
这可能是由于缓存或其他原因吗?如果是这样,我最好购买哪种处理器来将消息解码速度提高两倍?
我使用 Pi 计算工具对 CPU 进行了比较,结果如下:
Pi 16k
Xeon 00.234 sec
i7-860 00.171s
Pi 512k digits
Xeon 5.31 sec
i7-860(no HT) 5.987 sec.
i7-860(HT) 5.982 sec
Pi 4M digits
Xeon 0.56 min
i7-860(no HT) 1.11 min
i7-860(HT) 1.05 min
因此,Xeon 实际上速度稍快一些,但绝对不会快两倍
关闭 i7-860 上的 HT 不会改变图像。
i7-860,Windows 7,无 HT:
Debug 10:09:30.7436690 count=58 decoding take microseconds = 351
Debug 10:09:34.9269083 count=47 decoding take microseconds = 347
Debug 10:09:34.9959122 count=50 decoding take microseconds = 309
Debug 10:09:35.0359145 count=45 decoding take microseconds = 297
Debug 10:09:35.1469209 count=57 decoding take microseconds = 344
Debug 10:09:35.1979238 count=54 decoding take microseconds = 460
Debug 10:09:35.2179249 count=61 decoding take microseconds = 372
Debug 10:09:35.3009297 count=51 decoding take microseconds = 275
Debug 10:09:35.3479324 count=45 decoding take microseconds = 305
Debug 10:09:35.3779341 count=58 decoding take microseconds = 311
Debug 10:09:35.3879346 count=50 decoding take microseconds = 286
Debug 10:09:35.4379375 count=48 decoding take microseconds = 290
Debug 10:09:35.4789398 count=48 decoding take microseconds = 277
Debug 10:09:35.5089416 count=49 decoding take microseconds = 286
Debug 10:09:35.5589444 count=74 decoding take microseconds = 382
Debug 10:09:35.5679449 count=47 decoding take microseconds = 298
Debug 10:09:35.7389547 count=50 decoding take microseconds = 304
处理器比较:http://ark.intel.com/compare/52269,41316
Xeon 拥有 50% 的核心率、100% 的系统总线、AVX、ECC 内存、Turbo Boost 2.0、AES、英特尔® 按需交换、热监控技术、英特尔® 快速内存访问、英特尔® 灵活内存访问
i7-860 具有 HT 和增强型 Intel SpeedStep® 技术
可能是因为很多额外的技术,Xeon 的速度快了 2 倍......
答案1
您完全错过了两件事。
第一个数字直接回答了您的问题。Nehalem 一代,又名 i7 1.0,是 Core 2 Duo 的一大进步,但在 Sandy Bridge 之后,英特尔开始努力寻找性能改进并停滞不前。您的 Xeon 的时钟频率比同代 i7 版本快 20% 左右,在类似这种 L2 优化的工作负载中,您可以将其定为接近 40%,因为它的 L3 比 i7 版本更大。如果这是当时在该系统上运行的主要进程(因为它拥有最大的资源份额),它会变得更好,因为需要一些 CPU 内存的其他东西更少。
第二个因素,也可能是这里最重要的因素:缓存大小。
英特尔非常清楚,只有小于 L2 缓存的工作负载才能真正以处理器的真实速度运行,其他所有工作负载都以 CPU 在 L2 缓存中交换位所需的速度运行。在某种程度上,这是一种积极的做法,因为它可以规范程序员,使他们看到编写更小、更高效的代码的更多好处,但我也怀疑这是一种故意吸引消费者的手段;这是区分 Xeon 和 i7 数字运算性能的另一种主要方法(其他原因是核心数和 AVX 是主要吸引力)。Xeon 相对于 i7 的所有其他性能方面都与带宽有关......取决于其他公司改进他们的技术(以使用这种新带宽),以便 Xeon 证明其在一般工作负载(非 AVX/2)中的价值。
这就是为什么您的 Xeon(以及大多数 Xeon)具有更大的 1MB L2 缓存,而您的 i7 具有 256KB L2(由于年代久远,速度也可能更慢)的原因,我猜想 - 因为这个过程看起来像是一个分项的、易于线程化的任务集,可能被写成一个循环函数,可以装入少量内存 - 它将非常容易获得 L2 缓存速度优势,并且在 Xeon 中交换更少,从而大大提高性能。当然,出于营销原因,L2 缓存统计数据已从 Intel 的 Ark 中删除,并由“SmartCache”统计数据取代,它基本上是您的 L3 缓存,也就是用于在内核之间交换数据的缓存,速度要慢得多,与性能的关系也不太密切。
答案2
答案3
您是否运行过其他性能基准测试实用程序?了解这两个系统的性能概况会很有趣。也许其他硬件或软件会有所不同。
两台计算机都运行 64 位操作系统吗?