我已经尝试使用 linaro 的 flashbench 工具,但一直无法猜测我的 earse 块大小可能是多少。如果能提供一些关于如何解释我的结果的输入就太好了。
我的 USB 棒是 SanDisk Extreme 64GB SDCZ48-064G,我在 USB3 和 USB2 上使用 FAT32 SDCard Class10 卡上的 Lubuntu 14 对其进行了测试。
这是我使用 USB3 测量的结果:
sudo ./flashbench -a -b=1024 -c 1000 /dev/sdi
align 17179869184 pre 429µs on 438µs post 334µs diff 56.1µs
align 8589934592 pre 429µs on 436µs post 334µs diff 54.5µs
align 4294967296 pre 431µs on 438µs post 337µs diff 54µs
align 2147483648 pre 455µs on 461µs post 365µs diff 51.1µs
align 1073741824 pre 455µs on 461µs post 365µs diff 51.8µs
align 536870912 pre 451µs on 456µs post 358µs diff 51.7µs
align 268435456 pre 452µs on 458µs post 359µs diff 52.1µs
align 134217728 pre 452µs on 459µs post 360µs diff 52.5µs
align 67108864 pre 451µs on 460µs post 360µs diff 54.3µs
align 33554432 pre 434µs on 440µs post 339µs diff 53.7µs
align 16777216 pre 452µs on 460µs post 342µs diff 63µs
align 8388608 pre 451µs on 458µs post 359µs diff 52.7µs
align 4194304 pre 452µs on 458µs post 359µs diff 52.4µs
align 2097152 pre 437µs on 443µs post 359µs diff 45.1µs
align 1048576 pre 423µs on 433µs post 340µs diff 51.4µs
align 524288 pre 423µs on 432µs post 341µs diff 50.4µs
align 262144 pre 421µs on 431µs post 308µs diff 66.3µs
align 131072 pre 379µs on 430µs post 321µs diff 80µs
align 65536 pre 343µs on 358µs post 343µs diff 14.6µs
align 32768 pre 343µs on 358µs post 342µs diff 15.7µs
align 16384 pre 342µs on 356µs post 341µs diff 14.1µs
align 8192 pre 343µs on 356µs post 341µs diff 14µs
align 4096 pre 340µs on 352µs post 340µs diff 12.3µs
align 2048 pre 340µs on 353µs post 340µs diff 12.4µs
如您所见,在 128KiB 处只有一个明显的跳跃。在 4MiB 和 16MiB 处出现了一些不太明显的跳跃。但这些结果都不像您在末尾链接的页面中找到的示例那样清晰。
使用不同块大小进行相同测量的结果:
sudo ./flashbench -a -b=$[8*1024] -c 1000 /dev/sdi
align 17179869184 pre 493µs on 484µs post 397µs diff 38.9µs
align 8589934592 pre 490µs on 480µs post 396µs diff 36.7µs
align 4294967296 pre 493µs on 478µs post 398µs diff 32.6µs
align 2147483648 pre 517µs on 505µs post 429µs diff 31.7µs
align 1073741824 pre 517µs on 503µs post 425µs diff 32.1µs
align 536870912 pre 515µs on 502µs post 425µs diff 31.9µs
align 268435456 pre 515µs on 501µs post 422µs diff 33µs
align 134217728 pre 513µs on 500µs post 423µs diff 32.8µs
align 67108864 pre 514µs on 501µs post 423µs diff 32.8µs
align 33554432 pre 497µs on 485µs post 403µs diff 34.8µs
align 16777216 pre 515µs on 503µs post 405µs diff 42.5µs
align 8388608 pre 513µs on 502µs post 422µs diff 34µs
align 4194304 pre 514µs on 501µs post 421µs diff 33.7µs
align 2097152 pre 505µs on 490µs post 430µs diff 22.2µs
align 1048576 pre 487µs on 477µs post 405µs diff 31µs
align 524288 pre 486µs on 475µs post 405µs diff 29.5µs
align 262144 pre 486µs on 475µs post 374µs diff 45.4µs
align 131072 pre 440µs on 470µs post 381µs diff 59.5µs
align 65536 pre 404µs on 406µs post 406µs diff 1.67µs
align 32768 pre 408µs on 409µs post 409µs diff 735ns
align 16384 pre 409µs on 411µs post 410µs diff 2.09µs
sudo ./flashbench -a -b=$[64*1024] -c 1000 /dev/sdi
align 17179869184 pre 781µs on 796µs post 742µs diff 35.2µs
align 8589934592 pre 786µs on 793µs post 743µs diff 28µs
align 4294967296 pre 787µs on 793µs post 744µs diff 27.3µs
align 2147483648 pre 831µs on 829µs post 794µs diff 16.3µs
align 1073741824 pre 829µs on 827µs post 791µs diff 17.2µs
align 536870912 pre 828µs on 825µs post 792µs diff 15.5µs
align 268435456 pre 827µs on 825µs post 788µs diff 17.7µs
align 134217728 pre 827µs on 825µs post 789µs diff 16.8µs
align 67108864 pre 826µs on 826µs post 788µs diff 19.3µs
align 33554432 pre 798µs on 818µs post 788µs diff 24.9µs
align 16777216 pre 827µs on 826µs post 798µs diff 13.1µs
align 8388608 pre 826µs on 824µs post 788µs diff 17.1µs
align 4194304 pre 828µs on 824µs post 787µs diff 16.4µs
align 2097152 pre 811µs on 811µs post 787µs diff 12µs
align 1048576 pre 799µs on 797µs post 768µs diff 13.9µs
align 524288 pre 801µs on 796µs post 769µs diff 11.2µs
align 262144 pre 798µs on 794µs post 733µs diff 28.4µs
align 131072 pre 764µs on 793µs post 746µs diff 38.5µs
为了测试擦除块大小,我尝试了 open-au 选项:
sudo ./flashbench -O -b=$[8*1024] -e=$[4*1024*1024] -c 100 --open-au-nr=1 /dev/sdi
4MiB 74.2M/s
2MiB 100M/s
1MiB 71.8M/s
512KiB 91.1M/s
256KiB 28.7M/s
128KiB 33.8M/s
64KiB 27.1M/s
32KiB 21.9M/s
16KiB 22.7M/s
8KiB 12.7M/s
sudo ./flashbench -O -b=$[8*1024] -e=$[8*1024*1024] -c 100 --open-au-nr=1 /dev/sdi
8MiB 77.6M/s
4MiB 87.9M/s
2MiB 86.2M/s
1MiB 68M/s
512KiB 78.4M/s
256KiB 26.5M/s
128KiB 29.3M/s
64KiB 22.5M/s
32KiB 19.9M/s
16KiB 26.6M/s
8KiB 14.5M/s
sudo ./flashbench -O -b=$[8*1024] -e=$[16*1024*1024] -c 100 --open-au-nr=1 /dev/sdi
16MiB 94.2M/s
8MiB 123M/s
4MiB 112M/s
2MiB 95.7M/s
1MiB 95.9M/s
512KiB 83.3M/s
256KiB 30.2M/s
128KiB 27.7M/s
64KiB 22.8M/s
32KiB 15.9M/s
16KiB 28.5M/s
8KiB 15.3M/s
为了进行比较,一个块大小为 1KiB,另一个块大小为 16KiB:
sudo ./flashbench -O -b=$[1024] -e=$[16*1024*1024] -c 100 --open-au-nr=1 /dev/sdi
16MiB 111M/s
8MiB 114M/s
4MiB 124M/s
2MiB 101M/s
1MiB 97.1M/s
512KiB 77.7M/s
256KiB 30.8M/s
128KiB 27.6M/s
64KiB 22.3M/s
32KiB 15.2M/s
16KiB 33.5M/s
8KiB 15M/s
4KiB 7.47M/s
2KiB 3.31M/s
1KiB 1.63M/s
sudo ./flashbench -O -b=$[16*1024] -e=$[16*1024*1024] -c 100 --open-au-nr=1 /dev/sdi
16MiB 94.5M/s
8MiB 98.6M/s
4MiB 99M/s
2MiB 88.8M/s
1MiB 97.6M/s
512KiB 78.6M/s
256KiB 31.6M/s
128KiB 30.4M/s
64KiB 22M/s
32KiB 14.7M/s
16KiB 28.4M/s
此外,我还使用不同的擦除值测试了 find-fat 选项:
sudo ./flashbench -f -b=$[8*1024] -e=$[4*1024*1024] /dev/sdi
4MiB 90.9M/s 215M/s 169M/s 137M/s 220M/s 150M/s
2MiB 86.9M/s 217M/s 169M/s 138M/s 218M/s 119M/s
1MiB 87.7M/s 215M/s 214M/s 137M/s 216M/s 125M/s
512KiB 85.7M/s 207M/s 162M/s 134M/s 207M/s 146M/s
256KiB 85.7M/s 161M/s 201M/s 112M/s 209M/s 121M/s
128KiB 48.2M/s 60.5M/s 55.6M/s 55.8M/s 48.5M/s 56.2M/s
64KiB 88.1M/s 156M/s 164M/s 112M/s 131M/s 119M/s
32KiB 77.7M/s 159M/s 135M/s 117M/s 159M/s 127M/s
16KiB 73.3M/s 106M/s 119M/s 97.8M/s 103M/s 98.9M/s
8KiB 59.5M/s 67.7M/s 67.8M/s 65M/s 66.3M/s 66.5M/s
sudo ./flashbench -f -b=$[8*1024] -e=$[8*1024*1024] /dev/sdi
8MiB 112M/s 152M/s 143M/s 186M/s 228M/s 170M/s
4MiB 131M/s 172M/s 144M/s 147M/s 221M/s 222M/s
2MiB 131M/s 172M/s 122M/s 183M/s 228M/s 228M/s
1MiB 130M/s 173M/s 144M/s 183M/s 169M/s 227M/s
512KiB 118M/s 170M/s 118M/s 180M/s 221M/s 223M/s
256KiB 120M/s 152M/s 143M/s 180M/s 193M/s 192M/s
128KiB 48.9M/s 57.4M/s 56.2M/s 44.1M/s 61M/s 58.2M/s
64KiB 110M/s 135M/s 135M/s 112M/s 165M/s 156M/s
32KiB 106M/s 122M/s 149M/s 115M/s 171M/s 156M/s
16KiB 93.1M/s 109M/s 113M/s 88.9M/s 116M/s 122M/s
8KiB 64.9M/s 66M/s 66.8M/s 64.9M/s 67.9M/s 67.5M/s
sudo ./flashbench -f -b=$[8*1024] -e=$[16*1024*1024] /dev/sdi
16MiB 173M/s 131M/s 208M/s 223M/s 205M/s 194M/s
8MiB 178M/s 120M/s 221M/s 191M/s 223M/s 169M/s
4MiB 175M/s 125M/s 219M/s 191M/s 220M/s 169M/s
2MiB 175M/s 124M/s 219M/s 190M/s 219M/s 160M/s
1MiB 174M/s 119M/s 215M/s 186M/s 215M/s 163M/s
512KiB 168M/s 121M/s 201M/s 198M/s 200M/s 176M/s
256KiB 158M/s 121M/s 180M/s 209M/s 181M/s 182M/s
128KiB 51.7M/s 51.2M/s 58M/s 58.8M/s 54.5M/s 53.2M/s
64KiB 129M/s 111M/s 116M/s 158M/s 148M/s 144M/s
32KiB 128M/s 111M/s 146M/s 159M/s 147M/s 145M/s
16KiB 98.3M/s 91.1M/s 105M/s 110M/s 106M/s 105M/s
8KiB 66.2M/s 64M/s 67.5M/s 67.5M/s 67.5M/s 65.1M/s
现在我迷路了,你可能也一样,那么我该如何理解这些数据点呢?
总结: 我猜我有 8KiB 页面大小、128KiB 块和 16MiB 段,这也是 EraseBlockSize。有人能验证这一点或指出我的假设或方法中的错误吗?
如果有必要,我可以用不同的参数进行新的测量,但我没有有用的想法。
对于那些对此感兴趣的人,有一些链接:
- http://www.bradfordembedded.com/2014/05/flashbenching/
- http://lwn.net/Articles/428584/
- https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey
- http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device
- http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
- http://blog.nuclex-games.com/2009/12/aligning-an-ssd-on-linux/
- https://github.com/bradfa/flashbench/tree/dev
还有一些德语的:
答案1
我绝不是一个专家,但无论如何我都会尝试一下。
第一组测试旨在确定设备的页面大小。要点是执行两个 1KB读在 64KB 边界之前、之后和跨越 64KB 边界时,成本相似。这可能意味着页面大小为 128 KB,因为如果闪存控制器无法执行小于页面大小的读取,则跨越任何小于 128KB 的边界的读取成本相似。
在原始 lwn 文章中,给定对齐边界处的峰值被推断为块或页面边界,因为假设跨块边界的页面读取必须产生额外成本。除了某些特定的 FTL 实现外,我还没有看到任何解释为什么会这样。事实上,README 提到了这个警告:
“有些卡只有在使用特定块大小的访问时才会显示出清晰的模式,而其他卡则不会显示任何模式,这意味着需要以不同的方式确定数字。”
我想说的是,确定块大小的技巧并不适用于您的设备。
open au 测试旨在找出设备可以处理的并行顺序流的数量,而不会使流相互干扰。就您而言,您只测试了 open au = 1,因此测试不会告诉您有关闪存参数的任何信息,而是测量执行小 IO 与执行大 IO 对吞吐量的影响。小 IO 会导致每一层的开销,因此延迟更大,吞吐量更低。