SSD 不持久

SSD 不持久

我的 SSD 忘记数据并回退到以前的版本,

我已经在 Arch 论坛上提出了这个问题这里

假设我安装了一个应用程序并且一切正常。然后重新启动后,它就消失了,完全消失了,包括所有配置、相关数据等。例如,当我配置 fstab 时也会发生这种情况。重启后我又回到之前的版本。

再次重新启动使我再次回到“新”配置。第三次重新启动进入“旧”版本。所以我的系统有两个版本的事实,并在每次重新启动时在它们之间切换。

我将在下周通过复制到新的 SSD 来测试此配置。


Linux ThijsPC 3.17.1-1-ARCH #1 SMP PREEMPT Wed Oct 15 15:04:35 CEST 2014 x86_64 GNU/Linux

仅对 30% 的磁盘进行了分区。 TRIM 已启用。它使用 cfq I/O 调度程序。驱动器是已使用 3 年的 Crucial-M4。

UUID=2bddf92c-de9e-4bbc-bec9-5ba848dea11c / ext4 rw,defaults,noatime,discard 0 1
tmpfs /tmp tmpfs defaults,noatime,mode=1777 0 0

我跟着这个维基用于故障排除。

root@thijspc .:08:38:. {thijs}>hdparm -I /dev/sdc | grep TRIM
*   Data Set Management TRIM supported (limit 8 blocks)
*   Deterministic read data after TRIM

和:

[root@thijspc ~]# cat /sys/block/sdc/queue/scheduler
noop deadline [cfq] 

磁盘驱动器

Disk /dev/sdc: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 04171B1D-F083-4569-AFD9-5AC6B38C6063

Device         Start       End   Sectors  Size Type
/dev/sdc1       2048 204802047 204800000 97.7G Linux filesystem
/dev/sdc2  204802048 204804095      2048    1M BIOS boot

来自 dmesg 的 grep 'ata' (位于 ata4 上)

[    0.000000] ACPI: SSDT 0x00000000DDE30378 00036D (v01 SataRe SataTabl 00001000 INTL 20120711)
[    0.000000] Memory: 16374176K/16718360K available (5381K kernel code, 909K rwdata, 1712K rodata, 1140K init, 1176K bss, 344184K reserved)
[    0.799859] Write protecting the kernel read-only data: 8192k
[    0.847719] libata version 3.00 loaded.
[    0.902244] ata1: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16100 irq 27
[    0.902245] ata2: DUMMY
[    0.902246] ata3: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16200 irq 27
[    0.902248] ata4: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16280 irq 27
[    0.902251] ata5: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16300 irq 27
[    0.902254] ata6: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16380 irq 27
[    0.902845] ata7: SATA max UDMA/133 abar m512@0xf7b00000 port 0xf7b00100 irq 28
[    0.902848] ata8: SATA max UDMA/133 abar m512@0xf7b00000 port 0xf7b00180 irq 28
[    1.220536] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    1.220545] ata7: SATA link down (SStatus 0 SControl 300)
[    1.220557] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.220573] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

FSCK

fsck from util-linux 2.25.1
e2fsck 1.42.12 (29-Aug-2014)
Warning!  /dev/sdc1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 31002 has zero dtime.  Fix? no

Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 31010 was part of the orphaned inode list.  IGNORED.
Inode 31060 was part of the orphaned inode list.  IGNORED.
Inode 31199 was part of the orphaned inode list.  IGNORED.
Inode 4075151 was part of the orphaned inode list.  IGNORED.
Inode 4075224 was part of the orphaned inode list.  IGNORED.
Inode 4075385 was part of the orphaned inode list.  IGNORED.
Inode 4075637 was part of the orphaned inode list.  IGNORED.
Inode 4075641 was part of the orphaned inode list.  IGNORED.
Inode 4075873 was part of the orphaned inode list.  IGNORED.
Inode 4075874 was part of the orphaned inode list.  IGNORED.
Inode 4075944 was part of the orphaned inode list.  IGNORED.
Inode 4075968 was part of the orphaned inode list.  IGNORED.
Inode 4075973 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Entry 'E575FA3D8C4DFC93DCE2C4BD0E4E6B4B928AE50E' in /home/thijs/.cache/mozilla/firefox/ltoaq0gi.default/cache2/entries (5269454) has deleted/unused inode 5284764.  Clear? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(1684521--1684524) -(1685190--1685198) -(2181790--2181798) -(7614052--7614054) -(7615072--7615074) -(7620243--7620250) -(11070148--11070151) -16529433 -(16529439--16529447) -(21100591--21100598) -(21100641--21100648)
Fix? no

Free blocks count wrong (12652279, counted=12643997).
Fix? no

Inode bitmap differences:  -31002 -31010 -31060 -31199 -4075151 -4075224 -4075385 -4075637 -4075641 -(4075873--4075874) -4075944 -4075968 -4075973 -5284764
Fix? no

Free inodes count wrong for group #645 (572, counted=571).
Fix? no

Free inodes count wrong (5515652, counted=5511054).
Fix? no


/dev/sdc1: ********** WARNING: Filesystem still has errors **********


      890492 inodes used (13.90%, out of 6406144)
        2070 non-contiguous files (0.2%)
         365 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 869562/388/1
    12947720 blocks used (50.58%, out of 25599999)
           0 bad blocks
           2 large files

      514993 regular files
      339635 directories
           0 character device files
           0 block device files
           2 fifos
        2340 links
       40434 symbolic links (25112 fast symbolic links)
           2 sockets
------------
      897407 files

和 smartctl -a

smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.17.1-1-ARCH] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron RealSSD m4/C400/P400
Device Model:     M4-CT256M4SSD2
Serial Number:    0000000012530922F762
LU WWN Device Id: 5 00a075 10922f762
Firmware Version: 040H
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Nov 17 15:36:20 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  300) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  19) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       4240
 12 Power_Cycle_Count       0x0032   100   100   001    Old_age   Always       -       1010
170 Grown_Failing_Block_Ct  0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   001    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   001    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0033   099   099   010    Pre-fail  Always       -       38
174 Unexpect_Power_Loss_Ct  0x0032   100   100   001    Old_age   Always       -       68
181 Non4k_Aligned_Access    0x0022   100   100   001    Old_age   Always       -       520 386 134
183 SATA_Iface_Downshift    0x0032   100   100   001    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   001    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   001    Old_age   Always       -       0
189 Factory_Bad_Block_Ct    0x000e   100   100   001    Old_age   Always       -       80
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x003a   100   100   001    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   001    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   001    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   001    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   001    Old_age   Always       -       0
202 Perc_Rated_Life_Used    0x0018   099   099   001    Old_age   Offline      -       1
206 Write_Error_Rate        0x000e   100   100   001    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Vendor (0xff)       Completed without error       00%      4232         -
# 2  Vendor (0xff)       Completed without error       00%      4219         -
# 3  Vendor (0xff)       Completed without error       00%      1702         -
# 4  Vendor (0xff)       Completed without error       00%      1277         -
# 5  Short offline       Completed without error       00%      1218         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

蛴螬

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-2bddf92c-de9e-4bbc-bec9-5ba848dea11c' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_gpt
        insmod ext2
        set root='hd2,gpt1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd2,gpt1 --hint-efi=hd2,gpt1 --hint-baremetal=ahci2,gpt1  2bddf92c-de9e-4bbc-bec9-5ba848dea11c
        else
          search --no-floppy --fs-uuid --set=root 2bddf92c-de9e-4bbc-bec9-5ba848dea11c
        fi
        echo    'Loading Linux linux ...'
        linux   /boot/vmlinuz-linux root=UUID=2bddf92c-de9e-4bbc-bec9-5ba848dea11c rw  quiet
        echo    'Loading initial ramdisk ...'
        initrd  /boot/initramfs-linux.img
}

答案1

这可能是一个坏SSD。您能从 LiveCD 重现此行为吗? (fsck、挂载、写入一些数据、umount、重新启动 LiveCD、挂载、检查数据...)如果显示相同的问题,则可能是 SSD 或文件系统损坏,以至于 fsck 实际上无法修复它。

如果 Live CD 没有出现这种情况,则原因可能在于其他东西,比如您的内核或您安装的系统由于某种原因在重启时没有正确卸载/刷新...

除了非对齐的 4k 访问之外,您的 SMART 数据对我来说并不奇怪,您的值似乎比我自己的 Crucial M4 更大。你的分区都对齐了吗?尽管即使不是,也不应该导致这样的错误。

相关内容