• Welcome to Peterborough Linux User Group (Canada) Forum.
 

Problem booting Mint - probably messed up root files

Started by buster, November 09, 2021, 11:10:45 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

buster

I did solve the problem, but I don't fully understand what went wrong. Twice in the last year this has occurred, I think after updates. Choosing a different kernel didn't solve the problem.

The problem: A few oddities were showing up, like email could come in, but not be sent. A reboot led to a black page with a few choices, and that's when I found out a different kernel made no difference. There were also many, many words - enough to bring on depression.

The solution: Buried somewhere in the morass of verbiage was a hint that I might file check the root partition. So I typed fsck /dev/sda5. And then I agreed 'yes go ahead' about 20 to 25 times. And it rebooted nicely.

My question: Since this has happened twice within the last year, is it possible that this isn't just a case of software during the updates getting confused and disoriented, and that it could be a hardware problem or something else?

This is the best desktop in the house by the way, and it runs only Mint. And a more important thing is that it's my wife's computer, so there is no room for error.
Growing up from childhood and becoming an adult is highly overrated.

Jason

It sounds like a hardware issue, possibly an intermittent hard drive issue. I doubt very much it's a software issue unless you've upgraded the same distro for decades. Modern filesystem formats don't need to be fsck'ed. A suggestion to do so is a good sign it's a hardware problem.

Also, have you tested your RAM? A bad RAM stick can cause all kinds of problems including disk corruption. Disk corruption or bad memory could be the cause of those issues and others. A LOT of people have bad RAM and have no idea because a few bad memory cells (each chip has millions or billions) that aren't accessed won't cause problems. Then they are, and presto! Problems. Your distro might have a memory test option when you boot up. But if not, you can make a boot disk with memcheck; I think it's called Memcheck86. There's a paid version and a free one with similar names. I think you know which one you want. It can take a while depending on the age of the computer and how much RAM you have. If it finds errors, you'll need to find out which stick is causing the issue by process of elimination. Say you have 4 sticks and an error shows up. Remove two of the sticks. Run again. If it shows up again, remove one of those sticks.

Even if it turns out to be something else, everything you do on your computer is related to RAM and if you care (or your wife cares) about not having personal files corrupted, memory should be checked anyway.
* Zorin OS 17.1 Core and Windows 11 Pro on a Dell Precision 3630 Tower with an
i5-8600 3.1 GHz 6-core processor, dual 22" displays, 16 GB of RAM, 512 GB Nvme and a Geforce 1060 6 GB card
* Motorola Edge (2022) phone with Android 13

buster

"Modern filesystem formats don't need to be fsck'ed." Didn't know that.

The fsck solved the problem in both cases about 10 months apart. If the problem is a faulty memory stick (I'm understanding this from what you have written)  the flawed ram stick creates some errors on the /root system, and fsck is capable of sorting this out, even though modern file systems don't need to be fsck'd.

Have I got this correct? (So it's either ram or ssd.)
Growing up from childhood and becoming an adult is highly overrated.

ssfc72

Sounds like the identical problem I had with my Mint 19.1, this past Spring.  I also was able to get Mint to boot again by running the fsck command but the problem came back again fairly quickly, numerous times.

I installed Mint 20.1 and never had the problem happen again.
Mint 20.3 on a Dell 14" Inspiron notebook, HP Pavilion X360, 11" k120ca notebook (Linux Lubuntu), Dell 13" XPS notebook computer (MXLinux)
Cellphone Samsung A50, Koodo pre paid service

buster

Growing up from childhood and becoming an adult is highly overrated.

ssfc72

You may want to revert back to Mint 20.1
There is a kernal update, currently waiting in the Updates list on my Mint 20.1. I think I better unselect that kernal update, when I go to run the updates.
Mint 20.3 on a Dell 14" Inspiron notebook, HP Pavilion X360, 11" k120ca notebook (Linux Lubuntu), Dell 13" XPS notebook computer (MXLinux)
Cellphone Samsung A50, Koodo pre paid service

Jason

If it's twice a year, I wouldn't worry about it, Buster. The system may have crashed, often because of a power outage or a big crash. In other words, it wasn't a clean shutdown or something else that caused disk corruption (cosmic rays?). But yes, I think it's the SSD or memory but it's intermittent enough that it's not a big deal. Unless you have important personal data on it. Make sure it's backed up and test your backups (just open a file or three to make sure it's working).

I was wrong about fsck, it does run after so many boots or an unclean shutdown (see below for more) but it's usually just reading the journal (provided in modern filesystems). So it does the fsck but you usually never see it unless it needs your intervention.

If you experience it again and again in a short time, Bill, I'd look at the bottom of this page for some instructions. Note that if it's the root partition that it's happening on, you'll have to boot a live disk to do the repair. And you'll need to know which partition it's on. You can do this graphically or at a terminal prompt, type (where X is the actual drive number):

sudo fdisk -l /dev/sdX

It'll show you the partitions on that drive. The drive letter might change when you boot up from a live disk so try sda and sdb, for example. Don't mention the partition, which comes after the letter (i.e.. don't do /dev/sda1). You can also do 'sudo fdisk -l' to list all the drives and partitions but that might be messy. Pipe it through less if it won't fit on one screen like this:

sudo fdisk -l | less

The character between the '-l' and 'less' is entered by pressing <shift> </> (in US English keyboards).
Buster: if it ever happens again, use this option with fsck so you don't have to repeatedly press 'Y':

fsck -p

That's for an ext4 filesystem (or ext2/3) which you're likely using since it's the default.

Fsck errors are usually caused by power outages. However, the Arch page says that an fsck occurs every 30 boots automatically. I don't know if that's just with Arch or all Linux distros. But if the last field in your /etc/fstab for that partition is anything other than 0, an fsck will occur after so many boots. The number specifies the order. It's 1 or 2. The root (/) should be 1. And the others will be 2s unless not checked ever. Then they're checked in the order in which they appear.

But with a journaling filesystem, like all modern filesystems including ext4, fsck just reads back the journal. If it's a more serious error, a full fsck will be required. If the fsck is required a lot it's likely that there is corruption on the disk (probably the journaling file). Why it's happening? Who knows. Bad disk, bad memory? I doubt it's the kernel, Bill. Kernels wouldn't cause disk corruption. Linus would never allow a kernel that did that to proceed beyond the development version.

Your reinstall fixed it because it fixed the file corruption and likely didn't write over the same area of the disk. SSDs use wear-levelling which means when new data is written to it, it does it on unused or lesser used cells. So a reinstall likely won't be written to the same cells. If there is a problem with some parts of the drive, you won't see it until those cells are written to again.
* Zorin OS 17.1 Core and Windows 11 Pro on a Dell Precision 3630 Tower with an
i5-8600 3.1 GHz 6-core processor, dual 22" displays, 16 GB of RAM, 512 GB Nvme and a Geforce 1060 6 GB card
* Motorola Edge (2022) phone with Android 13