Author Topic: Problem booting Mint - probably messed up root files  (Read 97 times)

0 Members and 1 Guest are viewing this topic.

Offline buster

  • Member
  • Master
  • *
  • Posts: 1209
Problem booting Mint - probably messed up root files
« on: November 09, 2021, 11:10:45 am »
I did solve the problem, but I don't fully understand what went wrong. Twice in the last year this has occurred, I think after updates. Choosing a different kernel didn't solve the problem.

The problem: A few oddities were showing up, like email could come in, but not be sent. A reboot led to a black page with a few choices, and that's when I found out a different kernel made no difference. There were also many, many words - enough to bring on depression.

The solution: Buried somewhere in the morass of verbiage was a hint that I might file check the root partition. So I typed fsck /dev/sda5. And then I agreed 'yes go ahead' about 20 to 25 times. And it rebooted nicely.

My question: Since this has happened twice within the last year, is it possible that this isn't just a case of software during the updates getting confused and disoriented, and that it could be a hardware problem or something else?

This is the best desktop in the house by the way, and it runs only Mint. And a more important thing is that it's my wife's computer, so there is no room for error.
The Ironic Big Bust Theory: The likelihood of an advanced species imploding in apocalyptic stupidity. (Intergalactic Survey of Disappearing Civilizations: Chapter 4))

Offline Jason

  • Administrator
  • Master
  • *****
  • Posts: 3823
  • Humanist. Skeptic. Husband.
Re: Problem booting Mint - probably messed up root files
« Reply #1 on: November 09, 2021, 11:45:15 am »
It sounds like a hardware issue, possibly an intermittent hard drive issue. I doubt very much it's a software issue unless you've upgraded the same distro for decades. Modern filesystem formats don't need to be fsck'ed. A suggestion to do so is a good sign it's a hardware problem.

Also, have you tested your RAM? A bad RAM stick can cause all kinds of problems including disk corruption. Disk corruption or bad memory could be the cause of those issues and others. A LOT of people have bad RAM and have no idea because a few bad memory cells (each chip has millions or billions) that aren't accessed won't cause problems. Then they are, and presto! Problems. Your distro might have a memory test option when you boot up. But if not, you can make a boot disk with memcheck; I think it's called Memcheck86. There's a paid version and a free one with similar names. I think you know which one you want. It can take a while depending on the age of the computer and how much RAM you have. If it finds errors, you'll need to find out which stick is causing the issue by process of elimination. Say you have 4 sticks and an error shows up. Remove two of the sticks. Run again. If it shows up again, remove one of those sticks.

Even if it turns out to be something else, everything you do on your computer is related to RAM and if you care (or your wife cares) about not having personal files corrupted, memory should be checked anyway.
"With all its sham, drudgery, and broken dreams, it is still a beautiful world." - Max Ehrmann, Desiderata

Offline buster

  • Member
  • Master
  • *
  • Posts: 1209
Re: Problem booting Mint - probably messed up root files
« Reply #2 on: November 09, 2021, 01:55:21 pm »
"Modern filesystem formats don't need to be fsck'ed." Didn't know that.

The fsck solved the problem in both cases about 10 months apart. If the problem is a faulty memory stick (I'm understanding this from what you have written)  the flawed ram stick creates some errors on the /root system, and fsck is capable of sorting this out, even though modern file systems don't need to be fsck'd.

Have I got this correct? (So it's either ram or ssd.)
The Ironic Big Bust Theory: The likelihood of an advanced species imploding in apocalyptic stupidity. (Intergalactic Survey of Disappearing Civilizations: Chapter 4))

Offline ssfc72

  • Member
  • Master
  • *
  • Posts: 1855
Re: Problem booting Mint - probably messed up root files
« Reply #3 on: November 09, 2021, 03:10:05 pm »
Sounds like the identical problem I had with my Mint 19.1, this past Spring.  I also was able to get Mint to boot again by running the fsck command but the problem came back again fairly quickly, numerous times.

I installed Mint 20.1 and never had the problem happen again.
Mint 19.1 on a Dell 14" Inspiron notebook, HP Pavilion X360, 11" k120ca notebook (Linux Lubuntu), Dell 13" XPS notebook computer (MX Linux)
Cellphone Samsung A50, PCMobile pay as you go

Offline buster

  • Member
  • Master
  • *
  • Posts: 1209
Re: Problem booting Mint - probably messed up root files
« Reply #4 on: November 09, 2021, 05:15:55 pm »
At the moment I'm running Mint 20.2.
The Ironic Big Bust Theory: The likelihood of an advanced species imploding in apocalyptic stupidity. (Intergalactic Survey of Disappearing Civilizations: Chapter 4))

Offline ssfc72

  • Member
  • Master
  • *
  • Posts: 1855
Re: Problem booting Mint - probably messed up root files
« Reply #5 on: November 10, 2021, 06:06:37 am »
You may want to revert back to Mint 20.1
There is a kernal update, currently waiting in the Updates list on my Mint 20.1. I think I better unselect that kernal update, when I go to run the updates.
Mint 19.1 on a Dell 14" Inspiron notebook, HP Pavilion X360, 11" k120ca notebook (Linux Lubuntu), Dell 13" XPS notebook computer (MX Linux)
Cellphone Samsung A50, PCMobile pay as you go

Offline Jason

  • Administrator
  • Master
  • *****
  • Posts: 3823
  • Humanist. Skeptic. Husband.
Re: Problem booting Mint - probably messed up root files
« Reply #6 on: November 10, 2021, 02:35:20 pm »
If it's twice a year, I wouldn't worry about it, Buster. The system may have crashed, often because of a power outage or a big crash. In other words, it wasn't a clean shutdown or something else that caused disk corruption (cosmic rays?). But yes, I think it's the SSD or memory but it's intermittent enough that it's not a big deal. Unless you have important personal data on it. Make sure it's backed up and test your backups (just open a file or three to make sure it's working).

I was wrong about fsck, it does run after so many boots or an unclean shutdown (see below for more) but it's usually just reading the journal (provided in modern filesystems). So it does the fsck but you usually never see it unless it needs your intervention.

If you experience it again and again in a short time, Bill, I'd look at the bottom of this page for some instructions. Note that if it's the root partition that it's happening on, you'll have to boot a live disk to do the repair. And you'll need to know which partition it's on. You can do this graphically or at a terminal prompt, type (where X is the actual drive number):

Code: [Select]
sudo fdisk -l /dev/sdX
It'll show you the partitions on that drive. The drive letter might change when you boot up from a live disk so try sda and sdb, for example. Don't mention the partition, which comes after the letter (i.e.. don't do /dev/sda1). You can also do 'sudo fdisk -l' to list all the drives and partitions but that might be messy. Pipe it through less if it won't fit on one screen like this:

Code: [Select]
sudo fdisk -l | less
The character between the '-l' and 'less' is entered by pressing <shift> </> (in US English keyboards).
Buster: if it ever happens again, use this option with fsck so you don't have to repeatedly press 'Y':

Code: [Select]
fsck -p
That's for an ext4 filesystem (or ext2/3) which you're likely using since it's the default.

Fsck errors are usually caused by power outages. However, the Arch page says that an fsck occurs every 30 boots automatically. I don't know if that's just with Arch or all Linux distros. But if the last field in your /etc/fstab for that partition is anything other than 0, an fsck will occur after so many boots. The number specifies the order. It's 1 or 2. The root (/) should be 1. And the others will be 2s unless not checked ever. Then they're checked in the order in which they appear.

But with a journaling filesystem, like all modern filesystems including ext4, fsck just reads back the journal. If it's a more serious error, a full fsck will be required. If the fsck is required a lot it's likely that there is corruption on the disk (probably the journaling file). Why it's happening? Who knows. Bad disk, bad memory? I doubt it's the kernel, Bill. Kernels wouldn't cause disk corruption. Linus would never allow a kernel that did that to proceed beyond the development version.

Your reinstall fixed it because it fixed the file corruption and likely didn't write over the same area of the disk. SSDs use wear-levelling which means when new data is written to it, it does it on unused or lesser used cells. So a reinstall likely won't be written to the same cells. If there is a problem with some parts of the drive, you won't see it until those cells are written to again.
"With all its sham, drudgery, and broken dreams, it is still a beautiful world." - Max Ehrmann, Desiderata