Koozali.org: home of the SME Server

Recovery after crash: can't access mail; raid 1 set impaired.

Offline julianop

  • *
  • 61
  • +0/-0
I have just reconstructed my SME server after a crash last Friday of the system drive (failed sectors, it appears). I took the opportunity to upgrade from 8 to 9.2, and have two problems, for which I'm asking for help here: 1) users unable to connect into the mail system, and 2) trashed RAID1 (inadvertently self induced, I believe).

I'll give the background here, as it is relates to the two problems, but fully anticipate being instructed to separate discussion into two threads.

My installation for several years was SME 8 through several updates, into which I had introduced a manually constructed RAID 1 array to satisfy the links "/home/e-smith/files/users", "/home/e-smith/files/primary", "/home/e-smith/files/ibays/media", and "/home/e-smith/files/ibays/pub".
All was well.
The mapping in /etc/fstab/ is now..
"/dev/md127    /mnt/mymd0    ext4  defaults    0 0"

("/mnt/mymd0" is just the legacy name of the mount point; with SME 8 raid md0 was not used for "/". I needed to maintain that name for the backup recovery. I'm aware of the use of md0 as a raid structure name for "/"), and will change my mount point name to avoid confusion once I'm properly up and running.)

All user parent directories and their subs - "home" and "Maildir" - are therefore on the raid array. This has worked satisfactorily for a number of years, except that when recovering after the crash, I discovered that the "backup to workstation" didn't follow those links, so while I have a backup of the server configuration, and some data that wasn't on the raid array, I don't have email or file backups for my users. That was a stupid mistake: I should have known better than to not check.

In my rebuild,  had performed a restore to my clean install of 9.2, but at that point didn't realize that SME 9.2 defaults to 1.2 superblocks, and my raid 1 pair was set up with 0.9. The superblocks show up as bad now, and I can't mount the raid array (though it assembles...). Perhaps by my own misdoing from ignorance, there was no "/etc/mdadm.conf" or other information to tell 9.2 that classic 0.9 format was in use.

So, the server identity and the user configurations are restored, but I had to reconstruct the users' "Maildir" directory structure on the raid array. I did this by using the admin tool to create a new, dummy user using the admin tool, and then using "rsync -av" to replicate that structure in each of the users' directories. I then adjusted the ownership of the Maildir directory recursively to <username>:<username>.

At this point, incoming mail is beginning to flow in to the users' "Maildir/new" folders. and SpamAssassin is heroically diverting bad items to Maildir/.junkmail/new. The problem is that I cannot connect to the server using Thunderbird or by webmail, as I have done in the past.

In dealing with the crash I have been piling through information looking for clues but confess I am fatigued, and need help:
First order of business is to get my users and me access to our email, second is to recover previous email records and my data files from the old raid pair.
var/log/dovecot/current has a zillion identical errors; perhaps I should start here:
@400000005b15479d2495b70c Fatal: service(imap-login) User doesn't exist: dovenull (See default_login_user setting)
@400000005b15479d24961c9c master: Fatal: service(imap-login) User doesn't exist: dovenull (See default_login_user setting)

Thunderbird says "Could not connect to mail server spencer; the connection was refused." (spencer is the name of the server). Local DNS is provided by my router, and the client computer has no trouble resolving the name and getting ping returns from the server. However whenever I try to access the server to remap drives, windows 7 explorer crashes and restarts.


« Last Edit: June 04, 2018, 04:21:45 PM by julianop »

Offline julianop

  • *
  • 61
  • +0/-0
Re: Recovery after crash: can't access mail; raid 1 set impaired.
« Reply #1 on: June 06, 2018, 04:02:44 AM »
TLDR? OK, well, I fixed it.
I never did find out why the fresh install wouldn't respond to IMAP requests or why it caused Windows explorer to crash; I simply repeated the installation 9.2, which solved that problem. However...
I let the installer handle the RAID architecture, which was a big mistake: I gave it a new 250G drive for system (I had hoped it would do more than simply put" /boot" there, but...) and a pair of new 2T's for "/". What it did was to make two three-device RAID 1 arrays, with 232.5G from each drive, ignoring the major part of the 2T's:

[root@spencer log]# lsblk
sdc                      8:32   0   1.8T  0 disk
├─sdc1                   8:33   0   250M  0 part
│ └─md0                  9:0    0   250M  0 raid1 /boot
└─sdc2                   8:34   0   1.8T  0 part
  └─md1                  9:1    0 232.5G  0 raid1
    ├─main-root (dm-0) 253:0    0 225.7G  0 lvm   /
    └─main-swap (dm-1) 253:1    0   6.8G  0 lvm   [SWAP]
sdb                      8:16   0   1.8T  0 disk
├─sdb1                   8:17   0   250M  0 part
│ └─md0                  9:0    0   250M  0 raid1 /boot
└─sdb2                   8:18   0   1.8T  0 part
  └─md1                  9:1    0 232.5G  0 raid1
    ├─main-root (dm-0) 253:0    0 225.7G  0 lvm   /
    └─main-swap (dm-1) 253:1    0   6.8G  0 lvm   [SWAP]
sda                      8:0    0 232.9G  0 disk
├─sda1                   8:1    0   250M  0 part
│ └─md0                  9:0    0   250M  0 raid1 /boot
└─sda2                   8:2    0 232.7G  0 part
  └─md1                  9:1    0 232.5G  0 raid1
    ├─main-root (dm-0) 253:0    0 225.7G  0 lvm   /
    └─main-swap (dm-1) 253:1    0   6.8G  0 lvm   [SWAP]

That's what I get for letting the installer know better, I guess. I'll fix it later; I've lost too much sleep over the past couple of days.

And the trashed RAID array? In the end, all I needed was

# dumpe2fs /dev/sda2 | grep superblock

followed by

# fsck -b 32768 /dev/sda2

followed by a lot of "y" responses
on one of the drives, which I then mounted to the fresh 9.2 system to copy all my stuff back.
I'll figure out how to tell the backup to back up ALL my data next.