Koozali.org: home of the SME Server
Contribs.org Forums => Koozali SME Server 10.x => Topic started by: pmulroney on September 30, 2023, 10:58:08 AM
-
Hi Everyone,
A few days ago, I started getting this error every 5 minutes on my server, sent as an email with the subject: "Cron <root@fileserver> /usr/bin/php /usr/bin/horde-alarms"
Fatal Error:
Server error when querying database.
In /usr/share/pear/Horde/Alarm/Sql.php on line 91
1. Horde_Alarm->notify() /usr/bin/horde-alarms:26
2. Horde_Alarm->listAlarms() /usr/share/pear/Horde/Alarm.php:431
3. Horde_Alarm_Sql->_list() /usr/share/pear/Horde/Alarm.php:116
I think the hard drives are OK, no errors, only 37% full.
There was another error just before the first of these that says there was an error trying to do a database dump:
\nThere was an error trying to dump database tikiwiki, please check for table errors in this db. Forcing a backup of the corrupted DB. \nFailed to force backup of corrupted db tikiwiki as tikiwiki-failed.dump
Today I logged into the server via SSH and while I was looking at the various running processes, this message appeared:
Broadcast message from systemd-journald@fileserver.logicaldevelopments.com.au (Sat 2023-09-30 16:40:01 AWST):
HORDE[17076]: [horde] QUERY FAILED: Can't open file: './horde/horde_alarms.frm' (errno: 24)
DELETE FROM horde_alarms WHERE alarm_end IS NOT NULL AND alarm_end < '2023-09-30 08:40:01' [pid 17076 on line 324 of "/usr/share/pear/Horde/Db/Adapter/Mysqli.php"]
For now, I've basically commented out the horde-alarms code so that it runs but does nothing, but I fear that that I'm just masking a deeper issue.
Has Maria DB got some kind of corruption? How do I recover/rebuild it?
-
Check the db logs in /var/log/mariadb
Check /var/log/messages for occurred prior to.
Also
journalctl -xe
Need to try and find the source of the error.
Repair likely depends on whats broken.
I can see a comment regarding tikiwiki but not sure how it affects horde?
It may require a db restore from backup.
Have a read on my notes on logs & looking for errors.
https://forums.koozali.org/index.php/topic,54724.0.html
https://wiki.koozali.org/How_to_report_a_problem
Get a list of installed contribs & template mods too.
-
you need to take action soon as a corrupt db prevent the whole system to backup.
you can either use the GUI with phpmyadmin or the cli using mysqlcheck.
first check what table is corrupt then use repair.
if repair fails restore this table from backup. this might mean opening the horde db dump and copy paste only the relevant part.
for the cause: check your hard drive is not failing and remember if you has any power outage.
-
I knew JP would suggest something sensible!! :lol:
-
Check the db logs in /var/log/mariadb
Checking the db logs in /var/log/mariadb, I get lots of these types of errors on the tikiwiki database:
230930 1:00:47 [ERROR] mysqld: Can't open file: './tikiwiki/tiki_rss_modules.frm' (errno: 24)
230930 1:00:47 [ERROR] mysqld: Can't open file: './tikiwiki/tiki_search_stats.frm' (errno: 24)
... and a bit later on the same with horde, although it's always the same table:
230930 1:25:01 [ERROR] mysqld: Can't open file: './horde/horde_alarms.frm' (errno: 24)
230930 1:30:02 [ERROR] mysqld: Can't open file: './horde/horde_alarms.frm' (errno: 24)
If you go back further in time, there's a log from a month ago with these entries:
230825 15:59:16 InnoDB: The InnoDB memory heap is disabled
230825 15:59:16 InnoDB: Mutexes and rw_locks use GCC atomic builtins
230825 15:59:16 InnoDB: Compressed tables use zlib 1.2.7
230825 15:59:16 InnoDB: Using Linux native AIO
230825 15:59:16 InnoDB: Initializing buffer pool, size = 128.0M
230825 15:59:16 InnoDB: Completed initialization of buffer pool
230825 15:59:16 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
230825 15:59:33 InnoDB: Waiting for the background threads to start
230825 15:59:34 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 508399285
230825 15:59:35 [Note] Plugin 'FEEDBACK' is disabled.
230825 15:59:40 InnoDB: Starting shutdown...
230825 15:59:43 InnoDB: Shutdown completed; log sequence number 508399285
230825 15:59:43 InnoDB: The InnoDB memory heap is disabled
230825 15:59:43 InnoDB: Mutexes and rw_locks use GCC atomic builtins
230825 15:59:43 InnoDB: Compressed tables use zlib 1.2.7
230825 15:59:43 InnoDB: Using Linux native AIO
230825 15:59:43 InnoDB: Initializing buffer pool, size = 128.0M
230825 15:59:43 InnoDB: Completed initialization of buffer pool
230825 15:59:43 InnoDB: highest supported file format is Barracuda.
230825 15:59:43 InnoDB: Waiting for the background threads to start
230825 15:59:44 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 508399285
230825 15:59:44 [Note] Plugin 'FEEDBACK' is disabled.
230825 15:59:44 [Note] Server socket created on IP: '0.0.0.0'.
230825 15:59:45 [Note] Event Scheduler: Loaded 0 events
230825 15:59:45 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.68-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
However I'm not sure if that's a problem or not because the recurring error only happened a few days ago, not a month ago.
There was a power failure about a month ago, long enough for the UPS to fail as well. No issues until recently.
Check /var/log/messages for occurred prior to.
/var/log/messages contains these messages:
Sep 25 01:29:08 fileserver smartd[799]: Device: /dev/sdb [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Sep 25 03:29:08 fileserver smartd[799]: Device: /dev/sda [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Sep 25 18:29:08 fileserver smartd[799]: Device: /dev/sdb [SAT], SMART Usage Attribute: 9 Power_On_Hours changed from 87 to 86
Oct 1 01:20:49 fileserver smartd[801]: Device: /dev/sda [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100
Oct 1 01:20:49 fileserver smartd[801]: Device: /dev/sdb [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100
Is that bad?
Then there's entries like this which point to a corrupt database:
Sep 28 01:00:01 fileserver esmith::event[14688]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables
Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'tiki\_referer\_stats'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_activities'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_activity\_roles'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_instance\_activities'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_instance\_comments'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show fields from `tiki_related_categories`': Out of resources when opening file '/var/tmp/#sql_882_0.MAI' (Errcode: 24) (23)
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'tiki\_related\_categories'': Out of resources when opening file '/var/tmp/#sql_882_0.MAI' (Errcode: 24)
(23)
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show table status like 'tiki\_rss\_modules'': Can't read dir of './tikiwiki/' (errno: 24) (1018)
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show create table `tiki_rss_modules`': Can't open file: './tikiwiki/tiki_rss_modules.frm' (errno: 24) (1016)
Sep 28 01:00:15 fileserver esmith::event[14688]: \nThere was an error trying to dump database tikiwiki, please check for table errors in this db. Forcing a backup of the corrupted DB. \nFailed to force
backup of corrupted db tikiwiki as tikiwiki-failed.dump
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces
Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces
Sep 28 01:00:15 fileserver esmith::event[14688]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1695834001 584707|End|1695834015 896282|Elapsed|14.311575
Also
journalctl -xe
# journalctl -xe
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat
--
-- A new session with the ID 1046 has been created for the user pmulroney.
--
-- The leading process of the session is 20062.
Oct 01 06:38:59 fileserver.logicaldevelopments.com.au systemd[1]: Started Session 1046 of user pmulroney.
-- Subject: Unit session-1046.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-1046.scope has finished starting up.
--
-- The start-up result is done.
Oct 01 06:38:59 fileserver.logicaldevelopments.com.au sshd[20062]: pam_unix(sshd:session): session opened for user pmulroney by (uid=0)
Oct 01 06:38:59 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: new low-speed USB device number 51 using ehci-pci
Oct 01 06:38:59 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: device descriptor read/64, error -32
Oct 01 06:39:00 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: device descriptor read/64, error -32
Oct 01 06:39:00 fileserver.logicaldevelopments.com.au kernel: usb 1-1-port6: attempt power cycle
There are lots of the last 4 lines repeated over and over. I have a KVM that connects to three servers, and I get these errors a lot so I think it's background noise.
Get a list of installed contribs & template mods too.
# /sbin/e-smith/audittools/newrpms
Loaded plugins: fastestmirror, post-transaction-actions, priorities, smeserver
Loading mirror speeds from cached hostfile
* base: mirror.internode.on.net
* smeaddons: smeserver.de-labrusse.fr
* smeos: smeserver.de-labrusse.fr
* smeupdates: smeserver.de-labrusse.fr
* updates: mirror.internode.on.net
Extra Packages
GeoIP.x86_64 1.6.12-9.el7.sme @smecontribs
GeoIP-GeoLite-data.noarch 2018.06-7.el7.sme @smecontribs
GeoIP-GeoLite-data-extra.noarch 2018.06-7.el7.sme @smecontribs
php-php-gettext.noarch 1.0.12-1.el7 @smecontribs
php-tcpdf.noarch 6.2.26-1.el7 @smecontribs
php-tcpdf-dejavu-sans-fonts.noarch 6.2.26-1.el7 @smecontribs
phpMyAdmin.noarch 5.1.0-1.el7.sme @smecontribs
smeserver-git.noarch 1.2.0-11.el7.sme @smecontribs
smeserver-gitweb.noarch 1.1.0-14.el7.sme @smetest
smeserver-gitweb-theme.noarch 1.1.0-1.el7.sme @smetest
smeserver-mod_dav.noarch 1.1-8.el7.sme @smecontribs
smeserver-nfs.noarch 1.2.0-21.el7.sme @smecontribs
smeserver-phpmyadmin.noarch 4.0.10.2-13.el7.sme @smecontribs
smeserver-userpanel.noarch 1.4-6.el7.sme @smeupdates
smeserver-webhosting.noarch 0.0.9-16.el7.sme @smecontribs
smeserver-wordpress.noarch 1.2-12.el7.sme @smecontribs
smeserver-wsdd.noarch 0.2-5.el7.sme @smecontribs
wsdd.noarch 0.7.0-1.el7 @smecontribs
# /sbin/e-smith/audittools/templates
/etc/e-smith/templates-custom/etc/dar/DailyBackup.dcf/45prune: MANUALLY_ADDED, OVERRIDE
It may require a db restore from backup.
Agreed. The tikiwiki database is a leftover from many years ago - we moved to dokuwiki for our intranet. Therefore this database could just be removed. I reckon I can restore horde-alarms from another SME server since I'm fairly sure it's just an empty database.
-
you need to take action soon as a corrupt db prevent the whole system to backup.
you can either use the GUI with phpmyadmin or the cli using mysqlcheck.
first check what table is corrupt then use repair.
if repair fails restore this table from backup. this might mean opening the horde db dump and copy paste only the relevant part.
for the cause: check your hard drive is not failing and remember if you has any power outage.
I ran mysqlcheck --all-databases yesterday and there were no errors. Today I ran the same and a bunch of database tables are being reported as corrupt. Very weird!
How can I tell if the hard drive is failing?
[root@fileserver ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[2] sda2[0]
1952869376 blocks super 1.2 [2/2] [UU]
bitmap: 2/15 pages [8KB], 65536KB chunk
md0 : active raid1 sdb1[2] sda1[0]
510976 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
[root@fileserver ~]# smartctl -a /dev/sda
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 197 197 021 Pre-fail Always - 3116
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 20
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19488
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 14
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 43
194 Temperature_Celsius 0x0022 118 107 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
-
fwiw the smart values for sda look ok to me, what about the smart values for sdb?
That said, given the raid status is still ok, I would be leaning more towards an earlier file/db corruption that is being faithfully replicated on your healthy raid system.
-
First thing is lose the tikiwiki db.
Then try a repair on Horde.
If that fails restore that Horde db from a backup.
Can't see any particular disk errors.
-
might worth smartctl -t short /dev/sda and sdb
raid is able to maintain sync between two drive. However if initial write/read is bad it will replicate the same on second drive.
as pointed get ride of db you do not need. and try to repair the horde alarm table.
also could simply be open file limit reached
https://cloudblue.freshdesk.com/support/solutions/articles/44001885703-mysql-errors-appear-on-websites-can-t-open-file-table001-frm-errno-24-
-
might worth smartctl -t short /dev/sda and sdb
raid is able to maintain sync between two drive. However if initial write/read is bad it will replicate the same on second drive.
as pointed get ride of db you do not need. and try to repair the horde alarm table.
also could simply be open file limit reached
https://cloudblue.freshdesk.com/support/solutions/articles/44001885703-mysql-errors-appear-on-websites-can-t-open-file-table001-frm-errno-24-
I think the MySQL/MariaDB open file limit is the issue. I did the following:
[root@fileserver ~]# mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 53
Server version: 5.5.68-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| open_files_limit | 962 |
+------------------+-------+
1 row in set (0.00 sec)
Hmmm ... very low! I then figured out that this is a config database setting:
[root@fileserver etc]# config getprop mariadb OpenFilesLimit
[root@fileserver etc]# config setprop mariadb OpenFilesLimit 2048
[root@fileserver etc]# expand-template /etc/my.cnf
and verified that it placed this in the file:
open_files_limit=2048
Then ran the following:
[root@fileserver etc]# systemctl stop mariadb
[root@fileserver etc]# systemctl start mariadb
Then verified the open_files_limit setting has changed to 2048. I then ran the following:
[root@fileserver etc]# mysqlcheck --all-databases
And all databases this time came up clean.
I thought it might be a drive failure, so I started the process of swapping drives. I purchased 2 x 2Tb drives, and swapped out one of the current drives and replaced it with the new one and it started rebuilding the raid array. Now, I'm wondering if I really need to do this?
In the recent past we added a couple of new MySQL databases to the server, but I didn't think that it would cause such a weird issue like this - would it?
Regards,
Paul.
-
Interesting as if you do this using mysql105
[root@myhome ~]# mysql105
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 1028
Server version: 10.5.9-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| open_files_limit | 32184 |
+------------------+-------+
1 row in set (0.031 sec)
-
Interesting.
An old v9 server gives:
| open_files_limit | 1024 |
v10 maria (upgraded from v9)
| open_files_limit | 962 |
v10 maria 105
| open_files_limit | 32186 |
I know JP can't answer right now as he is off grid for a little while, but this may need some investigation?
-
bug this so we have 32186 or 10000 as default to avoid backup issue
-
https://bugs.koozali.org/show_bug.cgi?id=12417
-
Hi Everyone,
Thank you for all your help! I was starting to think I was going crazy!
Regards,
Paul.
-
Hi Paul
Can you drop some feedback in the bug as to the result of setting the files open number to the higher limit, will go towards verification as reproducing the error for test purposes needs the right scenario..
Thanks mate