Koozali.org: home of the SME Server

Contribs.org Forums => Koozali SME Server 10.x => Topic started by: pmulroney on September 30, 2023, 10:58:08 AM

Title: horde-alarms error
Post by: pmulroney on September 30, 2023, 10:58:08 AM: Hi Everyone,
A few days ago, I started getting this error every 5 minutes on my server, sent as an email with the subject: "Cron <root@fileserver> /usr/bin/php /usr/bin/horde-alarms"

Code: [Select]
Fatal Error: Server error when querying database. In /usr/share/pear/Horde/Alarm/Sql.php on line 91 1. Horde_Alarm->notify() /usr/bin/horde-alarms:26 2. Horde_Alarm->listAlarms() /usr/share/pear/Horde/Alarm.php:431 3. Horde_Alarm_Sql->_list() /usr/share/pear/Horde/Alarm.php:116
I think the hard drives are OK, no errors, only 37% full.

There was another error just before the first of these that says there was an error trying to do a database dump:

Code: [Select]
\nThere was an error trying to dump database tikiwiki, please check for table errors in this db. Forcing a backup of the corrupted DB. \nFailed to force backup of corrupted db tikiwiki as tikiwiki-failed.dump
Today I logged into the server via SSH and while I was looking at the various running processes, this message appeared:

Code: [Select]
Broadcast message from systemd-journald@fileserver.logicaldevelopments.com.au (Sat 2023-09-30 16:40:01 AWST): HORDE[17076]: [horde] QUERY FAILED: Can't open file: './horde/horde_alarms.frm' (errno: 24) DELETE FROM horde_alarms WHERE alarm_end IS NOT NULL AND alarm_end < '2023-09-30 08:40:01' [pid 17076 on line 324 of "/usr/share/pear/Horde/Db/Adapter/Mysqli.php"]
For now, I've basically commented out the horde-alarms code so that it runs but does nothing, but I fear that that I'm just masking a deeper issue.

Has Maria DB got some kind of corruption? How do I recover/rebuild it?
Title: Re: horde-alarms error
Post by: ReetP on September 30, 2023, 11:58:38 AM: Check the db logs in /var/log/mariadb

Check /var/log/messages for occurred prior to.

Also

Code: [Select]
journalctl -xe
Need to try and find the source of the error.

Repair likely depends on whats broken.

I can see a comment regarding tikiwiki but not sure how it affects horde?

It may require a db restore from backup.

Have a read on my notes on logs & looking for errors.

https://forums.koozali.org/index.php/topic,54724.0.html

https://wiki.koozali.org/How_to_report_a_problem

Get a list of installed contribs & template mods too.
Title: Re: horde-alarms error
Post by: Jean-Philippe Pialasse on September 30, 2023, 02:20:42 PM: you need to take action soon as a corrupt db prevent the whole system to backup.

you can either use the GUI with phpmyadmin or the cli using mysqlcheck.
first check what table is corrupt then use repair.
if repair fails restore this table from backup. this might mean opening the horde db dump and copy paste only the relevant part.

for the cause: check your hard drive is not failing and remember if you has any power outage.
Title: Re: horde-alarms error
Post by: ReetP on September 30, 2023, 02:45:54 PM: I knew JP would suggest something sensible!! :lol:
Title: Re: horde-alarms error
Post by: pmulroney on October 01, 2023, 01:09:24 AM: Quote from: ReetP on September 30, 2023, 11:58:38 AM
Check the db logs in /var/log/mariadb

Checking the db logs in /var/log/mariadb, I get lots of these types of errors on the tikiwiki database:

Code: [Select]
230930 1:00:47 [ERROR] mysqld: Can't open file: './tikiwiki/tiki_rss_modules.frm' (errno: 24) 230930 1:00:47 [ERROR] mysqld: Can't open file: './tikiwiki/tiki_search_stats.frm' (errno: 24)
... and a bit later on the same with horde, although it's always the same table:

Code: [Select]
230930 1:25:01 [ERROR] mysqld: Can't open file: './horde/horde_alarms.frm' (errno: 24) 230930 1:30:02 [ERROR] mysqld: Can't open file: './horde/horde_alarms.frm' (errno: 24)
If you go back further in time, there's a log from a month ago with these entries:

Code: [Select]
230825 15:59:16 InnoDB: The InnoDB memory heap is disabled 230825 15:59:16 InnoDB: Mutexes and rw_locks use GCC atomic builtins 230825 15:59:16 InnoDB: Compressed tables use zlib 1.2.7 230825 15:59:16 InnoDB: Using Linux native AIO 230825 15:59:16 InnoDB: Initializing buffer pool, size = 128.0M 230825 15:59:16 InnoDB: Completed initialization of buffer pool 230825 15:59:16 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 230825 15:59:33 InnoDB: Waiting for the background threads to start 230825 15:59:34 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 508399285 230825 15:59:35 [Note] Plugin 'FEEDBACK' is disabled. 230825 15:59:40 InnoDB: Starting shutdown... 230825 15:59:43 InnoDB: Shutdown completed; log sequence number 508399285 230825 15:59:43 InnoDB: The InnoDB memory heap is disabled 230825 15:59:43 InnoDB: Mutexes and rw_locks use GCC atomic builtins 230825 15:59:43 InnoDB: Compressed tables use zlib 1.2.7 230825 15:59:43 InnoDB: Using Linux native AIO 230825 15:59:43 InnoDB: Initializing buffer pool, size = 128.0M 230825 15:59:43 InnoDB: Completed initialization of buffer pool 230825 15:59:43 InnoDB: highest supported file format is Barracuda. 230825 15:59:43 InnoDB: Waiting for the background threads to start 230825 15:59:44 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 508399285 230825 15:59:44 [Note] Plugin 'FEEDBACK' is disabled. 230825 15:59:44 [Note] Server socket created on IP: '0.0.0.0'. 230825 15:59:45 [Note] Event Scheduler: Loaded 0 events 230825 15:59:45 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.5.68-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
However I'm not sure if that's a problem or not because the recurring error only happened a few days ago, not a month ago.

There was a power failure about a month ago, long enough for the UPS to fail as well. No issues until recently.

Quote from: ReetP on September 30, 2023, 11:58:38 AM
Check /var/log/messages for occurred prior to.

/var/log/messages contains these messages:
Code: [Select]
Sep 25 01:29:08 fileserver smartd[799]: Device: /dev/sdb [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200 Sep 25 03:29:08 fileserver smartd[799]: Device: /dev/sda [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200 Sep 25 18:29:08 fileserver smartd[799]: Device: /dev/sdb [SAT], SMART Usage Attribute: 9 Power_On_Hours changed from 87 to 86 Oct 1 01:20:49 fileserver smartd[801]: Device: /dev/sda [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100 Oct 1 01:20:49 fileserver smartd[801]: Device: /dev/sdb [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100
Is that bad?

Then there's entries like this which point to a corrupt database:
Code: [Select]
Sep 28 01:00:01 fileserver esmith::event[14688]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'tiki\_referer\_stats'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_activities'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_activity\_roles'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_instance\_activities'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:14 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'galaxia\_instance\_comments'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show fields from `tiki_related_categories`': Out of resources when opening file '/var/tmp/#sql_882_0.MAI' (Errcode: 24) (23) Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'SHOW TRIGGERS LIKE 'tiki\_related\_categories'': Out of resources when opening file '/var/tmp/#sql_882_0.MAI' (Errcode: 24) (23) Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show table status like 'tiki\_rss\_modules'': Can't read dir of './tikiwiki/' (errno: 24) (1018) Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Couldn't execute 'show create table `tiki_rss_modules`': Can't open file: './tikiwiki/tiki_rss_modules.frm' (errno: 24) (1016) Sep 28 01:00:15 fileserver esmith::event[14688]: \nThere was an error trying to dump database tikiwiki, please check for table errors in this db. Forcing a backup of the corrupted DB. \nFailed to force backup of corrupted db tikiwiki as tikiwiki-failed.dump Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces Sep 28 01:00:15 fileserver esmith::event[14688]: mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_882_2.MAI' (Errcode: 24)' when trying to dump tablespaces Sep 28 01:00:15 fileserver esmith::event[14688]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1695834001 584707|End|1695834015 896282|Elapsed|14.311575
Quote from: ReetP on September 30, 2023, 11:58:38 AM
Also
Code: [Select]
journalctl -xe

Code: [Select]
# journalctl -xe -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat -- -- A new session with the ID 1046 has been created for the user pmulroney. -- -- The leading process of the session is 20062. Oct 01 06:38:59 fileserver.logicaldevelopments.com.au systemd[1]: Started Session 1046 of user pmulroney. -- Subject: Unit session-1046.scope has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit session-1046.scope has finished starting up. -- -- The start-up result is done. Oct 01 06:38:59 fileserver.logicaldevelopments.com.au sshd[20062]: pam_unix(sshd:session): session opened for user pmulroney by (uid=0) Oct 01 06:38:59 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: new low-speed USB device number 51 using ehci-pci Oct 01 06:38:59 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: device descriptor read/64, error -32 Oct 01 06:39:00 fileserver.logicaldevelopments.com.au kernel: usb 1-1.6: device descriptor read/64, error -32 Oct 01 06:39:00 fileserver.logicaldevelopments.com.au kernel: usb 1-1-port6: attempt power cycle
There are lots of the last 4 lines repeated over and over. I have a KVM that connects to three servers, and I get these errors a lot so I think it's background noise.

Quote from: ReetP on September 30, 2023, 11:58:38 AM
Get a list of installed contribs & template mods too.

Code: [Select]
# /sbin/e-smith/audittools/newrpms Loaded plugins: fastestmirror, post-transaction-actions, priorities, smeserver Loading mirror speeds from cached hostfile * base: mirror.internode.on.net * smeaddons: smeserver.de-labrusse.fr * smeos: smeserver.de-labrusse.fr * smeupdates: smeserver.de-labrusse.fr * updates: mirror.internode.on.net Extra Packages GeoIP.x86_64 1.6.12-9.el7.sme @smecontribs GeoIP-GeoLite-data.noarch 2018.06-7.el7.sme @smecontribs GeoIP-GeoLite-data-extra.noarch 2018.06-7.el7.sme @smecontribs php-php-gettext.noarch 1.0.12-1.el7 @smecontribs php-tcpdf.noarch 6.2.26-1.el7 @smecontribs php-tcpdf-dejavu-sans-fonts.noarch 6.2.26-1.el7 @smecontribs phpMyAdmin.noarch 5.1.0-1.el7.sme @smecontribs smeserver-git.noarch 1.2.0-11.el7.sme @smecontribs smeserver-gitweb.noarch 1.1.0-14.el7.sme @smetest smeserver-gitweb-theme.noarch 1.1.0-1.el7.sme @smetest smeserver-mod_dav.noarch 1.1-8.el7.sme @smecontribs smeserver-nfs.noarch 1.2.0-21.el7.sme @smecontribs smeserver-phpmyadmin.noarch 4.0.10.2-13.el7.sme @smecontribs smeserver-userpanel.noarch 1.4-6.el7.sme @smeupdates smeserver-webhosting.noarch 0.0.9-16.el7.sme @smecontribs smeserver-wordpress.noarch 1.2-12.el7.sme @smecontribs smeserver-wsdd.noarch 0.2-5.el7.sme @smecontribs wsdd.noarch 0.7.0-1.el7 @smecontribs # /sbin/e-smith/audittools/templates /etc/e-smith/templates-custom/etc/dar/DailyBackup.dcf/45prune: MANUALLY_ADDED, OVERRIDE
Quote from: ReetP on September 30, 2023, 11:58:38 AM
It may require a db restore from backup.
Agreed. The tikiwiki database is a leftover from many years ago - we moved to dokuwiki for our intranet. Therefore this database could just be removed. I reckon I can restore horde-alarms from another SME server since I'm fairly sure it's just an empty database.
Title: Re: horde-alarms error
Post by: pmulroney on October 01, 2023, 01:29:23 AM: Quote from: Jean-Philippe Pialasse on September 30, 2023, 02:20:42 PM
you need to take action soon as a corrupt db prevent the whole system to backup.

you can either use the GUI with phpmyadmin or the cli using mysqlcheck.
first check what table is corrupt then use repair.
if repair fails restore this table from backup. this might mean opening the horde db dump and copy paste only the relevant part.

for the cause: check your hard drive is not failing and remember if you has any power outage.

I ran mysqlcheck --all-databases yesterday and there were no errors. Today I ran the same and a bunch of database tables are being reported as corrupt. Very weird!

How can I tell if the hard drive is failing?

Code: [Select]
[root@fileserver ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[2] sda2[0] 1952869376 blocks super 1.2 [2/2] [UU] bitmap: 2/15 pages [8KB], 65536KB chunk md0 : active raid1 sdb1[2] sda1[0] 510976 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Code: [Select]
[root@fileserver ~]# smartctl -a /dev/sda SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 197 197 021 Pre-fail Always - 3116 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 20 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19488 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 14 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 43 194 Temperature_Celsius 0x0022 118 107 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged
Title: Re: horde-alarms error
Post by: sages on October 01, 2023, 01:52:20 AM: fwiw the smart values for sda look ok to me, what about the smart values for sdb?
That said, given the raid status is still ok, I would be leaning more towards an earlier file/db corruption that is being faithfully replicated on your healthy raid system.
Title: Re: horde-alarms error
Post by: ReetP on October 01, 2023, 10:58:57 AM: First thing is lose the tikiwiki db.

Then try a repair on Horde.

If that fails restore that Horde db from a backup.

Can't see any particular disk errors.
Title: Re: horde-alarms error
Post by: Jean-Philippe Pialasse on October 01, 2023, 02:06:17 PM: might worth smartctl -t short /dev/sda and sdb

raid is able to maintain sync between two drive. However if initial write/read is bad it will replicate the same on second drive.

as pointed get ride of db you do not need. and try to repair the horde alarm table.

also could simply be open file limit reached
https://cloudblue.freshdesk.com/support/solutions/articles/44001885703-mysql-errors-appear-on-websites-can-t-open-file-table001-frm-errno-24-
Title: Re: horde-alarms error
Post by: pmulroney on October 02, 2023, 02:01:26 AM: Quote from: Jean-Philippe Pialasse on October 01, 2023, 02:06:17 PM
might worth smartctl -t short /dev/sda and sdb

raid is able to maintain sync between two drive. However if initial write/read is bad it will replicate the same on second drive.

as pointed get ride of db you do not need. and try to repair the horde alarm table.

also could simply be open file limit reached
https://cloudblue.freshdesk.com/support/solutions/articles/44001885703-mysql-errors-appear-on-websites-can-t-open-file-table001-frm-errno-24-

I think the MySQL/MariaDB open file limit is the issue. I did the following:

Code: [Select]
[root@fileserver ~]# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 53 Server version: 5.5.68-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | open_files_limit | 962 | +------------------+-------+ 1 row in set (0.00 sec)
Hmmm ... very low! I then figured out that this is a config database setting:

Code: [Select]
[root@fileserver etc]# config getprop mariadb OpenFilesLimit [root@fileserver etc]# config setprop mariadb OpenFilesLimit 2048 [root@fileserver etc]# expand-template /etc/my.cnf
and verified that it placed this in the file:

Code: [Select]
open_files_limit=2048
Then ran the following:

Code: [Select]
[root@fileserver etc]# systemctl stop mariadb [root@fileserver etc]# systemctl start mariadb
Then verified the open_files_limit setting has changed to 2048. I then ran the following:

Code: [Select]
[root@fileserver etc]# mysqlcheck --all-databases
And all databases this time came up clean.

I thought it might be a drive failure, so I started the process of swapping drives. I purchased 2 x 2Tb drives, and swapped out one of the current drives and replaced it with the new one and it started rebuilding the raid array. Now, I'm wondering if I really need to do this?

In the recent past we added a couple of new MySQL databases to the server, but I didn't think that it would cause such a weird issue like this - would it?

Regards,
Paul.
Title: Re: horde-alarms error
Post by: TerryF on October 02, 2023, 10:57:45 AM: Interesting as if you do this using mysql105
[root@myhome ~]# mysql105
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 1028
Server version: 10.5.9-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| open_files_limit | 32184 |
+------------------+-------+
1 row in set (0.031 sec)
Title: Re: horde-alarms error
Post by: ReetP on October 02, 2023, 12:41:44 PM: Interesting.

An old v9 server gives:

Code: [Select]
| open_files_limit | 1024 |
v10 maria (upgraded from v9)

Code: [Select]
| open_files_limit | 962 |
v10 maria 105

Code: [Select]
| open_files_limit | 32186 |

I know JP can't answer right now as he is off grid for a little while, but this may need some investigation?
Title: Re: horde-alarms error
Post by: Jean-Philippe Pialasse on October 02, 2023, 04:54:38 PM: bug this so we have 32186 or 10000 as default to avoid backup issue
Title: Re: horde-alarms error
Post by: ReetP on October 02, 2023, 06:44:19 PM: https://bugs.koozali.org/show_bug.cgi?id=12417
Title: Re: horde-alarms error
Post by: pmulroney on October 02, 2023, 11:32:52 PM: Hi Everyone,

Thank you for all your help! I was starting to think I was going crazy!

Regards,
Paul.
Title: Re: horde-alarms error
Post by: TerryF on October 03, 2023, 04:06:15 AM: Hi Paul

Can you drop some feedback in the bug as to the result of setting the files open number to the higher limit, will go towards verification as reproducing the error for test purposes needs the right scenario..

Thanks mate