Koozali.org: home of the SME Server
Contribs.org Forums => Koozali SME Server 10.x => Topic started by: stavi on December 29, 2022, 04:39:36 PM
-
Hello,
It seems that my network card is not working 100%. Sometimes the incoming internet connection is interrupted.
In the sme messages.log it says:
Dec 29 08:41:52 jarvis kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <cc> TDT <e4> next_to_use <e4> next_to_clean <c9> buffer_info[next_to_clean]: time_stamp <1527b89ca> next_to_watch <cc> jiffies <1527b989c> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
Dec 29 08:41:54 jarvis kernel: [1384066.200720] TDH <cc>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] TDT <e4>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] next_to_use <e4>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] next_to_clean <c9>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] buffer_info[next_to_clean]:
Dec 29 08:41:54 jarvis kernel: [1384066.200720] time_stamp <1527b89ca>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] next_to_watch <cc>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] jiffies <1527ba06c>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] next_to_watch.status <0>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] MAC Status <80083>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] PHY Status <796d>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] PHY 1000BASE-T Status <3800>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] PHY Extended Status <3000>
Dec 29 08:41:54 jarvis kernel: [1384066.200720] PCI Status <10>
Dec 29 08:41:54 jarvis kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <cc> TDT <e4> next_to_use <e4> next_to_clean <c9> buffer_info[next_to_clean]: time_stamp <1527b89ca> next_to_watch <cc> jiffies <1527ba06c> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10>
Dec 29 08:41:55 jarvis kernel: [1384067.212222] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
Dec 29 08:41:55 jarvis kernel: e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
Dec 29 08:41:59 jarvis kernel: [1384070.441241] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 29 08:41:59 jarvis kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 29 08:41:59 jarvis kernel: [1384071.266124] e1000e: eno1 NIC Link is Down
Dec 29 08:41:59 jarvis kernel: e1000e: eno1 NIC Link is Down
Dec 29 08:42:02 jarvis kernel: [1384074.273124] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 29 08:42:02 jarvis kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
I think that the driver of the network card is not good. Base driver, installed by SME. How can I update it?
As far as I've come:
WAN:
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet ------------ netmask 255.255.255.128 broadcast --.---.--.---
ether 34:17:eb:a1:53:9c txqueuelen 1000 (Ethernet)
RX packets 131794932 bytes 111342822731 (103.6 GiB)
RX errors 0 dropped 4276 overruns 0 frame 0
TX packets 210651591 bytes 256347429783 (238.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xf7d00000-f7d20000
lot of drop
LAN:
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.8.1 netmask 255.255.255.0 broadcast 192.168.8.255
ether e8:de:27:01:4c:63 txqueuelen 1000 (Ethernet)
RX packets 213078452 bytes 256094133341 (238.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 128300984 bytes 110600305563 (103.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@jarvis ~]# lspci | egrep -i --color 'network|ethernet'
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
[root@jarvis ~]# lshw -class network -short
H/W path Device Class Description
=======================================================
/0/100/19 eno1 network Ethernet Connection I217-LM
/0/100/1c.4/0 enp2s0 network RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
/1 dummy0 network Ethernet interface
[root@jarvis ~]# ethtool -i eno1
driver: e1000e
version: 3.2.6-k
firmware-version: 0.13-4
expansion-rom-version:
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
It's part of a home network, just a "sandbox".
Server type: dell optiplex 9020 https://www.dell.com/support/home/en-uk/product-support/product/optiplex-9020-desktop/drivers
WAN card:
https://ark.intel.com/content/www/us/en/ark/products/60019/intel-ethernet-connection-i217lm.html
driver from intel:
https://www.intel.com/content/www/us/en/products/sku/60019/intel-ethernet-connection-i217lm/downloads.html
https://www.intel.com/content/www/us/en/download/15084/intel-ethernet-adapter-complete-driver-pack.html
I would like help to move forward. Is this good for me?
How can I install it?
Thank you for your answers in advance.
Hi Stavi
-
Check the RH/CentOS 7 hardware compatibility list.
It isn't recommended to use other drivers.
-
So not that unusual..other threads tossed up by Mr Google
https://access.redhat.com/discussions/5928261
-
You could try a more recent driver from here http://elrepo.reloumirrors.net/elrepo/el7/x86_64/RPMS/
http://elrepo.reloumirrors.net/elrepo/el7/x86_64/RPMS/kmod-e1000e-3.8.7-1.el7_9.elrepo.x86_64.rpm
Doesn't seem to be your case but realtek is also infamous for loading buggy r8169 driver, correct one is also there:
http://elrepo.reloumirrors.net/elrepo/el7/x86_64/RPMS/kmod-r8168-8.051.02-1.el7_9.elrepo.x86_64.rpm
-
Happy New Year to Everyone!
Thanks for the help.
Strange error, previously it had Win 2012 Srv, it did not produce such an error.
I will try this driver replacement. I've never done that before. What to watch out for? If it still doesn't work, is it possible to restore the previous state?
hello Stavi
-
Strange error, previously it had Win 2012 Srv, it did not produce such an error.
But this is not Windows..... Quite possibly had some hacky Windows driver.
If it still doesn't work, is it possible to restore the previous state?
Yup. Have a read on installing, uninstalling & downgrading rpms.
Also you can buy decent compatible second hand cards for pennies.
-
wget http://elrepo.reloumirrors.net/elrepo/el7/x86_64/RPMS/kmod-e1000e-3.8.7-1.el7_9.elrepo.x86_64.rpm
yum localinstall kmod-e1000e-3.8.7-1.el7_9.elrepo.x86_64.rpm
signal-event reboot
Revert driver https://www.tecmint.com/view-yum-history-to-find-packages-info/
yum history
yum history undo top_ID_from_above
or simply
yum erase kmod-e1000e-3.8.7-1.el7_9.elrepo.x86_64.rpm
and
signal-event reboot
-
Hi,
Thank you very much for all your help.
It seems to have worked.
[root@jarvis cucc]# sudo ethtool -i eno1
driver: e1000e
version: 3.2.6-k
firmware-version: 0.13-4
I wonder if there will be any more errors.
I'm new to Linux, but I have to start somewhere :)
-
Actually not, version same as before. Did you restart?
-
ohh, wrong clipboard :)
[root@jarvis ~]# sudo ethtool -i eno1
driver: e1000e
version: 3.8.7-NAPI
-
Damn it.
Unfortunately the error persists :(
Jan 4 10:58:37 jarvis kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <96> TDT <bf> next_to_use <bf> next_to_clean <94> buffer_info[next_to_clean]: time_stamp <1054ffcff> next_to_watch <96> jiffies <10550128c> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10>
Jan 4 10:58:39 jarvis kernel: [89436.537790] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
Jan 4 10:58:39 jarvis kernel: [89436.537790] TDH <96>
Jan 4 10:58:39 jarvis kernel: [89436.537790] TDT <bf>
Jan 4 10:58:39 jarvis kernel: [89436.537790] next_to_use <bf>
Jan 4 10:58:39 jarvis kernel: [89436.537790] next_to_clean <94>
Jan 4 10:58:39 jarvis kernel: [89436.537790] buffer_info[next_to_clean]:
Jan 4 10:58:39 jarvis kernel: [89436.537790] time_stamp <1054ffcff>
Jan 4 10:58:39 jarvis kernel: [89436.537790] next_to_watch <96>
Jan 4 10:58:39 jarvis kernel: [89436.537790] jiffies <105501a5c>
Jan 4 10:58:39 jarvis kernel: [89436.537790] next_to_watch.status <0>
Jan 4 10:58:39 jarvis kernel: [89436.537790] MAC Status <80083>
Jan 4 10:58:39 jarvis kernel: [89436.537790] PHY Status <796d>
Jan 4 10:58:39 jarvis kernel: [89436.537790] PHY 1000BASE-T Status <3800>
Jan 4 10:58:39 jarvis kernel: [89436.537790] PHY Extended Status <3000>
Jan 4 10:58:39 jarvis kernel: [89436.537790] PCI Status <10>
Jan 4 10:58:39 jarvis kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <96> TDT <bf> next_to_use <bf> next_to_clean <94> buffer_info[next_to_clean]: time_stamp <1054ffcff> next_to_watch <96> jiffies <105501a5c> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10>
Jan 4 10:58:40 jarvis kernel: [89437.549525] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
Jan 4 10:58:40 jarvis kernel: e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
Jan 4 10:58:43 jarvis kernel: [89440.546745] e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Jan 4 10:58:43 jarvis kernel: e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
I will try it with another NIC.
-
Damn it.
Unfortunately the error persists :(
I will try it with another NIC.
:-(
Yes, save yourself the hassle.....
-
Holy thread resurrection batman!
I'm fighting this issue on a Dell Optiplex 9020 with its embedded NIC.
After much research & digging the general consensus is:
. use a diff NIC
. use the latest driver, done!
. turn off NIC offloading (https://forum.proxmox.com/threads/intel-nic-e1000e-hardware-unit-hang.106001/)
It's a long way to go to get to the server so replacing the NIC is my last resort.
This leaves me with the offloading option. For now I've implemented the offloading commands manually & it's looking promising so I'm looking to see how I can append/add the offloading command as part of the server startup; I would prefer to have something that attaches or is linked to the ifup command so it would be invoked if the ifup process is run manually.
The command I need to run is: ethtool -K eno1 gso off gro off tso off tx off rx off tso off
Of course if there's a way to make these settings permanent then that would be optimal but as far as I can see it's only possible if using nmclient. NetworkManager is available in the updates repo if this is the best way to do this.
Suggestions/recommendations on where/how best to implement this.
Cheers all
-
It looks to me like ethtool options are supported in the config database.
{
my $this_device;
$this_device = \%InternalInterface if $is_internal;
$this_device = \%ExternalInterface if $is_external;
return unless $this_device;
my $ethtool_options = $this_device->{EthtoolOpts};
return unless $ethtool_options;
return "ETHTOOL_OPTS=\"$ethtool_options\"";
}
Combining that with this post from serverfault:
https://serverfault.com/questions/463111/how-to-persist-ethtool-settings-through-reboot
These commands insert the indicated values into ifcfg-xxxx, but I don't have an offloading NIC to test with:
config setprop ExternalInterface EthtoolOpts "-K $(config getprop ExternalInterface Name) gso off gro off tso off tx off rx off tso off"
signal-event console-save
config setprop InternalInterface EthtoolOpts "-K $(config getprop InternalInterface Name) gso off gro off tso off tx off rx off tso off"
signal-event console-save
Note (for others who might end up here) -
I assume that "ETHTOOL_OPTS" would not do anything unless ethtool has been installed with yum install ethtool...
-
Awesome, thanks.
I didn't expect ethtool to already be supported
-
Courtesy of mmcarn I've now made this setting permanent in the SME db.
BTW - for other with this issue the latest/newest version of the e1000e driver that's readily available is in the elrepo repository.