La Fibre
Télécom => Logiciels et systèmes d'exploitation => Linux (usage serveur) => Discussion démarrée par: renaud07 le 13 septembre 2019 à 01:37:46
-
Bonsoir,
Je rencontre un problème mystérieux sur ma VM debian qui héberge wireguard et asterisk : après un redémarrage, les services mettent bien 3 à 4 minutes avant de se relancer.
Mais, et c'est là que je n'ai jamais vu ça, si j'ai une interaction avec la machine directement (genre je commence à taper un login) les services démarrent instantanèment. Je le vois à la connexion SSH, tant que je n’interagis pas avec le prompt, j'ai une connexion refused et dès que je tape ne serait-ce qu'une lettre, je peux enfin me connecter.
Et si je ne touche rien il faut que j'attende 3-4 minutes. Je l'ai vérifié des dizaines de fois.
Dans les logs, on voit que les services essaient de démarrer, timeout, puis ça fini par y arriver au bout de 3 minutes (d'ailleurs la ligne startup finished l'indique bien) :
Sep 12 21:58:52 astwg dbus-daemon[296]: [system] Successfully activated service 'org.freedesktop.systemd1'
Sep 12 21:58:52 astwg systemd[1]: Started Login Service.
Sep 12 21:58:52 astwg systemd[1]: Started System Logging Service.
Sep 12 21:58:53 astwg systemd-udevd[208]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 12 21:58:53 astwg systemd-udevd[208]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 12 21:58:53 astwg systemd-udevd[208]: Could not generate persistent MAC address for vxlan10: No such file or directory
Sep 12 21:58:53 astwg sh[236]: Waiting for DAD... Done
Sep 12 21:58:53 astwg sh[236]: ens18=ens18
Sep 12 21:58:53 astwg systemd[1]: Started Raise network interfaces.
Sep 12 21:58:53 astwg systemd[1]: Reached target Network.
Sep 12 21:58:53 astwg systemd[1]: Starting OpenBSD Secure Shell server...
Sep 12 21:58:53 astwg systemd[1]: Starting Permit User Sessions...
Sep 12 21:58:53 astwg systemd[1]: Reached target Network is Online.
Sep 12 21:58:53 astwg systemd[1]: Starting Asterisk PBX...
Sep 12 21:58:53 astwg systemd[1]: Starting LSB: exim Mail Transport Agent...
Sep 12 21:58:53 astwg systemd[1]: Started Permit User Sessions.
Sep 12 21:58:53 astwg systemd[1]: Started Getty on tty1.
Sep 12 21:58:53 astwg systemd[1]: Reached target Login Prompts.
Sep 12 21:58:56 astwg exim4[465]: Starting MTA: exim4.
Sep 12 21:58:56 astwg systemd[1]: Started LSB: exim Mail Transport Agent.
Sep 12 21:59:05 astwg asterisk[464]: #033[0mPBX UUID: 1047cf1c-a505-4e69-b6b9-7b6e4a66a3cb
Sep 12 21:59:05 astwg asterisk[464]: [Sep 12 21:59:05] #033[1;33mNOTICE#033[0m[464]: #033[1;37mloader.c#033[0m:#033[1;37m2230#033[0m #033[1;37mload_modules#033[0m: 330 modules will be loaded.
Sep 12 21:59:08 astwg asterisk[464]: [Sep 12 21:59:08] #033[1;33mNOTICE#033[0m[464]: #033[1;37mres_config_ldap.c#033[0m:#033[1;37m1830#033[0m #033[1;37mparse_config#033[0m: No directory user found, anonymous binding as default.
Sep 12 21:59:08 astwg asterisk[464]: [Sep 12 21:59:08] #033[1;31mERROR#033[0m[464]: #033[1;37mres_config_ldap.c#033[0m:#033[1;37m1856#033[0m #033[1;37mparse_config#033[0m: No directory URL or host found.
Sep 12 21:59:08 astwg asterisk[464]: [Sep 12 21:59:08] #033[1;31mERROR#033[0m[464]: #033[1;37mres_config_ldap.c#033[0m:#033[1;37m1718#033[0m #033[1;37mload_module#033[0m: Cannot load LDAP RealTime driver.
Sep 12 21:59:10 astwg asterisk[464]: [Sep 12 21:59:10] #033[1;33mNOTICE#033[0m[464]: #033[1;37mcdr.c#033[0m:#033[1;37m4485#033[0m #033[1;37mcdr_toggle_runtime_options#033[0m: CDR simple logging enabled.
Sep 12 21:59:18 astwg systemd-timesyncd[201]: Synchronized to time server for the first time [2001:bc8:4730:940e::1]:123 (2.debian.pool.ntp.org).
Sep 12 22:00:22 astwg systemd[1]: ssh.service: Start-pre operation timed out. Terminating.
Sep 12 22:00:22 astwg systemd[1]: asterisk.service: Start operation timed out. Terminating.
Sep 12 22:00:22 astwg systemd[1]: ssh.service: Control process exited, code=killed, status=15/TERM
Sep 12 22:00:22 astwg systemd[1]: ssh.service: Failed with result 'timeout'.
Sep 12 22:00:22 astwg systemd[1]: Failed to start OpenBSD Secure Shell server.
Sep 12 22:00:23 astwg systemd[1]: ssh.service: Service RestartSec=100ms expired, scheduling restart.
Sep 12 22:00:23 astwg systemd[1]: ssh.service: Scheduled restart job, restart counter is at 1.
Sep 12 22:00:23 astwg systemd[1]: Stopped OpenBSD Secure Shell server.
Sep 12 22:00:23 astwg systemd[1]: Starting OpenBSD Secure Shell server...
Sep 12 22:01:22 astwg asterisk[464]: [Sep 12 22:01:22] #033[1;31mWARNING#033[0m[591]: #033[1;37masterisk.c#033[0m:#033[1;37m3401#033[0m #033[1;37mcanary_thread#033[0m: The canary is no more. He has ceased to be! He's expired and gone to meet his maker! He's a stiff! Bereft of life, he rests in peace. His metabolic processes are now history! He's off the twig! He's kicked the bucket. He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!! THIS is an EX-CANARY. (Reducing priority)
Sep 12 22:01:53 astwg systemd[1]: asterisk.service: State 'stop-sigterm' timed out. Killing.
Sep 12 22:01:53 astwg systemd[1]: asterisk.service: Killing process 464 (asterisk) with signal SIGKILL.
Sep 12 22:01:53 astwg systemd[1]: ssh.service: Start-pre operation timed out. Terminating.
Sep 12 22:01:53 astwg systemd[1]: ssh.service: Control process exited, code=killed, status=15/TERM
Sep 12 22:01:53 astwg systemd[1]: ssh.service: Failed with result 'timeout'.
Sep 12 22:01:53 astwg systemd[1]: Failed to start OpenBSD Secure Shell server.
Sep 12 22:01:53 astwg systemd[1]: asterisk.service: Main process exited, code=killed, status=9/KILL
Sep 12 22:01:53 astwg systemd[1]: asterisk.service: Failed with result 'timeout'.
Sep 12 22:01:53 astwg systemd[1]: Failed to start Asterisk PBX.
Sep 12 22:01:53 astwg systemd[1]: Reached target Multi-User System.
Sep 12 22:01:53 astwg systemd[1]: Reached target Graphical Interface.
Sep 12 22:01:53 astwg systemd[1]: Starting Update UTMP about System Runlevel Changes...
Sep 12 22:01:53 astwg systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Sep 12 22:01:53 astwg systemd[1]: Started Update UTMP about System Runlevel Changes.
Sep 12 22:01:53 astwg systemd[1]: Startup finished in 5.269s (kernel) + 3min 9.408s (userspace) = 3min 14.678s.
Sep 12 22:01:53 astwg systemd[1]: ssh.service: Service RestartSec=100ms expired, scheduling restart.
Sep 12 22:01:53 astwg systemd[1]: ssh.service: Scheduled restart job, restart counter is at 2.
Sep 12 22:01:53 astwg systemd[1]: Stopped OpenBSD Secure Shell server.
Sep 12 22:01:53 astwg systemd[1]: Starting OpenBSD Secure Shell server...
Sep 12 22:01:54 astwg systemd[1]: asterisk.service: Service RestartSec=1s expired, scheduling restart.
Sep 12 22:01:54 astwg systemd[1]: asterisk.service: Scheduled restart job, restart counter is at 1.
Sep 12 22:01:54 astwg systemd[1]: Stopped Asterisk PBX.
Sep 12 22:01:54 astwg systemd[1]: Starting Asterisk PBX...
Voici le paramétrage systemd du serveur ssh :
cat /lib/systemd/system/ssh.service
[Unit]
Description=OpenBSD Secure Shell server
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target auditd.service
ConditionPathExists=!/etc/ssh/sshd_not_to_be_run
[Service]
EnvironmentFile=-/etc/default/ssh
ExecStartPre=/usr/sbin/sshd -t
ExecStart=/usr/sbin/sshd -D $SSHD_OPTS
ExecReload=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartPreventExitStatus=255
Type=notify
RuntimeDirectory=sshd
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
Alias=sshd.service
Je ne connais pas bien la gestion avec systemd à part les commandes courantes, y'a peut-être un truc à modifier ? idem pour asterisk ?
cat /lib/systemd/system/asterisk.service
[Unit]
Description=Asterisk PBX
Documentation=man:asterisk(8)
Wants=network-online.target
After=network-online.target
[Service]
Type=notify
ExecStart=/usr/sbin/asterisk -g -f -p -U asterisk
ExecReload=/usr/sbin/asterisk -rx 'core reload'
Restart=on-failure
RestartSec=1
WorkingDirectory=/var/lib/asterisk
[Install]
WantedBy=multi-user.target
Merci d'avance
-
La commande 'systemd-analyze' sert a cela:
sudo systemd-analyze critical-chain
et
sudo systemd-analyze blame
-
Voici le retour :
On dirait que systemd-analyze blame ne prend en compte que le dernier démarrage réussi au vu du timing.
systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.
graphical.target @3min 9.366s
└─multi-user.target @3min 9.366s
└─getty.target @9.057s
└─getty@tty1.service @9.057s
└─systemd-user-sessions.service @8.990s +60ms
└─network.target @8.925s
└─networking.service @6.370s +2.553s
└─ifupdown-pre.service @3.148s +3.218s
└─systemd-udev-trigger.service @2.880s +234ms
└─systemd-udevd-control.socket @2.865s
└─system.slice @2.685s
└─-.slice @2.685s
systemd-analyze blame
6.418s asterisk.service
4.584s man-db.service
3.680s dev-vda1.device
3.218s ifupdown-pre.service
2.553s networking.service
2.067s apparmor.service
1.852s exim4.service
1.818s ssh.service
1.098s logrotate.service
838ms keyboard-setup.service
692ms systemd-logind.service
688ms rsyslog.service
511ms sys-kernel-debug.mount
506ms systemd-remount-fs.service
496ms user@0.service
477ms dev-mqueue.mount
465ms dev-hugepages.mount
449ms systemd-tmpfiles-clean.service
440ms systemd-sysusers.service
345ms systemd-tmpfiles-setup.service
310ms systemd-udevd.service
234ms systemd-udev-trigger.service
212ms systemd-random-seed.service
207ms systemd-modules-load.service
200ms systemd-sysctl.service
181ms systemd-timesyncd.service
159ms systemd-update-utmp.service
149ms systemd-tmpfiles-setup-dev.service
147ms kmod-static-nodes.service
135ms sys-fs-fuse-connections.mount
133ms dev-disk-by\x2duuid-730ce55e\x2d5690\x2d41c3\x2d80ca\x2da2f7f890a9ad.swap
78ms console-setup.service
77ms systemd-journald.service
60ms systemd-user-sessions.service
39ms user-runtime-dir@0.service
39ms systemd-journal-flush.service
33ms systemd-update-utmp-runlevel.service
-
Mais, et c'est là que je n'ai jamais vu ça, si j'ai une interaction avec la machine directement (genre je commence à taper un login) les services démarrent instantanèment. Je le vois à la connexion SSH, tant que je n’interagis pas avec le prompt, j'ai une connexion refused et dès que je tape ne serait-ce qu'une lettre, je peux enfin me connecter.
Ca me fait penser à un manque d'entropie : https://wiki.debian.org/BoottimeEntropyStarvation.
Pour une VM, l'idéal est soit virtio-rng (que pour qemu je crois) qui demande de l'entropie à l'hôte, soit un CPU qui supporte RDRAND et que son utilisation soit activée (cf la page Debian).
Sinon il y a d'autres solutions, mais c'est moins bon.
-
Ça à l'air de bien correspondre. Surtout que cette VM a été upgradée de stretch à buster.
Mais j'en ai une 2nd, qui a subi le même traitement, et à priori elle n'est pas affectée (ou peu).
Vu que ce n'est pas trop long, je vais tenter une installation fraîche voir si ça change quelque chose.
-
On dirait bien que j'ai le même problème et cette fois j'ai une ligne qui parle de crng :
Sep 13 20:43:19 vpn systemd[1]: ssh.service: Start-pre operation timed out. Terminating.
Sep 13 20:43:19 vpn systemd[1]: ssh.service: Control process exited, code=killed, status=15/TERM
Sep 13 20:43:19 vpn systemd[1]: ssh.service: Failed with result 'timeout'.
Sep 13 20:43:19 vpn systemd[1]: Failed to start OpenBSD Secure Shell server.
Sep 13 20:43:19 vpn systemd[1]: Reached target Multi-User System.
Sep 13 20:43:19 vpn systemd[1]: Reached target Graphical Interface.
Sep 13 20:43:19 vpn systemd[1]: Starting Update UTMP about System Runlevel Changes...
Sep 13 20:43:19 vpn systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Sep 13 20:43:19 vpn systemd[1]: Started Update UTMP about System Runlevel Changes.
Sep 13 20:43:19 vpn systemd[1]: Startup finished in 4.168s (kernel) + 1min 35.183s (userspace) = 1min 39.351s.
Sep 13 20:43:20 vpn kernel: [ 99.539408] random: crng init done
Sep 13 20:43:20 vpn kernel: [ 99.539419] random: 7 urandom warning(s) missed due to ratelimiting
Sep 13 20:43:20 vpn systemd[1]: ssh.service: Service RestartSec=100ms expired, scheduling restart.
Sep 13 20:43:20 vpn systemd[1]: ssh.service: Scheduled restart job, restart counter is at 1.
Sep 13 20:43:20 vpn systemd[1]: Stopped OpenBSD Secure Shell server.
Sep 13 20:43:20 vpn systemd[1]: Starting OpenBSD Secure Shell server...
Sep 13 20:43:20 vpn systemd[1]: Started OpenBSD Secure Shell server.
Sep 13 20:57:06 vpn systemd[1]: Starting Cleanup of Temporary Directories...