DCOS Troubleshooting
- Get link
- X
- Other Apps
Reference
[DC/OS Troubleshooting 1.9] (https://docs.mesosphere.com/1.9/installing/troubleshooting/) NTP Servers
Useful command
- SSH to your master node and enter this command to view the logs from boot time: journalctl -u dcos-adminrouter -b
- SSH to your agent node and enter this command to view the logs from boot time: journalctl -u dcos-mesos-slave -b
- SSH to your master node and enter this command to view the logs from boot time: journalctl -u dcos-marathon -b
Failed to start Navstar: A distributed systems & network overlay orchestration engine
[dcosadmin@dcostest03 ~]$ service dcos-navstar restart
Redirecting to /bin/systemctl restart dcos-navstar.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: dcosadmin
Password:
==== AUTHENTICATION COMPLETE ===
Job for dcos-navstar.service failed because the control process exited with error code. See "systemctl status dcos-navstar.service" and "journalctl -xe" for details.
[dcosadmin@dcostest03 ~]$
[dcosadmin@dcostest03 ~]$ systemctl status dcos-navstar.service
● dcos-navstar.service - Navstar: A distributed systems & network overlay orchestration engine
Loaded: loaded (/opt/mesosphere/packages/navstar--1128db0234105a64fb4be52f4453cd6aa895ff30/dcos.target.wants/dcos-navstar.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2017-04-18 10:41:57 MDT; 2s ago
Process: 13580 ExecStartPre=/opt/mesosphere/bin/check-time (code=exited, status=1/FAILURE)
Apr 18 10:41:57 dcostest03.test.flairpackaging.com check-time[13580]: Time is not synchronized / marked as bad by the kernel.
Apr 18 10:41:57 dcostest03.test.flairpackaging.com systemd[1]: dcos-navstar.service: control process exited, code=exited status=1
Apr 18 10:41:57 dcostest03.test.flairpackaging.com systemd[1]: Failed to start Navstar: A distributed systems & network overlay orchestration engine.
Apr 18 10:41:57 dcostest03.test.flairpackaging.com systemd[1]: Unit dcos-navstar.service entered failed state.
Apr 18 10:41:57 dcostest03.test.flairpackaging.com systemd[1]: dcos-navstar.service failed.
[dcosadmin@dcostest03 ~]$
Result from working node for navstar
[dcosadmin@dcostest02 ~]$ systemctl status dcos-navstar.service
● dcos-navstar.service - Navstar: A distributed systems & network overlay orchestration engine
Loaded: loaded (/opt/mesosphere/packages/navstar--1128db0234105a64fb4be52f4453cd6aa895ff30/dcos.target.wants/dcos-navstar.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-04-17 16:54:20 MDT; 17h ago
Main PID: 35307 (navstar-env)
Memory: 65.2M
CGroup: /system.slice/dcos-navstar.service
├─35307 /bin/bash /opt/mesosphere/active/navstar/navstar/bin/navstar-env foreground
├─35316 /opt/mesosphere/packages/navstar--1128db0234105a64fb4be52f4453cd6aa895ff30/nav...
├─35438 erl_child_setup 16384
├─35463 inet_gethost 4
├─35464 inet_gethost 4
└─36012 inet_gethost 4
Apr 18 10:42:08 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:13 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:15 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:18 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:23 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:25 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:28 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:33 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:35 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Apr 18 10:42:38 dcostest02.test.flairpackaging.com navstar-env[35307]: [warning] <0.931.0>@lashup...
Hint: Some lines were ellipsized, use -l to show in full.
[dcosadmin@dcostest02 ~]$
When I check ntptime, it returned error message. After restarting the ntpd service, it is working now.
[dcosadmin@dcostest03 ~]$ ntptime
ntp_gettime() returns code 5 (ERROR)
time dca0c0ba.62c68000 Tue, Apr 18 2017 10:43:38.385, (.385841),
maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 16000000 us, estimated error 16000000 us,
status 0x41 (PLL,UNSYNC),
time constant 7, precision 1.000 us, tolerance 500 ppm,
[dcosadmin@dcostest03 ~]$ sudo service ntpd restart
Redirecting to /bin/systemctl restart ntpd.service
[dcosadmin@dcostest03 ~]$ ntptime
ntp_gettime() returns code 0 (OK)
time dca0c0cf.70a2d000 Tue, Apr 18 2017 10:43:59.439, (.439984),
maximum error 1016 us, estimated error 16 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1016 us, estimated error 16 us,
status 0x1 (PLL),
time constant 7, precision 1.000 us, tolerance 500 ppm,
[dcosadmin@dcostest03 ~]$ systemctl status dcos-navstar.service
● dcos-navstar.service - Navstar: A distributed systems & network overlay orchestration engine
Loaded: loaded (/opt/mesosphere/packages/navstar--1128db0234105a64fb4be52f4453cd6aa895ff30/dcos.target.wants/dcos-navstar.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2017-04-18 10:44:03 MDT; 1s ago
Process: 13990 ExecStartPre=/usr/bin/env ip link set minuteman up (code=exited, status=0/SUCCESS)
Process: 13984 ExecStartPre=/usr/bin/env ip link add minuteman type dummy (code=exited, status=0/SUCCESS)
Process: 13981 ExecStartPre=/usr/bin/env modprobe dummy (code=exited, status=0/SUCCESS)
Process: 13978 ExecStartPre=/usr/bin/mkdir -p /var/lib/dcos/navstar/lashup (code=exited, status=0/SUCCESS)
Process: 13975 ExecStartPre=/usr/bin/mkdir -p /var/lib/dcos/navstar/mnesia (code=exited, status=0/SUCCESS)
Process: 13968 ExecStartPre=/opt/mesosphere/bin/bootstrap dcos-minuteman (code=exited, status=0/SUCCESS)
Process: 13960 ExecStartPre=/opt/mesosphere/bin/bootstrap dcos-navstar (code=exited, status=0/SUCCESS)
Process: 13946 ExecStartPre=/opt/mesosphere/bin/setup_iptables.sh (code=exited, status=0/SUCCESS)
Process: 13941 ExecStartPre=/usr/bin/env modprobe ip_vs_wlc (code=exited, status=0/SUCCESS)
Process: 13938 ExecStartPre=/bin/ping -c1 ready.spartan (code=exited, status=0/SUCCESS)
Process: 13936 ExecStartPre=/opt/mesosphere/bin/check-time (code=exited, status=0/SUCCESS)
Main PID: 13995 (navstar-env)
Memory: 33.0M
CGroup: /system.slice/dcos-navstar.service
├─13995 /bin/bash /opt/mesosphere/active/navstar/navstar/bin/navstar-env foreground
├─14006 /opt/mesosphere/packages/navstar--1128db0234105a64fb4be52f4453cd6aa895ff30/nav...
└─14128 erl_child_setup 16384
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13960]: [INFO] Zookeeper connection...D
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13960]: [DEBUG] bootstrapping dcos-...r
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13968]: [INFO] Clearing proxy envir...s
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13968]: [INFO] Connecting to zk-3.z...1
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13968]: [INFO] Zookeeper connection...D
Apr 18 10:44:03 dcostest03.test.flairpackaging.com bootstrap[13968]: [DEBUG] bootstrapping dcos-...n
Apr 18 10:44:03 dcostest03.test.flairpackaging.com systemd[1]: Started Navstar: A distributed sy....
Apr 18 10:44:04 dcostest03.test.flairpackaging.com navstar-env[13995]: Exec: /opt/mesosphere/pack...
Apr 18 10:44:04 dcostest03.test.flairpackaging.com navstar-env[13995]: Root: /opt/mesosphere/pack...
Apr 18 10:44:04 dcostest03.test.flairpackaging.com navstar-env[13995]: /opt/mesosphere/packages/n...
Hint: Some lines were ellipsized, use -l to show in full.
[dcosadmin@dcostest03 ~]$
Checking ntp setting
check ntp servers
[dcosadmin@dcotest03 ~]$ sudo vi /etc/ntp.conf
[dcosadmin@dcotest03 ~]$ service ntpd stop
Redirecting to /bin/systemctl stop ntpd.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: dcosadmin
Password:
==== AUTHENTICATION COMPLETE ===
[dcosadmin@dcotest03 ~]$
[dcosadmin@dcotest03 ~]$
[dcosadmin@dcotest03 ~]$ ntpq
ntpq>
exit
[dcosadmin@dcotest03 ~]$ service ntpd start
Redirecting to /bin/systemctl start ntpd.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: dcosadmin
Password:
==== AUTHENTICATION COMPLETE ===
[dcosadmin@dcotest03 ~]$ ntpq
ntpq> peer
remote refid st t when poll reach delay offset jitter
==============================================================================
itachi.tux-host 193.49.184.17 3 u - 64 1 47.916 -0.383 0.102
ns522433.ip-158 18.26.4.105 2 u 2 64 1 57.707 5.409 0.194
159.203.31.244 23.213.115.25 2 u 1 64 1 51.155 32.976 0.162
206-248-144-162 .PPS. 1 u 2 64 1 46.510 -1.090 0.000
ntpq> peer
remote refid st t when poll reach delay offset jitter
==============================================================================
+itachi.tux-host 193.49.184.17 3 u 6 64 1 47.863 -0.394 0.076
+ns522433.ip-158 18.26.4.105 2 u 5 64 1 57.427 5.302 0.064
-159.203.31.244 23.213.115.25 2 u 4 64 1 51.155 32.976 0.102
*206-248-144-162 .PPS. 1 u 4 64 1 46.068 -1.099 0.129
ntpq> quit
[dcosadmin@dcotest03 ~]$ service ntpd restart
Redirecting to /bin/systemctl restart ntpd.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: dcosadmin
Password:
==== AUTHENTICATION COMPLETE ===
[dcosadmin@dcotest03 ~]$ ntpq
ntpq> peer
remote refid st t when poll reach delay offset jitter
==============================================================================
ca.picquenot.co 193.190.230.66 2 u 1 64 1 47.831 1.169 0.166
ns522433.ip-158 18.26.4.105 2 u 2 64 1 57.797 5.207 0.045
ns1.ptpbroadban 132.246.11.227 3 u 1 64 1 47.310 -0.718 0.059
time.srv.ualber 129.128.153.62 2 u 2 64 1 9.979 -0.672 0.000
ntpq> exit
[dcosadmin@dcotest03 ~]$ exit
Hanging from sudo /opt/mesosphere/bin/./3dt -diag
-
Check firewall is disabled [dcosadmin@dcostest01 ~]$ sudo systemctl stop firewalld && sudo systemctl disable firewalld
-
Check log using journalctl -xe Apr 25 13:57:24 dcotest06.test.flairpackaging.com bootstrap[2632]: [WARNING] Connection dropped: socket connection error: Name or service not known Apr 25 13:57:24 dcotest06.test.flairpackaging.com bootstrap[2632]: [INFO] Connecting to zk-3.zk:2181
This is zookeeper error journalctl -u dcos-exhibitor -b
- Check detect_ip
[dcosadmin@dcotest03 ~]$ cat /opt/mesosphere/bin/detect_ip
#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show ens160 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)
[dcosadmin@dcotest03 ~]$
NTP Error
[dcosadmin@dcotest05 ~]$ ntptime
ntp_gettime() returns code 5 (ERROR)
time dcaa4b8d.a65aa000 Tue, Apr 25 2017 16:26:21.649, (.649820),
maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 16000000 us, estimated error 16000000 us,
status 0x40 (UNSYNC),
time constant 2, precision 1.000 us, tolerance 500 ppm,
[dcosadmin@dcotest05 ~]$ service ntpd restart
The best way is to change server list from /etc/ntp.conf to the reference document In most cases it's best to use pool.ntp.org to find an NTP server (or 0.pool.ntp.org, 1.pool.ntp.org, etc if you need multiple server names). The system will try finding the closest available servers for you. If you distribute software or equipment that uses NTP
[dcosadmin@dcotest05 ~]$ cat /etc/ntp.conf
......
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst
If ntp time is still issue, please run below command. This will change NTP enabled value from No to Yes.
[dcosadmin@dcotest05 ~]$ timedatectl set-ntp true
[dcosadmin@dcotest05 ~]$ timedatectl
Local time: Wed 2017-04-26 10:51:46 MDT
Universal time: Wed 2017-04-26 16:51:46 UTC
RTC time: Wed 2017-04-26 16:51:45
Time zone: America/Edmonton (MDT, -0600)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2017-03-12 01:59:59 MST
Sun 2017-03-12 03:00:00 MDT
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2017-11-05 01:59:59 MDT
Sun 2017-11-05 01:00:00 MST
[dcosadmin@dcotest05 ~]$
- Get link
- X
- Other Apps
Comments
Post a Comment