Connectivity Troubleshooting

When deploying Kasm at scale in enterprise type settings, with advanced L7 firewalls, proxies, and other security devices in the path, there is no end to the possible combinations of configurations and devices at play. Therefore, an advanced guide is needed to assist engineers in finding potential problems within their environment that may be keeping Kasm from working properly.

Note

When troubleshooting this issue, always test by creating new sessions after making changes. Resuming a session may not have the changes applied.

This section assumes that users are able to get to the Kasm Workspaces application and login. Users are able to create a session and Kasm does provision a container or VM session for the user. The user would typically see the following connecting screen which may loop, the screenshot below is shown with Chrome Developer Tools open.

../../_images/requesting_kasm_devtools.webp

Session hangs at requesting Kasm

Browser Extensions

As a first step, open a new Incognito Window, ensure no other Incognito Windows are open before opening a new one. This will ensure that browser extensions are disabled. It will also ensure cookie collisions do not occur. Ensure to use the Incognito Window for all further testing as you progress through the following sections. If the problem goes away immediately when using an Incognito Window, try a normal window but disable all browser extensions. Enable them one at a time until the issue appears again. If you cannot load a session with all browser extensions disabled, but you can load a session in an Incognito Window, then it is likely a cookie issue. Follow the troubleshooting steps in the Validating Session Cookies, Cookie Conflict, and Cookie Transit sections.

Validate Session Port

Here is an expanded view of the above example with the Url field extended. In this below example, the port going to the KasmVNC session is on 8443 while the Kasm Web application seems to be on port 443. This is likely a misconfiguration. Many organizations will configure Kasm Workspaces to run on a high port number internally, but then proxy to the internet on port 443. This is very common in the DoD and Federal sector where DISA STIGs disallow the use of privileged low port numbers on internal servers.

../../_images/devtools_wrong_port.webp

Session hangs at requesting Kasm

What typically occurs is that Kasm Workspaces is installed internally on a high port number. When accessing Kasm directly on the high port number, it works fine. When accessing Kasm externally through a proxy on port 443, Kasm sessions fail to load. The setting that applies here is the Zone proxy port setting. This port setting is relative to the client. So if Kasm has been configured to internally be running on 8443, but it is proxied by an F5, for example, on port 443, the Zone setting Proxy Port should be set to port 443. After changing the setting, you will need to destroy any existing sessions. Newly created sessions will pick up the new port change.

Validate Session Cookies

While standard API calls use tokens in the JSON body of the request, requests to KasmVNC use two cookies that authorize the connection. The cookies are validated at each hop in the path to the users container. KasmVNC itself does not check the cookies, instead, the last NGINX server in the path injects a HTTP Authorization header with a unique token. The client never has access to or knowledge of this token. Further sections will walk through validating this process, this section will focus on ensuring the cookies are present and confirming they are making it all the way to the last hop.

../../_images/devtools_cookies_present.webp

Session hangs at requesting Kasm

Using the above screenshot as an example, find the request to load vnc.html and select the request. On the Cookies sub tab for the request, ensure the checkbox show filtered out request cookies is unchecked. The username and session_token cookies are the ones that are required by Kasm. Ensure there is only one of each.

If the username or session_token cookie is missing, check the show filtered out request cookies to see if the cookie is there but being blocked by your browser. See the sub section Browser Blocking Cookies for troubleshooting this issue.

If there are multiple usernames or multiple session_token cookies, see the Cookie Conflict section.

If both cookies are present and there is only one of each, continue on to the Cookie Transit subsection.

Browser Blocking Cookies

Browsers can block cookies for a number of reasons. Browser extensions and security software can block cookies for any number of reasons. A good place to start would be the Console tab within DevTools. If the browser itself is blocking the cookies, it will usually list the reason. Common reasons for the browser itself to block the cookie would include Cross-Origin Resource Sharing security mechanisms. If Console tab output indicates CORS issues, ensure you are on the latest release of Kasm Workspaces. Kasm Workspaces 1.14.0 included better handing of CORS issues for common architectures.

CORS issues can come into play when Kasm Workspaces is setup with multiple Zones, with each zone in a different region. Each zone will have a different domain name. When a user navigates to the main site, such as kasm.acme.com, all requests go to the primary region’s API servers. When a user creates a session, the iframe for KasmVNC or RDP session will go to the region specific hostname, such as us-east.kasm.acme.com. It is important that all zone domain names are sub domains of the primary site, otherwise you will run into CORS issues. Do not, for example, make the primary application domain kasm.acme.com and a zone domain us-east.acme.com. Ensure that zone domain names are sub domains of the primary Kasm Workspaces domain name.

Validate Session Authorization

When a user’s request travels to their session container, it traverses a WebApp server’s nginx container, it then travels to the agent server that the container is on and traverses an nginx container and finally the user’s container. On both servers, the NGINX container makes an API call to the kasm_api container. For the WebApp server this kasm_api container resides on the same server as nginx and in the same docker network. The WebApp server’s NGINX container makes a call to /api/kasm_connect to retrieve the details of where to forward the request. In a distributed architecture, the agent can be anywhere and be privately addressed. The client does not have the IP address or hostname of the agent, nor does the user’s HTTP request contain this information. NGINX calls the kasm_connect API to retrieve the required information. Run the following command to check that the API container is getting the request and that it returns a 202 status code.

sudo docker logs -f --tail 10 kasm_api 2>&1 | grep /api/kasm_connect
2023-11-17 18:31:01,529 [INFO] cherrypy.access.140087972390848: 172.18.0.9 - - [17/Nov/2023:18:31:01] "GET /api/kasm_connect/ HTTP/1.0" 202 - "https://kasm.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"

In the above example the /api/kasm_connect API call was received and a status code 202 was returned, indicating it should have been successful. Ensure that the target agent is reachable from each API server. NGINX receives the same IP/hostname that is shown in the Kasm Admin UI under Infrastructure -> Docker Agents. So ensure you use the address listed there in the curl example below to ensure the API server can reach the agent in question.

curl -k https://<agent_ip>:8443/agent/__healthcheck
{"ok": true}

The Agent server will send API calls to the Kasm WebApp server to authorize the incoming request. This can sometimes cause issues in complex environments. First, check your deployments configuration, go to the Kasm Admin UI and navigate to Infrastructure, Zones, and edit the applicable Zone. In the Zone settings is the Upstream Auth Address. For a multi-server deployment the default value is $request_host$ which is a Kasm variable that gets replaced at runtime with the hostname of the incoming request. Lets walk through the following example.

Client –> Load Balancer (kasm.example.com) –> Layer 7 Firewall –> Kasm WebApp Servers –> Agent

A client connects to Kasm at https://kasm.example.com, the load balancer then forwards that to one of 4 Kasm WebApp servers, which then forward that request to the agent that the user’s container is on. When the agent receives the request, it needs to authorize the request. With default Zone settings on a multi-server deployment, it will send that API call to https://kasm.example.com/api/internal_auth. This is a ok default behavior that works for many deployments, however, in this example the domain name kasm.example.com does not point directly to the WebApp servers, it is for a public load balancer that is in a DMZ some where else in the enterprise. The Kasm agents may not have access to send API calls there, or the calls might be subject to a forward proxy with SSL inspection. To validate whether your agent can access the API servers through this default setting, run the following command.

# success
curl -k https://kasm.example.com/api/__healthcheck
{"ok": true}

# name resolution fails
curl -k https://kasm.example.com/api/__healthcheck
curl: (6) Could not resolve host: kasm.example.com

# invalid host
curl -k https://kasm.example.com/api/__healthcheck
curl: (7) Failed to connect to 10.0.0.251 port 443 after 3074 ms: No route to host

If you do not get the json response shown in the first example above, then your agent likely can’t access the WebApp server through the same domain name that your clients access kasm through. A better way to architect this for enterprise grade deployments is to use an internal load balancer with a hostname. Change the Zone Upstream Auth Address to the hostname of the internal load balancer. Ensure your can curl the API health check through the internal load balancer from the agents. Another approach is to use an internal DNS name that points to all 4 WebApp servers and change the Zone’s Upstream Auth Address to point to that internal hostname. After changing this setting, you will need to delete any created sessions. Any newly created sessions will have the new setting.

Finally, it is good to ensure that an API server actually received the internal_auth API request and what it did with the request. Run the below command on each WebApp server to inspect the API container logs for internal_auth requests.

sudo docker logs -f kasm_api 2>&1 | grep internal_auth
2023-11-15 18:43:18,076 [INFO] cherrypy.access.140168522947744: 172.18.0.9 - - [15/Nov/2023:18:43:18] "GET /api/internal_auth/ HTTP/1.1" 202 - "https://kasm.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"

The above example log shows a 202 response, indicating the request was not only received, wit was authorized.

WebSockets

The actual stream of the desktop session goes through a websocket connection, and websocket connections are handled differently and may be blocked or otherwise disrupted by security software installed on the client system or by security devices in the path from the client to the Kasm WebApp servers. First, make sure that the client is sending the request for the websocket connection in the first place. Open up DevTools in the browser, go to the Network tab, and then attempt to connect to a session. Click the WS filter, shown in the below screenshot, to get just the websocket connections and look for the websockify request.

../../_images/devtools_websocket.webp

DevTools WebSocket

Also check the console tab in DevTools and ensure you don’t see errors.

../../_images/websocket_error.png

Session hangs at requesting Kasm

Next, determine if the websockify request is making it all the way to the target agent. The easiest way to quickly determine if the websocket connection is making it all the way down to the agent is to run the following command on the agent.

sudo docker logs -f kasm_proxy 2>&1 | grep '/vnc/websockify ' | grep -v -P '(internal_auth|kasm_connect)'
123.123.123.123 - - [15/Nov/2023:18:50:20 +0000] "GET /desktop/72248a05-922d-4518-b92f-7a9d1ea529eb/vnc/websockify HTTP/1.1" 101 3104787 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"

In the above example, the request was received and the response code was 101, which is what should happen. If you see this request come in on the agent, then this means the websocket has traversed all the way through the stack to the agent. In this case proceed to the next section.

If you do not see any output from the above command on the Agent, then something in your stack, prior to Kasm is interfering with the websocket connection.

KasmVNC Troubleshooting

If you have verified all the above steps the next step is to troubleshoot KasmVNC. First, enable debug logging on the user container.

  1. In the Kasm Admin UI, navigate to Workspaces -> Workspaces

  2. Find the target Workspace in the list and click the Edit button.

  3. Scroll down to the Docker Run Config Override field and paste in the following. { "environment": { "KASM_DEBUG": 1 } }

  4. Launch a new session

Now SSH to the agent that the session was provisioned on and run the following command to get a shell inside the container.

# Get a list of running containers and identify your session container, the name of the container contains your partial username and session ID.
sudo docker ps
CONTAINER ID   IMAGE                                                               COMMAND                  CREATED         STATUS                PORTS                                           NAMES
125348fe9990   kasmweb/core-ubuntu-jammy-private:feature_KASM-5078_multi-monitor   "/dockerstartup/kasm…"   3 minutes ago   Up 3 minutes          4901/tcp, 5901/tcp, 6901/tcp                    mattkasm.loc_4f8ada7f

# Get a shell inside of the container
sudo docker exec -it 125348fe9990 /bin/bash
default:~$ 

# Tail the KasmVNC logs
default:~$ tail -f .vnc/125348fe9990\:1.log

With the tail of the logs running, try to connect to the session. You will see a bunch of normal HTTP requests that are loading the static page resources such as vnc.html, javascript, stylesheets, etc. They will look like the following:

 2023-11-17 14:51:07,525 [DEBUG] websocket 141: BasicAuth matched
 2023-11-17 14:51:07,525 [INFO] websocket 141: /websockify request failed websocket checks, not a GET request, or missing Upgrade header
 2023-11-17 14:51:07,525 [DEBUG] websocket 141: Invalid WS request, maybe a HTTP one
 2023-11-17 14:51:07,525 [DEBUG] websocket 141: Requested file '/app/sounds/bell.oga'
 2023-11-17 14:51:07,525 [INFO] websocket 141: 172.18.0.9 71.62.47.171 kasm_user "GET /app/sounds/bell.oga HTTP/1.1" 200 8701
 2023-11-17 14:51:07,525 [DEBUG] websocket 141: No connection after handshake
 2023-11-17 14:51:07,525 [DEBUG] websocket 141: handler exit

The first field is the date and time. After the time is a comma, followed by a number, that number is a HTTP request ID. Group each request by ID, so that you have all logs for a specific request. In the above example, all the logs were produced for the request for /app/sounds/bell.oga. The above log of “/websockify request failed websocket checks, not a GET request, or missing Upgrade header” is misleading. This is not, in and of itself an issue. It merely means that the incoming request was not a websocket request. In the beginning, KasmVNC only had a websocket server and it did not handle other types of web requests. The message is only relevant if the requested file was /websockify. If you saw the message “/websockify request failed websocket checks, not a GET request, or missing Upgrade header” for the /websockify file, then this means that KasmVNC was unable to identify the request as a valid websocket connection.

Below is an example of what you should see for the websocket connection. Note the message “using protocol HyBi/IETF 6455 13” indicating that KasmVNC was able to correctly identify the exact Websocket specification being used by the browser.

 2023-11-17 14:51:07,526 [DEBUG] websocket 142: using SSL socket
 2023-11-17 14:51:07,526 [DEBUG] websocket 142: X-Forwarded-For ip '71.62.47.171'
 2023-11-17 14:51:07,529 [DEBUG] websocket 142: BasicAuth matched
 2023-11-17 14:51:07,529 [DEBUG] websocket 142: using protocol HyBi/IETF 6455 13
 2023-11-17 14:51:07,529 [DEBUG] websocket 142: connecting to VNC target
 2023-11-17 14:51:07,529 [DEBUG] XserverDesktop: new client, sock 32

In some cases KasmVNC could have a bug. In the recent past, KasmVNC would crash due to large cookies or if the websocket connection was not exactly to the spec. These issues have since been corrected, however, there is no end to the combination of devices and services out there that sit between users and Kasm. In some cases these security devices or services manipulate the HTTP requests in a way that brings them out of compliance with the specification. This can cause improper handling by KasmVNC. The following is an exmaple of what a crash would look like in the KasmVNC logs.

(EE) 
(EE) Backtrace:
(EE) 0: /usr/bin/Xvnc (xorg_backtrace+0x4d) [0x5e48dd]
(EE) 1: /usr/bin/Xvnc (0x400000+0x1e8259) [0x5e8259]
(EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f5a57ef6000+0x12980) [0x7f5a57f08980]
(EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (epoll_wait+0x57) [0x7f5a552eca47]
(EE) 4: /usr/bin/Xvnc (ospoll_wait+0x37) [0x5e8d07]
(EE) 5: /usr/bin/Xvnc (WaitForSomething+0x1c3) [0x5e2813]
(EE) 6: /usr/bin/Xvnc (Dispatch+0xa7) [0x597007]
(EE) 7: /usr/bin/Xvnc (dix_main+0x36e) [0x59b1fe]
(EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xe7) [0x7f5a551ecbf7]
(EE) 9: /usr/bin/Xvnc (_start+0x2a) [0x46048a]
(EE) 
(EE) Received signal 11 sent by process 17182, uid 0
(EE) 
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE) 

KasmVNC will be restarted automatically by the container’s entrypoint script, so you may see this repeat. Copy the backtrace output and provide it to Kasm support, along with the output of the following command.

sudo docker exec -it 125348fe9990 Xvnc -version

Xvnc KasmVNC 1.2.0.e4a5004f4b89b9da78c9b5f5aee59c08c662ccec - built Oct 31 2023 11:22:56
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12008000, The X.Org Foundation

With the above information we should be able to symbolize the backtrace and potentially find out what the issue is.

Server Configuration Issues

The following sub sections cover configuration issues on individual servers. These issues would be at the host OS level, so not with Kasm itself, but with the configuration of the host operating system or dependencies therein.

Confirm Local Connectivity

Sometimes in troubleshooting, if individual Kasm service containers are started, stopped, or restarted, the Kasm proxy container may lose the local hostname resolution of the other containers. First, lets stop and start the Kasm services to ensure hostname resolution is refreshed and that all containers were started in the proper order.

/opt/kasm/bin/stop
/opt/kasm/bin/start

Next lets confirm that all services are up, running, and healthy. The following output shows that all services are up, running, and healthy. This is from a single server deployment.

sudo docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED      STATUS                PORTS                                           NAMES
401963ee87a0   kasmweb/nginx:1.25.1        "/docker-entrypoint.…"   6 days ago   Up 6 days             80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   kasm_proxy
0eb604899140   kasmweb/agent:1.14.0        "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   4444/tcp                                        kasm_agent
140660c3c201   kasmweb/manager:1.14.0      "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   8181/tcp                                        kasm_manager
1ed22b860c6b   kasmweb/kasm-guac:1.14.0    "/dockerentrypoint.sh"   6 days ago   Up 6 days (healthy)                                                   kasm_guac
45fceb6bff08   kasmweb/share:1.14.0        "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   8182/tcp                                        kasm_share
5759e5692a85   kasmweb/api:1.14.0          "/bin/sh -c 'python3…"   6 days ago   Up 6 days             8080/tcp                                        kasm_api
482059a66347   redis:5-alpine              "docker-entrypoint.s…"   7 days ago   Up 7 days             6379/tcp                                        kasm_redis
670da792ed27   postgres:12-alpine          "docker-entrypoint.s…"   7 days ago   Up 7 days (healthy)   5432/tcp                                        kasm_db

For a WebApp server on a multi-server deployment, the output should look like the following.

sudo docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED      STATUS                PORTS                                           NAMES
401963ee87a0   kasmweb/nginx:1.25.1        "/docker-entrypoint.…"   6 days ago   Up 6 days             80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   kasm_proxy
140660c3c201   kasmweb/manager:1.14.0      "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   8181/tcp                                        kasm_manager
45fceb6bff08   kasmweb/share:1.14.0        "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   8182/tcp                                        kasm_share
5759e5692a85   kasmweb/api:1.14.0          "/bin/sh -c 'python3…"   6 days ago   Up 6 days             8080/tcp                                        kasm_api

For an Agent server on a multi-server deployment, the output should look like the following.

sudo docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED      STATUS                PORTS                                           NAMES
401963ee87a0   kasmweb/nginx:1.25.1        "/docker-entrypoint.…"   6 days ago   Up 6 days             80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   kasm_proxy
0eb604899140   kasmweb/agent:1.14.0        "/bin/sh -c '/usr/bi…"   6 days ago   Up 6 days (healthy)   4444/tcp                                        kasm_agent

Make note of the port number in the output of the kasm_proxy container. From the above examples you can see 0.0.0.0:443->443/tcp, which indicates that the host’s port 443 is mapped to the container’s port 443. This indicates that kasm was installed on port 443, ensure this matches your expectation.

Next, ensure that there is a program listening on the target port.

ss -ltn
State               Recv-Q              Send-Q                            Local Address:Port                              Peer Address:Port              Process              
LISTEN              0                   4096                                    0.0.0.0:111                                    0.0.0.0:*                                      
LISTEN              0                   4096                              127.0.0.53%lo:53                                     0.0.0.0:*                                      
LISTEN              0                   128                                     0.0.0.0:22                                     0.0.0.0:*                                      
LISTEN              0                   4096                                    0.0.0.0:443                                    0.0.0.0:*                                      
LISTEN              0                   511                                   127.0.0.1:35521                                  0.0.0.0:*                                      
LISTEN              0                   511                                     0.0.0.0:9001                                   0.0.0.0:*                                      
LISTEN              0                   4096                                       [::]:111                                       [::]:*                                      
LISTEN              0                   128                                        [::]:22                                        [::]:*                                      
LISTEN              0                   4096                                       [::]:443                                       [::]:*

The above output shows that my server is listening on port 443 on both ipv4 and ipv6.

Next, get the local IP address of the user facing network interface.

ubuntu@roles-matt:~$ ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:17:0f:8c:19 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 10.0.0.106/24 metric 100 brd 10.0.0.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::17ff:fe0f:8c19/64 scope link 
       valid_lft forever preferred_lft forever

Ignore loopback interfaces, docker0, bridge interfaces, and any other interfaces without an IP address. In the example above we can see the IP address is 10.0.0.106. Curl the IP address using HTTPS on the expected port number.

# Correct Output
curl -k https://10.0.0.106:443/api/__healthcheck
{"ok": true}

# Failure
curl -k https://10.0.0.106:443/api/__healthcheck
curl: (7) Failed to connect to 10.0.0.106 port 443 after 0 ms: Connection refused

If that command either hung or immediately returned a Connection refused message as shown above in the failure case, then your local system likely has a firewall running, see the next section.

Host Based Firewalls

Host based firewalls such as McAfee HBSS and even the Linux default UFW can interfere with communications. Docker manages iptables rules and other firewalls and security products either apply additional rules or actually use iptables as well. This can result in corrupt iptables rules.

If you have UFW installed, run the following to allow https on port 443. See the UFW documentation on how to make this rule permanent, add alternative ports, or for additional usage instructions.

sudo ufw status

sudo ufw allow https

The following commands will completely clear IP tables rules and any NAT rules.

This may break your system, ensure you know what you are doing

# shut down kasm
/opt/kasm/bin/stop

# Accept all traffic first to avoid ssh lockdown  via iptables firewall rules #
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
 
# Flush All Iptables Chains/Firewall rules #
iptables -F
 
# Delete all Iptables Chains #
iptables -X
 
# Flush all counters too #
iptables -Z 
# Flush and delete all nat and  mangle #
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -t raw -F
iptables -t raw -X

# Restarting docker will regenerate the iptables rules that docker needs
sudo systemctl restart docker

# Bring Kasm back up
/opt/kasm/bin/start

If the above fixes your issues, it may only be temporary. You may have security or configuration management software installed on your server that eventually re-apply the offending rules. Please consult the documentation of your offending software for remediation.