Sizing and Deployment Guide

This guide covers sizing requirements and deployment guidance for the Kasm Workspaces platform and assumes a distributed architecture with separate API, database, and agent components. For single server deployments, use the guidance provided in the agent sizing section.

Note

For all componenets, maximum concurrent Kasm sessions is the metric that should be used when sizing. Some use cases will have lots of total users with a small number of concurrent users while others will have more concurrent sessions than total users (more than 1 session per user). In all cases, the maximum concurrent session count should be the metric used when considering sizing requirements.

Web App Role

The web app server role is comprised of several containerized services: API, Manager, Share, and proxy. The API server is responsible for taking API calls from users and potentially external services using the developer API. The manager service is responsible for managing clusters of agents and auto scaling new agents in the cloud if configured. The web app server role is both vertically scalable (adding more CPUs and RAM) and horizontally scalable (adding more web app servers).

Redundancy

For large deployments, it is recommended to have N+1 redundancy for the web app servers. This ensures ease of maintenance, allowing administrators to bring down a server for patching, without effecting users.

Sizing

Concurrent sessions are used for deployment size recommendations, so while you may be using a named user license, you still need to understand your concurrent session requirements in order to size your deployment appropriately. There are two deployment architectures that can be used, which have drastically different sizing requirements for the Web App Server role, Standard and Direct-to-Agent. The standard deployment architecture has all traffic traverse the web app servers that are responsible for each respective zone. The Direct-to-Agent workflow has client API calls going to the Web App servers while the desktop rendering goes directly to the server hosting the application container. Direct-to-agent is only applicable to containerized desktop/app sessions and does not apply to Servers connected to via RDP.

Deployment

Server Specifications

Concurrent Sessions

Standard

4 CPUs, 4GB RAM, 80 GB HD

70 to 150

Direct-to-Agent

4 CPUs, 4GB RAM, 80 GB HD

300 to 500

The range of concurrent sessions a single server can support is highly dependent on the use case. Administrators supporting an online gaming service would be on the low end of the spectrum while more moderate uses like remote work or browser isolation could expect to be near the high end of the spectrum.

An administrator expecting to support up to 200 concurrent sessions for a remote work use case, in a standard deployment architecture, would need 3 servers using the N+1 redundancy recommendation. With 3 API servers, the administrator could expect to support up to 450 concurrent users. If the administrator needs to remove one for maintenance, the deployment would still be able to support up to 300 concurrent sessions, still within the desired capacity.

Load Balancing

Load distribution to different deployed web app servers can be accomplished via DNS load balancing or using a network load balancer such as an F5 or cloud load balancer.

DNS Load Balancing

With DNS load balancing, there is an A record for the same domain name pointing to each web app servers. Health checks can be used to automatically remove the DNS entry should the corresponding web app server become unresponsive.

DNS load balancing offers a simple solution that requires no additional components between users and the backend servers. This simplifies the architecture and significantly reduces the cost of providing resilient services.

There are many services and physical devices on premise that can provide DNS load balancing with health checks. AWS Route 53 is one example. You don’t need to host Kasm in AWS to use Route 53. Route 53 can provide public DNS services for resources on-premise or even in other cloud providers.

The disadvantage to using DNS based load balancing is slow convergence on failure. Even with the most aggressive settings of10 seconds between health checks and 3 consecutive failures, will result in at least 30 seconds to recognize the failure. At this point the DNS record is removed, however, clients will not get the update until the TTL has expired. A typical TTL in AWS is 5 minutes, however, it can be set as low as 60 seconds. Therefore, the best case scenario is that on failure some clients will be down for at least 90 seconds.

Network Load Balancer

Another option is to use a network load balancer, either a physical on-premise load balancer such as an F5 or a cloud based load balancer. In either case, the load balancer is placed between the users and the web app servers and distributes the load between all the web app servers. Load balancers typically use health checks to determine if an upstream server is offline. These health checks can be passive, such as looking at the TCP sessions or HTTP return codes of traffic. Active health checks can also be used, which actively make HTTP calls to a specific path and expect a configured HTTP status code.

Network load balancers have the advantage of providing convergence within seconds after a failure. They can also try multiple upstream web app servers for a single request, meaning users typically wouldn’t notice a web app server going down. The disadvantage is that it complicates the architecture with additional components between the end users and the backend servers. The load balancers can also go down and require maintenance as well. Which usually means maintaining N+1 redundancy and using DNS load balancing to balance the traffic to the load balancers.

Network load balancers may be required for some organizations as they provide many other potential benefits: * Single entry point for all publicly served websites for the organization * Web Application Firewall (WAF) * SSL inspection * Data Loss Prevention * Anti-virus * Logging of all traffic at the enterprise level * Other advanced security features

Health Checks

Wether using DNS load balancing, network load balancers, or both, health checks are imperative to ensuring failover and alerting. Kasm has a built in health check located at /api/__healthcheck, which checks the operational status of the API server and the API server’s access to the database. If the API returns a non HTTP 200 series status code, the health check failed. AWS Route 53 can be configured with email alerts on failure, in addition to removing the associated A record. When the health check starts returning healthy, the A record will be re-enabled. For network load balancers the premise is the same, upstream servers will automatically be removed/added as their health check changes.

Maintenance

It is recommended to have a scheduled maintenance window, however, Web App servers can be gracefully taken offline without affecting users. This would require removing the selected server from the load balancing mechanism. For DNS load balancing, that would mean removing the A record. For network load balancers that would mean removing the selected server from the upstream list of servers and applying the configuration change, to each load balancer. After the server has been patched and rebooted, the DNS record or upstream change can be reverted. For DNS load balancing, it can take a while for users requests to stop coming into the server. The amount of time needed is determined by the TTL of the DNS record and potentially DNS architecture at the user’s site which cache DNS records for longer than the configured TTL. Therefore, if DNS load balancing is used, it is recommended to monitor the traffic going through the API server before starting maintenance activities.

Database Role

By default, Kasm uses a Postgres and Redis containers on the database role server.

Redis

The Redis data store is used by Kasm for chat in shared Kasm sessions. While Redis may used in the future for more features, at the time of this writing, Redis resource requirements are very minimal. Kasm utilizes the latest version of Redis 5.x and is compatible with Redis services such as AWS Elastic Cache with Redis compatibility. Since Redis is not used for storage, smaller instance types such as cache.t4g.small can be used, which have 2 vCPUs and 1.37GB of RAM. The default installation of Kasm utilizes a simple single node containerized Redis deployment. Administrators can use their own Redis deployment. See the Redis documentation for guidance on sizing, redundancy, and maintenance.

PostgreSQL

Kasm utilizes the latest version of Postgres 14.x and uses a simple containerized version in the default installation. Larger deployments may choose to use an external PostgreSQL database. The minimum requirements for PostgreSQL in general are minimal, however, we recommend at least 2 CPUs and 4 GB of RAM. These minimum requirements will handle deployments into the hundreds of concurrent users. The database server can be vertically scaled to increase performance of larger deployments.

Disk space requirements are more complicated. Kasm collects logs from all components and by default keeps debug logs for 4 hours and all other logs 7 days. These logs are ingested into the database and used by the dashboard and logging panels of the UI. For larger installations it is highly recommended to configure Kasm to forward logs to an Enterprise class SIEM solution, such as Splunk and set the Kasm log retention period to 0, effectively disabling database logs. With the default log retention settings, the database will use between 250 to 550MB for a single server install. Deployments with hundreds of concurrent users can easily use several tens of gigabytes of storage with default log retention settings. Smaller deployments with less than around 50 concurrent sessions can rely on Kasm’s built-in logging mechanisms, with a healthy database volume of around 150GB on the higher end of the spectrum. Deployments larger than this should use an external SIEM solution and set log retention to 0. If disabling the built in logging is not an option, monthly database maintenance should be performed to ensure AUTO VACUUMs are being performed and the space is being released back to the operating system.

Postgres Compatible Databases

Kasm supports using AWS RDS and AWS Aurora Postgres database. Other Postgres compatible solutions, such as CockroachDB may work but are not officially supported by Kasm Technologies. Solutions such as AWS RDS provide added benefits of ease of maintenance, scalability, redundancy/failover, and automated backups.

Redundancy and Backups

Due to the criticality of the database to Kasm operations, it is important to carefully consider database redundancy and backups. At a minimum, regular backups of the database should be performed. See the Kasm documentation on performing backups and restorations of the database. The simplest form of redundancy is automated scheduled backups using a cron job, which are transferred to a remote location (NFS/S3,etc) and are able to be restored on a standby server in case of a failure of the primary database. For a solution that provides high availability, see the PostgreSQL documentation for HA deployments.

Sizing

The following sizing recommendation is for sizing a the system backing the Postgres database. The standard Kasm deployment uses a single database server, however, RDS and other Postgres compatible PaaS deployments are supported.

Server Specifications

Concurrent Sessions

4 CPUs, 4GB RAM, 80 GB HD

50 to 200

4 CPUs, 8GB RAM, 100 GB HD

200 to 500

8 CPUs, 8GB RAM, 150 GB HD

500 to 1000

Agent Role

The agent servers are responsible for hosting containers for user desktops/apps. These servers have high CPU, RAM, and disk requirements that are heavily dependent on the specific use case.

Redundancy

User sessions are automatically spread to available agents. When an agent goes offline or does not respond, new sessions are directed to healthy agents within the same zone that meet the requirements needed to fullfil the requested Workspace. Existing sessions may be interrupted until the agent is brought back online.

Sizing

Concurrent sessions are used for deployment size recommendations, so while you may be using a named user license, you still need to understand your concurrent session requirements in order to size your deployment appropriately. Agents only support container based Workspace sessions, so ensure you are sizing only for expected container based Workspaces sessions. The number of user sessions each agent can run depends on the Workspaces being deployed on that agent and the Agent Settings. The Agent Settings allow the administrator to override how many CPUs and RAM that Kasm uses to calculate how many sessions an agent can take. This allows the administrator to oversubscribe the agents. Each Workspace definition can be configured with a different CPU count and RAM amount. The below table assumes each Workspace running on the agent has the identical specifications configured.

Workspace Specs

Agent Size

Agent Overrides

Concurrent Sessions

2 CPUs, 4 GB RAM

16 CPUs, 64GB RAM

N/A

8

4 CPUs, 4 GB RAM

16 CPUs, 64GB RAM

96 CPUs, 80GB RAM

20

4 CPUs, 4 GB RAM

32 CPUs, 128GB RAM

192 CPUs, 192GB RAM

48

Note

For container workspaces that will have session recording enabled, one additional CPU should be assigned to the workspace to handle the session recording encoding workload.

In the first example, there is no agent override configured. For each session on the agent, there are 2 CPUs subtracted from the available resources. Since the agent physically has 16 CPUs, that means only 8 sessions could be established. This is not optimal because system still has more RAM available, and could potentially handle twice as many sessions if the CPUs were overridden to 32. The amount of overriding the CPU and RAM is dependent on how large the agent is and what the use case is. For game streaming use case, users will be using most of their resources at all times. For standard use cases, however, overriding is generally safe but requires the administrator monitor the environment and understand the baseline usage. The subsequent two rows show two more scenarios of overriding. The more sessions a single system can take, generally the more you can override, because there are more sessions on the system, each of which has RAM and CPU not being fully utilized. The final example is very aggressive and is achievable depending on the use case. It is critical that actual RAM utilization on the agents is never allowed to get near full capacity. Getting the right override requires careful monitoring of your deployment in real world usage. Use the Kasm Workspaces Admin dashboard to view RAM utilization over time, on each agent.

Disk Space

The amount of disk space required will depend on the number of concurrent users the agent is expected to host and how much data each user is expected to accumulate during their session. When a new session is created, nearly no additional space is required on the agent to provision the session. As the user adds files or uses applications like Chrome, changes to the containers file system will accumulate space. The amount of disk space that each user session will consume can vary greatly based on your use case. In addition, when session recording is enabled, each session running on the agent where the user is part of a group with session recording enabled will need additional space to store the recording while it is being encoded and uploaded to S3 buckets.

It is recommended to use a separate volume for /var/lib/docker that uses the XFS file system. The /var/lib/docker volume should have at least 80GB of disk space as a baseline plus the number of users times the amount of space expected per user (80GB + (Users * space_per_user)). The following table provides the calculations for several scenarios. If persistent profiles are enabled and use a remote file system such as NFS, the persistent profile does not reside on the agent and thus does not count when figuring out the expected size per user. You can utilize the command sudo docker system df -v to see the amount of disk space utilized by each container.

Base Size

Concurrent Sessions

Max Size Per User

Total Volume Size

80GB

8

5GB

120 GB

80GB

20

10GB

280 GB

80GB

48

12GB

656 GB

Limiting Disk Usage

It is possible to place size and speed restrictions on disk usage of containers. See the Docker Reference Docs for more details. If using the default overlay2 storage driver, the backing file system must be XFS for the size and speed restrictions to work. The following JSON can be placed in the Workspace definition in the Docker Run Override field. With the example in place, the user’s session would be capped at 10GB of disk space. This does not include the base desktop environment or installed applications, only changes the user invokes within their environment.

Restricting space used by a user’s session.

{"storage_opt":{"size":"10g"}}

Restricting Disk I/O with read/write bps.

{"device_read_bps":[{"Path":"/dev/vda","Rate":1000000}]}

Restricting Disk I/) with relative weights.

{"blkio_weight":200}

CPU Allocation Method

Workspaces supports provisioning session containers with one of two methods, Quotas (--cpus) or Shares (--cpu-shares).

See Docker Resource Contraints for more details on how Docker utilizes these flags.

The default method is Shares and is governed by the Global Setting Default CPU Allocation Method. The allocation method can also be updated on the Workspace configuration by changing the CPU Allocation Method in the Workspace Settings. By default, the Workspace setting is configured to Inherit, which means to use the Global Setting.

Shares

Note

CPU and Cores are used interchangably is this section. Ultimately, what is being referenced is the number of Logical Processors that are presented to the system. This will vary depending on the physical processor, such as those that are multi-core or support hyper-threading. It may also vary depending on the operating enviroment (e.g Virtual Machines / Cloud Environments).

When the Shares CPU Allocation Method is used, session containers are provisioned with the Docker equivalent of --cpu-shares=<workspace.cores * 1024>. For example, if the Cores setting on the Workspace is set to 2, the container would be provisioned with --cpu-shares=2048.

When a container utilizes shares, the amount of CPU resources the container can use is weighted against other containers and their share value. Most notably, the container is only throttled if there is CPU contention. If there is no contention, the container can use as much as it needs.

For example, on an 8 CPU machine, if the Cores Workspace setting is configured at 4, this will result in the container created with --cpu-shares=4096.

  • If no CPU contention exists, the container can use all 8 CPUs.

  • If there are 2 containers both with --cpu-shares=4096 each with be able to us the full CPU resources if no contention exists.

  • If contention exists, each container will be allowed up to 50% of the CPU resources because their shares (weights) are equal.

The Shares method is useful for maximizing the usage of CPU resources, as all containers can use as much as needed when there is no contention. For bursty workloads, this will likely result in a better overall user experience when compared to the Quotas method. However, user experience may not be as consistent depending on the CPU activity of other containers.

Quotas

When the Quotas CPU Allocation Method is used, session containers get provisioned with the Docker equivalent of --cpus=X. The value used is based on the Cores setting defined on the Workspace. This sets a ceiling for the amount of CPU resources the container can use. For example, on an 8 CPU system with the Workspace configured at 2 cores, the container will only be allowed to use up to 25% of the CPUs.

This strategy may be more helpful if the desire is to provide a more consistent performance profile. It may also be helpful if Kasm is running on systems with additional applications and utilizing all available CPU resources at times is not appropriate.

Cloud Auto-Scaling Sizing

Kasm can automatically scale agents in a number of cloud service providers. The instance size of the VM to use for auto scaled agents is configured in the VM Provider Config. Administrators can choose to use an instance size that would allow a single Kasm session to be provisioned or an instance size that would allow for many instances to be provisioned per agent. Using larger instance sizes allows for CPU and RAM oversubscribing, however, using smaller instance sizes allows for resources to be released faster as user sessions end. Administrators will need to monitor the use of the system and select a strategy that maximizes cost savings and performance for there specific use-case. For example, Kasm Technologies currently uses an instance size that can accommodate two sessions per instance for the personal SaaS product.

Redundancy

When a user creates a new session, the manager API service will select an agent and attempt to provision the container there. If the provision failed, the manger will move on to the next available agent. This ensures redundancy for creating new sessions. The manager can only use agents that have the Docker image and Docker network (if specified in the Workspace settings) available and are assigned to the correct zone. If a central container image registry is used, agents will automatically pull images down. Agents will only pull images that can be provisioned on that agent. If a Kasm Workspace is defined that is assigned a specific Docker network and that network does not exist on an agent, the agent will not pull that specific Docker image. It is recommended to reduce differences between agents within a Zone and treat all agents in a Zone as a cluster of identically configured servers. This ensures Kasm will provision new user containers evenly across the cluster of agents.

For cloud deployments that auto scale, capacity is managed by Kasm. For deployments with static agents, however, capacity planning is needed. For large deployments with static agents, it is recommended to keep the number of agents at N+2. In other words, ensure the number of agents is enough to handle peek capacity if you were to loose 2 agents. This allows for both maintenance and for the loss of 1 agent. If an agent were to go offline or be disabled by an administrator, Kasm will automatically send new sessions to the remaining agents.

For existing sessions, it is not possible to provide redundancy if the system they are on goes down. However, there are additional resiliencies built in. The agent service can go down and user containers will continue to operate un-interrupted. Similarly, the agent can be disabled in the Kasm admin UI and existing sessions will continue to operate.

Maintenance

For systems in the cloud with auto scaling, the AMI ID defined in the VM Provider Config can be updated to start using an updated AMI. It is recommended to have a testing zone or a testing Kasm deployment to test updating to a new AMI before applying to production.

For deployments with static agents it is recommended to keep 1 agent disabled in rotation, for system patching. For example, if the deployment had 6 agents, the administrator would disable one, wait for all user sessions to close on that agent, then perform system patching and restart the agent. Once the agent was back up and ready, the admin would enable the agent in the Kasm UI. This process would be repeated on the next agent. This allows for administrators to keep Kasm agents updated continuously, without the need for scheduled downtime. This is the reason that N+2 redundancy for agents is recommended for larger deployments.

Connection Proxy

The connection proxy service is a customized Guacamole connection handler. This encodes and proxies standard RDP, VNC, and SSH connections to a websocket format usable by a modern web browser. Each Zone in Kasm requires at least one connection proxy, if the zone will have servers with RDP/VNC/SSH connections.

Redundancy and Load Balancing

Kasm will automatically load balance user sessions across multiple connection proxies and it is recommended to have at least two for redundancy. Kasm checks that a connection proxy is actively working prior to directing a user session to one. This ensures automatic failover while also load balancing sessions across any number of servers per Zone.

Maintenance

When connection proxies are taken offline, either by a reboot or by shutting down the service, Kasm will stop sending sessions through the down server. This is done automatically without admin intervention, however, any user sessions currently flowing through the connection proxy will be disconnected. The clients will automatically reconnect to their session and flow through a different connection proxy. It is recommended to perform maintenance on connection proxy servers during a scheduled maintenance window, however, it is not strictly required. It is important to test your maintenance procedures in your own environment in order to understand the operational impact to users.

Sizing

The required size of the connection proxy will vary highly on the exact user usage of systems. For example, if all users are watching full screen videos within their remote Windows desktop, the connection proxy will consume approximately 1.25 CPU cores per user, without session recording enabled. Typically, users are not all watching full screen videos or playing video games all at the same time and in fact you may find that 60% of your users are not actively engaged with a session at any given time. Users may be reading an email or document, with little to no interaction. The larger your deployment, the more slack you have to play the law of averages. Some Kasm deployments with several hundred users easily accommodate all users with two servers with 16 cores each. These deployments are business customers, where users are typically interacting with documents and emails. Your mileage will vary, it is important to monitor your own deployment’s resource utilization to understand your user base’s real world usage and adjust accordingly. Memory requirements for the connection proxy are modest with 4 GB of RAM being a minimum.

With session recording enabled, additional resources are required. Sessions are initially recorded in a raw format that consumes a lot of local disk space. The video segments are then encoded in a compressed video format and uploaded. The Session Recording Queue Length global setting defines how many video clips are processed concurrently. This setting should be increased when scaling up the number of concurrent sessions each connection proxy is sized to handle. It is recommended to set the Session Recording Queue Length to half that of the number of cores a system has. So if a system has 16 cores, set the Session Recording Queue Length to 8. This setting will ensure half the resources of a single server are dedicated to recording and reserves the other 8 cores for session streaming. Session recording is enabled as a Group Setting, therefore, session recording may not be enabled on all user sessions, depending on your configuration. Monitor the real world usage of your environment and adjust as needed.

Warning

All connection proxies should be identical in size, this is even more important when session recording is enabled.

Disk usage is minimal, without session recording. With session recording enabled, you should plan for at least 1GB or disk space per concurrent session. Additionally, disk I/O can become a bottleneck with higher capacities. For cloud based deployments, check your cloud provider documentation on how to increase disk I/O if needed.

Example Specifications

Note

The following table provides a starting point, each environment will have different requirements based on several real world factors, such as actual user desktop activity, CPU architecture and class, cloud provider, and more.

Deployment

Server Specifications

Concurrent Sessions

Without Session Recording

8 CPUs, 4GB RAM, 80 GB HD

32

With Session Recording

16 CPUs, 8GB RAM, 112 GB HD

32