Sizing and Deployment Guide

This guide covers sizing requirements and deployment guidance for the Kasm Workspaces platform and assumes a distributed architecture with separate API, database, and agent components. For single server deployments, use the guidance provided in the agent sizing section.

Note

For all componenets, maximum concurrent Kasm sessions is the metric that should be used when sizing. Some use cases will have lots of total users with a small number of concurrent users while others will have more concurrent sessions than total users (more than 1 session per user). In all cases, the maximum concurrent session count should be the metric used when considering sizing requirements.

Web App Role

The web app server role is comprised of several containerized services: API, Manager, Share, and proxy. The API server is responsible for taking API calls from users and potentially external services using the developer API. The manager service is responsible for managing clusters of agents and auto scaling new agents in the cloud if configured. The web app server role is both vertically scalable (adding more CPUs and RAM) and horizontally scalable (adding more web app servers).

Redundancy

For large deployments, it is recommended to have N+1 redundancy for the web app servers. This ensures ease of maintenance, allowing administrators to bring down a server for patching, without effecting users.

Load Balancing

Load distribution to different deployed web app servers can be accomplished via DNS load balancing or using a network load balancer such as an F5 or cloud load balancer.

DNS Load Balancing

With DNS load balancing, there is an A record for the same domain name pointing to each web app servers. Health checks can be used to automatically remove the DNS entry should the corresponding web app server become unresponsive.

DNS load balancing offers a simple solution that requires no additional components between users and the backend servers. This simplifies the architecture and significantly reduces the cost of providing resilient services.

There are many services and physical devices on premise that can provide DNS load balancing with health checks. AWS Route 53 is one example. You don’t need to host Kasm in AWS to use Route 53. Route 53 can provide public DNS services for resources on-premise or even in other cloud providers.

The disadvantage to using DNS based load balancing is slow convergence on failure. Even with the most aggressive settings of10 seconds between health checks and 3 consecutive failures, will result in at least 30 seconds to recognize the failure. At this point the DNS record is removed, however, clients will not get the update until the TTL has expired. A typical TTL in AWS is 5 minutes, however, it can be set as low as 60 seconds. Therefore, the best case scenario is that on failure some clients will be down for at least 90 seconds.

Network Load Balancer

Another option is to use a network load balancer, either a physical on-premise load balancer such as an F5 or a cloud based load balancer. In either case, the load balancer is placed between the users and the web app servers and distributes the load between all the web app servers. Load balancers typically use health checks to determine if an upstream server is offline. These health checks can be passive, such as looking at the TCP sessions or HTTP return codes of traffic. Active health checks can also be used, which actively make HTTP calls to a specific path and expect a configured HTTP status code.

Network load balancers have the advantage of providing convergence within seconds after a failure. They can also try multiple upstream web app servers for a single request, meaning users typically wouldn’t notice a web app server going down. The disadvantage is that it complicates the architecture with additional components between the end users and the backend servers. The load balancers can also go down and require maintenance as well. Which usually means maintaining N+1 redundancy and using DNS load balancing to balance the traffic to the load balancers.

Network load balancers may be required for some organizations as they provide many other potential benefits: * Single entry point for all publicly served websites for the organization * Web Application Firewall (WAF) * SSL inspection * Data Loss Prevention * Anti-virus * Logging of all traffic at the enterprise level * Other advanced security features

Health Checks

Wether using DNS load balancing, network load balancers, or both, health checks are imperative to ensuring failover and alerting. Kasm has a built in health check located at /api/__healthcheck, which checks the operational status of the API server and the API server’s access to the database. If the API returns a non HTTP 200 series status code, the health check failed. AWS Route 53 can be configured with email alerts on failure, in addition to removing the associated A record. When the health check starts returning healthy, the A record will be re-enabled. For network load balancers the premise is the same, upstream servers will automatically be removed/added as their health check changes.

Sizing Requirements

The Web App role servers require modest specifications. For Enterprise deployments with user counts in the hundreds or more, it is recommended to start with at least 2 servers each with the following specifications: * 4 vCPUs * 4 GB RAM * 80 GB HD SSD

Each use case of the Kasm platform is different, so each installation with have varying mileage, however, this baseline provides a good starting point for a large deployment. Additional servers should be added as needed, always maintaining a minimum of N+1 redundancy.

Maintenance

It is recommended to have a scheduled maintenance window, however, Web App servers can be gracefully taken offline without affecting users. This would require removing the selected server from the load balancing mechanism. For DNS load balancing, that would mean removing the A record. For network load balancers that would mean removing the selected server from the upstream list of servers and applying the configuration change, to each load balancer. After the server has been patched and rebooted, the DNS record or upstream change can be reverted. For DNS load balancing, it can take a while for users requests to stop coming into the server. The amount of time needed is determined by the TTL of the DNS record and potentially DNS architecture at the user’s site which cache DNS records for longer than the configured TTL. Therefore, if DNS load balancing is used, it is recommended to monitor the traffic going through the API server before starting maintenance activities.

Database Role

By default, Kasm uses a Postgres and Redis containers on the database role server.

Redis

The Redis data store is used by Kasm for chat in shared Kasm sessions. While Redis may used in the future for more features, at the time of this writing, Redis resource requirements are very minimal. Kasm utilizes the latest version of Redis 5.x and is compatible with Redis services such as AWS Elastic Cache with Redis compatibility. Since Redis is not used for storage, smaller instance types such as cache.t4g.small can be used, which have 2 vCPUs and 1.37GB of RAM. The default installation of Kasm utilizes a simple single node containerized Redis deployment. Administrators can use their own Redis deployment. See the Redis documentation for guidance on sizing, redundancy, and maintenance.

PostgreSQL

Kasm utilizes the latest version of Postgres 12.x and uses a simple containerized version in the default installation. Larger deployments may choose to use an external PostgreSQL database. The minimum requirements for PostgreSQL in general are minimal, however, we recommend at least 2 CPUs and 4 GB of RAM. These minimum requirements will handle deployments into the hundreds of concurrent users. The database server can be vertically scaled to increase performance of larger deployments.

Disk space requirements are more complicated. Kasm collects logs from all components and by default keeps debug logs for 4 hours and all other logs 7 days. These logs are ingested into the database and used by the dashboard and logging panels of the UI. For larger installations it is highly recommended to configure Kasm to forward logs to an Enterprise class SIEM solution, such as Splunk and set the Kasm log retention period to 0, effectively disabling database logs. With the default log retention settings, the database will use between 250 to 550MB for a single server install. Deployments with hundreds of concurrent users can easily use several tens of gigabytes of storage with default log retention settings. Smaller deployments with less than around 50 concurrent sessions can rely on Kasm’s built-in logging mechanisms, with a healthy database volume of around 150GB on the higher end of the spectrum. Deployments larger than this should use an external SIEM solution and set log retention to 0. If disabling the built in logging is not an option, monthly database maintenance should be performed to ensure AUTO VACUUMs are being performed and the space is being released back to the operating system.

Postgres Compatible Databases

Kasm supports using AWS RDS and AWS Aurora Postgres database. Other Postgres compatible solutions, such as CockroachDB may work but are not officially supported by Kasm Technologies. Solutions such as AWS RDS provide added benefits of ease of maintenance, scalability, redundancy/failover, and automated backups.

Redundancy and Backups

Due to the criticality of the database to Kasm operations, it is important to carefully consider database redundancy and backups. At a minimum, regular backups of the database should be performed. See the Kasm documentation on performing backups and restorations of the database. The simplest form of redundancy is automated scheduled backups using a cron job, which are transferred to a remote location (NFS/S3,etc) and are able to be restored on a standby server in case of a failure of the primary database. For a solution that provides high availability, see the PostgreSQL documentation for HA deployments.

Agent Role

The agent servers are responsible for hosting containers for user desktops/apps. These servers have high CPU, RAM, and disk requirements that are heavily dependent on the specific use case.

Sizing

A good place to start for sizing is the default Workspace settings and potential minimum specifications for each user container. By default, Workspaces defined in Kasm have 2 CPUs and 2.77GB of RAM assigned to them. This may or may not be adequate for a given use case and should be adjusted to meet requirements of the use case. Kasm must manage the compute resources (agents) it has available when provisioning a container for a new session. The default mechanism for tracking resources is merely decrementing the Workspace’s configured CPU and RAM settings from the assigned agent’s physical cores and RAM. If the deployment had a single agent with 4 CPUs and 4 GB of RAM, provisioning a single default Workspace would result in having 2 CPUs and 1.23GB of RAM remaining to be used for more sessions. The agents can, however, be configured to override that default logic. In the agent settings you can override the physical CPUs and RAM. It is generally safe to override CPUs and the larger the agent the more you can safely override CPUs, as not all users will be using 100% of their allocated CPUs at the same time.

For agents installed on bare metal (not virtualized), a good place to start is doubling the physical number of cores in the CPU override setting. Most hypervisors and cloud service providers present hyper-threads as a single vCPU. So if the processor is an Intel Xeon based processor that supports two hyper-threads per core, each core represents two vCPUs. Intel claims that hyper-threading increases a cores throughput by 30%, however, it is highly application dependent. Since vCPUs are already over subscribed, a safe place to start is overriding the agent CPUs by 25%. Administrators will need to tune the override settings carefully over time to achieve the best balance of performance and cost savings for their specific use case and hardware capabilities.

It is not recommended to override RAM in general, however, each use case and deployment is different. Use the Agents view in Kasm to inspect how much RAM is in use on each system. The Agent view in Kasm shows how much RAM is allocated and how much RAM is actually being used. After operating the deployment for enough time, administrators will know how much RAM is typically available during normal use. Total system RAM should never be exceeded for performance reasons and each agent should have a SWAP file or partition.

CPU Allocation Method

Workspaces supports provisioning session containers with one of two methods, Quotas (--cpus) or Shares (--cpu-shares).

See Docker Resource Contraints for more details on how Docker utilizes these flags.

The default method is Shares and is governed by the Global Setting Default CPU Allocation Method. The allocation method can also be updated on the Workspace configuration by changing the CPU Allocation Method in the Workspace Settings. By default, the Workspace setting is configured to Inherit, which means to use the Global Setting.

Shares

Note

CPU and Cores are used interchangably is this section. Ultimately, what is being referenced is the number of Logical Processors that are presented to the system. This will vary depending on the physical processor, such as those that are multi-core or support hyper-threading. It may also vary depending on the operating enviroment (e.g Virtual Machines / Cloud Environments).

When the Shares CPU Allocation Method is used, session containers are provisioned with the Docker equivalent of --cpu-shares=<workspace.cores * 1024>. For example, if the Cores setting on the Workspace is set to 2, the container would be provisioned with --cpu-shares=2048.

When a container utilizes shares, the amount of CPU resources the container can use is weighted against other containers and their share value. Most notably, the container is only throttled if there is CPU contention. If there is no contention, the container can use as much as it needs.

For example, on an 8 CPU machine, if the Cores Workspace setting is configured at 4, this will result in the container created with --cpu-shares=4096.

  • If no CPU contention exists, the container can use all 8 CPUs.

  • If there are 2 containers both with --cpu-shares=4096 each with be able to us the full CPU resources if no contention exists.

  • If contention exists, each container will be allowed up to 50% of the CPU resources because their shares (weights) are equal.

The Shares method is useful for maximizing the usage of CPU resources, as all containers can use as much as needed when there is no contention. For bursty workloads, this will likely result in a better overall user experience when compared to the Quotas method. However, user experience may not be as consistent depending on the CPU activity of other containers.

Quotas

When the Quotas CPU Allocation Method is used, session containers get provisioned with the Docker equivalent of --cpus=X. The value used is based on the Cores setting defined on the Workspace. This sets a ceiling for the amount of CPU resources the container can use. For example, on an 8 CPU system with the Workspace configured at 2 cores, the container will only be allowed to use up to 25% of the CPUs.

This strategy may be more helpful if the desire is to provide a more consistent performance profile. It may also be helpful if Kasm is running on systems with additional applications and utilizing all available CPU resources at times is not appropriate.

Cloud Auto-Scaling Sizing

Kasm can automatically scale agents in a number of cloud service providers. The instance size of the VM to use for auto scaled agents is configured in the VM Provider Config. Administrators can choose to use an instance size that would allow a single Kasm session to be provisioned or an instance size that would allow for many instances to be provisioned per agent. Using larger instance sizes allows for CPU and RAM oversubscribing, however, using smaller instance sizes allows for resources to be released faster as user sessions end. Administrators will need to monitor the use of the system and select a strategy that maximizes cost savings and performance for there specific use-case. For example, Kasm Technologies currently uses an instance size that can accommodate two sessions per instance for the personal SaaS product.

Redundancy

When a user creates a new session, the manager API service will select an agent and attempt to provision the container there. If the provision failed, the manger will move on to the next available agent. This ensures redundancy for creating new sessions. The manager can only use agents that have the Docker image and Docker network (if specified in the Workspace settings) available and are assigned to the correct zone. If a central container image registry is used, agents will automatically pull images down. Agents will only pull images that can be provisioned on that agent. If a Kasm Workspace is defined that is assigned a specific Docker network and that network does not exist on an agent, the agent will not pull that specific Docker image. It is recommended to reduce differences between agents within a Zone and treat all agents in a Zone as a cluster of identically configured servers. This ensures Kasm will provision new user containers evenly across the cluster of agents.

For cloud deployments that auto scale, capacity is managed by Kasm. For deployments with static agents, however, capacity planning is needed. For large deployments with static agents, it is recommended to keep the number of agents at N+2. In other words, ensure the number of agents is enough to handle peek capacity if you were to loose 2 agents. This allows for both maintenance and for the loss of 1 agent. If an agent were to go offline or be disabled by an administrator, Kasm will automatically send new sessions to the remaining agents.

For existing sessions, it is not possible to provide redundancy if the system they are on goes down. However, there are additional resiliencies built in. The agent service can go down and user containers will continue to operate un-interrupted. Similarly, the agent can be disabled in the Kasm admin UI and existing sessions will continue to operate.

Maintenance

For systems in the cloud with auto scaling, the AMI ID defined in the VM Provider Config can be updated to start using an updated AMI. It is recommended to have a testing zone or a testing Kasm deployment to test updating to a new AMI before applying to production.

For deployments with static agents it is recommended to keep 1 agent disabled in rotation, for system patching. For example, if the deployment had 6 agents, the administrator would disable one, wait for all user sessions to close on that agent, then perform system patching and restart the agent. Once the agent was back up and ready, the admin would enable the agent in the Kasm UI. This process would be repeated on the next agent. This allows for administrators to keep Kasm agents updated continuously, without the need for scheduled downtime. This is the reason that N+2 redundancy for agents is recommended for larger deployments.