Hardware Requirements

Cluster-level pod sizing, node defaults, GPU, storage, and networking requirements for an on-premise, airgapped, or bring-your-own-Kubernetes (BYOK) Atolio deployment.

Atolio runs entirely inside your infrastructure. There is no Atolio-hosted SaaS layer. This page covers hardware sizing for on-premise, airgapped, and bring-your-own-Kubernetes (BYOK) deployments, where you supply the cluster and node hardware.

Atolio also supports managed deployments on AWS, Azure, and GCP, where Atolio’s Terraform provisions sized node groups for you. For per-cloud instance types, quotas, and networking detail, see the deployment guides for AWS, Azure, and GCP.

Overview

A deployment requires:

  • General compute nodes for application services and the search database.
  • At least one GPU node for the embedding service. Production requires GPU-backed embedding; CPU-only is for proof-of-concept or staging.
  • Persistent block storage for Vespa content (index) nodes.
  • S3-compatible object storage (MinIO, Ceph RGW, or any S3-compatible store) for assets such as user avatars and shared logs.
  • An ingress gateway for HTTPS. Inbound connectivity is required for connectors that receive webhook updates.

Query-time LLM inference is deployed independently of the Atolio cluster. It can run entirely on-premise for fully sovereign deployments, on an OpenAI-compatible endpoint you operate, or via an external managed provider. Sizing the inference layer is outside the scope of this page. Only embedding requires an in-cluster GPU.

The major workloads:

WorkloadRole
VespaThe core index. Scales horizontally for failover and replication.
GPU EmbedderCalculates dense vector embeddings at index and query time.
Core ServicesIngest pipeline, API, web UI, control plane.
ConnectorsOne microservice per source system; fetches documents, metadata, users, and permissions.
Apache TikaExtracts text from PDFs, Office documents, and similar files.

Three things drive sizing: active connector count, indexed corpus size, and concurrent users. The figures below are baselines. Atolio’s deployment team will check them against your expected scale before you provision.

Pod-level Resource Requirements

This is the canonical sizing view. Use it to size worker nodes in your Kubernetes or OpenShift cluster.

ServiceQuantityCPU CoresMin MemoryDiskGPU
Vespa Content2+864 GB500+ GB stateful PVC / 30 GB statelessNo
Vespa Admin326 GB30 GB stateful PVC / 60 GB statelessNo
Vespa Container2+816 GB30 GB statelessNo
Tika11632 GB20 GB statelessNo
Embedder1416 GB20 GB statelessYes
Connectors × N1+44 GB20 GB statelessNo
System Services3416 GB20 GB statelessNo

The 2 Vespa content pods and 3 Vespa admin pods must each run on physically separate machines (admin pods can be co-located with other services). The remaining services can share nodes wherever capacity exists.

Cluster Requirements

Atolio runs on any conformant Kubernetes 1.33+ cluster, including airgapped environments. OpenShift (including ROSA) is fully supported and tested; vanilla Kubernetes, Rancher/RKE2, and EKS-Anywhere style installs work as well.

BYOK fits when data-residency or compliance rules require infrastructure inside your perimeter, when the cluster is airgapped, when you already operate a hardened Kubernetes platform, or when the LLM, embedder, index, and connectors must all run on customer-owned hardware.

What Atolio provides

  • Helm charts: atolio-db (Vespa) and atolio-svc (Lumen application services).
  • Container images suitable for an internal or airgapped registry.
  • Deployment runbooks and direct support during install.

What you provide

  • A Kubernetes or OpenShift cluster sized per the Pod-level Resource Requirements, with the role split below.
  • A CSI block storage class for Vespa PVCs.
  • An S3-compatible object store (MinIO, Ceph RGW, or any S3-compatible store).
  • A reachable container registry plus credentials. For airgapped installs, your internal mirror.
  • An ingress controller and TLS certificate (or cert-manager).
  • An OIDC provider for SSO. Atolio supports Okta, Microsoft Entra ID, Google, and Keycloak.

Node role split

RoleMin nodesPer-node targetNotes
Vespa Content28 vCPU / 64 GB RAM / 500 GB SSDPhysically separate machines. Stateful; PVCs must survive node loss.
Vespa Admin32 vCPU / 8 GB RAMPhysically separate machines. Hosts the Vespa configserver (~30 GB on the OS disk).
Vespa Container1+8 vCPU / 16 GB RAMStateless query/feed dispatch. Scale horizontally with query load.
Lumen Services2+4 vCPU / 16 GB RAMStateless. Hosts core services, Tika, and connector pods.
Embedding Services1+4 vCPU / 16+ GB RAM + 1× GPU (16 GB+)x86_64. NVIDIA T4, A4000, A4500, A10, A10G, or L4 with 16–24 GB VRAM.

x86_64 is the typical on-prem architecture. ARM64 images are also published and supported.

Airgapped deployments

In airgapped mode, all components run inside your perimeter, with no outbound traffic at runtime. Atolio’s deployment team can supply an offline image bundle and matching runbook on request.

Storage

VolumeDefault SizeNotes
Vespa content data disk (per node)500 GBIndexed documents and embedding vectors. Raise for large corpora or higher-dimensional models.
Vespa content / container node OS disk30 GBOS and local Vespa logs.
Vespa admin node OS disk60 GBAccommodates the Vespa configserver (up to ~30 GB).
Lumen services / embedding node OS disk20 GBStandard OS disk.
Object storage bucketn/aTwo buckets per deployment (shared and private). Any S3-compatible store. No fixed quota.

Encryption-at-rest is provided by your CSI driver / storage class. Mark Vespa content and admin StorageClasses and PVCs as Retain so their data survives chart redeploys.

GPU Requirements

The embedder is the only in-cluster service that requires a GPU. It generates dense vector embeddings for indexed content; running on GPU keeps ingest throughput up during large connector backfills.

  • GPU: NVIDIA T4 with 16 GB VRAM is the baseline. Higher-end accelerators (A4000, A4500, A10, A10G, L4) with 16–24 GB VRAM are also supported.
  • Driver: install the NVIDIA GPU operator (or the OpenShift NVIDIA GPU operator) so GPU nodes expose the nvidia.com/gpu resource.
  • Query-time inference: hosted independently (on-premise or external). See Overview.

For proof-of-concept or staging clusters where CPU-only embedding is acceptable, set embedder.useGPU=false in the Lumen Helm values and omit the GPU node pool.

Networking

Atolio runs in a single VPC/VNet/VLAN with public and private subnets across multiple failure domains.

  • CIDR: any non-overlapping /20+ range works.
  • Subnets: at least /22 (1,024 IPs) per private subnet to avoid IPv4 exhaustion as pod counts grow.
  • Failure domains: at least three, for HA across racks, hypervisors, or zones.
  • Ingress: any ingress controller works (NGINX, HAProxy, OpenShift Router, F5 BIG-IP). Inbound HTTPS is required for webhook-driven connectors.
  • Egress: outbound HTTPS to source-system APIs, the LLM endpoint, and the container registry. Airgapped deployments host the LLM and registry internally.

Scaling Guidance

Sizing increases with:

  • Document count and size: more or larger documents (e.g. PDFs, design files) → more Vespa content nodes or higher per-node memory.
  • Embedding dimensionality: higher-dimensional models pressure Vespa content memory. Plan on 128 GB+ per content node for high-dimensional models.
  • Concurrent users / query rate: scale Vespa container nodes and Lumen services horizontally.
  • Connector throughput: heavy connectors (Slack, GitHub at scale, large Drive estates) increase ingest CPU and GPU embedding throughput.
  • HA posture: for stricter SLAs, run 3+ Vespa content nodes and 3+ Lumen services pods.

For larger deployments, Atolio’s deployment team can run a pre-deployment content scan against an expected connector to estimate document inventory and recommend node count, memory tier, and disk size before provisioning.

Customizing the Defaults

Every node count and disk size is exposed as a Helm value in the Vespa and Lumen charts. Commonly tuned settings cover:

  • Lumen services replica count and resource requests.
  • Vespa content node count and content disk size.
  • Vespa container node count.
  • Embedding services replica count, GPU toggle, and resource requests.
  • OpenShift-specific knobs: security context constraints, image pull secrets, storage class overrides.

If you need a configuration outside the documented defaults (for example, an internal procurement standard, a specific Kubernetes distribution, or an unusual storage driver), share the target hardware and platform with your Atolio support team so they can confirm compatibility: architecture, base image, GPU drivers, and CSI driver.