Installing Sherpa AIServer (CPU-only, no GPU)#
This guide describes installing and running the stack when the language model runs on CPU (no NVIDIA GPU). The preparation steps are the same as the general installation but differ in the set of Docker images, compose file, and startup profile.
Suitable for#
- A server without an NVIDIA GPU or where GPU is not required for LLMs.
- Accepting significantly slower LLM response times compared to GPU.
Important: Services like Whisper and BGE Reranker in docker-compose.yml are designed for GPU. For CPU-only scenarios do not enable whisper, reranker, or full profiles — use the cpu profile instead (and decide separately about embed/pg which do not require NVIDIA).
What to download (same as regular installation, replacing the LLM image)#
Download the client archive, models, and other images.
Differences from GPU installation#
| Component | GPU installation | CPU installation |
|---|---|---|
| LLM image | aiserver-llm-server |
aiserver-llm-server-cpu |
| Compose launch | docker-compose.yml without cpu profile |
docker-compose.yml with cpu profile |
Everything else follows the general preparation list if you don't use optional GPU services.
- Client archive:
client-files/latest - Images:
aiserver,aiserver-pg,aiserver-embed,aiserver-code-interpreter,aiserver-nginx,aiserver-websocket, optionallyaiserver-whisperandaiserver-bge-reranker(only if you plan GPU profiles) - Models:
embed_model_store.tar.gz(required for embed), and one LLM model of your choice
Direct links for the CPU-LLM image#
Download the aiserver-llm-server-cpu image instead of aiserver-llm-server (or additionally if you want both).
The aiserver-llm-server image is optional for CPU scenarios — you may skip it to save space and bandwidth.
Unpack and load images into Docker#
- Unpack the client archive and prepare scripts.
- Load images via
sudo ./sh_scripts/load_all_docker_images.sh. - If there is an
aiserver-llm-server-cpu_*.tar.gzarchive and the loader does not pick it up, import manually:
docker load --input aiserver-llm-server-cpu_*.tar.gz
- Run
extract_models.sh,extract_vllm.shand configure.envand certificates as in the installation.
Configure .env for CPU-LLM#
The main aiserver service connects to the LLM via LLM_MODEL_API_BASE_URL / LLM_API_HOST and LLM_API_PORT. For the aiserver-llm-server-cpu container the API listens on port 8000 inside the Docker network (host mapping in docker-compose.yml is 3007).
Set:
LLM_API_HOST=aiserver-llm-server-cpu
LLM_API_PORT=8000
LLM_MODEL_API_BASE_URL=http://aiserver-llm-server-cpu:8000
Start with docker-compose.yml and cpu profile#
In the installation directory use the provided docker-compose.yml and explicitly enable the cpu profile so the aiserver-llm-server-cpu service starts and GPU profiles (gpu, gpu2) are not activated.
Basic start (CPU LLM only)#
docker compose -f docker-compose.yml --profile cpu up -d
Stop#
docker compose -f docker-compose.yml --profile cpu down
Verify containers#
docker compose -f docker-compose.yml --profile cpu ps
The aiserver-llm-server-cpu container should be Up.
Optional host check for LLM#
The compose file maps port 8000 to 3007 on the host. Verify the model or health endpoint:
curl -sS "http://127.0.0.1:3007/v1/models" || curl -sS "http://127.0.0.1:3007/health"
Check logs if needed: docker compose -f docker-compose.yml logs -f aiserver-llm-server-cpu.
Quick checklist#
- Download artifacts as for a normal installation, but use the
aiserver-llm-server-cpuimage instead ofaiserver-llm-server. - Unpack client files, load images, extract models.
- Set
.env:LLM_MODEL_API_BASE_URL=http://aiserver-llm-server-cpu:8000(and matchingLLM_API_HOST/LLM_API_PORT). - Start:
docker compose -f docker-compose.yml --profile cpu up -d. - Do not enable
whisper/reranker/fullif no GPU is available.
For CPU-only installs the NVIDIA/Container Toolkit sections can be skipped unless you run GPU containers.