Large language models and chat assistants
Interactive LLM products usually combine inference servers, request routing, caching, retrieval systems, logging, and safety layers. Linux-based server environments are common because accelerator drivers, container images, and orchestration tools are mature there.
- Typical compute: GPU or specialized accelerator inference fleets.
- Supporting systems: vector search, queues, observability, policy filters.