Data flow per deployment model – Solver Hubs

Purpose and scope

This document provides an in-depth overview of data flows, deployment options, and privacy boundaries for the AI Agent Platform. It addresses client concerns about where data is processed and stored—on-premises or in the cloud—and what data, if any, leaves the client’s environment. The Platform comprises:

Chat API: A WebSocket interface for real-time user–assistant messaging.
CRUD Conversation API: Endpoints for managing conversation state (user messages, tool inputs/outputs).
Agent, Skill, and Tool APIs: Interfaces for defining chatbot agents and integrations, with metadata in MongoDB.
Retrieval API: Hybrid keyword + vector search powered by OpenSearch.
Data Processing Pipeline: Jobs that ingest client data (documents, items, or other structured sources) via crawling, storage push, or system pulls, then encode inputs into vector embeddings using Hugging Face models.
Model API: LLM inference endpoint, runnable on-premises or as a cloud-hosted service.

All components run in Kubernetes and utilize OpenSearch for search indices and MongoDB for logs and metadata. Both OpenSearch and MongoDB can be deployed on-premises or in the cloud.

Data flow

1. Fully on-premises deployment

Every component and data store operates within the client’s infrastructure, ensuring zero external egress.

Chat & conversation management
- Users connect via WebSocket to the Chat API over TLS.
- Conversation events (user messages, tool inputs, tool outputs) stream through the CRUD Conversation API and persist in MongoDB.
- Chat API forwards context to the on-prem Model API for inference.
- Model API, running on local GPUs (≥2 H100), returns responses to the Chat API.
- Chat API relays reply to users and updates MongoDB logs.
Data processing pipeline
- On-prem jobs ingest client data via crawling, storage pushes, or system pulls (e.g., SharePoint).
- HuggingFace encoder models generate embeddings.
- Embeddings index into the on-prem OpenSearch cluster.
Retrieval & search
- Retrieval API queries the on-prem OpenSearch indices, combining keyword and vector search.

Data residency: All data—messages, tool I/O, embeddings, and metadata—remains within the client’s network.

2. Hybrid deployment: cloud inference only

LLM inference is offloaded to the cloud; all other services stay on-premises.

Inference workflow
- Chat API checks GPU availability. If below threshold (<2 H100), it encrypts the conversation payload and sends it to the Cloud Model API over TLS.
- Cloud Model API performs stateless inference and returns the assistant’s reply.
- Chat API integrates the reply with MongoDB logs and delivers it to the user.

Data transmitted: Only the encrypted inference payload (conversation context and tool parameters). No MongoDB or OpenSearch records leave.

3. Fully managed cloud deployment

All Platform components and dependencies run in the client’s cloud account as managed services.

Service deployment
- Chat & CRUD Conversation APIs: Containerized services behind a load balancer.
- Agent, Skill, Tool & Retrieval APIs: Connected to MongoDB Atlas and OpenSearch Service.
- Model API: Serverless or container-based inference endpoints.
- Data Processing Pipeline: Cloud-native workflows ingest from cloud storage, encode inputs, and index embeddings.
Data flow & storage
- User events and tool outputs flow through cloud-hosted APIs into MongoDB Atlas.
- Data Processing Pipeline ingests client data from cloud storage, generates embeddings, and indexes in OpenSearch Service.
- Retrieval API queries the managed OpenSearch cluster.

Data Residency: All client data resides in the cloud under the client’s account with provider-enforced controls.

Summary of data egress

Deployment Model	Data Egress
Fully On-Premises	None
Hybrid (Cloud Inference Only)	Encrypted inference payload (conversation + tool data)
Fully Managed Cloud	All data flows within cloud environment

Appendix: Acronyms

CRUD: Create, Read, Update, Delete
GPU: Graphics Processing Unit
TLS: Transport Layer Security
mTLS: Mutual TLS
RBAC: Role-Based Access Control