← Back to All Products

🤖 Self-Hosted AI Platform

iNdex .ai

Your data never leaves your servers.
Your intelligence. Your infrastructure. Your rules.

Llama 3.1

CodeLlama

Mistral

Phi-3

Get Early Access See Architecture

🔒 100% LOCAL — ZERO CLOUD

llama3.1:8b

0+Models Supported

0msAvg First Token

0KContext Window

0%Data Privacy

Supported Models

Neural Network Ecosystem

Swap between frontier open-source models in real time. No API keys. No cloud dependencies.

Technical Design

Clean Architecture

Four-layer design. Blazor frontend, CQRS application layer, pure domain, and Ollama-powered infrastructure.

Presentation

Blazor Server + MudBlazor UI components and REST API endpoints

Blazor MudBlazor JWT

Application

CQRS pattern. Commands, queries, handlers and business logic via MediatR

CQRS MediatR DTOs

Domain

Entities, interfaces, pure domain logic. Zero external dependencies

Entities Interfaces .NET 8

Infrastructure

Ollama AI engine, PostgreSQL, file system. Repository implementations

Ollama PostgreSQL Docker

Data Sovereignty

Zero External Calls

Air-Gapped Operation

Runs fully offline. No internet required after installation.

Your Infrastructure

Deploy on-premise, private cloud, or bare metal GPU server.

GDPR & HIPAA Ready

Full compliance. Data never crosses organizational boundaries.

No Vendor Lock-in

Open standards. Swap models freely. Own your AI stack forever.

Benchmarks

Raw Performance

Measured on consumer-grade NVIDIA GPU. Results vary by hardware configuration.

Response Latency

0ms

avg first-token latency — local GPU

Tokens / Second

avg generation speed — 8B model

Context Window

max context tokens — extended mode

Streaming API

Real-Time Token Streaming

iNdex.ai — Streaming API Demo

# POST /api/chat/stream
curl -X POST http://localhost:5000/api/chat/stream \ -H "Content-Type: application/json" \ -d '{"model":"llama3.1:8b","prompt":"Explain neural networks"}'
↓ streaming response ...

Server-Sent Events

See each token as it's generated. No waiting for full completion. Native SSE protocol.

SSE

REST + WebSocket

Both sync and streaming endpoints. Integrate with any client in any language.

REST

Semantic Kernel

Microsoft's AI orchestration. Build agents, chains, and plugins natively in .NET.

Get Running in Minutes

One Command Deploy

Three paths to production. Docker, native .NET, or GPU-accelerated bare metal.

Docker Compose

Spin up the full stack in one command. Postgres, Ollama, and the API fully orchestrated.

$ docker-compose up -d

Native .NET

ASP.NET Core Minimal APIs. Fast, modern, cross-platform across Linux, Windows, and macOS.

$ dotnet run --project iNdex.Api

GPU Accelerated

NVIDIA CUDA via Ollama + LLamaSharp. 16GB RAM minimum. GPU optional but recommended.

$ ollama pull llama3.1:8b

Start Building

Own Your AI.
Own Your Future.

Self-hosted. Privacy-first. Enterprise-ready. iNdex.ai puts the power of large language models entirely within your control.

Request Access All Products