Model Management
Neural Inverse includes a built-in model management layer so you never have to leave the IDE to get a model running. From the Agent Manager (Cmd+Alt+A), you can install base models locally in one click, deploy private GPU endpoints to AWS or Azure, and see the live status of every model available in your environment.
Quick Install (Simple Mode)
The Models tab opens in Simple mode by default. It shows a curated grid of base models organized by category — Code, Chat, Reasoning, and Multimodal.
Each card shows:
- Model name and originating org (e.g.
Meta,Mistral AI,Google) - Parameter count badge (7B, 13B, 70B, etc.)
- One-click Install button that uses Ollama under the hood
- Live download progress bar while pulling
How it works
Simple mode requires Ollama to be running locally. Neural Inverse detects it automatically. If Ollama is not running, a status pill at the top of the screen shows its state and links to the install page.
When you click Install, Neural Inverse calls the Ollama pull API (POST /api/pull) and streams the download progress directly into the card UI. No terminal window required.
Once installed, the model is auto-detected and registered in your Ollama provider settings — no manual configuration needed.
Curated models
| Category | Models available |
|---|---|
| Code | Qwen2.5-Coder 7B/14B/32B, DeepSeek-Coder V2, CodeLlama 13B/34B, Codestral |
| Chat | Llama 3.3 70B, Llama 3.2 3B, Phi-4 14B, Gemma 3 12B, Mistral 7B |
| Reasoning | DeepSeek-R1 7B/14B/32B, QwQ 32B |
| Multimodal | LLaVA 13B, Llava-Phi-3, BakLLaVA |
Switching to Advanced mode
Click Advanced / Marketplace in the top-right of the Models tab to open the full model marketplace with search, filters, provider selection, and cloud deployment options.
Model Marketplace (Advanced Mode)
The marketplace gives you access to the full model catalog across all providers. Use it when the curated list doesn't have what you need.
Sidebar controls:
- Search box with live filtering
- Provider filter chips (Ollama, vLLM, HuggingFace, Cloud)
- Category filters (Code, Chat, Vision, Embedding, etc.)
Model detail pane:
- Full description, parameter counts, quantization options
- Provider compatibility
- Install / Deploy button
From the marketplace you can also trigger cloud deployments directly — select a model, choose Deploy to Cloud, and the wizard opens.
Cloud Deployment
Neural Inverse can provision a private GPU instance on AWS or Azure running vLLM — fully within your own cloud account. Your data never touches Neural Inverse infrastructure.
Prerequisites
Before deploying you need stored cloud credentials:
- Open Agent Manager → Settings → Cloud Credentials
- Select AWS or Azure and enter your credentials
- Neural Inverse validates the format and tests connectivity before storing
Credentials are encrypted via the OS secret store (ISecretStorageService) — never written to disk in plaintext.
Deployment wizard
The cloud deploy wizard walks through:
| Step | What happens |
|---|---|
| Instance selection | Pick GPU type, vCPUs, RAM, storage. Cost/hour shown per option. |
| Region | Choose deployment region. Region format is validated to prevent misconfiguration. |
| Model | Confirm the model to serve. |
| Deploy | Neural Inverse runs the provisioning sequence and streams live timeline logs. |
The provisioning sequence runs in a dedicated terminal:
- Creates a security group with IP-restricted inbound rules (your IP only)
- Launches the instance with IMDSv2 enforced
- Installs CUDA, Python, and vLLM via cloud-init
- Starts vLLM as a systemd service with auto-restart and a generated API key
- Polls the health endpoint until it responds (up to 15 minutes, with retry counter)
You can Abort at any time during provisioning. The wizard shows elapsed time and the current step.
Deployment states
| Status | Meaning |
|---|---|
provisioning | Instance is being created |
running | Endpoint is live and healthy |
unreachable | Health check failed — may be a transient network issue |
stopping | Teardown in progress |
stopped | Instance terminated |
error | Provisioning failed — see log for details |
Security
- API keys are randomly generated (32-byte hex) and stored encrypted
- Security groups restrict port 8000 to your current IP at deploy time
- IMDSv2 is enforced on all EC2 instances (no SSRF via metadata endpoint)
- Azure tenant IDs are URI-encoded; region names are validated against an allowlist
Deployments Tab
The Deployments tab in Agent Manager gives you a live view of everything running in your model environment.
Local providers
The top section shows auto-detected local providers:
| Provider | Default port | Detection |
|---|---|---|
| Ollama | 11434 | Health check every 30s |
| vLLM | 8000 | Health check every 30s |
| LM Studio | 1234 | Health check every 30s |
Each row shows: status badge (Running / Stopped), endpoint URL, list of loaded models, and last-checked timestamp.
Cloud deployments
The bottom section lists all cloud deployments from your account with their current status, GPU type, model, region, cost/hour, and quick-action buttons (Open endpoint, Stop, Delete).
Live updates
The Deployments tab listens to the DeploymentRegistryService event bus. When a local provider comes up or goes down, or when a cloud deployment changes status, the UI updates in real time without requiring a manual refresh.
Auto-Configuration
When Neural Inverse detects a new deployment (local or cloud), it can automatically configure the corresponding provider in your LLM settings — but only if the provider is currently unconfigured.
Rules:
- If you have already set a custom endpoint or API key, auto-config will not overwrite it
- If the provider has
_didFillInProviderSettings = true, auto-config skips it - A notification appears with OK, Undo, and Don't auto-configure options
- Selecting "Don't auto-configure" dismisses future auto-config for that provider permanently (stored per profile)
What gets configured:
| Deployment type | What is set |
|---|---|
| Local (Ollama/vLLM/LM Studio) | Available models registered; provider enabled |
| Cloud (vLLM endpoint) | Endpoint URL, API key, and model registered; provider enabled |
Auto-config rules are stored and can be reviewed or reverted from the Deployments tab.
Enterprise
Model Management for Teams is available on the Enterprise plan. View pricing or contact sales.
Neural Inverse Enterprise adds organization-wide model management:
- Private cloud model registry — deploy models once, share endpoints across your entire engineering team within your org's private cloud and network
- Policy enforcement — define which models developers can use, block unapproved providers at the org level
- Centralized credential management — cloud credentials stored and rotated at the org level; developers never handle raw AWS/Azure keys
- Local model governance — IT can define approved local model lists; developers can install from the approved catalog only
- Usage telemetry — token counts, model usage by developer, cost attribution per team or project
- SSO-gated deployment access — only authorized roles can provision or terminate cloud GPU instances
Contact sales@neuralinverse.com for early access.
Related
LLM Providers (BYOLLM)
Connect cloud providers — Anthropic, OpenAI, Bedrock, Gemini, and more.
Power Mode
Run agentic coding sessions using any configured model.
Contributing — Model Management
Architecture guide for contributors working on the model layer.