Model Management

Neural Inverse includes a built-in model management layer so you never have to leave the IDE to get a model running. From the Agent Manager (Cmd+Alt+A), you can install base models locally in one click, deploy private GPU endpoints to AWS or Azure, and see the live status of every model available in your environment.

Quick Install (Simple Mode)

The Models tab opens in Simple mode by default. It shows a curated grid of base models organized by category — Code, Chat, Reasoning, and Multimodal.

Each card shows:

Model name and originating org (e.g. Meta, Mistral AI, Google)
Parameter count badge (7B, 13B, 70B, etc.)
One-click Install button that uses Ollama under the hood
Live download progress bar while pulling

How it works

Simple mode requires Ollama to be running locally. Neural Inverse detects it automatically. If Ollama is not running, a status pill at the top of the screen shows its state and links to the install page.

When you click Install, Neural Inverse calls the Ollama pull API (POST /api/pull) and streams the download progress directly into the card UI. No terminal window required.

Once installed, the model is auto-detected and registered in your Ollama provider settings — no manual configuration needed.

Curated models

Category	Models available
Code	Qwen2.5-Coder 7B/14B/32B, DeepSeek-Coder V2, CodeLlama 13B/34B, Codestral
Chat	Llama 3.3 70B, Llama 3.2 3B, Phi-4 14B, Gemma 3 12B, Mistral 7B
Reasoning	DeepSeek-R1 7B/14B/32B, QwQ 32B
Multimodal	LLaVA 13B, Llava-Phi-3, BakLLaVA

Switching to Advanced mode

Click Advanced / Marketplace in the top-right of the Models tab to open the full model marketplace with search, filters, provider selection, and cloud deployment options.

Model Marketplace (Advanced Mode)

The marketplace gives you access to the full model catalog across all providers. Use it when the curated list doesn't have what you need.

Sidebar controls:

Search box with live filtering
Provider filter chips (Ollama, vLLM, HuggingFace, Cloud)
Category filters (Code, Chat, Vision, Embedding, etc.)

Model detail pane:

Full description, parameter counts, quantization options
Provider compatibility
Install / Deploy button

From the marketplace you can also trigger cloud deployments directly — select a model, choose Deploy to Cloud, and the wizard opens.

Cloud Deployment

Neural Inverse can provision a private GPU instance on AWS or Azure running vLLM — fully within your own cloud account. Your data never touches Neural Inverse infrastructure.

Prerequisites

Before deploying you need stored cloud credentials:

Open Agent Manager → Settings → Cloud Credentials
Select AWS or Azure and enter your credentials
Neural Inverse validates the format and tests connectivity before storing

Credentials are encrypted via the OS secret store (ISecretStorageService) — never written to disk in plaintext.

Deployment wizard

The cloud deploy wizard walks through:

Step	What happens
Instance selection	Pick GPU type, vCPUs, RAM, storage. Cost/hour shown per option.
Region	Choose deployment region. Region format is validated to prevent misconfiguration.
Model	Confirm the model to serve.
Deploy	Neural Inverse runs the provisioning sequence and streams live timeline logs.

The provisioning sequence runs in a dedicated terminal:

Creates a security group with IP-restricted inbound rules (your IP only)
Launches the instance with IMDSv2 enforced
Installs CUDA, Python, and vLLM via cloud-init
Starts vLLM as a systemd service with auto-restart and a generated API key
Polls the health endpoint until it responds (up to 15 minutes, with retry counter)

You can Abort at any time during provisioning. The wizard shows elapsed time and the current step.

Deployment states

Status	Meaning
`provisioning`	Instance is being created
`running`	Endpoint is live and healthy
`unreachable`	Health check failed — may be a transient network issue
`stopping`	Teardown in progress
`stopped`	Instance terminated
`error`	Provisioning failed — see log for details

Security

API keys are randomly generated (32-byte hex) and stored encrypted
Security groups restrict port 8000 to your current IP at deploy time
IMDSv2 is enforced on all EC2 instances (no SSRF via metadata endpoint)
Azure tenant IDs are URI-encoded; region names are validated against an allowlist

Deployments Tab

The Deployments tab in Agent Manager gives you a live view of everything running in your model environment.

Local providers

The top section shows auto-detected local providers:

Provider	Default port	Detection
Ollama	11434	Health check every 30s
vLLM	8000	Health check every 30s
LM Studio	1234	Health check every 30s

Each row shows: status badge (Running / Stopped), endpoint URL, list of loaded models, and last-checked timestamp.

If you have already set a custom endpoint or API key, auto-config will not overwrite it
If the provider has _didFillInProviderSettings = true, auto-config skips it
A notification appears with OK, Undo, and Don't auto-configure options
Selecting "Don't auto-configure" dismisses future auto-config for that provider permanently (stored per profile)

What gets configured:

Deployment type	What is set
Local (Ollama/vLLM/LM Studio)	Available models registered; provider enabled
Cloud (vLLM endpoint)	Endpoint URL, API key, and model registered; provider enabled

Auto-config rules are stored and can be reviewed or reverted from the Deployments tab.

Enterprise

Model Management for Teams is available on the Enterprise plan. View pricing or contact sales.

Neural Inverse Enterprise adds organization-wide model management:

Private cloud model registry — deploy models once, share endpoints across your entire engineering team within your org's private cloud and network
Policy enforcement — define which models developers can use, block unapproved providers at the org level
Centralized credential management — cloud credentials stored and rotated at the org level; developers never handle raw AWS/Azure keys
Local model governance — IT can define approved local model lists; developers can install from the approved catalog only
Usage telemetry — token counts, model usage by developer, cost attribution per team or project
SSO-gated deployment access — only authorized roles can provision or terminate cloud GPU instances

Contact sales@neuralinverse.com for early access.

LLM Providers (BYOLLM)

Connect cloud providers — Anthropic, OpenAI, Bedrock, Gemini, and more.

Power Mode

Run agentic coding sessions using any configured model.

Contributing — Model Management

Architecture guide for contributors working on the model layer.

Was this page helpful?