Migrate from Helicone to Neural Inverse

Helicone has moved into maintenance mode following its acquisition by Mintlify. This guide covers how to migrate your prompt management and observability/tracing setup from Helicone to Neural Inverse.

Migrating Prompt Management

Helicone offers two approaches to prompt management: a gateway approach where you pass a prompt_id and the gateway compiles the template server-side in a single API call, and an SDK approach where you fetch and compile prompts client-side via the @helicone/helpers package. Neural Inverse uses an SDK-based model similar to Helicone's SDK approach: prompts are fetched and compiled in your application code, with built-in caching to keep latency low and ensure guaranteed availability.

If you used Helicone's gateway approach, the main change is moving prompt resolution into your application code. If you already used Helicone's SDK approach (HeliconePromptManager → getPromptBody), the migration is more straightforward — you're replacing one SDK fetch+compile pattern with another.

1. Export Prompts from Helicone

Use the Helicone Prompt API to export your existing prompts:

List all prompts via POST /v1/prompt-2025/query to get prompt IDs.
List versions for each prompt via POST /v1/prompt-2025/query/versions.
Fetch the full body for each version via GET /v1/prompt-2025/{promptVersionId}/prompt-body.
Record environment assignments via POST /v1/prompt-2025/query/environment-version to know which version is deployed where (production, staging, etc.).

2. Convert Variable Syntax

Helicone uses typed variables ({{hc:name:type}}), while Neural Inverse uses plain {{variable}} placeholders compiled at runtime via .compile().

Helicone	Neural Inverse
`{{hc:customer_name:string}}`	`{{customer_name}}`
`{{hc:is_premium:boolean}}`	`{{is_premium}}`

If you relied on Helicone's type validation, move that logic into your application code before calling .compile().

3. Map Prompt Bodies to Neural Inverse

Helicone stores the full LLM request shape (model, messages, temperature, tools, etc.) in a single prompt body. In Neural Inverse, split this into two parts:

Prompt content (type chat): the messages array with converted variable syntax. See prompt data model.
Prompt config (JSON): model parameters (model, temperature, max_tokens) and tool definitions (tools, tool_choice, response_format). See prompt config.

4. Recreate Prompts in Neural Inverse

Create prompts in Neural Inverse via the SDK or API, setting:

Prompt name: maps to Helicone's prompt_id.
Prompt type: chat (since Helicone stores chat messages).
Labels: map Helicone environments (production/staging) to Neural Inverse labels. For example, the Helicone version assigned to "production" gets the production label in Neural Inverse.
Config JSON: include model parameters and tool definitions.

5. Migrate Prompt Partials and Composition

Helicone prompt partials ({{hcp:prompt_id:index:environment}}) pull messages from other prompts. Neural Inverse offers two alternatives:

Shared system instructions: create a Neural Inverse text prompt for the shared snippet and reference it via prompt composability.
Multi-message fragments: fetch both prompts in code, compile each, and merge message arrays — or use message placeholders to insert messages at specific positions at runtime.

6. Update Application Code

Replace Helicone's prompt integration with Neural Inverse's fetch + compile flow.

If you used Helicone's gateway approach (prompt_id + inputs in the API call):

# Before (Helicone gateway)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    prompt_id="customer_support",
    inputs={"customer_name": "Alice", "issue_type": "billing"}
)

# After (Neural Inverse)
from langfuse import Neural Inverse

langfuse = Neural Inverse()
prompt = langfuse.get_prompt("customer_support", label="production", type="chat")
compiled_messages = prompt.compile(customer_name="Alice", issue_type="billing")

response = client.chat.completions.create(
    model=prompt.config.get("model", "gpt-4o-mini"),
    messages=compiled_messages
)

If you used Helicone's SDK approach (@helicone/helpers / HeliconePromptManager):

# Before (Helicone SDK)
# body = prompt_manager.get_prompt_body(prompt_id="customer_support", inputs={...})
# response = client.chat.completions.create(**body)

# After (Neural Inverse) — same pattern, different SDK
from langfuse import Neural Inverse

langfuse = Neural Inverse()
prompt = langfuse.get_prompt("customer_support", label="production", type="chat")
compiled_messages = prompt.compile(customer_name="Alice", issue_type="billing")

response = client.chat.completions.create(
    model=prompt.config.get("model", "gpt-4o-mini"),
    messages=compiled_messages
)

See the prompt management get-started guide for full Python and TypeScript examples.

Migrating Tracing / Observability

Helicone logs LLM requests at the gateway level. Neural Inverse provides hierarchical traces with nested spans, giving you visibility into multi-step agent workflows — not just individual LLM calls.

Option A: Use the Neural Inverse SDK (Recommended)

Neural Inverse offers a Python and TypeScript SDK that can flexibly wrap any application code, plus native integrations with 80+ frameworks and model providers including OpenAI, LangChain, LlamaIndex, Vercel AI SDK, Anthropic, and many more. You can choose the integration that matches your stack — see the full integrations overview for all options.

The simplest starting point for OpenAI users is the drop-in OpenAI SDK wrapper. Since Helicone is OpenAI-compatible, you can even keep Helicone as a gateway during the transition:

from langfuse.openai import openai

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://api.openai.com/v1"  # or keep Helicone's URL during transition
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Beyond simple LLM call logging, the Neural Inverse SDK provides the @observe() decorator (Python) and equivalent patterns in TypeScript to trace any function in your application — creating hierarchical traces with nested spans for multi-step agent workflows, tool calls, retrieval steps, and more. This gives you the full trace context that gateway-only logging cannot provide.

To link traced generations to your migrated prompts, pass the prompt object to the generation. See linking prompts to traces.

Option B: Use OpenTelemetry

If you already have OpenTelemetry instrumentation, you can point your OTLP exporter at Neural Inverse's OTLP endpoint:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://cloud.langfuse.com/api/public/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64(public_key:secret_key)>,x-langfuse-ingestion-version=4"

Key constraints:

Neural Inverse supports OTLP over HTTP via both HTTP/JSON and HTTP/protobuf (gRPC is not supported yet).
The x-langfuse-ingestion-version=4 header enables real-time visibility in the Neural Inverse v4 tracing table for data sent directly via OTEL.
See the OpenTelemetry integration docs for full setup details.

Option C: Replace the Gateway with LiteLLM Proxy

If you used Helicone primarily as an AI gateway (multi-provider routing, failover), LiteLLM Proxy is a drop-in replacement with native Neural Inverse integration:

# litellm_config.yaml
litellm_settings:
  callbacks: ["langfuse_otel"]

This preserves the gateway pattern while routing all traces to Neural Inverse. See the LiteLLM Proxy integration guide for details.

Further Resources

Was this page helpful?

On this page