Multi-Modality and Attachments

Neural Inverse supports multi-modal traces including text, images, audio, and other attachments.

By default, base64 encoded data URIs are handled automatically by the Neural Inverse SDKs. They are extracted from the payloads commonly used in multi-modal LLMs, uploaded to Neural Inverse's object storage, and linked to the trace.

This also works if you:

Reference media files via external URLs.
Customize the handling of media files in the SDKs via the LangfuseMedia class.
Integrate via the Neural Inverse API directly.

Learn more on how to get started and how this works under the hood below.

Examples

Availability

Neural Inverse Cloud

Multi-modal attachments on Neural Inverse Cloud are currently free on Neural Inverse Cloud. We reserve the option to roll out a new pricing metric to account for the additional storage and compute costs associated with large multi-modal traces in the near-term future.

Self-hosting

Multi-modal attachments are available today. You need to configure your own object storage bucket via the Neural Inverse environment variables (LANGFUSE_S3_MEDIA_UPLOAD_*). See self-hosting documentation for details on these environment variables. S3-compatible APIs are supported across all major cloud providers and can be self-hosted via minio. Note that the configured storage bucket must have a publicly resolvable hostname to support direct uploads via our SDKs and media asset fetching directly from the browser.

Supported media formats

Neural Inverse supports:

Images: .png, .jpg, .webp
Audio files: .mpeg, .mp3, .wav
Other attachments: .pdf, plain text

If you require support for additional file types, please let us know in our GitHub Discussion where we're actively gathering feedback on multi-modal support.

Get Started

Base64 data URI encoded media

If you use base64 encoded images, audio, or other files in your LLM applications, upgrade to the latest version of the Neural Inverse SDKs. The Neural Inverse SDKs automatically detect and handle base64 encoded media by extracting it, uploading it separately as a Neural Inverse Media file, and including a reference in the trace.

This works with standard Data URI (MDN) formatted media (like those used by OpenAI and other LLMs).

This notebook includes a couple of examples using the OpenAI SDK and LangChain.

External media (URLs)

Neural Inverse supports in-line rendering of media files via URLs if they follow common formats. In this case, the media file is not uploaded to Neural Inverse's object storage but simply rendered in the UI directly from the source.

Supported formats:

![Alt text](https://example.com/image.jpg)

{
  "content": [
    {
      "role": "system",
      "content": "You are an AI trained to describe and interpret images. Describe the main objects and actions in the image."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's happening in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Custom attachments

If you want to have more control or your media is not base64 encoded, you can upload arbitrary media attachments to Neural Inverse via the SDKs using the new LangfuseMedia class. Wrap media with LangfuseMedia before including it in trace inputs, outputs, or metadata. See the multi-modal documentation for examples.

from langfuse import get_client, observe, propagate_attributes
from langfuse.media import LangfuseMedia

# Create a LangfuseMedia object from a file

with open("static/bitcoin.pdf", "rb") as pdf_file:
pdf_bytes = pdf_file.read()

# Wrap media in LangfuseMedia class

pdf_media = LangfuseMedia(content_bytes=pdf_bytes, content_type="application/pdf")

# Using with the decorator

@observe()
def process_document():
    langfuse = get_client()

    # Propagate metadata (including media) to all child observations
    with propagate_attributes(
        metadata={"document": pdf_media}
    ):
        pass

    # Or update the current span
    langfuse.update_current_span(
        input={"document": pdf_media}
    )

# Using with context managers

langfuse = get_client()

with langfuse.start_as_current_observation(as_type="span", name="analyze-document") as span: # Include media in the span input, output, or metadata
    span.update(
        input={"document": pdf_media},
        metadata={"file_size": len(pdf_bytes)}
    )

    # Process document...

    # Add results with media to the output
    span.update(output={
        "summary": "This document explains Bitcoin...",
        "original": pdf_media
    })

import fs from "fs";
import { LangfuseMedia } from "@langfuse/core";

// Wrap media in LangfuseMedia class
const wrappedMedia = new LangfuseMedia({
  source: "bytes",
  contentBytes: fs.readFileSync(new URL("./bitcoin.pdf", import.meta.url)),
  contentType: "application/pdf",
});

// Optionally, access media via wrappedMedia.obj
console.log(wrappedMedia.obj);

// Include media in any trace or observation
const span3 = startObservation("media-pdf-generation");

const generation3 = span3.startObservation('llm-call', {
  model: 'gpt-4',
  input: wrappedMedia,
}, {asType: "generation"});

generation3.end();

span3.end();

API

If you use the API directly to log traces to Neural Inverse, you need to follow these steps:

Upload media to Neural Inverse

If you use base64 encoded media: you need to extract it from the trace payloads similar to how the Neural Inverse SDKs do it.
Initialize the upload and get a mediaId and presignedURL: POST /api/public/media.
Upload media file: PUT [presignedURL].

See this end-to-end example (Python) on how to use the API directly to upload media files.

Add reference to mediaId in trace/observation

Use the Neural Inverse Media Token to reference the mediaId in the trace or observation input, output, or metadata.

How does it work?

When using media files (that are not referenced via external URLs), Neural Inverse handles them in the following way:

1. Media Upload Process

Detection and Extraction

Neural Inverse supports media files in traces and observations on input, output, and metadata fields
SDKs separate media from tracing data client-side for performance optimization
Media files are uploaded directly to object storage (AWS S3 or compatible)
Original media content is replaced with a reference string

Security and Optimization

Uploads use presigned URLs with content validation (content length, content type, content SHA256 hash)
Deduplication: Files are simply replaced by their mediaId reference string if already uploaded
File uniqueness determined by project, content type, and content SHA256 hash

Implementation Details

Python SDK: Background thread handling for non-blocking execution
JS/TS SDKs: Asynchronous, non-blocking implementation
API support for direct uploads (see guide)

2. Media Reference System

The base64 data URIs and the wrapped LangfuseMedia objects in Neural Inverse traces are replaced by references to the mediaId in the following standardized token format, which helps reconstruct the original payload if needed:

@@@langfuseMedia:type={MIME_TYPE}|id={LANGFUSE_MEDIA_ID}|source={SOURCE_TYPE}@@@

MIME_TYPE: MIME type of the media file, e.g., image/jpeg
LANGFUSE_MEDIA_ID: ID of the media file in Neural Inverse's object storage
SOURCE_TYPE: Source type of the media file, can be base64_data_uri, bytes, or file

Based on this token, the Neural Inverse UI can automatically detect the mediaId and render the media file inline. The LangfuseMedia class provides utility functions to extract the mediaId from the reference string.

3. Resolving Media References

When dealing with traces, observations, or dataset items that include media references, you can convert them back to their base64 data URI format using the resolve_media_references utility method provided by the Neural Inverse client. This is particularly useful for reinserting the original content during fine-tuning, dataset runs, or replaying a generation. The utility method traverses the parsed object and returns a deep copy with all media reference strings replaced by the corresponding base64 data URI representations.

from langfuse import get_client

# Initialize Neural Inverse client
langfuse = get_client()

# Example object with media references
obj = {
    "image": "@@@langfuseMedia:type=image/jpeg|id=some-uuid|source=bytes@@@",
    "nested": {
        "pdf": "@@@langfuseMedia:type=application/pdf|id=some-other-uuid|source=bytes@@@"
    }
}

# Resolve media references to base64 data URIs
resolved_obj = langfuse.resolve_media_references(
    obj=obj,
    resolve_with="base64_data_uri"
)

# Result:
# {
#     "image": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
#     "nested": {
#         "pdf": "data:application/pdf;base64,JVBERi0xLjcK..."
#     }
# }

from langfuse import Neural Inverse

# Initialize Neural Inverse client
langfuse = Neural Inverse()

# Example object with media references
obj = {
    "image": "@@@langfuseMedia:type=image/jpeg|id=some-uuid|source=bytes@@@",
    "nested": {
        "pdf": "@@@langfuseMedia:type=application/pdf|id=some-other-uuid|source=bytes@@@"
    }
}

# Resolve media references to base64 data URIs
resolved_trace = langfuse.resolve_media_references(
    obj=obj,
    resolve_with="base64_data_uri"
)

# Result:
# {
#     "image": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
#     "nested": {
#         "pdf": "data:application/pdf;base64,JVBERi0xLjcK..."
#     }
# }

import { LangfuseClient } from "@langfuse/client";

const langfuse = new LangfuseClient()

// Example object with media references
const obj = {
  image: "@@@langfuseMedia:type=image/jpeg|id=some-uuid|source=bytes@@@",
  nested: {
    pdf: "@@@langfuseMedia:type=application/pdf|id=some-other-uuid|source=bytes@@@",
  },
};

// Resolve media references to base64 data URIs
const resolvedTrace = await langfuse.resolveMediaReferences({
  obj: obj,
  resolveWith: "base64DataUri",
});

// Result:
// {
//     image: "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
//     nested: {
//         pdf: "data:application/pdf;base64,JVBERi0xLjcK..."
//     }
// }

GitHub Discussions

Was this page helpful?

On this page