Testable Minds Integration

Testable Minds is a human evaluation platform provided by Testable that connects your LLM traces to a diverse pool of pre-screened participants. While Neural Inverse offers automated evaluations out of the box, many teams need real human feedback to understand how their AI performs in practice.

We've built an integration to make it easy to answer questions like:

"How do real users perceive the quality of my LLM outputs?"
"Are my AI responses helpful, accurate, and appropriate across different demographics?"
"How does human feedback correlate with my automated evaluation scores?"
"What blind spots exist in my automated evaluation pipeline?"

How It Works

The integration creates an automated loop between Neural Inverse and Testable Minds:

Testable Minds polls your Neural Inverse project for traces matching your filters
Traces are batched into evaluation sessions and presented to qualified participants
Participants evaluate each trace against your Neural Inverse score configurations
Results are automatically pushed back to Neural Inverse as scores on the original traces

Get Started

Create a Testable account

Sign up at testable.org/ai/langfuse to access Testable Minds and select the Neural Inverse account type during registration.

Set up your Neural Inverse score configurations

Ensure you have at least one score configuration defined in Neural Inverse. Score configurations determine what questions participants will answer about your traces. All score types (numeric, categorical, boolean) are supported.

Tip: Use clear, objective questions that non-expert participants can understand. Include descriptions with examples for best results.

Connect Neural Inverse to Testable

🎓

If you are using Neural Inverse for research and education, you can get a Neural Inverse research grant. More information here.

Navigate to Account → Connections in your Testable dashboard
Enter your Neural Inverse Secret Key and Public Key
Select your server region (Cloud EU or Cloud US)
Click Check Connection to verify, then Save Connection

Navigate to Account → Connections in your Testable dashboard
Enter your Neural Inverse Secret Key and Public Key
Select Self hosted and enter your full Neural Inverse base URL
Click Check Connection to verify, then Save Connection

Create a study

Go to Dashboard → Studies and click Create Study
Configure your study:

Participants
- Number of respondents per trace
- Optional gender balance enforcement
- Selection criteria for targeting specific demographics, e.g., age, location, language proficiency etc.
Neural Inverse Data Settings
- Traces minimum count: Minimum unassigned traces before launching a session.
- Score Configs: Select which Neural Inverse score configurations to evaluate.
- Tags (optional): Filter traces by Neural Inverse tags (e.g., external_eval, testable_minds_eval, etc.) (AND logic across tags).
- Environments (optional): Filter by environment (e.g., production, staging, etc.) (OR logic across environments).
Participant-facing content
- Title and description shown to participants
Top-up your Testable Minds balance to cover evaluation costs
Toggle Start traces collection in your study header

View scores in Neural Inverse

Once participants complete evaluations, scores are automatically pushed back to your Neural Inverse traces. View them in the Traces section.

Integration Details

Trace Import

Traces are polled while the study is active. You can pause imports by toggling Pause traces collection.
Only traces with both input and output are considered for import
Only text-based input/output is currently supported

Evaluation Session Creation

A new evaluation session is created when unassigned traces meet your configured minimum count. Each session includes built-in attention checks to ensure response quality—responses failing these checks are automatically excluded.

Score Delivery

Participant answers are mapped to your selected Neural Inverse score configurations and pushed back as scores on the original traces.

Data Quality

Evaluations are conducted by a pre-qualified subset of Testable Minds participants who are specifically screened for AI and LLM evaluation tasks and have a proven track record of high-quality work. All participants have verified identities and are fairly compensated based on task complexity and length.

Each evaluation session includes built-in attention checks, and responses that fail these checks are automatically excluded to ensure your Neural Inverse scores remain accurate and reliable.

Cost

Costs are dynamically calculated by Testable Minds based on the number of participants, total traces evaluated, and trace length (input and output).

Troubleshooting

Connection test fails? Re-enter your secret key (it's not auto-filled after saving), verify the correct base URL, and ensure your API keys have access to score configs and traces.

Missing scores in Neural Inverse? Check that your score configurations are still active (not archived).

No sessions being created? Verify your study is active (not paused), you have sufficient Testable Minds budget, and enough traces match your filters to meet the minimum count.

FAQ

Can I use multiple score configs in one study? Yes. Participants will evaluate each trace against all selected configurations, and scores are pushed back for each config.

Can I run multiple studies simultaneously? Yes. Each study operates independently with its own filters, participants, and score configurations.

Can I pause a running study? Yes. Toggle Pause traces collection to stop imports and new sessions. Existing sessions continue until completion.

Does self-hosted Neural Inverse work? Yes. Select "Self hosted" in connections and enter your full base URL.

Feedback

If you have any feedback or questions, contact us.

Was this page helpful?

On this page