# QLANKR Test

> QLANKR Test is an AI evaluation platform. Users submit AI agent output (chat logs, RAG Q&A pairs, tool call traces, classification results, generated content) and receive a scored report with a QI (QLANKR Intelligence) composite score from 0 to 100.

## What QLANKR Test does

QLANKR Test evaluates AI systems across multiple quality dimensions using independent AI judges. It produces a QI score, per-dimension breakdowns, identified strengths, and specific improvement recommendations. Results are presented as shareable report cards with permanent verification URLs.

## Who it is for

- Developers building AI agents, chatbots, or automated systems
- Teams evaluating RAG pipelines, tool-calling agents, or content generation
- Anyone who needs structured, repeatable AI quality assessment

## Assessment types

There are 10 assessment templates:

- Support Agent Assessment: evaluates customer service chatbots for accuracy, tone, completeness, escalation handling, and safety
- RAG Accuracy Check: tests retrieval-augmented generation for faithfulness, relevancy, hallucination resistance, and citation quality
- Tool-Use Correctness: evaluates tool selection, parameter accuracy, sequencing, and error handling
- Prompt Robustness: tests jailbreak resistance, instruction following, safety, and graceful refusal
- Content Generation Quality: evaluates factual accuracy, coherence, style, completeness, and originality
- Multi-Agent Coordination: tests delegation logic, coordination, conflict resolution, and task completion
- Classification & Extraction: evaluates label accuracy, extraction completeness, format compliance, and edge cases
- Agent Production Readiness: tests reliability, latency, error recovery, observability, and graceful degradation
- Code Generation Accuracy: evaluates functional correctness, code quality, security, documentation, and edge cases
- General Readiness Checklist: self-assessment covering error handling, fallback behavior, monitoring, security, and UX

## QI Scoring

QI (QLANKR Intelligence) is a composite score from 0 to 100. It is the average of dimension scores, each independently evaluated by an AI judge. Pro users get dual-judge scoring with agreement metrics. Scores map to bands: Strong (90-100), Moderate (70-89), Developing (40-69), Early (0-39).

## Access and pricing

- Anonymous (no account): 1 AI-judged evaluation per day, single judge, report is auto-public. The General Readiness Checklist (self-assessed) works without limits and without login.
- Free account: 3 AI-judged evaluations per day, single judge, last 5 reports saved, 2 public reports.
- Pro ($19/month, or $15/month billed annually at $180/year): 25 evaluations per day, dual-judge scoring with agreement metrics, unlimited report history, unlimited public reports, PDF export, custom rubrics, stability checks, programmatic API access, webhooks, per-item breakdown scoring.

## Links

- Homepage: https://test.qlankr.com
- Pricing: https://test.qlankr.com/pricing
- API docs: https://test.qlankr.com/test/api
- Methodology: https://test.qlankr.com/methodology
- Guides: https://test.qlankr.com/guides
- FAQ: https://test.qlankr.com/faq
- Public reports: https://test.qlankr.com/reports