> ## Documentation Index > Fetch the complete documentation index at: https://arklex-06dfaf56-feat-arkdock-documentation.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # FAQ > Common questions about connecting agents, running simulations, evaluating performance, and team workflows in Arklex. ## Getting started Most teams connect an agent and run their first simulation within minutes. Point Arklex at your agent's endpoint, define a scenario, and run it — no setup or scripting required. No. Arklex calls your agent over HTTP. As long as your agent exposes a compatible endpoint, no changes are needed in your codebase. Arklex supports two integration types: **Chat Completions** (any endpoint following the OpenAI `/chat/completions` schema) and **A2A** (Agent-to-Agent protocol). Most agents built on popular frameworks — LangChain, CrewAI, the OpenAI Agents SDK, or custom code — connect via the Chat Completions endpoint with no code changes. ## Simulations and evaluations A **simulation** is the execution layer: it runs scenarios against your agent and produces conversation transcripts. An **evaluation** is the scoring layer: it takes one or more completed simulations and scores the transcripts using an LLM judge. Multi-turn. Each simulated user follows a persona and goal across a full conversation, producing transcripts that reflect realistic interaction patterns rather than one-shot prompts. Manual testing is slow, subjective, and hard to repeat. Arklex runs the same scenarios consistently across every agent version, scores them with a defined rubric, and keeps a record you can compare over time — so you catch regressions instead of rediscovering them in production. ## Metrics Seven built-in metrics cover general agent quality, spanning both quantitative and qualitative dimensions. You can also define **custom metrics** for behaviors specific to your domain. Write a scoring prompt in plain language describing what good and bad look like on a 1–5 scale. The LLM judge applies it from the next evaluation onward. Custom metrics are versioned, reusable across evaluations, and mix freely with built-ins. The judge is a starting point, not the final word. When your team disagrees with a score, reviewers add their own in the **Annotations** tab. The **Calibration** tab then shows agreement rates per metric and surfaces common disagreements, giving you the evidence to refine a metric's prompt and bring automated scoring in line with human judgment over time. ## Team workflow and security Yes. Multiple reviewers can annotate turns independently. Admins can toggle to a view showing all reviewers' annotations alongside the auto-evaluation scores and the resolved values used for calibration. Header values — typically API keys and auth tokens — are encrypted at rest. The platform displays masked values and never returns the raw secret after it's saved.