Deep dive into evaluation results, execution traces, tool performance, and multi-agent architecture
Each agent evaluated across 5 destinations: Tokyo, Paris, Bangkok, New York, Bali
Individual tool call evaluations with pass/fail checks
Watch the multi-agent workflow execute step by step
Interactive tool testing — run any agent tool with custom parameters
Search flights between cities with airline data, pricing, and booking links
Find real hotels with ratings, amenities, and booking links
Curated local experiences, attractions, and restaurants
Smart budget allocation with real city cost data
LangGraph-powered sequential pipeline — direct tool calls for speed, LLM only for final compilation
| Provider | Model | Cost | Speed |
|---|---|---|---|
| Groq | llama-3.3-70b-versatile | FREE | ⚡ Blazing fast |
| gemini-2.0-flash | FREE | ⚡ Fast | |
| OpenAI | gpt-4o-mini | ~$0.15/1M | ⚡ Fast |