Model Evaluation

Systematic assessment of LLM performance using benchmarks, human evaluation, and automated metrics. Evaluates accuracy, hallucination rate, latency, cost, and task-specific performance. Critical before deploying AI in production.

Related Terms

Hallucination Guardrails

Llm Evaluation Framework

More AI Terms

AI Agent

An autonomous AI system that can plan, use tools, and execute multi-step tasks to achieve goals. Goes beyond simple chat: agents can browse the web, write code, query databases, and interact with APIs. The frontier of applied AI.

AI Cost Optimization

Strategies for reducing the operational cost of AI systems: prompt caching, model selection (use cheaper models for simple tasks), batching, output length control, and caching frequent queries. Can reduce costs 60-90% without quality loss.

AI Safety

The field focused on ensuring AI systems behave as intended and don't cause harm. Encompasses alignment (AI goals match human goals), robustness (resistance to adversarial attacks), interpretability (understanding AI reasoning), and governance.

Claude

Anthropic's family of large language models designed for safety and helpfulness. Known for longer context windows (200K tokens), strong reasoning, and lower hallucination rates. Claude 4 is the latest generation. The main competitor to GPT-4.

Computer Vision

AI that enables machines to interpret visual information from images and video. Applications include object detection, image classification, facial recognition, OCR, and medical imaging. Models: YOLO, ResNet, Vision Transformers.

Context Window

The maximum amount of text an LLM can process in a single interaction, measured in tokens. Claude offers 200K tokens, GPT-4o offers 128K. Larger windows enable processing entire documents but increase cost and latency.

Model Evaluation

Related Terms

Related Articles

More AI Terms

AI Agent

AI Cost Optimization

AI Safety

Claude

Computer Vision

Context Window

Related Resources

Articles

Key Terms

Common Questions

Compare

Services