Contact
AI

RLHF

Reinforcement Learning from Human Feedback — a training technique where human evaluators rank model outputs to train a reward model, which then guides the LLM to produce preferred responses. The technique that made ChatGPT conversational.

Related Articles

Related Resources