Approaches to training AI to be safe

From LawSnap
Jump to navigation Jump to search

part of AI Risk

Reinforcement Learning with Human Feedback[edit | edit source]

Ayush Thakur Understanding Reinforcement Learning from Human Feedback (RLHF)

Constitutional AI[edit | edit source]

Anthropic Claude's Constitution

Yuntao Bai, Saurav Kadavath et al. Constitutional AI: Harmlessness from AI Feedback