Approaches to training AI to be safe

From LawSnap

Jump to navigation Jump to search

part of AI Risk

Reinforcement Learning with Human Feedback[edit | edit source]

Ayush Thakur Understanding Reinforcement Learning from Human Feedback (RLHF)

Constitutional AI[edit | edit source]

Anthropic Claude's Constitution

Yuntao Bai, Saurav Kadavath et al. Constitutional AI: Harmlessness from AI Feedback

Retrieved from "https://lawsnap.mywikis.wiki/w/index.php?title=Approaches_to_training_AI_to_be_safe&oldid=41"

Cookies help us deliver our services. By using our services, you agree to our use of cookies.