Constitutional AI (CAI) guides LLM behavior using constitutions, but determining which constitutional principles are most effective remains a challenge. We introduce the C3AI framework (Constitutions for CAI models), which serves two key functions: (1) crafting constitutions by selecting and phrasing effective principles; and (2) evaluating whether CAI models follow their constitutions. By analyzing principles from AI and psychology, we found that positively framed and behavior-based principles align more closely with human decisions than negatively framed or trait-based prin- ciples. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general capa- bilities. Notably, fine-tuned CAI models performed well on negative principles but struggled with positive principles. Overall, C3AI pro- vides a systematic and transparent approach to developing and evaluating constitutional AI, offering a foundation for more reliable LLM alignment
N.B.: If you do not receive the instruction message within a few hours, please check your junk/spam e-mail folder just in case the email was moved there.