C3AI: Crafting and Evaluating Constitutions for Constitutional AI


Description

Constitutional AI (CAI) guides LLM behavior using constitutions, but determining which constitutional principles are most effective remains a challenge. We introduce the C3AI framework (Constitutions for CAI models), which serves two key functions: (1) crafting constitutions by selecting and phrasing effective principles; and (2) evaluating whether CAI models follow their constitutions. By analyzing principles from AI and psychology, we found that positively framed and behavior-based principles align more closely with human decisions than negatively framed or trait-based prin- ciples. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general capa- bilities. Notably, fine-tuned CAI models performed well on negative principles but struggled with positive principles. Overall, C3AI pro- vides a systematic and transparent approach to developing and evaluating constitutional AI, offering a foundation for more reliable LLM alignment


The C3AI framework (Crafting Constitutions for CAI models) serves two key functions: (1) crafting constitutions; and (2) evaluating whether models adhere to their constitutions. Crafting involves three steps: selecting relevant items for a specific use case (Item Selection), converting them into standardized, human-understandable statements and machine-readable principles (Item Transformation), and curating a final set of principles to form a constitution (Principle Selection). Evaluating model adherence assesses how well the model follows specific principles, and whether it aligns with intended uses by, for example, prioritizing safety or mathematical reasoning.

Publications

  • C3AI: Crafting and Evaluating Constitutions for Constitutional AI. ACM WWW 2025 PDF

Code and data


We'll never share your email with anyone else.

N.B.: If you do not receive the instruction message within a few hours, please check your junk/spam e-mail folder just in case the email was moved there.