Social Dynamics

C3AI: Crafting and Evaluating Constitutions for Constitutional AI

The C3AI framework (Crafting Constitutions for CAI models) serves two key functions: (1) crafting constitutions; and (2) evaluating whether models adhere to their constitutions. Crafting involves three steps: selecting relevant items for a specific use case (Item Selection), converting them into standardized, human-understandable statements and machine-readable principles (Item Transformation), and curating a final set of principles to form a constitution (Principle Selection). Evaluating model adherence assesses how well the model follows specific principles, and whether it aligns with intended uses by, for example, prioritizing safety or mathematical reasoning.

Description

Constitutional AI (CAI) guides LLM behavior using constitutions, but determining which constitutional principles are most effective remains a challenge. We introduce the C3AI framework (Constitutions for CAI models), which serves two key functions: (1) crafting constitutions by selecting and phrasing effective principles; and (2) evaluating whether CAI models follow their constitutions. By analyzing principles from AI and psychology, we found that positively framed and behavior-based principles align more closely with human decisions than negatively framed or trait-based prin- ciples. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general capa- bilities. Notably, fine-tuned CAI models performed well on negative principles but struggled with positive principles. Overall, C3AI pro- vides a systematic and transparent approach to developing and evaluating constitutional AI, offering a foundation for more reliable LLM alignment

Publications

C3AI: Crafting and Evaluating Constitutions for Constitutional AI. ACM WWW 2025 PDF

Code and data

Fill in the form to receive the link to the datasets:

Lastname

Name

Institution

Email address We'll never share your email with anyone else.

Reason for requesting the data (max 100 words)

N.B.: If you do not receive the instruction message within a few hours, please check your junk/spam e-mail folder just in case the email was moved there.