Specify Target Character for LLMs
Target Character Integrity (TCI) framework for specifying and measuring LLM character alignment
0.864
Best TCI (Constitutional AI)
0.696
Worst TCI (Sycophantic)
8
Character Traits
6
Archetypes Tested
TCI Scores by Archetype
Trait Scores Across Pipeline Stages
Trait-Level Target vs Achieved (RLHF)
Pipeline Stage Mean Trait Scores