Specify Target Character for LLMs

Target Character Integrity (TCI) framework for specifying and measuring LLM character alignment

0.864
Best TCI (Constitutional AI)
0.696
Worst TCI (Sycophantic)
8
Character Traits
6
Archetypes Tested

TCI Scores by Archetype

Trait Scores Across Pipeline Stages

Trait-Level Target vs Achieved (RLHF)

Pipeline Stage Mean Trait Scores