Evaluating whether LLM-driven architecture generation trends generalize across datasets, modalities, and tasks. Based on Khalid et al. (2026).