Unit 1 Logo Unit 2 Logo Unit 3 Logo Unit 4 Logo

CreBench

Human-Aligned Creativity Evaluation from Idea to Process to Product

Accepted at AAAI 2026
Authors: Kaiwen Xue¹*, Li Chenglong¹*, Zhonghong Ou²†, Guoxin Zhang¹, Kaoyan Lu³, Shuai Lyu¹, Yifan Zhu¹, Zong Ping¹, Junpeng Ding¹, Xinyu Liu⁴, Qunlin Chen⁴, Weiwei Qin¹, Yiran Shen¹, Jiayi Cen⁵ (*Co-first authors, †Corresponding author)

🌟 Overview

CreBench is a creativity evaluation benchmark designed for open‑ended drawing and visual design tasks. It systematically measures model performance across three interconnected dimensions: Creative Ideas, Creative Processes, and Creative Products. CreBench focuses on whether a model’s generative behavior aligns with human‑like creative thinking throughout the full workflow of ideation, exploration, refinement, and visual execution.

🧠 Definition of Creativity

In this benchmark, creativity is defined as:
The process of generating, evaluating, and refining problem‑solving ideas, and manipulating visual or structural elements to produce novel and appropriate solutions. This includes divergent exploration, conceptual combination, refinement of a core idea, and elaboration of visual details. A drawing is considered creative when it is both an effective solution and a meaningful creative expression.

📐 Evaluation Dimensions

CreBench decomposes creativity into three major dimensions, each with further sub‑criteria:

1. Creative Ideas

2. Creative Processes

3. Creative Products

📊 Rubric Design

CreBench provides a 1–5 level scoring rubric for each criterion, ranging from Absent to Highly Developed. The rubric captures behaviors such as:

The rubric supports both human annotation and LLM‑as‑a‑judge evaluations with structured scoring instructions.

🚀 Key Contributions

📁 Resources