A Comprehensive Evaluation of 26 State-of-the-Art Text-to-Image Models

DM Television

REMspace achieves first ever lucid dream communication in history

«

October

»

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

A Comprehensive Evaluation of 26 State-of-the-Art Text-to-Image Models

Tags: api framework microsoft

Author: DATE POSTED:October 12, 2024

Feed: Hacker Noon - Medium

View: Original article

:::info Authors:

(1) Tony Lee, Stanford with Equal contribution;

(2) Michihiro Yasunaga, Stanford with Equal contribution;

(3) Chenlin Meng, Stanford with Equal contribution;

(4) Yifan Mai, Stanford;

(5) Joon Sung Park, Stanford;

(6) Agrim Gupta, Stanford;

(7) Yunzhi Zhang, Stanford;

(8) Deepak Narayanan, Microsoft;

(9) Hannah Benita Teufel, Aleph Alpha;

(10) Marco Bellagente, Aleph Alpha;

(11) Minguk Kang, POSTECH;

(12) Taesung Park, Adobe;

(13) Jure Leskovec, Stanford;

(14) Jun-Yan Zhu, CMU;

(15) Li Fei-Fei, Stanford;

(16) Jiajun Wu, Stanford;

(17) Stefano Ermon, Stanford;

(18) Percy Liang, Stanford.

:::

Table of Links

Abstract and 1 Introduction

2 Core framework

7 Experiments and results

Author contributions, Acknowledgments and References

B Scenario details

C Metric details

D Model details

E Human evaluation procedure

6 Models

We evaluate 26 recent text-to-image models, encompassing various types (e.g., diffusion, autoregressive, GAN), sizes (ranging from 0.4B to 13B parameters), organizations, and accessibility (open or closed). Table 4 presents an overview of the models and their corresponding properties. In our evaluation, we employ the default inference configurations provided in the respective model’s API, GitHub, or Hugging Face repositories.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Feed: Hacker Noon - Medium

View: Original article

Tags: api framework microsoft