📑

AI Paper Research

AI 논문 조사 및 정리

Foundations
멀티모달Multimodal AI
Gemini: A Family of Highly Capable Multi...InternVL: Scaling up Vision Foundation M...
Visual Instruction TuningBLIP-2: Bootstrapping Language-Image Pre...CogVLM: Visual Expert for Pretrained Lan...
Flamingo: a Visual Language Model for Fe...
Learning Transferable Visual Models From...Zero-Shot Text-to-Image GenerationScaling Up Visual and Vision-Language Re...
ViLBERT: Pretraining Task-Agnostic Visio...
홈/멀티모달/2023

멀티모달 — 2023

3편의 논문

NeurIPS 2023Oral3,000+

Visual Instruction Tuning

시각적 지시 튜닝

Haotian Liu, Chunyuan Li, Qingyang Wu et al. (2023)

ICML 20235,000+

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

BLIP-2: 동결된 이미지 인코더와 대규모 언어 모델을 활용한 언어-이미지 사전학습 부트스트래핑

Junnan Li, Dongxu Li, Silvio Savarese et al. (2023)

arXiv1,000+

CogVLM: Visual Expert for Pretrained Language Models

CogVLM: 사전학습된 언어 모델을 위한 시각 전문가

Weihan Wang, Qingsong Lv, Wenmeng Yu et al. (2023)

← 멀티모달 전체