📑

AI Paper Research

AI 논문 조사 및 정리

Foundations
멀티모달Multimodal AI
Gemini: A Family of Highly Capable Multi...InternVL: Scaling up Vision Foundation M...
Visual Instruction TuningBLIP-2: Bootstrapping Language-Image Pre...CogVLM: Visual Expert for Pretrained Lan...
Flamingo: a Visual Language Model for Fe...
Learning Transferable Visual Models From...Zero-Shot Text-to-Image GenerationScaling Up Visual and Vision-Language Re...
ViLBERT: Pretraining Task-Agnostic Visio...
홈/멀티모달/2024

멀티모달 — 2024

2편의 논문

arXiv3,000+

Gemini: A Family of Highly Capable Multimodal Models

Gemini: 고성능 멀티모달 모델 패밀리

Gemini Team, Google DeepMind (2024)

CVPR 20241,000+

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL: 범용 시각-언어 과제를 위한 비전 파운데이션 모델 스케일링 및 정렬

Zhe Chen, Jiannan Wu, Wenhai Wang et al. (2024)

← 멀티모달 전체