📑

AI Paper Research

AI 논문 조사 및 정리

Foundations
경량화·효율화Efficient AI
QLoRA: Efficient Finetuning of Quantized...AWQ: Activation-aware Weight Quantizatio...Fast Inference from Transformers via Spe...Efficient Memory Management for Large La...FlashAttention-2: Faster Attention with ...
Switch Transformers: Scaling to Trillion...FlashAttention: Fast and Memory-Efficien...GPTQ: Accurate Post-Training Quantizatio...
LoRA: Low-Rank Adaptation of Large Langu...
Distilling the Knowledge in a Neural Net...
홈/경량화·효율화/2023

경량화·효율화 — 2023

5편의 논문

NeurIPS 20233,000+

QLoRA: Efficient Finetuning of Quantized LLMs

QLoRA: 양자화된 LLM의 효율적 미세 조정

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al. (2023)

MLSys 2024800+

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

AWQ: LLM 압축과 가속을 위한 활성화 인식 가중치 양자화

Ji Lin, Jiaming Tang, Haotian Tang et al. (2023)

ICML 2023600+

Fast Inference from Transformers via Speculative Decoding

추측적 디코딩을 통한 트랜스포머의 빠른 추론

Yaniv Leviathan, Matan Kalman, Yossi Matias (2023)

SOSP 20231,000+

Efficient Memory Management for Large Language Model Serving with PagedAttention

PagedAttention을 이용한 대규모 언어 모델 서빙의 효율적 메모리 관리

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang et al. (2023)

ICLR 2024800+

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

FlashAttention-2: 개선된 병렬화와 작업 분할로 더 빠른 어텐션

Tri Dao (2023)

← 경량화·효율화 전체