📑

AI Paper Research

AI 논문 조사 및 정리

Foundations
경량화·효율화Efficient AI
QLoRA: Efficient Finetuning of Quantized...AWQ: Activation-aware Weight Quantizatio...Fast Inference from Transformers via Spe...Efficient Memory Management for Large La...FlashAttention-2: Faster Attention with ...
Switch Transformers: Scaling to Trillion...FlashAttention: Fast and Memory-Efficien...GPTQ: Accurate Post-Training Quantizatio...
LoRA: Low-Rank Adaptation of Large Langu...
Distilling the Knowledge in a Neural Net...
홈/경량화·효율화/2022

경량화·효율화 — 2022

3편의 논문

JMLR 20223,000+

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Switch Transformers: 단순하고 효율적인 희소성으로 조 단위 파라미터 모델로 스케일링

William Fedus, Barret Zoph, Noam Shazeer (2022)

NeurIPS 20223,000+

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

FlashAttention: IO 인식 기반의 빠르고 메모리 효율적인 정확한 어텐션

Tri Dao, Daniel Y. Fu, Stefano Ermon et al. (2022)

ICLR 20231,500+

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

GPTQ: 생성적 사전학습 트랜스포머의 정확한 사후 학습 양자화

Elias Frantar, Saleh Ashkboos, Torsten Hoefler et al. (2022)

← 경량화·효율화 전체