📑

AI Paper Research

AI 논문 조사 및 정리

Foundations
AI 안전성·정렬AI Safety & Alignment
Sleeper Agents: Training Deceptive LLMs ...
Towards Monosemanticity: Decomposing Lan...Representation Engineering: A Top-Down A...Weak-to-Strong Generalization: Eliciting...
Constitutional AI: Harmlessness from AI ...Training a Helpful and Harmless Assistan...Red Teaming Language Models to Reduce Ha...
TruthfulQA: Measuring How Models Mimic H...
Certified Adversarial Robustness via Ran...
Explaining and Harnessing Adversarial Ex...
홈/AI 안전성·정렬/2019

AI 안전성·정렬 — 2019

1편의 논문

ICML 20192,000+

Certified Adversarial Robustness via Randomized Smoothing

랜덤 평활화를 통한 인증된 적대적 강건성

Jeremy Cohen, Elan Rosenfeld, J. Zico Kolter (2019)

← AI 안전성·정렬 전체