2

Efficient Large Scale Language Modeling with Mixtures of Experts

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide …

2

Efficient Large Scale Language Modeling with Mixtures of Experts

Rethinking Automatic Evaluation in Sentence Simplification