淘客熙熙

主题:这几天大火的Deepseek没有人讨论吗 -- 俺本懒人

共:💬63 🌺207 🌵9新 💬37 🌺4 待认可1
全看分页树展 · 主题 跟帖
家园 DeepSeek 的模型可不小

深度自己的介绍:

V3: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.

R1 是基于 V3 的。一样是671B total parameters with 37B activated for each token.。而Meta的羊驼3也不过 15 trillion training tokens,和DeepSeek 基本相当。而训练结果是最大 405B total parameters ,8B 和 70B active。

DeepSeek 主要的突破有两个:

1. 降低了训练成本,缩短了训练时间

2.显示 AI 的思索过程

全看分页树展 · 主题 跟帖


有趣有益,互惠互利;开阔视野,博采众长。
虚拟的网络,真实的人。天南地北客,相逢皆朋友

Copyright © cchere 西西河