- 近期网站停站换新具体说明
- 按以上说明时间,延期一周至网站时间26-27左右。具体实施前两天会在此提前通知具体实施时间
主题:这几天大火的Deepseek没有人讨论吗 -- 俺本懒人
深度自己的介绍:
V3: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.
R1 是基于 V3 的。一样是671B total parameters with 37B activated for each token.。而Meta的羊驼3也不过 15 trillion training tokens,和DeepSeek 基本相当。而训练结果是最大 405B total parameters ,8B 和 70B active。
DeepSeek 主要的突破有两个:
1. 降低了训练成本,缩短了训练时间
2.显示 AI 的思索过程
- 相关回复 上下关系8
压缩 4 层
🙂就像生成模型,创造力只管生成,判断归用户。形式逻辑问题很大 14 nobodyknowsI 字4216 2025-01-30 12:42:24
🙂很独特的视角,对我有启发 8 唐家山 字1177 2025-01-30 21:30:42
🙂DeepSeek还到不了逻辑的层次,依然是自然语言的层次 8 nobodyknowsI 字3141 2025-01-31 03:39:02
🙂DeepSeek 的模型可不小
🙂AI要到什么层次它才能输出人类未知的东西 1 贼不走空 字207 2025-01-31 04:49:08
🙂“叙事即权力,想象力即战场”、“城中谈礼法,乡野种稻粱”算吗 4 nobodyknowsI 字503 2025-01-31 07:37:39
🙂DS的自学路径之语义的创造与批判 1 瓷航惊涛 字386 2025-02-01 01:52:48
🙂这方面你可以做个试验 1 胡辣汤 字408 2025-01-31 11:39:53