近期网站停站换新具体说明
按以上说明时间，延期一周至网站时间26-27左右。具体实施前两天会在此提前通知具体实施时间

主题：这几天大火的Deepseek没有人讨论吗 -- 俺本懒人

共:💬63 🌺207 🌵9新 💬37 🌺4 待认可1

大浪淘沙

DeepSeek 的模型可不小

深度自己的介绍：

V3: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.

R1 是基于 V3 的。一样是671B total parameters with 37B activated for each token.。而Meta的羊驼3也不过 15 trillion training tokens，和DeepSeek 基本相当。而训练结果是最大 405B total parameters ，8B 和 70B active。

DeepSeek 主要的突破有两个：

1. 降低了训练成本，缩短了训练时间

2.显示 AI 的思索过程

全看分页树展 · 主题跟帖

相关回复上下关系8
压缩 4 层
- 🙂就像生成模型，创造力只管生成，判断归用户。形式逻辑问题很大 14 nobodyknowsI 字4216 2025-01-30 12:42:24
  - 🙂很独特的视角，对我有启发 8 唐家山字1177 2025-01-30 21:30:42
    🙂DeepSeek还到不了逻辑的层次，依然是自然语言的层次 8 nobodyknowsI 字3141 2025-01-31 03:39:02
    🙂DeepSeek 的模型可不小
    🙂AI要到什么层次它才能输出人类未知的东西 1 贼不走空字207 2025-01-31 04:49:08
    🙂“叙事即权力，想象力即战场”、“城中谈礼法，乡野种稻粱”算吗 4 nobodyknowsI 字503 2025-01-31 07:37:39
    🙂DS的自学路径之语义的创造与批判 1 瓷航惊涛字386 2025-02-01 01:52:48
    🙂这方面你可以做个试验 1 胡辣汤字408 2025-01-31 11:39:53

有趣有益，互惠互利；开阔视野，博采众长。
虚拟的网络，真实的人。天南地北客，相逢皆朋友

版面群落趣味社区帮助常见问题网站简介基本河规隐私条款使用条款广告说明