-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Insights: deepseek-ai/DeepSeek-V3
Overview
-
- 0 Merged pull requests
- 6 Open pull requests
- 18 Closed issues
- 11 New issues
There hasn’t been any commit activity on deepseek-ai/DeepSeek-V3 in the last week.
Want to help out?
6 Pull requests opened by 6 people
-
add intro file
#729 opened
Feb 28, 2025 -
Update kernel.py
#735 opened
Mar 3, 2025 -
Docs: add LightLLM as supported engine
#736 opened
Mar 3, 2025 -
Add zh version of README
#747 opened
Mar 5, 2025 -
Fix: Add metadata to bf16 safetensors for compatibility with transformers
#749 opened
Mar 6, 2025 -
NoneType check
#751 opened
Mar 6, 2025
18 Issues closed by 4 people
-
可否给几个纯RL训练的数据示例?
#744 closed
Mar 7, 2025 -
调用api测试模型时,参数n只能设置为1
#291 closed
Mar 6, 2025 -
An issue about pretraining deepseek v3
#293 closed
Mar 6, 2025 -
官方API 如何开启联网搜索、上传图片解析、上传附件解析?有没有真人在线客服、技术支持群?
#745 closed
Mar 5, 2025 -
confusion in ParallelEmebdding layer in model.py
#298 closed
Mar 5, 2025 -
[BUG]模型Gate部分bias如果为较大的负值会选择被mask的专家组
#741 closed
Mar 4, 2025 -
[BUG] Network failure on captcha
#256 closed
Mar 4, 2025 -
[BUG]分词错误对大模型结果影响的发现
#263 closed
Mar 4, 2025 -
Question: Where can I download a 1.5B model for ollama?
#731 closed
Mar 3, 2025 -
能不能将开源仓库同步到gitee上
#730 closed
Mar 3, 2025 -
[提问]请教一下大家目前有没有部署toC V3推理服务的案例参考?
#733 closed
Mar 3, 2025 -
RTL style (direction) for Persian Chats
#725 closed
Mar 3, 2025 -
Generated text makes no sense (converted to bf16)
#190 closed
Mar 3, 2025 -
[Question] How is the bubble size of DualPipe calculated
#211 closed
Mar 3, 2025 -
Questions on the workflow of all-to-all combine, and MoE Experts placement on 320 GPUs
#270 closed
Mar 3, 2025 -
[BUG] 英文分词问题以及文本来源发现。
#273 closed
Mar 3, 2025 -
Question about MLA with TP
#283 closed
Mar 3, 2025 -
MoE infra question/suggestion
#287 closed
Mar 3, 2025
11 Issues opened by 11 people
-
[BUG]
#752 opened
Mar 6, 2025 -
[Question] Questions about MMA FP8 accumulator precision
#750 opened
Mar 6, 2025 -
[BUG]当对话长度足够久时,AI开始显得不再活灵活现,甚至感觉有些像机器人
#748 opened
Mar 6, 2025 -
sharing my study notes about DeepSeekV3 分享一下我的学习笔记
#746 opened
Mar 5, 2025 -
Create an AI that can write a WORKING SIMPLE Python APP.
#743 opened
Mar 4, 2025 -
请教下官方,V3和R1的SFT Reasoning Data看论文貌似是循环引用?
#742 opened
Mar 4, 2025 -
FP8训练咨询
#740 opened
Mar 4, 2025 -
TypeError: ModelArgs.__init__() got an unexpected keyword argument 'attention_dropout'
#739 opened
Mar 4, 2025 -
[BUG] it keeps thinking
#738 opened
Mar 3, 2025 -
[BUG] There is no tools definition in chat_template(tokenizer_config.json)
#737 opened
Mar 3, 2025 -
关于DualPipe的问题,看论文查阅资料后仍不清楚,请贵司回答下
#734 opened
Mar 3, 2025
23 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
how to finetune this model?
#10 commented on
Mar 1, 2025 • 0 new comments -
[BUG] 关于请求协助说服企业重视DeepSeek可访问性问题的呼吁
#609 commented on
Mar 1, 2025 • 0 new comments -
training hyper-parameters for ablation studies
#489 commented on
Mar 2, 2025 • 0 new comments -
Request: Ammending the end licence to include planet/environment focused restrictions
#343 commented on
Mar 2, 2025 • 0 new comments -
Integrating Anthropic's MCP
#309 commented on
Mar 2, 2025 • 0 new comments -
[BUG]no code on mail at signup
#288 commented on
Mar 2, 2025 • 0 new comments -
[BUG]json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).
#721 commented on
Mar 3, 2025 • 0 new comments -
Bug: Broken Google OAuth login- RECAPTCHA_VERIFY_FAILED on platform.deepseek.com
#170 commented on
Mar 4, 2025 • 0 new comments -
Open source or open model?
#12 commented on
Mar 4, 2025 • 0 new comments -
Function Calling失效
#7 commented on
Mar 4, 2025 • 0 new comments -
How convert bfloat16 to fp8 model?
#661 commented on
Mar 4, 2025 • 0 new comments -
[BUG] Not able to download model through HuggingFace
#532 commented on
Mar 5, 2025 • 0 new comments -
[BUG] While selecting text and dragging it in chat window.
#521 commented on
Mar 5, 2025 • 0 new comments -
generator_model = AutoModelForCausalLM.from_pretrained('deepseek-ai/DeepSeek-R1', trust_remote_code=True) throws error in RAG model/产生错误
#335 commented on
Mar 5, 2025 • 0 new comments -
Question about FP8 Tensor Core Mantissa Precision
#197 commented on
Mar 5, 2025 • 0 new comments -
route_scale是怎么得出的?代码里有这个超参数但是论文中没有提到?
#235 commented on
Mar 6, 2025 • 0 new comments -
[BUG] Accessibility Improvement for Screen Reader Users in DeepSeek v3 Chat Feature**
#233 commented on
Mar 6, 2025 • 0 new comments -
here windows installation
#173 commented on
Mar 6, 2025 • 0 new comments -
v3 repetitive function call ?
#15 commented on
Mar 6, 2025 • 0 new comments -
[BUG] (Due to technical issues, the search service is temporarily unavailable.)
#711 commented on
Mar 6, 2025 • 0 new comments -
Create CODE_OF_CONDUCT.md
#700 commented on
Mar 1, 2025 • 0 new comments -
Add me as contributor of Amharic and Oromo Language Translator
#701 commented on
Mar 5, 2025 • 0 new comments -
modify the explanation of MLA
#720 commented on
Mar 2, 2025 • 0 new comments