infiniflow · KevinHuSh · Dec 18, 2024 · Dec 16, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/README_zh.md b/README_zh.md
@@ -158,7 +158,7 @@
    | nightly           | &approx;9       | :heavy_check_mark:    | *Unstable* nightly build |
    | nightly-slim      | &approx;2       | ❌                    | *Unstable* nightly build |
 
-   > [!TIP]  
+   > [!TIP] 
    > 如果你遇到 Docker 镜像拉不下来的问题，可以在 **docker/.env** 文件内根据变量 `RAGFLOW_IMAGE` 的注释提示选择华为云或者阿里云的相应镜像。
    > - 华为云镜像名：`swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow`
    > - 阿里云镜像名：`registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow`

diff --git a/docs/guides/deploy_local_llm.mdx b/docs/guides/deploy_local_llm.mdx
@@ -9,6 +9,8 @@ import TabItem from '@theme/TabItem';
 
 Run models locally using Ollama, Xinference, or other frameworks.
 
+---
+
 RAGFlow supports deploying models locally using Ollama, Xinference, IPEX-LLM, or jina. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
 
 RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.

diff --git a/docs/guides/run_health_check.md b/docs/guides/run_health_check.md
@@ -7,6 +7,8 @@ slug: /run_health_check
 
 Double-check the health status of RAGFlow's dependencies.
 
+---
+
 The operation of RAGFlow depends on four services:
 
 - **Elasticsearch** (default) or [Infinity](https://github.com/infiniflow/infinity) as the document engine

diff --git a/docs/references/http_api_reference.md b/docs/references/http_api_reference.md
@@ -1372,15 +1372,15 @@ curl --request POST \
   - `"model_name"`, `string`  
     The chat model name. If not set, the user's default chat model will be used.  
   - `"temperature"`: `float`  
-    Controls the randomness of the model's predictions. A lower temperature increases the model's confidence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.  
+    Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`.  
   - `"top_p"`: `float`  
     Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`  
   - `"presence_penalty"`: `float`  
     This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
   - `"frequency penalty"`: `float`  
     Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
   - `"max_token"`: `integer`  
-    The maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to `512`. 
+    The maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to `512`. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.  
 - `"prompt"`: (*Body parameter*), `object`  
   Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes:  
   - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
@@ -1507,15 +1507,15 @@ curl --request PUT \
   - `"model_name"`, `string`  
     The chat model name. If not set, the user's default chat model will be used.  
   - `"temperature"`: `float`  
-    Controls the randomness of the model's predictions. A lower temperature increases the model's confidence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.  
+    Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`.  
   - `"top_p"`: `float`  
     Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`  
   - `"presence_penalty"`: `float`  
     This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
   - `"frequency penalty"`: `float`  
     Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
   - `"max_token"`: `integer`  
-    The maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to `512`.  
+    The maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to `512`. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.  
 - `"prompt"`: (*Body parameter*), `object`  
   Instructions for the LLM to follow.  A `prompt` object contains the following attributes:  
   - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
@@ -2149,6 +2149,7 @@ Failure:
 ---
 
 ## Create session with agent
+
 *If there are parameters in the `begin` component, the session cannot be created in this way.*
 
 **POST** `/api/v1/agents/{agent_id}/sessions`

diff --git a/docs/references/python_api_reference.md b/docs/references/python_api_reference.md
@@ -950,15 +950,15 @@ The LLM settings for the chat assistant to create. Defaults to `None`. When the
 - `model_name`: `str`  
   The chat model name. If it is `None`, the user's default chat model will be used.  
 - `temperature`: `float`  
-  Controls the randomness of the model's predictions. A lower temperature increases the model's confidence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.  
+  Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses. Defaults to `0.1`.  
 - `top_p`: `float`  
   Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`  
 - `presence_penalty`: `float`  
   This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
 - `frequency penalty`: `float`  
   Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
 - `max_token`: `int`  
-  The maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to `512`.
+  The maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to `512`. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.
 
 #### prompt: `Chat.Prompt`
 
@@ -1016,11 +1016,11 @@ A dictionary representing the attributes to update, with the following keys:
 - `"dataset_ids"`: `list[str]` The datasets to update.
 - `"llm"`: `dict` The LLM settings:
   - `"model_name"`, `str` The chat model name.
-  - `"temperature"`, `float` Controls the randomness of the model's predictions.  
+  - `"temperature"`, `float` Controls the randomness of the model's predictions. A lower temperature results in more conservative responses, while a higher temperature yields more creative and diverse responses.  
   - `"top_p"`, `float` Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from.  
   - `"presence_penalty"`, `float` This discourages the model from repeating the same information by penalizing words that have appeared in the conversation.
   - `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
-  - `"max_token"`, `int` The maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to `512`.
+  - `"max_token"`, `int` The maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to `512`. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.
 - `"prompt"` : Instructions for the LLM to follow.
   - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
   - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.

diff --git a/docs/release_notes.md b/docs/release_notes.md
@@ -7,6 +7,40 @@ slug: /release_notes
 
 Key features, improvements and bug fixes in the latest releases.
 
+## v0.15.0
+
+Released on December 18, 2024.
+
+### New features
+
+- Introduces additional Agent-specific APIs.
+- Supports using page rank score to improve retrieval performance when searching across multiple knowledge bases.
+- Offers an iframe in Chat and Agent to facilitate the integration of RAGFlow into your webpage.
+- Adds a Helm chart for deploying RAGFlow on Kubernetes.
+- Supports importing or exporting an agent in JSON format.
+- Supports stepping for Agent components/tools.
+- Adds a new UI language (*contributed by the community*): Japanese.
+- Supports resuming GraphRAG and RAPTOR from a failure, enhancing task management resilience.
+- Adds more Mistral models.
+- Adds a dark mode to the UI, allowing users to toggle between light and dark themes.
+
+### Improvements
+
+- Upgrades document layout recognition models for Deepdoc.
+- Significantly enhances the retrieval performance when using [Infinity](https://github.com/infiniflow/infinity) as document engine.
+
+### Related APIs
+
+#### HTTP APIs
+
+- [List agent sessions](https://ragflow.io/docs/dev/http_api_reference#list-agent-sessions)
+- [List agents](https://ragflow.io/docs/dev/http_api_reference#list-agents)
+
+#### Python APIs
+
+- [List agent sessions](https://ragflow.io/docs/dev/python_api_reference#list-agent-sessions)
+- [List agents](https://ragflow.io/docs/dev/python_api_reference#list-agents)
+
 ## v0.14.1
 
 Released on November 29, 2024.

diff --git a/web/src/locales/en.ts b/web/src/locales/en.ts
@@ -136,7 +136,7 @@ export default {
       toMessage: 'Missing end page number (excluded)',
       layoutRecognize: 'Layout recognition',
       layoutRecognizeTip:
-        'Use visual models for layout analysis to better understand the structure of the document and effectively locate document titles, text blocks, images, and tables. If disabled, only the plain text from the PDF will be retrieved.',
+        'Use visual models for layout analysis to better understand the structure of the document and effectively locate document titles, text blocks, images, and tables. If disabled, only the plain text in the PDF will be retrieved.',
       taskPageSize: 'Task page size',
       taskPageSizeMessage: 'Please input your task page size!',
       taskPageSizeTip: `During layout recognition, a PDF file is split into chunks and processed in parallel to increase processing speed. This parameter sets the size of each chunk. A larger chunk size reduces the likelihood of splitting continuous text between pages.`,
@@ -398,7 +398,7 @@ The above is the content you need to summarize.`,
         'Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently.',
       maxTokens: 'Max tokens',
       maxTokensMessage: 'Max tokens is required',
-      maxTokensTip: `This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to 512.`,
+      maxTokensTip: `This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to 512. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.`,
       maxTokensInvalidMessage: 'Please enter a valid number for Max Tokens.',
       maxTokensMinMessage: 'Max Tokens cannot be less than 0.',
       quote: 'Show quote',
@@ -430,7 +430,7 @@ The above is the content you need to summarize.`,
       partialTitle: 'Partial Embed',
       extensionTitle: 'Chrome Extension',
       tokenError: 'Please create API Token first!',
-      betaError: 'Please apply an API key in system setting firstly.',
+      betaError: 'Please acquire a RAGFlow API key from the System Settings page first.',
       searching: 'Searching...',
       parsing: 'Parsing',
       uploading: 'Uploading',
@@ -453,7 +453,7 @@ The above is the content you need to summarize.`,
       profileDescription: 'Update your photo and personal details here.',
       maxTokens: 'Max Tokens',
       maxTokensMessage: 'Max Tokens is required',
-      maxTokensTip: `This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses. Defaults to 512.`,
+      maxTokensTip: `This sets the maximum length of the model's output, measured in the number of tokens (words or pieces of words). Defaults to 512. If disabled, you lift the maximum token limit, allowing the model to determine the number of tokens in its responses.`,
       maxTokensInvalidMessage: 'Please enter a valid number for Max Tokens.',
       maxTokensMinMessage: 'Max Tokens cannot be less than 0.',
       password: 'Password',
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,6 +9,8 @@ import TabItem from '@theme/TabItem'; @@
     Run models locally using Ollama, Xinference, or other frameworks.
+    ---
     RAGFlow supports deploying models locally using Ollama, Xinference, IPEX-LLM, or jina. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
     RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
@@ Expand Down @@