Skip to content

Commit

Permalink
Merge branch 'infiniflow:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
isthaison authored Nov 21, 2024
2 parents 37450c4 + cc5960b commit f9df213
Show file tree
Hide file tree
Showing 71 changed files with 1,030 additions and 354 deletions.
29 changes: 25 additions & 4 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,15 +70,26 @@ jobs:
echo "RAGFLOW_IMAGE=infiniflow/ragflow:dev" >> docker/.env
sudo docker compose -f docker/docker-compose.yml up -d
- name: Run tests against Elasticsearch
- name: Run sdk tests against Elasticsearch
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
export HOST_ADDRESS=http://host.docker.internal:9380
until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
cd sdk/python && poetry install && source .venv/bin/activate && cd test && pytest --tb=short t_dataset.py t_chat.py t_session.py t_document.py t_chunk.py
cd sdk/python && poetry install && source .venv/bin/activate && cd test/test_sdk_api && pytest -s --tb=short get_email.py t_dataset.py t_chat.py t_session.py t_document.py t_chunk.py
- name: Run frontend api tests against Elasticsearch
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
export HOST_ADDRESS=http://host.docker.internal:9380
until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
cd sdk/python && poetry install && source .venv/bin/activate && cd test/test_frontend_api && pytest -s --tb=short get_email.py test_dataset.py
- name: Stop ragflow:dev
if: always() # always run this step even if previous steps failed
Expand All @@ -89,15 +100,25 @@ jobs:
run: |
sudo DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml up -d
- name: Run tests against Infinity
- name: Run sdk tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
export HOST_ADDRESS=http://host.docker.internal:9380
until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
cd sdk/python && poetry install && source .venv/bin/activate && cd test/test_sdk_api && pytest -s --tb=short get_email.py t_dataset.py t_chat.py t_session.py t_document.py t_chunk.py
- name: Run frontend api tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
export HOST_ADDRESS=http://host.docker.internal:9380
until sudo docker exec ragflow-server curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
cd sdk/python && poetry install && source .venv/bin/activate && cd test && pytest --tb=short t_dataset.py t_chat.py t_session.py t_document.py t_chunk.py
cd sdk/python && poetry install && source .venv/bin/activate && cd test/test_frontend_api && pytest -s --tb=short get_email.py test_dataset.py
- name: Stop ragflow:dev
if: always() # always run this step even if previous steps failed
Expand Down
1 change: 1 addition & 0 deletions agent/component/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from .akshare import AkShare, AkShareParam
from .crawler import Crawler, CrawlerParam
from .invoke import Invoke, InvokeParam
from .template import Template, TemplateParam


def component_class(class_name):
Expand Down
10 changes: 7 additions & 3 deletions agent/component/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,10 +385,14 @@ def __str__(self):
"""
return """{{
"component_name": "{}",
"params": {}
"params": {},
"output": {},
"inputs": {}
}}""".format(self.component_name,
self._param
)
self._param,
json.dumps(json.loads(str(self._param))["output"], ensure_ascii=False),
json.dumps(json.loads(str(self._param))["inputs"], ensure_ascii=False)
)

def __init__(self, canvas, id, param: ComponentParamBase):
self._canvas = canvas
Expand Down
2 changes: 1 addition & 1 deletion agent/component/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def _run(self, history, **kwargs):
else: retrieval_res = pd.DataFrame([])

for n, v in kwargs.items():
prompt = re.sub(r"\{%s\}" % re.escape(n), re.escape(str(v)), prompt)
prompt = re.sub(r"\{%s\}" % re.escape(n), str(v), prompt)

if not self._param.inputs and prompt.find("{input}") >= 0:
retrieval_res = self.get_input()
Expand Down
85 changes: 85 additions & 0 deletions agent/component/template.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#
# Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import re
from agent.component.base import ComponentBase, ComponentParamBase


class TemplateParam(ComponentParamBase):
"""
Define the Generate component parameters.
"""

def __init__(self):
super().__init__()
self.content = ""
self.parameters = []

def check(self):
self.check_empty(self.content, "[Template] Content")
return True


class Template(ComponentBase):
component_name = "Template"

def get_dependent_components(self):
cpnts = set([para["component_id"].split("@")[0] for para in self._param.parameters \
if para.get("component_id") \
and para["component_id"].lower().find("answer") < 0 \
and para["component_id"].lower().find("begin") < 0])
return list(cpnts)

def _run(self, history, **kwargs):
content = self._param.content

self._param.inputs = []
for para in self._param.parameters:
if not para.get("component_id"): continue
component_id = para["component_id"].split("@")[0]
if para["component_id"].lower().find("@") >= 0:
cpn_id, key = para["component_id"].split("@")
for p in self._canvas.get_component(cpn_id)["obj"]._param.query:
if p["key"] == key:
kwargs[para["key"]] = p.get("value", "")
self._param.inputs.append(
{"component_id": para["component_id"], "content": kwargs[para["key"]]})
break
else:
assert False, f"Can't find parameter '{key}' for {cpn_id}"
continue

cpn = self._canvas.get_component(component_id)["obj"]
if cpn.component_name.lower() == "answer":
hist = self._canvas.get_history(1)
if hist:
hist = hist[0]["content"]
else:
hist = ""
kwargs[para["key"]] = hist
continue

_, out = cpn.output(allow_partial=False)
if "content" not in out.columns:
kwargs[para["key"]] = ""
else:
kwargs[para["key"]] = " - "+"\n - ".join([o if isinstance(o, str) else str(o) for o in out["content"]])
self._param.inputs.append({"component_id": para["component_id"], "content": kwargs[para["key"]]})

for n, v in kwargs.items():
content = re.sub(r"\{%s\}" % re.escape(n), str(v), content)

return Template.be_output(content)

5 changes: 4 additions & 1 deletion api/apps/canvas_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,10 @@ def sse():
try:
for ans in canvas.run(stream=True):
if ans.get("running_status"):
yield "data:" + json.dumps({"code": 0, "message": "", "data": ans}, ensure_ascii=False) + "\n\n"
yield "data:" + json.dumps({"code": 0, "message": "",
"data": {"answer": ans["content"],
"running_status": True}},
ensure_ascii=False) + "\n\n"
continue
for k in ans.keys():
final_ans[k] = ans[k]
Expand Down
4 changes: 1 addition & 3 deletions api/apps/kb_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,7 @@ def rm():
if not KnowledgebaseService.delete_by_id(req["kb_id"]):
return get_data_error_result(
message="Database error (Knowledgebase removal)!")
tenants = UserTenantService.query(user_id=current_user.id)
for tenant in tenants:
settings.docStoreConn.deleteIdx(search.index_name(tenant.tenant_id), req["kb_id"])
settings.docStoreConn.delete({"kb_id": req["kb_id"]}, search.index_name(kbs[0].tenant_id), req["kb_id"])
return get_json_result(data=True)
except Exception as e:
return server_error_response(e)
2 changes: 2 additions & 0 deletions api/apps/sdk/doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ def upload(dataset_id, tenant_id):
return get_result(
message="No file selected!", code=settings.RetCode.ARGUMENT_ERROR
)
'''
# total size
total_size = 0
for file_obj in file_objs:
Expand All @@ -127,6 +128,7 @@ def upload(dataset_id, tenant_id):
message=f"Total file size exceeds 10MB limit! ({total_size / (1024 * 1024):.2f} MB)",
code=settings.RetCode.ARGUMENT_ERROR,
)
'''
e, kb = KnowledgebaseService.get_by_id(dataset_id)
if not e:
raise LookupError(f"Can't find the dataset with ID {dataset_id}!")
Expand Down
3 changes: 2 additions & 1 deletion api/apps/user_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,7 +517,8 @@ def user_register(user_id, user):
"llm_name": llm.llm_name,
"model_type": llm.model_type,
"api_key": settings.API_KEY,
"api_base": settings.LLM_BASE_URL,
"api_base": settings.LLM_BASE_URL
#"max_tokens": llm.max_tokens if llm.max_tokens else 8192
}
)

Expand Down
12 changes: 9 additions & 3 deletions docker/.env
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# The type of doc engine to use.
# Supported values are `elasticsearch`, `infinity`.
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}

# ------------------------------
Expand All @@ -20,7 +22,7 @@ ES_HOST=es01
ES_PORT=1200

# The password for Elasticsearch.
# When updated, you must revise the `es.password` entry in service_conf.yaml accordingly.
# When updated, you must revise the `es.password` entry in service_conf.yaml accordingly.
ELASTIC_PASSWORD=infini_rag_flow

# The port used to expose the Kibana service to the host machine,
Expand Down Expand Up @@ -85,7 +87,7 @@ RAGFLOW_IMAGE=infiniflow/ragflow:dev-slim
# RAGFLOW_IMAGE=infiniflow/ragflow:dev
#
# The Docker image of the dev edition includes:
# - Embedded embedding models:
# - Built-in embedding models:
# - BAAI/bge-large-zh-v1.5
# - BAAI/bge-reranker-v2-m3
# - maidalun1020/bce-embedding-base_v1
Expand Down Expand Up @@ -123,3 +125,7 @@ TIMEZONE='Asia/Shanghai'
# Optimizations for MacOS
# Uncomment the following line if your OS is MacOS:
# MACOS=1

# The maximum file size for each uploaded file, in bytes.
# You can uncomment this line and update the value if you wish to change 128M file size limit
# MAX_CONTENT_LENGTH=134217728
7 changes: 6 additions & 1 deletion docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The [.env](./.env) file contains important environment variables for Docker.

- `infiniflow/ragflow:dev-slim` (default): The RAGFlow Docker image without embedding models.
- `infiniflow/ragflow:dev`: The RAGFlow Docker image with embedding models including:
- Embedded embedding models:
- Built-in embedding models:
- `BAAI/bge-large-zh-v1.5`
- `BAAI/bge-reranker-v2-m3`
- `maidalun1020/bce-embedding-base_v1`
Expand Down Expand Up @@ -117,6 +117,11 @@ The [.env](./.env) file contains important environment variables for Docker.
- `MACOS`
Optimizations for MacOS. It is disabled by default. You can uncomment this line if your OS is MacOS.

### Maximum file size

- `MAX_CONTENT_LENGTH`
The maximum file size for each uploaded file, in bytes. You can uncomment this line if you wish to change 128M file size limit.

## 🐋 Service configuration

[service_conf.yaml](./service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor. In a dockerized setup, this file is automatically created based on the [service_conf.yaml.template](./service_conf.yaml.template) file (replacing all environment variables by their values).
Expand Down
6 changes: 3 additions & 3 deletions docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ The [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) file con
### MySQL

- `MYSQL_PASSWORD`
The password for MySQL.
The password for MySQL.
- `MYSQL_PORT`
The port used to expose the MySQL service to the host machine, allowing **external** access to the MySQL database running inside the Docker container. Defaults to `5455`.

Expand All @@ -75,7 +75,7 @@ The [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) file con
- `MINIO_PORT`
The port used to expose the MinIO API service to the host machine, allowing **external** access to the MinIO object storage service running inside the Docker container. Defaults to `9000`.
- `MINIO_USER`
The username for MinIO.
The username for MinIO.
- `MINIO_PASSWORD`
The password for MinIO. accordingly.

Expand All @@ -95,7 +95,7 @@ The [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) file con

- `infiniflow/ragflow:dev-slim` (default): The RAGFlow Docker image without embedding models.
- `infiniflow/ragflow:dev`: The RAGFlow Docker image with embedding models including:
- Embedded embedding models:
- Built-in embedding models:
- `BAAI/bge-large-zh-v1.5`
- `BAAI/bge-reranker-v2-m3`
- `maidalun1020/bce-embedding-base_v1`
Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,8 @@ Once you have selected an embedding model and used it to parse a file, you are n
_When the file parsing completes, its parsing status changes to **SUCCESS**._

:::caution NOTE
- If your file parsing gets stuck at below 1%, see [FAQ 4.3](https://ragflow.io/docs/dev/faq#43-why-does-my-document-parsing-stall-at-under-one-percent).
- If your file parsing gets stuck at near completion, see [FAQ 4.4](https://ragflow.io/docs/dev/faq#44-why-does-my-pdf-parsing-stall-near-completion-while-the-log-does-not-show-any-error)
- If your file parsing gets stuck at below 1%, see [this FAQ](https://ragflow.io/docs/dev/faq#why-does-my-document-parsing-stall-at-under-one-percent).
- If your file parsing gets stuck at near completion, see [this FAQ](https://ragflow.io/docs/dev/faq#why-does-my-pdf-parsing-stall-near-completion-while-the-log-does-not-show-any-error)
:::

## Intervene with file parsing
Expand Down
Loading

0 comments on commit f9df213

Please sign in to comment.