Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dataprep Upload Link issue #913

Merged
merged 9 commits into from
Nov 19, 2024

Conversation

letonghan
Copy link
Collaborator

Description

Fix html content loading problem of dataprep.
Use AsyncHtmlLoader of langchain to load and analysis html contents.

Issues

Fail to retrieve html contents after uploading html links.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Add html2text in requirements.txt

Tests

Local tested

pre-commit-ci bot and others added 8 commits November 18, 2024 09:52
* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: letonghan <[email protected]>
@chensuyue chensuyue added this to the v1.1 milestone Nov 19, 2024
@chensuyue
Copy link
Collaborator

chensuyue commented Nov 19, 2024

The left issue for vdms microservice will be tracking by another PR.
@srinarayan-srikanthan will submit the fixing PR.

@lvliang-intel lvliang-intel merged commit 1bfc430 into opea-project:main Nov 19, 2024
19 of 21 checks passed
@chensuyue chensuyue mentioned this pull request Nov 19, 2024
4 tasks
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Nov 22, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Nov 28, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Dec 2, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
@letonghan letonghan deleted the dataprep/upload_link branch December 19, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants