web_base loader is broken #11095

DmitryKatson · 2023-09-27T04:57:02Z

System Info

Name: langchain
Version: 0.0.299

Who can help?

@eyurtsev @hwchase17

Information

The official example notebooks/scripts
My own modified scripts

Related Components

Reproduction

Just follow https://python.langchain.com/docs/integrations/document_loaders/web_base

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.espn.com/")
data = loader.load()

Expected behavior

The standard WebLoader is broken when we pass webpath as described in the docs as
loader = WebBaseLoader("https://www.espn.com/")

However it workes if we pass as
loader = WebBaseLoader(["https://www.espn.com/"])

The reason for that is this commit

Also because of this breaking change the other Custom Web Loaders are broken as well IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader

The text was updated successfully, but these errors were encountered:

dosubot · 2023-09-27T05:04:48Z

🤖

Hello,

Thank you for bringing this issue to our attention. It seems like the problem is due to the way the web_paths attribute is set in the __init__ method of the WebBaseLoader class. If web_path is a string, it is not considered a Sequence and hence, it is not converted to a list. Instead, it is directly assigned to self.web_paths. This means that self.web_paths becomes a string instead of a list of strings. This could cause issues in methods like lazy_load, load, and aload where self.web_paths is iterated over, expecting a list of strings.

This issue would affect the functionality of other custom web loaders like IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader if they inherit from WebBaseLoader and use the web_paths attribute, as they would also face the same issue when a single webpath is passed as a string.

To fix this issue, you could modify the __init__ method to always convert web_path to a list, even if it's a single string:

if web_paths:
    self.web_paths = list(web_paths)
else:
    self.web_paths = [web_path] if isinstance(web_path, str) else list(web_path)

This way, self.web_paths will always be a list of strings, whether web_path is a single string or a sequence of strings.

As for the commit you mentioned, I'm unable to find the specific changes that could have caused the WebBaseLoader to break. Could you please provide more details about the changes in that commit?

Thank you for your patience and cooperation.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

DmitryKatson · 2023-09-27T05:16:41Z

This is not very useful as web_base loader doesn't work as it's written in the docs. The team should fix the web_base loader or change the docs and all dependent loaders

JahangirJadi · 2023-09-27T08:01:58Z

from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator

url = 'https://lilianweng.github.io/posts/2023-06-23-agent/'
loader = WebBaseLoader(URL)

This made me crazy.

loader.load not working. Please fix it.

error:
436 raise InvalidURL(*e.args)
438 if not scheme:
--> 439 raise MissingSchema(
440 f"Invalid URL {url!r}: No scheme supplied. "
441 f"Perhaps you meant https://github.com/langchain-ai/langchain/issues/3542{url}?"
442 )
444 if not host:
445 raise InvalidURL(f"Invalid URL {url!r}: No host supplied")

MissingSchema: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h?

DmitryKatson · 2023-09-27T08:46:37Z

from langchain.document_loaders import WebBaseLoader from langchain.indexes import VectorstoreIndexCreator

url = 'https://lilianweng.github.io/posts/2023-06-23-agent/' loader = WebBaseLoader(URL)

This made me crazy.

loader.load not working. Please fix it.

error: 436 raise InvalidURL(*e.args) 438 if not scheme: --> 439 raise MissingSchema( 440 f"Invalid URL {url!r}: No scheme supplied. " 441 f"Perhaps you meant https://github.com/langchain-ai/langchain/issues/3542{url}?" 442 ) 444 if not host: 445 raise InvalidURL(f"Invalid URL {url!r}: No host supplied")

MissingSchema: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h?

@JahangirJadi
So as a quick workaround solution this works.

url = 'https://lilianweng.github.io/posts/2023-06-23-agent/'
loader = WebBaseLoader([URL])

selfmeister · 2023-09-28T17:26:23Z

worked for me:

if isinstance(web_path, str):
     self.web_paths = [web_path]
else:
     self.web_paths = list(web_paths)```

Fix initialization #11095

@baskaryan

* Support using async callback handlers with sync callback manager (langchain-ai#10945) The current behaviour just calls the handler without awaiting the coroutine, which results in exceptions/warnings, and obviously doesn't actually execute whatever the callback handler does  * LangServe (langchain-ai#11046) Adds LangServe package * Integrate Runnables with Fast API creating Server and a RemoteRunnable client * Support multiple runnables for a given server * Support sync/async/batch/abatch/stream/astream/astream_log on the client side (using async implementations on server) * Adds validation using annotations (relying on pydantic under the hood) -- this still has some rough edges -- e.g., open api docs do NOT generate correctly at the moment * Uses pydantic v1 namespace Known issues: type translation code doesn't handle a lot of types (e.g., TypedDicts) --------- Co-authored-by: Bagatur <[email protected]> * Add input/output schemas to runnables (langchain-ai#11063) This adds `input_schema` and `output_schema` properties to all runnables, which are Pydantic models for the input and output types respectively. These are inferred from the structure of the Runnable as much as possible, the only manual typing needed is - optionally add type hints to lambdas (which get translated to input/output schemas) - optionally add type hint to RunnablePassthrough These schemas can then be used to create JSON Schema descriptions of input and output types, see the tests - [x] Ensure no InputType and OutputType in our classes use abstract base classes (replace with union of subclasses) - [x] Implement in BaseChain and LLMChain - [x] Implement in RunnableBranch - [x] Implement in RunnableBinding, RunnableMap, RunnablePassthrough, RunnableEach, RunnableRouter - [x] Implement in LLM, Prompt, Chat Model, Output Parser, Retriever - [x] Implement in RunnableLambda from function signature - [x] Implement in Tool  * Expose loads and dumps in load namespace * Async support for OpenAIFunctionsAgentOutputParser (langchain-ai#11140) * milvus collections (langchain-ai#11148) Description: There was no information about Milvus collections in the documentation, so I am adding that. Maintainer: @eyurtsev * Xata chat memory FIX (langchain-ai#11145) - **Description:** Changed data type from `text` to `json` in xata for improved performance. Also corrected the `additionalKwargs` key in the `messages()` function to `additional_kwargs` to adhere to `BaseMessage` requirements. - **Issue:** The Chathisroty.messages() will return {} of `additional_kwargs`, as the name is wrong for `additionalKwargs` . - **Dependencies:** N/A - **Tag maintainer:** N/A - **Twitter handle:** N/A My PR is passing linting and testing before submitting. * Fixed Typo Error in Update get_started.mdx file by addressing a minor typographical error. (langchain-ai#11154) Fixed Typo Error in Update get_started.mdx file by addressing a minor typographical error. This improvement enhances the readability and correctness of the notebook, making it easier for users to understand and follow the demonstration. The commit aims to maintain the quality and accuracy of the content within the repository. please review the change at your convenience. @baskaryan , @hwaking * Implement better reprs for Runnables * x * x * x * x * Fix stop key of TextGen. (langchain-ai#11109) The key of stopping strings used in text-generation-webui api is [`stopping_strings`](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples/api-example.py#L51), not `stop`.  * LangServe: Clean up init files (langchain-ai#11174) Clean up init files * mypy * Lint * Lint * Expose lc_id as a classmethod (langchain-ai#11176) * Expose LC id as a class method * User should not need to know that the last part of the id is the class name * Update Bedrock service name to "bedrock-runtime" and model identifiers (langchain-ai#11161) - **Description:** Bedrock updated boto service name to "bedrock-runtime" for the InvokeModel and InvokeModelWithResponseStream APIs. This update also includes new model identifiers for Titan text, embedding and Anthropic. Co-authored-by: Mani Kumar Adari <[email protected]> * LangServe: Add release workflow (langchain-ai#11178) Add release workflow to langserve * LangServe: Update langchain requirement for publishing (langchain-ai#11186) Update langchain requirement for publishing * temporarily skip embedding empty string test (langchain-ai#11187) * Fix anthropic secret key when passed in via init (langchain-ai#11185) Fixes anthropic secret key when passed via init langchain-ai#11182 * add anthropic scheduled tests and unit tests (langchain-ai#11188) * Rm additional file check for scheduled tests (langchain-ai#11192) cc @obi1kenobi Causing issues with GHA creds https://github.com/langchain-ai/langchain/actions/runs/6342674950/job/17228926776 * Add source metadata to OutlookMessageLoader (langchain-ai#11183) Description: Add "source" metadata to OutlookMessageLoader This pull request adds the "source" metadata to the OutlookMessageLoader class in the load method. The "source" metadata is required when indexing with RecordManager in order to sync the index documents with a source. Issue: None Dependencies: None Twitter handle: @ATelders Co-authored-by: Arthur Telders <[email protected]> * [OpenSearch] Add Self Query Retriever Support to OpenSearch (langchain-ai#11184) ### Description Add Self Query Retriever Support to OpenSearch ### Maintainers @rlancemartin, @eyurtsev, @navneet1v ### Twitter Handle @OpenSearchProj Signed-off-by: Naveen Tatikonda <[email protected]> * [ElasticsearchStore] Improve migration text to ElasticsearchStore (langchain-ai#11158) We noticed that as we have been moving developers to the new `ElasticsearchStore` implementation, we want to keep the ElasticVectorSearch class still available as developers transition slowly to the new store. To speed up this process, I updated the blurb giving them a better recommendation of why they should use ElasticsearchStore. * update docs nav (langchain-ai#11146) * Add langserve version (langchain-ai#11195) Add langserve version * [Feat] Add optional client-side encryption to DynamoDB chat history memory (langchain-ai#11115) **Description:** Added optional client-side encryption to the Amazon DynamoDB chat history memory with an AWS KMS Key ID using the [AWS Database Encryption SDK for Python](https://docs.aws.amazon.com/database-encryption-sdk/latest/devguide/python.html) **Issue:** langchain-ai#7886 **Dependencies:** [dynamodb-encryption-sdk](https://pypi.org/project/dynamodb-encryption-sdk/) **Tag maintainer:** @hwchase17 **Twitter handle:** [@jplock](https://twitter.com/jplock/) --------- Co-authored-by: Bagatur <[email protected]> * Shared Executor (langchain-ai#11028) * LLMonitor Callback handler: fix bug (langchain-ai#11128) Here is a small bug fix for the LLMonitor callback handler. I've also added user identification capabilities. * Add support for MongoDB Atlas $vectorSearch vector search (langchain-ai#11139) Adds support for the `$vectorSearch` operator for MongoDBAtlasVectorSearch, which was announced at .Local London (September 26th, 2023). This change maintains breaks compatibility support for the existing `$search` operator used by the original integration (langchain-ai#5338) due to incompatibilities in the Atlas search implementations. --------- Co-authored-by: Bagatur <[email protected]> * add from_existing_graph to neo4j vector (langchain-ai#11124) This PR adds the option to create a Neo4jvector instance from existing graph, which embeds existing text in the database and creates relevant indices. * Add `add_graph_documents` support for FalkorDBGraph (langchain-ai#11122) Adding `add_graph_documents` support for FalkorDBGraph and extending the `Neo4JGraph` api so it can support `cypher.py` * FIx eval prompt (langchain-ai#11087) **Description:** fixes a common typo in some of the eval criteria. * Expanded version range for networkx, fixed sample notebook (langchain-ai#11094) ## Description Expanded the upper bound for `networkx` dependency to allow installation of latest stable version. Tested the included sample notebook with version 3.1, and all steps ran successfully. --------- Co-authored-by: Bagatur <[email protected]> * docs: Mendable Search Improvements (langchain-ai#11199) Improvements to the Mendable UI, more accurate responses, and bug fixes. * Change type annotations from LLMChain to Chain in MultiPromptChain (langchain-ai#11082) - **Description:** The types of 'destination_chains' and 'default_chain' in 'MultiPromptChain' were changed from 'LLMChain' to 'Chain'. and removed variables declared overlapping with the parent class - **Issue:** When a class that inherits only Chain and not LLMChain, such as 'SequentialChain' or 'RetrievalQA', is entered in 'destination_chains' and 'default_chain', a pydantic validation error is raised. - - codes ``` retrieval_chain = ConversationalRetrievalChain( retriever=doc_retriever, combine_docs_chain=combine_docs_chain, question_generator=question_gen_chain, ) destination_chains = { 'retrieval': retrieval_chain, } main_chain = MultiPromptChain( router_chain=router_chain, destination_chains=destination_chains, default_chain=default_chain, verbose=True, ) ``` ✅ `make format`, `make lint` and `make test` * fix: short-circuit black and mypy calls when no changes made (langchain-ai#11051) Both black and mypy expect a list of files or directories as input. As-is the Makefile computes a list files changed relative to the last commit; these are passed to black and mypy in the `format_diff` and `lint_diff` targets. This is done by way of the Makefile variable `PYTHON_FILES`. This is to save time by skipping running mypy and black over the whole source tree. When no changes have been made, this variable is empty, so the call to black (and mypy) lacks input files. The call exits with error causing the Makefile target to error out with: ```bash $ make format_diff poetry run black Usage: black [OPTIONS] SRC ... One of 'SRC' or 'code' is required. make: *** [format_diff] Error 1 ``` This is unexpected and undesirable, as the naive caller (that's me! 😄 ) will think something else is wrong. This commit smooths over this by short circuiting when `PYTHON_FILES` is empty. * Callback integration for Trubrics (langchain-ai#11059) After contributing to some examples in the [langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook) with @hinthornw, here is a PR that adds a callback handler to use LangChain with [Trubrics](https://github.com/trubrics/trubrics-sdk). * Support add_embeddings for opensearch (langchain-ai#11050) - **Description:** - Make running integration test for opensearch easy - Provide a way to use different text for embedding: refer to langchain-ai#11002 for more of the use case and design decision. - **Issue:** N/A - **Dependencies:** None other than the existing ones. * chore: add support for TypeScript code splitting (langchain-ai#11160) - **Description:** Adds typescript language to `TextSplitter` --------- Co-authored-by: Jacob Lee <[email protected]> * fix trubrics lint issue (langchain-ai#11202) * SearchApi integration (langchain-ai#11023) Based on the customers' requests for native langchain integration, SearchApi is ready to invest in AI and LLM space, especially in open-source development. - This is our initial PR and later we want to improve it based on customers' and langchain users' feedback. Most likely changes will affect how the final results string is being built. - We are creating similar native integration in Python and JavaScript. - The next plan is to integrate into Java, Ruby, Go, and others. - Feel free to assign @SebastjanPrachovskij as a main reviewer for any SearchApi-related searches. We will be glad to help and support langchain development. * Synthetic Data generation (langchain-ai#9472) --------- Co-authored-by: William Fu-Hinthorn <[email protected]> Co-authored-by: Bagatur <[email protected]> * LangServe: Relax requirements (langchain-ai#11198) Relax requirements * Add last_edited_time and created_time props to NotionDBLoader (langchain-ai#11020) # Description Adds logic for NotionDBLoader to correctly populate `last_edited_time` and `created_time` fields from [page properties](https://developers.notion.com/reference/page#property-value-object). There are no relevant tests for this code to be updated. --------- Co-authored-by: Bagatur <[email protected]> * `LlamaCppEmbeddings`: adds `verbose` parameter, similar to `llms.LlamaCpp` class (langchain-ai#11038) ## Description As of now, when instantiating and during inference, `LlamaCppEmbeddings` outputs (a lot of) verbose when controlled from Langchain binding - it is a bit annoying when computing the embeddings of long documents, for instance. This PR adds `verbose` for `LlamaCppEmbeddings` objects to be able **not** to print the verbose of the model to `stderr`. It is natively supported by `llama-cpp-python` and directly passed to the library – the PR is hence very small. The value of `verbose` is `True` by default, following the way it is defined in [`LlamaCpp` (`llamacpp.py` #L136-L137)](https://github.com/langchain-ai/langchain/blob/c87e9fb2ce0ae617e3b2edde52421c80adef54cc/libs/langchain/langchain/llms/llamacpp.py#L136-L137) ## Issue _No issue linked_ ## Dependencies _No additional dependency needed_ ## To see it in action ```python from langchain.embeddings import LlamaCppEmbeddings MODEL_PATH = "<path_to_gguf_file>" if __name__ == "__main__": llm_embeddings = LlamaCppEmbeddings( model_path=MODEL_PATH, n_gpu_layers=1, n_batch=512, n_ctx=2048, f16_kv=True, verbose=False, ) ``` Co-authored-by: Bagatur <[email protected]> * Support new version of tiktoken that are working with langchain (tag "^0.3.2" => "">=0.3.2,<0.6.0" and python "^3.9" =>">=3.9") (langchain-ai#11006) - **Description:** be able to use langchain with other version than tiktoken 0.3.3 i.e 0.5.1 - **Issue:** cannot installed the conda-forge version since it applied all optional dependency: conda-forge/langchain-feedstock#85 replace "^0.3.2" by "">=0.3.2,<0.6.0" and "^3.9" by python=">=3.9" Tested with python 3.10, langchain=0.0.288 and tiktoken==0.5.0 --------- Co-authored-by: Bagatur <[email protected]> * Typo fix to MathpixPDFLoader - changed processed_file_format default … (langchain-ai#10960) …from mmd to md. langchain-ai#7282  Co-authored-by: jare0530 <[email protected]> * Fix web-base loader (langchain-ai#11135) Fix initialization langchain-ai#11095 * Updated `LocalAIEmbeddings` docstring to better explain why `openai` (langchain-ai#10946) Fixes my misgivings in langchain-ai#10912 * Add support for project metadata in run_on_dataset (langchain-ai#11200) * Add from_embeddings for opensearch (langchain-ai#10957) * Skip for py3.8 * Skip in py3.8 * skip more * Even more * Enable creating Tools from any Runnable * Fix invocation * Lint * Lint * Add RunnableGenerator * Add tests * Lint * Add a streaming json parser * Implement str one * WIP Add tests§ * Implement diff * Implement diff * Backwards compat * Clean warnings: replace type with isinstance and fix syntax (langchain-ai#11219) Clean warnings: replace type with `isinstance` and fix on notebook syntax syntax * Add async tests and comments * Update fireworks features (langchain-ai#11205) Description * Update fireworks feature on web page Issue - Not applicable Dependencies - None Tag maintainer - @baskaryan * mongodb doc loader init (langchain-ai#10645) - **Description:** A Document Loader for MongoDB - **Issue:** n/a - **Dependencies:** Motor, the async driver for MongoDB - **Tag maintainer:** n/a - **Twitter handle:** pigpenblue Note that an initial mongodb document loader was created 4 months ago, but the [PR ](langchain-ai#4285 never pulled in. @leo-gan had commented on that PR, but given it is extremely far behind the master branch and a ton has changed in Langchain since then (including repo name and structure), I rewrote the branch and issued a new PR with the expectation that the old one can be closed. Please reference that old PR for comments/context, but it can be closed in favor of this one. Thanks! --------- Co-authored-by: Bagatur <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]> * Suppress warnings in interactive env that stem from tab completion (langchain-ai#11190) Suppress warnings in interactive environments that can arise from users relying on tab completion (without even using deprecated modules). jupyter seems to filter warnings by default (at least for me), but ipython surfaces them all * OpenAI gpt-3.5-turbo-instruct cost information (langchain-ai#11218) Added pricing info for `gpt-3.5-turbo-instruct` for OpenAI and Azure OpenAI. Co-authored-by: Attila Tőkés <[email protected]> * Fix typo in gradient.ipynb (langchain-ai#11206) Enviroment -> Environment  * Make test deterministic * bump 305 (langchain-ai#11224) * Using langchain input types (langchain-ai#11204) Using langchain input type * Make tests stricter, remove old code, fix up pydantic import when using v2 (langchain-ai#11231) Make tests stricter, remove old code, fix up pydantic import when using v2 (langchain-ai#11231) * Combine with existing json output parsers * Lint * Keep exceptions when not in streaming mode * Update json.py Co-authored-by: Eugene Yurtsev <[email protected]> * Update json.py Co-authored-by: Eugene Yurtsev <[email protected]> * Lint * Remove flawed test - It is not possible to access properties on classes, only on instances, therefore this test is not something we can implement * Implement RunnablePassthrough.assign(...) (langchain-ai#11222) Passes through dict input and assigns additional keys  * Add type to message chunks (langchain-ai#11232) * Ignore aadd (langchain-ai#11235) * fix code injection vuln (langchain-ai#11233) - **Description:** Fix a code injection vuln by adding one more keyword into the filtering list - **Issue:** N/A - **Dependencies:** N/A - **Tag maintainer:** - **Twitter handle:** Co-authored-by: Eugene Yurtsev <[email protected]> * Bump deps in langserve (langchain-ai#11234) Bump deps in langserve lockfile * Update DeepSparse LLM (langchain-ai#11236) **Description:** Adds streaming and many more sampling parameters to the DeepSparse interface --------- Co-authored-by: Harrison Chase <[email protected]> * docs: `integrations/memory` consistency (langchain-ai#10255) - updated titles and descriptions of the `integrations/memory` notebooks into consistent and laconic format; - removed `docs/extras/integrations/memory/motorhead_memory_managed.ipynb` file as a duplicate of the `docs/extras/integrations/memory/motorhead_memory.ipynb`; - added `integrations/providers` Integration Cards for `dynamodb`, `motorhead`. - updated `integrations/providers/redis.mdx` with links - renamed several notebooks; updated `vercel.json` to reroute new names. * docs: `document_transformers` consistency (langchain-ai#10467) - Updated `document_transformers` examples: titles, descriptions, links - Added `integrations/providers` for missed document_transformers * docs: updated `YouTube` and `tutorial` video links (langchain-ai#10897) updated `YouTube` and `tutorial` videos with new links. Removed couple of duplicates. Reordered several links by view counters Some formatting: emphasized the names of products * minor fix: remove redundant code from OpenAIFunctionsAgent (langchain-ai#11245) minor fix: remove redundant code from OpenAIFunctionsAgent (langchain-ai#11245) * rename repo namespace to langchain-ai (langchain-ai#11259) ### Description renamed several repository links from `hwchase17` to `langchain-ai`. ### Why I discovered that the README file in the devcontainer contains an old repository name, so I took the opportunity to rename the old repository name in all files within the repository, excluding those that do not require changes. ### Dependencies none ### Tag maintainer @baskaryan ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda) * Fix typo in docstring (langchain-ai#11256) Description : Remove meaningless 's' in docstring * Create new RunnableSerializable class in preparation for configurable runnables - Also move RunnableBranch to its own file * Lint * Lint * Lint * Lint * Move RunnableWithFallbacks to its own file * Lint * Lint * Lint * Update quickstart.mdx to add backtick after `ChatMessages` (langchain-ai#11241) While going through the documentation I found this small issue and wanted to contribute!  * Remove extra spaces (langchain-ai#11283) ### Description When I was reading the document, I found that some examples had extra spaces and violated "Unexpected spaces around keyword / parameter equals (E251)" in pep8. I removed these extra spaces. ### Tag maintainer @eyurtsev ### Twitter handle [billvsme](https://twitter.com/billvsme) * Add base docker image and ci script for building and pushing (langchain-ai#10927) * bump 306 (langchain-ai#11289) * Small changes to runnable docs (langchain-ai#11293)  * Add Google GitHub Action creds file to gitignore. (langchain-ai#11296) Should resolve the issue here: https://github.com/langchain-ai/langchain/actions/runs/6342767671/job/17229204508#step:7:36 After this merges, we can revert langchain-ai#11192 * Add pending deprecation warning (langchain-ai#11133) This PR uses 2 dedicated LangChain warnings types for deprecations (mirroring python's built in deprecation and pending deprecation warnings). These deprecation types are unslienced during initialization in langchain achieving the same default behavior that we have with our current warnings approach. However, because these warnings have a dedicated type, users will be able to silence them selectively (I think this is strictly better than our current handling of warnings). The PR adds a deprecation warning to llm symbolic math. --------- Co-authored-by: Predrag Gruevski <[email protected]> * Make numexpr optional (langchain-ai#11049) Co-authored-by: Eugene Yurtsev <[email protected]> * Bump min version of numexpr (langchain-ai#11302) Bump min version * Bedrock scheduled tests (langchain-ai#11194) * Fix closing bracket in length-based selector snippet (langchain-ai#11294) **Description:** Fix a forgotten closing bracket in the length-based selector snippet Co-authored-by: Eugene Yurtsev <[email protected]> * Fix line break in docs imports (langchain-ai#11270) It is just a straightforward docs fix. * add LLMBashChain to experimental (langchain-ai#11305) Add LLMBashChain to experimental * Add .configurable_fields() and .configurable_alternatives() to expose fields of a Runnable to be configured at runtime (langchain-ai#11282) * Upgrade `langchain` dependency versions to resolve dependabot alerts. (langchain-ai#11307) * Add scoring chain (langchain-ai#11123)  * Make Google PaLM classes serialisable (langchain-ai#11121) Similarly to Vertex classes, PaLM classes weren't marked as serialisable. Should be working fine with LangSmith. --------- Co-authored-by: Erick Friis <[email protected]> * Mark Vertex AI classes as serialisable (langchain-ai#10484)  --------- Co-authored-by: Erick Friis <[email protected]> * Adds Tavily Search API retriever (langchain-ai#11314) @baskaryan @efriis * Update clarifai.mdx --------- Signed-off-by: Naveen Tatikonda <[email protected]> Co-authored-by: Nuno Campos <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]> Co-authored-by: Bagatur <[email protected]> Co-authored-by: William FH <[email protected]> Co-authored-by: Apurv Agarwal <[email protected]> Co-authored-by: Nan LI <[email protected]> Co-authored-by: Nuno Campos <[email protected]> Co-authored-by: Akio Nishimura <[email protected]> Co-authored-by: mani2348 <[email protected]> Co-authored-by: Mani Kumar Adari <[email protected]> Co-authored-by: Arthur Telders <[email protected]> Co-authored-by: Arthur Telders <[email protected]> Co-authored-by: Naveen Tatikonda <[email protected]> Co-authored-by: Joseph McElroy <[email protected]> Co-authored-by: Justin Plock <[email protected]> Co-authored-by: Bagatur <[email protected]> Co-authored-by: Hugues <[email protected]> Co-authored-by: Noah Stapp <[email protected]> Co-authored-by: Tomaz Bratanic <[email protected]> Co-authored-by: Guy Korland <[email protected]> Co-authored-by: Piotr Mardziel <[email protected]> Co-authored-by: Piyush Jain <[email protected]> Co-authored-by: Nicolas <[email protected]> Co-authored-by: Michael Kim <[email protected]> Co-authored-by: Michael Landis <[email protected]> Co-authored-by: Jeff Kayne <[email protected]> Co-authored-by: Kenneth Choe <[email protected]> Co-authored-by: Fynn Flügge <[email protected]> Co-authored-by: Jacob Lee <[email protected]> Co-authored-by: Donatas Remeika <[email protected]> Co-authored-by: PaperMoose <[email protected]> Co-authored-by: Noah Czelusta <[email protected]> Co-authored-by: Clément Sicard <[email protected]> Co-authored-by: Dr. Fabien Tarrade <[email protected]> Co-authored-by: jreinjr <[email protected]> Co-authored-by: jare0530 <[email protected]> Co-authored-by: James Braza <[email protected]> Co-authored-by: Cynthia Yang <[email protected]> Co-authored-by: Jon Saginaw <[email protected]> Co-authored-by: Attila Tőkés <[email protected]> Co-authored-by: Attila Tőkés <[email protected]> Co-authored-by: Ikko Eltociear Ashimine <[email protected]> Co-authored-by: Haozhe <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Leonid Ganeline <[email protected]> Co-authored-by: Dayuan Jiang <[email protected]> Co-authored-by: Kazuki Maeda <[email protected]> Co-authored-by: Yeonji-Lim <[email protected]> Co-authored-by: James Odeyale <[email protected]> Co-authored-by: zhengkai <[email protected]> Co-authored-by: Predrag Gruevski <[email protected]> Co-authored-by: Oleg Sinavski <[email protected]> Co-authored-by: João Carabetta <[email protected]> Co-authored-by: CG80499 <[email protected]> Co-authored-by: David Duong <[email protected]> Co-authored-by: Erick Friis <[email protected]>

dosubot · 2023-12-28T16:02:23Z

Hi, @DmitryKatson

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue was raised due to the WebBaseLoader being broken when using the webpath as described in the documentation. There were discussions around potential fixes and workarounds provided by you and other users, but the resolution status remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you for your understanding and contributions.

I (Dosu)

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Sep 27, 2023

This was referenced Sep 27, 2023

Fix web-base loader #11134

Closed

Fix web-base loader #11135

Merged

mrtj mentioned this issue Sep 28, 2023

WebBaseLoader interprets incorrectly the web_path parameter #11180

Closed

14 tasks

finaldie mentioned this issue Sep 28, 2023

[Article & RSS] Robust saving phase finaldie/auto-news#31

Merged

baskaryan pushed a commit that referenced this issue Sep 29, 2023

Fix web-base loader (#11135)

2c114fc

Fix initialization #11095

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 28, 2023

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web_base loader is broken #11095

web_base loader is broken #11095

DmitryKatson commented Sep 27, 2023

dosubot bot commented Sep 27, 2023

DmitryKatson commented Sep 27, 2023

JahangirJadi commented Sep 27, 2023

DmitryKatson commented Sep 27, 2023

selfmeister commented Sep 28, 2023 •

edited

Loading

dosubot bot commented Dec 28, 2023

web_base loader is broken #11095

web_base loader is broken #11095

Comments

DmitryKatson commented Sep 27, 2023

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

dosubot bot commented Sep 27, 2023

Sources

DmitryKatson commented Sep 27, 2023

JahangirJadi commented Sep 27, 2023

DmitryKatson commented Sep 27, 2023

selfmeister commented Sep 28, 2023 • edited Loading

dosubot bot commented Dec 28, 2023

selfmeister commented Sep 28, 2023 •

edited

Loading