Skip to content

Commit

Permalink
feat: load all namespaces (langchain-ai#13549)
Browse files Browse the repository at this point in the history
- **Description:** This change allows for the `MWDumpLoader` to load all
namespaces including custom by default instead of only loading the
[default
namespaces](https://www.mediawiki.org/wiki/Help:Namespaces#Localisation).
  - **Tag maintainer:** @hwchase17
  • Loading branch information
andstu authored and amiaxys committed Nov 23, 2023
1 parent a516099 commit 93a2dd2
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions libs/langchain/langchain/document_loaders/mediawikidump.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def __init__(
self.file_path = file_path if isinstance(file_path, str) else str(file_path)
self.encoding = encoding
# Namespaces range from -2 to 15, inclusive.
self.namespaces = namespaces or list(range(-2, 16))
self.namespaces = namespaces
self.skip_redirects = skip_redirects
self.stop_on_error = stop_on_error

Expand All @@ -76,7 +76,7 @@ def load(self) -> List[Document]:
for page in dump.pages:
if self.skip_redirects and page.redirect:
continue
if page.namespace not in self.namespaces:
if self.namespaces and page.namespace not in self.namespaces:
continue
try:
for revision in page:
Expand Down

0 comments on commit 93a2dd2

Please sign in to comment.