You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have a site that we set up in the scraper config that is hosted on a non-standard HTTP/HTTPS port (3000). When setting the start_urls to a hostname with a port e.g. http://my-host:3000/ , the scraper fails with an error message suggesting it does not accept domains with ports. It looks like the old algolia scraper configs used to support ports so I assume this is related to an update to the scrapy package used in this forked solution.
Steps to reproduce
Build and run a docusaurus site locally, serving on http://localhost:3000
Update the Docsearch config to set the start_urls "start_urls":["http://localhost:3000/"]
run the docsearch scraper
Expected Behavior
Site is scraped and uploaded to Typesense server
Actual Behavior
Error returned from scraper:
PortWarning: allowed_domains accepts only domains without ports. Ignoring entry localhost:3000 in allowed_domains.
warnings.warn(message, PortWarning)
Metadata
Typesense Version:
Docker images:
typesense/typesense:0.24.1
typesense/docsearch-scraper:0.6.0
OS: Linux
The text was updated successfully, but these errors were encountered:
typesense-docsearch-scraper has all the commits from algolia-docsearch-scraper up to Dec 22, 2020. I don't see any updates in the algolia scraper since then where this port limitation was addressed...
Also I still see that error message about ports not allowed in allowed_domains in the master branch of scrapy here. So this limitation still exists as of today.
So I'm surprised to see a config in the docsearch scraper configs repo with a port number!
Description
We currently have a site that we set up in the scraper config that is hosted on a non-standard HTTP/HTTPS port (3000). When setting the
start_urls
to a hostname with a port e.g.http://my-host:3000/
, the scraper fails with an error message suggesting it does not accept domains with ports. It looks like the old algolia scraper configs used to support ports so I assume this is related to an update to the scrapy package used in this forked solution.Steps to reproduce
http://localhost:3000
"start_urls":["http://localhost:3000/"]
Expected Behavior
Actual Behavior
Error returned from scraper:
Metadata
Typesense Version:
Docker images:
OS: Linux
The text was updated successfully, but these errors were encountered: