Failed to connect to the browser instance, will retry in 5 secs #674

snowdream · 2024-11-19T11:07:01Z

Describe the Bug

https://docs.hoarder.app/Installation/docker

i try to run hoarder with docker compose,but failed.

Steps to Reproduce

create .env

HOARDER_VERSION=release
NEXTAUTH_SECRET=super_random_string
MEILI_MASTER_KEY=another_random_string
NEXTAUTH_URL=http://localhost:3000

create docker-compose.yml

version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 3000:3000
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      # OPENAI_API_KEY: ...
      DATA_DIR: /data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data

volumes:
  meilisearch:
  data:

docker compose up -d

docker compose up -d

Expected Behaviour

http://localhost:3000/ is OK

Screenshots or Additional Context

Device Details

Microsoft Edge 版本 131.0.2903.48 (正式版本) (x86_64) On macOS

Exact Hoarder Version

release

The text was updated successfully, but these errors were encountered:

Azhelor · 2024-11-19T20:48:15Z

I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.

Crush-RY · 2024-11-21T13:10:06Z

same question

kamtschatka · 2024-11-21T15:14:07Z

everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.

Crush-RY · 2024-11-21T15:31:59Z

My deployment system is linux and this is my config file

version: "3.8"
networks:
  traefiknet:
    external: true
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:release
    restart: unless-stopped
    container_name: hoarder
    volumes:
      - /opt/mydocker/hoarder/data:/data
    ports:
      - 54110:3000
    env_file:
      - .env
    networks:
      - traefiknet
    labels:
      - traefik.docker.network=traefiknet
      - traefik.enable=true
      - traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`)
      - traefik.http.routers.hoarder.entrypoints=http,https
      - traefik.http.routers.hoarder.priority=10
      - traefik.http.routers.hoarder.tls=true
      - traefik.http.services.hoarder.loadbalancer.server.port=3000
      - traefik.http.routers.hoarder.tls.certresolver=mycloudflare

  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    container_name: chrome
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
    networks:
      - traefiknet
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    container_name: meilisearch
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /opt/mydocker/hoarder/meilisearch:/meili_data
    networks:
      - traefiknet

MohamedBassem · 2024-11-21T22:02:37Z

@Crush-RY can you share the logs from the web container?

MohamedBassem · 2024-11-21T22:13:58Z

Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.

MohamedBassem · 2024-11-21T22:15:30Z

Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?

MohamedBassem · 2024-11-21T22:23:29Z

I've just pushed 393d097 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?

Azhelor · 2024-11-21T22:53:30Z

Sure, here it is:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 411ms

> @hoarder/[email protected] start:prod /app/apps/workers
> tsx index.ts

2024-11-21T22:48:49.735Z info: Workers version: nightly
2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

(process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024
2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com
    at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/node-fetch/lib/index.js:1501:11)
    at ClientRequest.emit (node:events:518:28)
    at ClientRequest.emit (node:domain:489:12)
    at emitErrorEvent (node:_http_client:103:11)
    at TLSSocket.socketErrorListener (node:_http_client:506:5)
    at TLSSocket.emit (node:events:518:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
2024-11-21T22:51:10.023Z info: Starting crawler worker ...
2024-11-21T22:51:10.025Z info: Starting inference worker ...
2024-11-21T22:51:10.026Z info: Starting search indexing worker ...
2024-11-21T22:51:10.027Z info: Starting tidy assets worker ...
2024-11-21T22:51:10.028Z info: Starting video worker ...
2024-11-21T22:51:10.029Z info: Starting feed worker ...
2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)

Hope that helps.

And to answer your first question: I run hoarder since a few days and I have this error since the beginning.

MohamedBassem · 2024-11-21T23:10:10Z

Yeah, this is actually very helpful. I think I know how I can fix that!

…ad adblock list. #674

MohamedBassem · 2024-11-21T23:42:34Z

So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent 378ad9b to ensure that this doesn't block worker startup. And in your case, you might also want to set CRAWLER_ENABLE_ADBLOCKER=false so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?

Azhelor · 2024-11-22T00:53:27Z

Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work.

Here are the logs:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 358ms

> @hoarder/[email protected] start:prod /app/apps/workers
> tsx index.ts

(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.170Z info: Workers version: nightly
2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
2024-11-22T00:50:20.324Z info: Starting crawler worker ...
2024-11-22T00:50:20.325Z info: Starting inference worker ...
2024-11-22T00:50:20.325Z info: Starting search indexing worker ...
2024-11-22T00:50:20.326Z info: Starting tidy assets worker ...
2024-11-22T00:50:20.326Z info: Starting video worker ...
2024-11-22T00:50:20.326Z info: Starting feed worker ...
2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c"
2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:20.594Z info: [search][23] Completed successfully
2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
s [TRPCError]: Bookmark not found
    at /app/apps/web/.next/server/chunks/6815.js:1:16914
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:32333)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:33299)
    at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379
    at async Promise.all (index 1) {
  code: 'NOT_FOUND',
  [cause]: undefined
}
2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:34.758Z info: [search][24] Completed successfully
2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)

(process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024
2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ...
2024-11-22T00:50:42.989Z info: [search][26] Completed successfully
2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.

MohamedBassem · 2024-11-22T00:57:24Z

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

snowdream · 2024-11-22T01:45:39Z

As you know,I am in China.

Does hoarder access any api i can not access?

Azhelor · 2024-11-22T10:43:09Z

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

Alright, I will investigate on my side, thank you for your help!

Crush-RY · 2024-11-22T16:34:53Z

I'm also in China, and as you mentioned, it turns out to be a network issue. I tried deploying hoarder on a VPS without network restrictions, and it worked perfectly.

Thanks a lot for your help!

Azhelor · 2024-11-23T14:53:51Z

I finally managed to fix the error and it was indeed coming from bad default docker configuration. For people facing the same one, here are the steps I followed: https://stackoverflow.com/questions/39400886/docker-cannot-resolve-dns-on-private-network and then I restarted my server.

AlotOfBlahaj · 2024-11-25T06:26:14Z

In my situation, the web containter does not included the network where chrome and meilisearch in.

docker network inspect hoarder-app-eeoke0_default [ { "Name": "hoarder-app-eeoke0_default", "Id": "b0c282cc6d82c4c22e8124fb046eefff913b2e0f019693f6fe6cd6b86f685047", "Created": "2024-11-11T07:01:38.95469184+01:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.20.0.0/16", "Gateway": "172.20.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "23200b27a616749aa75a266277134e15300de124e77bc009fac94bc055170b1d": { "Name": "hoarder-app-eeoke0-chrome-1", "EndpointID": "12a25700f84cafefaa61e55ec9f8c9abafc638a68d6b928c3b6de9c71461028d", "MacAddress": "02:42:ac:14:00:02", "IPv4Address": "172.20.0.2/16", "IPv6Address": "" }, "76f3d8333267c3d0e1c4120d1520499466e44ab46ff728f345a6fa815ad3af50": { "Name": "hoarder-app-eeoke0-meilisearch-1", "EndpointID": "5362897cb6d2a74414ab7bcbee65223f40d895671d7b1a3f3ed67df2e5f3dc8e", "MacAddress": "02:42:ac:14:00:03", "IPv4Address": "172.20.0.3/16", "IPv6Address": "" } }, "Options": {}, "Labels": { "com.docker.compose.network": "default", "com.docker.compose.project": "hoarder-app-eeoke0", "com.docker.compose.version": "2.29.7" } } ]

I manualy add web to the network, and it works.

FlorentLM · 2025-01-02T23:52:05Z

Had the same dns issue with chome-alpine

Adding the launch parameter --headless=new fixed it.However it created another error:

Failed to fetch browser webSocket URL from http://172.28.0.4:9222/json/version: fetch failed

But that seems to be inconsequential (for now), so the workaround is fine

This was referenced Nov 21, 2024

Failed to connect to the browser instance #682

Closed

Failed to connect to the browser instance, will retry in 5 secs #683

Closed

MohamedBassem added the bug Something isn't working label Nov 21, 2024

MohamedBassem added a commit that referenced this issue Nov 21, 2024

fix(workers): Don't block connection to chrome when failing to downlo…

378ad9b

…ad adblock list. #674

MohamedBassem mentioned this issue Nov 24, 2024

[Crawler] Failed to connect to the browser instance, will retry in 5 secs #577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to connect to the browser instance, will retry in 5 secs #674

Failed to connect to the browser instance, will retry in 5 secs #674

snowdream commented Nov 19, 2024

Azhelor commented Nov 19, 2024

Crush-RY commented Nov 21, 2024

kamtschatka commented Nov 21, 2024

Crush-RY commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

Azhelor commented Nov 21, 2024 •

edited

Loading

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

Azhelor commented Nov 22, 2024

MohamedBassem commented Nov 22, 2024 •

edited

Loading

snowdream commented Nov 22, 2024

Azhelor commented Nov 22, 2024

Crush-RY commented Nov 22, 2024

Azhelor commented Nov 23, 2024

AlotOfBlahaj commented Nov 25, 2024 •

edited

Loading

FlorentLM commented Jan 2, 2025

Failed to connect to the browser instance, will retry in 5 secs #674

Failed to connect to the browser instance, will retry in 5 secs #674

Comments

snowdream commented Nov 19, 2024

Describe the Bug

Steps to Reproduce

Expected Behaviour

Screenshots or Additional Context

Device Details

Exact Hoarder Version

Azhelor commented Nov 19, 2024

Crush-RY commented Nov 21, 2024

kamtschatka commented Nov 21, 2024

Crush-RY commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

Azhelor commented Nov 21, 2024 • edited Loading

MohamedBassem commented Nov 21, 2024

MohamedBassem commented Nov 21, 2024

Azhelor commented Nov 22, 2024

MohamedBassem commented Nov 22, 2024 • edited Loading

snowdream commented Nov 22, 2024

Azhelor commented Nov 22, 2024

Crush-RY commented Nov 22, 2024

Azhelor commented Nov 23, 2024

AlotOfBlahaj commented Nov 25, 2024 • edited Loading

FlorentLM commented Jan 2, 2025

Azhelor commented Nov 21, 2024 •

edited

Loading

MohamedBassem commented Nov 22, 2024 •

edited

Loading

AlotOfBlahaj commented Nov 25, 2024 •

edited

Loading