Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to connect to the browser instance, will retry in 5 secs #674

Open
snowdream opened this issue Nov 19, 2024 · 19 comments
Open

Failed to connect to the browser instance, will retry in 5 secs #674

snowdream opened this issue Nov 19, 2024 · 19 comments
Labels
bug Something isn't working

Comments

@snowdream
Copy link

Describe the Bug

https://docs.hoarder.app/Installation/docker

i try to run hoarder with docker compose,but failed.

image image image image

Steps to Reproduce

  1. create .env
HOARDER_VERSION=release
NEXTAUTH_SECRET=super_random_string
MEILI_MASTER_KEY=another_random_string
NEXTAUTH_URL=http://localhost:3000
  1. create docker-compose.yml
version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 3000:3000
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      # OPENAI_API_KEY: ...
      DATA_DIR: /data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data

volumes:
  meilisearch:
  data:
  1. docker compose up -d
docker compose up -d

Expected Behaviour

http://localhost:3000/ is OK

Screenshots or Additional Context

image

Device Details

Microsoft Edge 版本 131.0.2903.48 (正式版本) (x86_64) On macOS

Exact Hoarder Version

release

@Azhelor
Copy link

Azhelor commented Nov 19, 2024

I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.

@Crush-RY
Copy link

same question

@kamtschatka
Copy link
Contributor

everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.

@Crush-RY
Copy link

My deployment system is linux and this is my config file

version: "3.8"
networks:
  traefiknet:
    external: true
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:release
    restart: unless-stopped
    container_name: hoarder
    volumes:
      - /opt/mydocker/hoarder/data:/data
    ports:
      - 54110:3000
    env_file:
      - .env
    networks:
      - traefiknet
    labels:
      - traefik.docker.network=traefiknet
      - traefik.enable=true
      - traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`)
      - traefik.http.routers.hoarder.entrypoints=http,https
      - traefik.http.routers.hoarder.priority=10
      - traefik.http.routers.hoarder.tls=true
      - traefik.http.services.hoarder.loadbalancer.server.port=3000
      - traefik.http.routers.hoarder.tls.certresolver=mycloudflare

  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    container_name: chrome
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
    networks:
      - traefiknet
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    container_name: meilisearch
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /opt/mydocker/hoarder/meilisearch:/meili_data
    networks:
      - traefiknet

@MohamedBassem
Copy link
Collaborator

@Crush-RY can you share the logs from the web container?

@MohamedBassem
Copy link
Collaborator

Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.

@MohamedBassem
Copy link
Collaborator

Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?

@MohamedBassem
Copy link
Collaborator

I've just pushed 393d097 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?

@Azhelor
Copy link

Azhelor commented Nov 21, 2024

Sure, here it is:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 411ms

> @hoarder/[email protected] start:prod /app/apps/workers
> tsx index.ts

2024-11-21T22:48:49.735Z info: Workers version: nightly
2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

(process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024
2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com
    at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/node-fetch/lib/index.js:1501:11)
    at ClientRequest.emit (node:events:518:28)
    at ClientRequest.emit (node:domain:489:12)
    at emitErrorEvent (node:_http_client:103:11)
    at TLSSocket.socketErrorListener (node:_http_client:506:5)
    at TLSSocket.emit (node:events:518:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
2024-11-21T22:51:10.023Z info: Starting crawler worker ...
2024-11-21T22:51:10.025Z info: Starting inference worker ...
2024-11-21T22:51:10.026Z info: Starting search indexing worker ...
2024-11-21T22:51:10.027Z info: Starting tidy assets worker ...
2024-11-21T22:51:10.028Z info: Starting video worker ...
2024-11-21T22:51:10.029Z info: Starting feed worker ...
2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578) 

Hope that helps.

And to answer your first question: I run hoarder since a few days and I have this error since the beginning.

@MohamedBassem
Copy link
Collaborator

Yeah, this is actually very helpful. I think I know how I can fix that!

@MohamedBassem
Copy link
Collaborator

So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent 378ad9b to ensure that this doesn't block worker startup. And in your case, you might also want to set CRAWLER_ENABLE_ADBLOCKER=false so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?

@Azhelor
Copy link

Azhelor commented Nov 22, 2024

Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work.

Here are the logs:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 358ms

> @hoarder/[email protected] start:prod /app/apps/workers
> tsx index.ts

(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.170Z info: Workers version: nightly
2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
2024-11-22T00:50:20.324Z info: Starting crawler worker ...
2024-11-22T00:50:20.325Z info: Starting inference worker ...
2024-11-22T00:50:20.325Z info: Starting search indexing worker ...
2024-11-22T00:50:20.326Z info: Starting tidy assets worker ...
2024-11-22T00:50:20.326Z info: Starting video worker ...
2024-11-22T00:50:20.326Z info: Starting feed worker ...
2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c"
2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:20.594Z info: [search][23] Completed successfully
2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
s [TRPCError]: Bookmark not found
    at /app/apps/web/.next/server/chunks/6815.js:1:16914
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:32333)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:33299)
    at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379
    at async Promise.all (index 1) {
  code: 'NOT_FOUND',
  [cause]: undefined
}
2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:34.758Z info: [search][24] Completed successfully
2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)

(process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024
2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ...
2024-11-22T00:50:42.989Z info: [search][26] Completed successfully
2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/[email protected][email protected]/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.

@MohamedBassem
Copy link
Collaborator

MohamedBassem commented Nov 22, 2024

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

@snowdream
Copy link
Author

As you know,I am in China.

Does hoarder access any api i can not access?

@Azhelor
Copy link

Azhelor commented Nov 22, 2024

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

Alright, I will investigate on my side, thank you for your help!

@Crush-RY
Copy link

I'm also in China, and as you mentioned, it turns out to be a network issue. I tried deploying hoarder on a VPS without network restrictions, and it worked perfectly.

Thanks a lot for your help!

@Azhelor
Copy link

Azhelor commented Nov 23, 2024

I finally managed to fix the error and it was indeed coming from bad default docker configuration. For people facing the same one, here are the steps I followed: https://stackoverflow.com/questions/39400886/docker-cannot-resolve-dns-on-private-network and then I restarted my server.

@AlotOfBlahaj
Copy link

AlotOfBlahaj commented Nov 25, 2024

In my situation, the web containter does not included the network where chrome and meilisearch in.

docker network inspect hoarder-app-eeoke0_default [ { "Name": "hoarder-app-eeoke0_default", "Id": "b0c282cc6d82c4c22e8124fb046eefff913b2e0f019693f6fe6cd6b86f685047", "Created": "2024-11-11T07:01:38.95469184+01:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.20.0.0/16", "Gateway": "172.20.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "23200b27a616749aa75a266277134e15300de124e77bc009fac94bc055170b1d": { "Name": "hoarder-app-eeoke0-chrome-1", "EndpointID": "12a25700f84cafefaa61e55ec9f8c9abafc638a68d6b928c3b6de9c71461028d", "MacAddress": "02:42:ac:14:00:02", "IPv4Address": "172.20.0.2/16", "IPv6Address": "" }, "76f3d8333267c3d0e1c4120d1520499466e44ab46ff728f345a6fa815ad3af50": { "Name": "hoarder-app-eeoke0-meilisearch-1", "EndpointID": "5362897cb6d2a74414ab7bcbee65223f40d895671d7b1a3f3ed67df2e5f3dc8e", "MacAddress": "02:42:ac:14:00:03", "IPv4Address": "172.20.0.3/16", "IPv6Address": "" } }, "Options": {}, "Labels": { "com.docker.compose.network": "default", "com.docker.compose.project": "hoarder-app-eeoke0", "com.docker.compose.version": "2.29.7" } } ]

I manualy add web to the network, and it works.

@FlorentLM
Copy link

Had the same dns issue with chome-alpine

Adding the launch parameter --headless=new fixed it.However it created another error:

Failed to fetch browser webSocket URL from http://172.28.0.4:9222/json/version: fetch failed

But that seems to be inconsequential (for now), so the workaround is fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants