Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take full page screenshots #143 #148

Merged
merged 1 commit into from
May 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions apps/web/components/dashboard/preview/LinkContentSection.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ function ScreenshotSection({ link }: { link: ZBookmarkedLink }) {
return (
<div className="relative h-full min-w-full">
<Image
fill={true}
alt="screenshot"
src={`/api/assets/${link.screenshotAssetId}`}
className="object-contain"
width={0}
height={0}
sizes="100vw"
style={{ width: "100%", height: "auto" }}
/>
</div>
);
Expand Down
3 changes: 2 additions & 1 deletion apps/workers/crawlerWorker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -231,10 +231,11 @@ async function crawlPage(jobId: string, url: string) {
// If you change this, you need to change the asset type in the store function.
type: "png",
encoding: "binary",
fullPage: serverConfig.crawler.fullPageScreenshot,
}),
]);
logger.info(
`[Crawler][${jobId}] Finished capturing page content and a screenshot.`,
`[Crawler][${jobId}] Finished capturing page content and a screenshot. FullPageScreenshot: ${serverConfig.crawler.fullPageScreenshot}`,
);
return { htmlContent, screenshot, url: page.url() };
} finally {
Expand Down
1 change: 1 addition & 0 deletions docs/docs/03-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,6 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin
| CRAWLER_NUM_WORKERS | No | 1 | Number of allowed concurrent crawling jobs. By default, we're only doing one crawling request at a time to avoid consuming a lot of resources. |
| CRAWLER_DOWNLOAD_BANNER_IMAGE | No | true | Whether to cache the banner image used in the cards locally or fetch it each time directly from the website. Caching it consumes more storage space, but is more resilient against link rot and rate limits from websites. |
| CRAWLER_STORE_SCREENSHOT | No | true | Whether to store a screenshot from the crawled website or not. Screenshots act as a fallback for when we fail to extract an image from a website. You can also view the stored screenshots for any link. |
| CRAWLER_FULL_PAGE_SCREENSHOT | No | false | Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page |
| CRAWLER_JOB_TIMEOUT_SEC | No | 60 | How long to wait for the crawler job to finish before timing out. If you have a slow internet connection or a low powered device, you might want to bump this up a bit |
| CRAWLER_NAVIGATE_TIMEOUT_SEC | No | 30 | How long to spend navigating to the page (along with its redirects). Increase this if you have a slow internet connection |
2 changes: 2 additions & 0 deletions packages/shared/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ const allEnv = z.object({
CRAWLER_NUM_WORKERS: z.coerce.number().default(1),
CRAWLER_DOWNLOAD_BANNER_IMAGE: stringBool("true"),
CRAWLER_STORE_SCREENSHOT: stringBool("true"),
CRAWLER_FULL_PAGE_SCREENSHOT: stringBool("false"),
MEILI_ADDR: z.string().optional(),
MEILI_MASTER_KEY: z.string().default(""),
LOG_LEVEL: z.string().default("debug"),
Expand Down Expand Up @@ -66,6 +67,7 @@ const serverConfigSchema = allEnv.transform((val) => {
navigateTimeoutSec: val.CRAWLER_NAVIGATE_TIMEOUT_SEC,
downloadBannerImage: val.CRAWLER_DOWNLOAD_BANNER_IMAGE,
storeScreenshot: val.CRAWLER_STORE_SCREENSHOT,
fullPageScreenshot: val.CRAWLER_FULL_PAGE_SCREENSHOT,
},
meilisearch: val.MEILI_ADDR
? {
Expand Down
Loading