-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add Testing Unit Guide (jest/vitest) #1999
Comments
Can you share more about what you tried? I am not aware of any real problems, we do use vitest in a few crawlers and it works just fine, but it depends on what you want to test. I wouldn't even try jest, as jest with ESM projects is a pain to deal with. |
Thank you for the swift reply! Here's my vitest test setup: import { describe, it, expect } from 'vitest'
import { scrapeWebsite } from './scrape-website'
describe('Email Extraction', async () => {
it('should extract all email addresses from the website', async () => {
let emails = await scrapeWebsite()
console.log(emails)
// expect(emails).toEqual(['[email protected]'])
})
}) import { CheerioCrawler, Configuration } from 'crawlee'
function extractAllMailToLinksAsEmailList($) {
/**
* @type {string[]}
*/
const mailtoLinks = []
$('a[href^="mailto:"]').each((index, element) => {
const link = $(element).attr('href')
const email = link?.replace('mailto:', '')
mailtoLinks.push(email)
})
return mailtoLinks
}
export async function scrapeWebsite() {
// Create new configuration
const crawlerConfig = new Configuration({
// do not write scraping data to filesystem
persistStorage: false,
})
let mailAddressesFromMailtoLinks = []
// crawler
const crawler = new CheerioCrawler(
{
async requestHandler({ $, request }) {
// handle loaded data
const title = $('title').text()
console.log(`The title of "${request.url}" is: ${title}.`)
mailAddressesFromMailtoLinks =
extractAllMailToLinksAsEmailList($)
},
},
crawlerConfig
)
// Start the crawler with the provided URLs
await crawler.run(['https://www.w3docs.com/snippets/html/how-to-create-mailto-links.html'])
return mailAddressesFromMailtoLinks
} The function works fine when calling it directly, but using ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Vitest caught 1 unhandled error during the test run.
This might cause false positive tests. Resolve unhandled errors to make sure your tests are not affected.
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Unhandled Error ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Error: Unexpected message on Worker: { type: 'ack', messageId: '4b4efb25-a9a2-4980-8182-5abf9a1163e0' }
❯ Worker.<anonymous> node_modules/.pnpm/[email protected]/node_modules/tinypool/dist/esm/index.js:548:28
❯ Worker.emit node:events:527:28
❯ MessagePort.<anonymous> node:internal/worker:232:53
❯ [nodejs.internal.kHybridDispatch] node:internal/event_target:639:20
❯ exports.emitMessage node:internal/per_context/messageport:23:28
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ |
I see someone actually already reported this to vitest, unfortunately no response in there. |
Looks like it's because the cc @vladfrangu |
hmm, it's definitely not our module doing any worker related stuff @B4nan, we've had that turned off for a while due to random v8/node hard crashes related to it and fs calls (also debt to remove since it's not needed anymore)... From that stacktrace, its something using tinypool. I don't know if that's vitest's doing or something else, maybe list what's importing tinypool (pnpm should have something like npm ls or yarn why) |
devDependencies: |
Oof, I found the issue. It is on our side, I forgot that code path was still called. I'll submit a PR in a minute to remove it! |
I'd say that error means that crawlee or something it depends on is posting messages that vitest does not understand. @vladfrangu so this is a dead code? if so, we should remove that, we have git to store the history, no need to keep code that is not in use. |
Thanks for the super quick diagnosis on this issue, guys! 👍 |
Hi there, just been wondering on the PR ETA. Is the issue more complex than previously thought ? Should I add a separate bug report? |
Nono, I'll have this submitted today! Sorry for the bit slower responses |
Which package is the feature request for? If unsure which one to select, leave blank
Feature
(First, thanks for this amazing package!)
A reliant software CI/CD pipeline requires testing. It would be great if Crawlee had a doc page that describes how to use a testing framework (like jest/vitest) to validate expected outcome.
Motivation
A reliant software CI/CD pipeline requires testing.
Tried adding unit test for a crawler, but it's not straightforward to get the results.
Ideal solution or implementation, and any additional constraints
A short page in the docs on how to add jest/vitest testing.
Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered: