Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add Testing Unit Guide (jest/vitest) #1999

Closed
MentalGear opened this issue Jul 20, 2023 · 11 comments · Fixed by #2004
Closed

docs: Add Testing Unit Guide (jest/vitest) #1999

MentalGear opened this issue Jul 20, 2023 · 11 comments · Fixed by #2004
Labels
feature Issues that represent new features or improvements to existing features.

Comments

@MentalGear
Copy link

MentalGear commented Jul 20, 2023

Which package is the feature request for? If unsure which one to select, leave blank

Feature

(First, thanks for this amazing package!)

A reliant software CI/CD pipeline requires testing. It would be great if Crawlee had a doc page that describes how to use a testing framework (like jest/vitest) to validate expected outcome.

Motivation

A reliant software CI/CD pipeline requires testing.
Tried adding unit test for a crawler, but it's not straightforward to get the results.

Ideal solution or implementation, and any additional constraints

A short page in the docs on how to add jest/vitest testing.

Alternative solutions or implementations

No response

Other context

No response

@MentalGear MentalGear added the feature Issues that represent new features or improvements to existing features. label Jul 20, 2023
@B4nan
Copy link
Member

B4nan commented Jul 20, 2023

Tried adding unit test for a crawler, but it's not straightforward to get the results.

Can you share more about what you tried? I am not aware of any real problems, we do use vitest in a few crawlers and it works just fine, but it depends on what you want to test. I wouldn't even try jest, as jest with ESM projects is a pain to deal with.

@MentalGear
Copy link
Author

Thank you for the swift reply!

Here's my vitest test setup:

import { describe, it, expect } from 'vitest'
import { scrapeWebsite } from './scrape-website'

describe('Email Extraction', async () => {
    it('should extract all email addresses from the website', async () => {
        let emails = await scrapeWebsite()
        console.log(emails)
        // expect(emails).toEqual(['[email protected]'])
    })
})
import { CheerioCrawler, Configuration } from 'crawlee'

function extractAllMailToLinksAsEmailList($) {
    /**
     * @type {string[]}
     */
    const mailtoLinks = []

    $('a[href^="mailto:"]').each((index, element) => {
        const link = $(element).attr('href')
        const email = link?.replace('mailto:', '')
        mailtoLinks.push(email)
    })

    return mailtoLinks
}

export async function scrapeWebsite() {
    // Create new configuration
    const crawlerConfig = new Configuration({
        // do not write scraping data to filesystem
        persistStorage: false,
    })

    let mailAddressesFromMailtoLinks = []

    // crawler
    const crawler = new CheerioCrawler(
        {
            async requestHandler({ $, request }) {
                // handle loaded data
                const title = $('title').text()
                console.log(`The title of "${request.url}" is: ${title}.`)

                mailAddressesFromMailtoLinks =
                    extractAllMailToLinksAsEmailList($)
            },
        },
        crawlerConfig
    )

    // Start the crawler with the provided URLs
    await crawler.run(['https://www.w3docs.com/snippets/html/how-to-create-mailto-links.html'])

    return mailAddressesFromMailtoLinks
}

The function works fine when calling it directly, but using vitest I get the following error:

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

Vitest caught 1 unhandled error during the test run.
This might cause false positive tests. Resolve unhandled errors to make sure your tests are not affected.

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Unhandled Error ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Error: Unexpected message on Worker: { type: 'ack', messageId: '4b4efb25-a9a2-4980-8182-5abf9a1163e0' }
 ❯ Worker.<anonymous> node_modules/.pnpm/[email protected]/node_modules/tinypool/dist/esm/index.js:548:28
 ❯ Worker.emit node:events:527:28
 ❯ MessagePort.<anonymous> node:internal/worker:232:53
 ❯ [nodejs.internal.kHybridDispatch] node:internal/event_target:639:20
 ❯ exports.emitMessage node:internal/per_context/messageport:23:28

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

@B4nan
Copy link
Member

B4nan commented Jul 20, 2023

I see someone actually already reported this to vitest, unfortunately no response in there.

vitest-dev/vitest#2882

@B4nan
Copy link
Member

B4nan commented Jul 20, 2023

Looks like it's because the isMainThread method no longer works as expected, as vitest is using worker threads to run the tests (pointed out here).

cc @vladfrangu

@vladfrangu
Copy link
Member

hmm, it's definitely not our module doing any worker related stuff @B4nan, we've had that turned off for a while due to random v8/node hard crashes related to it and fs calls (also debt to remove since it's not needed anymore)...

From that stacktrace, its something using tinypool. I don't know if that's vitest's doing or something else, maybe list what's importing tinypool (pnpm should have something like npm ls or yarn why)

@MentalGear
Copy link
Author

devDependencies:
vitest 0.25.8
└── tinypool 0.3.0

@vladfrangu
Copy link
Member

Oof, I found the issue. It is on our side, I forgot that code path was still called. I'll submit a PR in a minute to remove it!

@B4nan
Copy link
Member

B4nan commented Jul 20, 2023

I'd say that error means that crawlee or something it depends on is posting messages that vitest does not understand.

@vladfrangu so this is a dead code? if so, we should remove that, we have git to store the history, no need to keep code that is not in use.

https://github.com/apify/crawlee/blob/master/packages/memory-storage/src/workers/file-storage-worker.ts

@MentalGear
Copy link
Author

MentalGear commented Jul 20, 2023

Thanks for the super quick diagnosis on this issue, guys! 👍
Looking forward to the PR to continue TDD'ing with crawlee ! 👨‍💻💻

@MentalGear
Copy link
Author

MentalGear commented Jul 21, 2023

Oof, I found the issue. It is on our side, I forgot that code path was still called. I'll submit a PR in a minute to remove it!

Hi there, just been wondering on the PR ETA. Is the issue more complex than previously thought ? Should I add a separate bug report?

@vladfrangu
Copy link
Member

Oof, I found the issue. It is on our side, I forgot that code path was still called. I'll submit a PR in a minute to remove it!

Hi there, just been wondering on the PR ETA. Is the issue more complex than previously thought ? Should I add a separate bug report?

Nono, I'll have this submitted today! Sorry for the bit slower responses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants