Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serverside Rendering on Vercel fails; missing GLIBC_2.29 #15

Open
2 tasks done
ItsMeBrianD opened this issue Apr 14, 2023 · 18 comments
Open
2 tasks done

Serverside Rendering on Vercel fails; missing GLIBC_2.29 #15

ItsMeBrianD opened this issue Apr 14, 2023 · 18 comments

Comments

@ItsMeBrianD
Copy link

ItsMeBrianD commented Apr 14, 2023

What happens?

When attempting to deploy some Javascript project to Vercel that leverages SSR and DuckDB; the build fails.

The error message being presented by DuckDB is /lib64/libm.so.6: version 'GLIBC_2.29' not found (required by /vercel/path0/node_modules/duckdb/lib/binding/duckdb.node.

This has worked previously.

To Reproduce

This repo has a simple reproduction of the issue; simply create a vercel project based on this (or a fork), and the build will fail with the error message
https://github.com/ItsMeBrianD/duckdb-vercel-repro

OS:

Vercel

DuckDB Version:

0.7.1

DuckDB Client:

node

Full Name:

Brian Donald

Affiliation:

Evidence

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@archiewood
Copy link

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

@Mause
Copy link
Member

Mause commented Apr 15, 2023

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

@tobilg
Copy link

tobilg commented Apr 17, 2023

As Vercel is running on AWS Lambda as far as I know, I'm having a hard time imagining that this has worked before, as Lambda environments are currently based on Amazon Linux 2, which uses GLIBC 2.26. See https://repost.aws/questions/QUrXOioL46RcCnFGyELJWKLw/glibc-2-27-on-amazon-linux-2

I guess you could download my DuckDB for Lambda layer, and extract the build artifacts: https://github.com/tobilg/duckdb-nodejs-layer#arns

@pgzmnk
Copy link

pgzmnk commented Jul 24, 2023

Experiencing similar error on Vercel with both node 18.x and 16.x.

https://github.com/pgzmnk/openb

image

@tobilg
Copy link

tobilg commented Sep 27, 2023

I therefor created https://www.npmjs.com/package/duckdb-lambda-x86 which should solve the actual issue.

@Mause
Copy link
Member

Mause commented Oct 17, 2023

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

@archiewood any updates?

@hanshino
Copy link

I've encountered the same problem as described. Specifically, I'm using [email protected].

Environment:

  • Operating System: Ubuntu 22.02 and Mac M1 Sonoma
  • Encountered inside a Docker container
  • Docker Base Image: node:14

Steps to Reproduce:

docker run --rm -it node:14 bash

In node:14 container

mkdir app && cd app
yarn init -y
yarn add [email protected]
cd node_modules/duckdb
npm test

Are there any necessary packages that I need to install?

Tranlated by ChatGPT.


Sorry for my english is not good. I hope there's no offense.

@tobilg
Copy link

tobilg commented Oct 21, 2023

@hanshino the default duckdb npm package will not work IMO due to GLIBC incompatibilities, as described above. For Lambda usage, I maintain the https://www.npmjs.com/package/duckdb-lambda-x86 package which should fix your issues.

@ryan-williams
Copy link

ryan-williams commented Nov 17, 2023

Here's a wrapper over duckdb-async and duckdb-lambda-x86 that I just wrote, which seems to work both on my M1 macbook (which requires duckdb-async) and on an EC2 instance where I was previously hitting the GLIBC_2.29 error (where duckdb-lambda-x86 works instead):

// lib/duckdb.ts
let _query: Promise<(query: string) => any>

_query = import("duckdb-async")
    .then(duckdb => duckdb.Database)
    .then(Database => Database.create(":memory:"))
    .then((db: any) => ((query: string) => db.all(query)))
    .catch(async error => {
        console.log("duckdb init error:", error)
        let duckdb = await import("duckdb-lambda-x86");
        let Database: any = await duckdb.Database;
        const db = new Database(":memory:")
        const connection = db.connect()
        return (query: string) => {
            return new Promise((resolve, reject) => {
                connection.all(query, (err: any, res: any) => {
                    if (err) reject(err);
                    resolve(res);
                })
            })
        }
    })

export { _query }

Sample API endpoint that uses it:

// /api/query.ts
import { _query } from "@/lib/duckdb"
import { NextApiRequest, NextApiResponse } from "next";

// Convert BigInts to numbers
function replacer(key: string, value: any) {
    if (typeof value === 'bigint') {
        return Number(value)
    } else {
        return value;
    }
}

export default async function handler(
    req: NextApiRequest,
    res: NextApiResponse,
) {
    const { body: { path } } = req
    const query = await _query
    const rows = await query(`select * from read_parquet("${path}")`)  // 🚨 unsafe / SQLi 🚨
    res.status(200).send(JSON.stringify(rows, replacer))
}

@michaelwallabi
Copy link

FYI for others who run into this. I ended up using @tobilg's duckdb-lambda-x86 to resolve this with Vercel. In my case I'm just replacing the default duckdb.node binary with the duckdb-lambda-x86 version in the CI build.

@iku000888
Copy link

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project).
I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

@Dev-rick
Copy link

Dev-rick commented Sep 16, 2024

Even with replacing the binaries, I am getting the following issue on version 1.0.0. (I am on Vercel, Nodejs 20)

Unhandled Rejection: [Error: IO Error: Can't find the home directory at ''
Specify a home directory using the SET home_directory='/path/to/dir' option.] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'IO'
}

Setting a homedirectory does also result in an error:
Error: TypeError: Failed to set configuration option home_directory: Invalid Input Error: Could not set option "home_directory" as a global option
at new Database (/var/task/node_modules/duckdb-async/dist/duckdb-async.js:226:19)

Can anyone help me please? Thank you!

@michaelwallabi
Copy link

Like @iku000888, I do the following when creating a DB, which seems to work:

    const db = Database.create(":memory:");
    let tempDirectory = tmpdir() || '/tmp';
    await (await db).exec(`
        SET home_directory='${tempDirectory}';
        .... other settings here
        `);
    ```

@Dev-rick
Copy link

Dev-rick commented Sep 17, 2024

@iku000888 and @michaelwallabi Thanks for the input!

Unfortunately I am now getting the following error (on Vercel), on local everything works fine with the same env variables.

Error: HTTP Error: HTTP GET error on 'https://XXX.s3.amazonaws.com/XXX.parquet' (HTTP 400)] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'HTTP'
}

My code is:

const S3_LAKE_BUCKET_NAME = process.env.S3_LAKE_BUCKET_NAME
const AWS_S3_ACCESS_KEY = process.env['AWS_S3_ACCESS_KEY']
const AWS_S3_SECRET_KEY = process.env['AWS_S3_SECRET_KEY']
const AWS_S3_REGION = process.env['AWS_S3_REGION']

const retrieveDataFromParquet = async ({
  key,
  sqlStatement,
  tableName,
}: {
  key: string
  sqlStatement: string
  tableName: string
}) => {
  try {
    // Create a new DuckDB database connection
    const db = await Database.create(':memory:')

    console.log('Setting home directory...')
    await db.all(`SET home_directory='/tmp';`)

    console.log('Installing and loading httpfs extension...')
    await db.all(`
      INSTALL httpfs;
      LOAD httpfs;
    `)

    console.log('Setting S3 credentials...')
    await db.all(`
      SET s3_region='${AWS_S3_REGION}';
      SET s3_access_key_id='${AWS_S3_ACCESS_KEY}';
      SET s3_secret_access_key='${AWS_S3_SECRET_KEY}';
    `)

    // Test S3 access
    console.log('Testing S3 access...')
    try {
      const testResult = await db.all(`
        SELECT * FROM parquet_metadata('s3://${S3_LAKE_BUCKET_NAME}/${key}');
      `)
      console.log('S3 access test result successfully loaded:')
    } catch (s3Error) {
      console.error('Error testing S3 access:', s3Error)
      throw s3Error // Rethrow the error to stop execution
    }

    // Try to read file info without actually reading the file
    console.log('Checking file info...')
    try {
      const fileInfo = await db.all(`
        SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}') LIMIT 0;
      `)
      console.log('File info loaded')
    } catch (fileError) {
      console.error('Error checking file info:', fileError)
    }

    // If everything above works, try creating the table
    console.log('Creating table...')
    await db.all(
      `CREATE TABLE ${tableName} AS SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}');`,
    )

    console.log('Table created successfully')

    // Execute the query
    const result = db.all(sqlStatement)

    // Close the database connection
    db.close()

    // Send the result
    return result as unknown as Promise<{ [k: string]: any }[]>
  } catch (error) {
    console.error('Error:', error)
    return null
  }
}

@tobilg
Copy link

tobilg commented Sep 17, 2024

Have a look at my implementation at https://github.com/tobilg/serverless-duckdb/blob/main/src/lib/awsSecret.ts and triggering https://github.com/tobilg/serverless-duckdb/blob/main/src/functions/queryS3Express.ts#L95 before any access to S3.

Hint: IMO you also need to pass the SESSION_TOKEN and eventually the ENDPOINT as well if you're using S3 One-Zone Express.

I'm wondering why you're seeing a 400 status (invalid request), and not a 403 status though.

@tobilg
Copy link

tobilg commented Sep 17, 2024

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Thank you, appreciate the feedback!

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

@iku000888
Copy link

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

Oh hm that is interesting.
I thought I was running my lambdas on Node 20 and was getting ELF errors, so either AL 2023 still has issues or I'm not on Node 20 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants