Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.error: required argument is not an integer #37

Open
SamuelMarks opened this issue Aug 23, 2023 · 1 comment
Open

struct.error: required argument is not an integer #37

SamuelMarks opened this issue Aug 23, 2023 · 1 comment

Comments

@SamuelMarks
Copy link
Contributor

Using your library to batch insert data from Parquet files into PostgreSQL, prepared like so:

schema = pa.schema(
    [
        pa.field("timestamp_col", pa.timestamp("ms", tz="UTC")),
        pa.field("json_col", pa.struct([("can", pa.string())])),
        pa.field("array_str_col", pa.list_(pa.string())),
        pa.field("array_bigint_col", pa.list_(pa.int64())),
        pa.field(
            "array_json_col",
            pa.list_(pa.struct([("foo", pa.string()), ("can", pa.string())])),
        ),
    ]
)
row = pa.Table.from_pydict(
    {
        "timestamp_col": [datetime.utcnow().timestamp()],
        "json_col": [{"can": "haz"}],
        "array_str_col": [["Flamingo", "Centipede"]],
        "array_bigint_col": [np.arange(6, dtype=np.int64)],
        "array_json_col": [[{"foo": "bar"}, {"can": "haz"}]],
    },
    schema=schema,
)

Getting this weird error:

Error
Traceback (most recent call last):
  File "lib/python3.11/site-packages/pgcopy/copy.py", line 210, in f
    return formatter(v)
           ^^^^^^^^^^^^
  File "lib/python3.11/site-packages/pgcopy/copy.py", line 179, in <lambda>
    return lambda v: array_formatter(att.typelem, formatter, v)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/pgcopy/copy.py", line 162, in array_formatter
    return str_formatter(struct.pack(''.join(fmt), *data))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: required argument is not an integer

The above exception was the direct cause of the following exception:

For completeness here are the related functions:

pq.write_table(row, parquet_filepath)
parquet_to_table(parquet_filepath, table_name=inspect.stack()[0].function)

def parquet_to_table(filename, table_name=None, database_uri=None, dry_run=False):
    """
    Parquet file to an executable insertion `COPY FROM` into a table

    :param filename: Path to a Parquet file
    :type filename: ```str```

    :param table_name: Table name to use, else use penultimate underscore surrounding word form filename basename
    :type table_name: ```Optional[str]```

    :param database_uri: Database connection string. Defaults to `RDBMS_URI` in your env vars.
    :type database_uri: ```Optional[str]```
    """
    parquet_file = ParquetFile(filename)
    engine = create_engine(
        environ["RDBMS_URI"] if database_uri is None else database_uri
    )

    deque(
        map(
            lambda df: df.to_sql(
                table_name,
                con=engine,
                if_exists="append",
                method=psql_insert_copy,
                index=False,
            ),
            map(methodcaller("to_pandas"), parquet_file.iter_batches()),
        ),
        maxlen=0,
    )

def psql_insert_copy(table, conn, keys, data_iter):
    """
    Execute SQL statement inserting data

    Parameters
    ----------
    table : pandas.io.sql.SQLTable
    conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
    keys : list of str
        Column names
    data_iter : Iterable that iterates the values to be inserted
    """

    columns = keys[1:] if keys and keys[0] == "index" else keys

    mgr = CopyManager(conn.connection, table.name, columns)
    mgr.copy(map(lambda line: map(parse_col, line), data_iter))
    conn.connection.commit()
@altaurog
Copy link
Owner

Unfortunately, I don’t have much experience with Parquet or with df.to_sql. I can try to find some time to look at this, but it likely won’t be this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants