Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Allow passing MetaData (or meta kwargs) to high-level SQL functions #7441

Open
aldanor opened this issue Jun 12, 2014 · 16 comments
Open
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@aldanor
Copy link
Contributor

aldanor commented Jun 12, 2014

Use case: I want to query an Oracle table but I'm not the owner, so meta.reflect(engine) would do nothing and always return an empty list of tables rendering read_sql_table useless. This works: meta.reflect(engine, schema='the_real_owner'), however there's no way of passing the metadata / schema to the high-level pandas.io.sql functions. One may also want to use a meta.reflect(engine, only=[...]) or meta.reflect(engine, oracle_resolve_synonyms=True) or any other dialect-specific argument.

PandasSQLAlchemy class already support passing meta in the constructor, so it's just a matter of a adding an extra keyword argument to the three io.sql.read_sql* functions. Or maybe allow passing through arbitrary kwargs to meta's reflect method, like read_sql_table(table, engine, reflect=dict(schema='my_schema', oracle_resolve_synonyms=True)) -- ugly but functional.

@aldanor aldanor changed the title [ENH] Allow passing MetaData to read_sql, read_sql_table, read_sql_query [ENH] Allow passing MetaData (or MetaData kwargs) to high-level pandas.io.sql functions Jun 12, 2014
@aldanor aldanor changed the title [ENH] Allow passing MetaData (or MetaData kwargs) to high-level pandas.io.sql functions [ENH] Allow passing MetaData (or kwargs) to high-level pandas.io.sql functions Jun 12, 2014
@aldanor aldanor changed the title [ENH] Allow passing MetaData (or kwargs) to high-level pandas.io.sql functions [ENH] Allow passing MetaData (or meta kwargs) to high-level pandas.io.sql functions Jun 12, 2014
@jorisvandenbossche jorisvandenbossche changed the title [ENH] Allow passing MetaData (or meta kwargs) to high-level pandas.io.sql functions [ENH] Allow passing MetaData (or meta kwargs) to high-level SQL functions Jun 12, 2014
@jorisvandenbossche
Copy link
Member

@aldanor on the only keyword you mentioned, this is already filed as #7396.

For passing a MetaData object to the read_sql_query/table functions, this was implemented partially in the past, but removed from the functional API (see discussion here: #6300 (comment) and commit ad97d27) to keep that as simple as possible for a first release (easier to add than to remove keywords). In any case, it is available in the OO API with PandasSQLAlchemy.

So basically, we just have to decide if we want to provide this functionality in both (read_* functions and object-interface), or only via the object (so that when you need this functionality, you should use the OO API).

@aldanor
Copy link
Contributor Author

aldanor commented Jun 12, 2014

@jorisvandenbossche It may be just me, but I think it might make sense to allow the metadata object to be passed to the higher-level functions (or a set of arbitrary kwargs for reflect since there may be dialect-specific options) for those functions to be useful in setups more involved than the most basic ones.

In some environments (e.g. a typical corporate Oracle setup), you would absolutely have to pass the schema argument for reflect to work. And you may want to pass only in case your schema hosts ten thousand tables.

@jorisvandenbossche
Copy link
Member

What would be the most common things you would want to adapt in the meta object? Of course the schema, but are there also other things? Can you give some examples?

Just asking as if the schema argument would be used a lot, I think we should provide it as a real keyword and not only through passing a MetaData object (most users should not start adapting a MetaData object themselves, so if a lot of users need the schema argument, we should provide it seperately of meta=..).

@mangecoeur @danielballan @hayd

@jorisvandenbossche jorisvandenbossche added this to the 0.14.1 milestone Jun 13, 2014
@aldanor
Copy link
Contributor Author

aldanor commented Jun 13, 2014

@jorisvandenbossche Do we consider the only argument being a special case here (as in, don't query the entire schema in read_sql functions)? As for the schema being a separate keyword, should the users be able to pass meta as well, so the function signature would contain schema=None, meta=None? If that doesn't clutter the API too much, that'd be nice.

As for myself, I sometimes have to use oracle_resolve_synonyms=True in MetaData#reflect or Table#__init__ when working with dblinks. I guess there exist a bunch of other useful dialect-specific options that I'm not aware of.

@jorisvandenbossche
Copy link
Member

As I said above, the only keyword is already discussed in #7396, and it will be used by default (reflect only the database you are reading, and none for read_sql_query) so no seperate keyword needed I think? If you have further comments or remarks on that, you can comment there.

For the API, that seems a good option. Another one would be to provide a way to pass options directly to reflect with something like reflect_kwds={...}, but that seems a bit more ugly (and limits it to reflect).
Passing things to Table I think is too advanced for read_sql, but maybe we can see how we can do this in the OO API.

Would you be interested to do a PR to add this?

@aldanor
Copy link
Contributor Author

aldanor commented Jun 13, 2014

I'm by no means sqlalchemy expert, so if someone more skilled in this would do it, it would be a better idea :) Elsewise I can give it a try on the weekend, will have to figure out how to set up test environment etc for the sql test suite then.

Re: API, schema and meta keywords look fine, what about the only keyword when you pass the meta in? In this case the user is fully responsible for reflecting whatever he needs, right?

@aldanor
Copy link
Contributor Author

aldanor commented Jun 14, 2014

@jorisvandenbossche Would someone help me out in setting up a test environment so that I can run all tests in test_sql.py without skipping? I mean usernames, passwords, databases, schemas for both mysql and postgresql. It's quite confusing by just looking at the code and it may make sense to put intitialization scripts somewhere in the test suite to make the whole thing more accessible to other developers (kind of like this: https://bitbucket.org/anthony_tuininga/cx_oracle/src/dcc80926aa86d524f36500a27a658d3b4c14cbb1/test/SetupTest.sql?at=default).

@jreback
Copy link
Contributor

jreback commented Jun 14, 2014

@aldanor that would close #7091

@jorisvandenbossche
Copy link
Member

An overview:

  • sqlite: built-in and using in memory database -> nothing to set-up
  • mysql: root@localhost/pandas_nosetest -> database 'pandas_nosetest' with username 'root' and blank password, on host 'localhost'
  • postgresql: postgres@localhost/pandas_nosetest -> database 'pandas_nosetest' with username 'postgres' and blank password, on host 'localhost'

So no special schema or something, I think for both default values when you would create such a database locally.
They are created here https://github.com/pydata/pandas/blob/master/.travis.yml#L99 with mysql -e 'create database pandas_nosetest;'

If you want tro try to set-up something like #7091 or what you linked to, certainly try! If we support specifying schema's, will will also have to test this, so the database set-up will have to be more advanced.

@jorisvandenbossche
Copy link
Member

@aldanor Did it work out to look at this?

@jreback
Copy link
Contributor

jreback commented Jul 6, 2014

@jorisvandenbossche status?

@jorisvandenbossche
Copy link
Member

I have unfortunately no time this week to do it myself, so pushing to 0.15

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche
Copy link
Member

@aldanor I started implementing this, see #7952. For now only a schema support. Could you try this out to see if this solves your basic problems reported here?

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.1, 0.15.0 Oct 2, 2014
@jorisvandenbossche
Copy link
Member

This is partly implemented (schema kwarg, possibility to pass meta to SQLDatabase), but pushing to 0.15.1 to see if we want to provide more.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@Homesteady
Copy link

I'm resurrecting an ancient issue here, but is there any interest in including schema as an optional kwarg in read_sql? The read_sql wrapper is a convenient one, but the inability to pass a schema value to be used by read_sql_table limits the usefulness of this method in practice. I have a WIP PR adding this in, but it looks like this has been debated already...?

Anyways, it's an addition that I would find very useful, but I am new to contributing to Pandas, so someone feel free to please set me straight here. 🙏

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

5 participants