-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: user api for new sql functionality #6300
Comments
I don't recall if we test deprecations in general (e.g. call the function in an |
This is I think easy to agree on, to be consistent with the other io modules:
But
@hayd @mangecoeur @jreback @danielballan @y-p Opinions? |
it seems that you can use read_table if u can reflect the table definition well it seems that pandas should then try to use read_table if possible (eg try to reflect the table) then fallback to read_sql? why is this this a problem or not possible? it is really confusing to have 2 functions which basically do the same thing depending on an implementation detail |
@jreback I addressed some of the reasons here, and some more reasons below I don't see this as confusing at all, and the functionality is distinct - you have low level SQL querying and higher level "table selection" apis. On the other hand, if you issue a read_sql, you will be using an SQL query, that may not return values stored in a db TABLE - e.g. you might do something like If you used read_table in the case you only wanted one table (and we'll see below why that's not straightforward) you would suddenly find you had different type casting rules, because instead of the rules that apply to read_sql (limited type coertion) you get the ones that apply to read_table. You should never create that sort of unexpected behaviour. In any case, how would you know if you were only selecting one table? You would have to parse the SQL string to find out that "SELECT * FROM table_name" (and it's functionally equivalent alternatives, of which there are a few) meant you want everything from table_name. Which means you need to introduce an SQL parser and to deal with the differences between SQL dialects, which completely defeats the point of using SQLAlchemy. Next, there are keyword arguments which have meaning for read_table but not for read_sql, and vice-versa. E.g. "flavor" is meaningless for read_table, "columns" is likewise for read_sql. If you tried to merge the functions these options would have a different effect depending on the value of the input arg - that to me is really bad API design, it's as if the accelerator pedal of your car turned into the brake pedal depending on whether you are wearing a hat. It's just nuts! Having 2 functions which represent different levels of abstraciton from raw SQL completely eliminates these problems. It leaves lots of room to add interesting funcitonality to read_table and to take advantage of the higher level of abstraction offered by SQLAlchemy's expression language. All we need to do is figure out nice API names. If a top level function is required (though IMHO it's not that important) I think |
@mangecoeur ok...I hear ya...so to summarize, The big question here is whether to expose the currently named Since we are breaking.....why don't we use the names you suggest I actually think the top-level names ARE important as most users will simply use those by default. In general you don't get too many diving into the modules (nor should they necessarily). If these are both useful, then they can both be top-level.
|
I had the same thought: sep to read_sql_query and read_sql_table (I think this will be used more). @jreback Isn't it the other way around? read_sql_table takes a table_name and infers dtypes based on the dtypes of the sql table. read_sql_query takes arbitrary sql and has a go at infering ? I still think there is a case for read_sql to do Definitely the functionality of sql_read_table should be toplevel. |
@hayd oh...I see I did put them 'backwards'. Ok if we go with seems minimum pain |
@mangecoeur About the top level function: that's the design choice made earlier in pandas, and I think it is important to be consistent on this: the main read_.. function in top level namespace. Indeed a good summary: there are two cases for reading sql data:
And due to the potential differences between the results of both methods, it is indeed important to clearly identify those two cases in the docs (whether as two cases in one function, or two cases in two functions). For me it is ok to go with |
I think that I am refering to these top-level functions reversed. so ignore me! |
Something else, @mangecoeur, is it also the intention that users could use one of the objects? (in the meaning: has this an added value, so that it would be usefull to mention it in the docs) Because now, in the docs there is a mention of |
@jorisvandenbossche well spotted, that should be updated to read PandasSQLAlchemy, and maybe also to include PandasSQLTable. But the OO API probably still needs work anyway. It's very useful if you need more fine control for certain use cases (I use it for debugging to check the create table statements before doing an insert for example) |
@mangecoeur you might want to put a small explanation at the top of the sql.py file for these 'internal' classes. clearly will not be externally exposed, but just like anything else, I am sure someone will fine a use for them! |
@jreback Not sure what you mean by "externally exposed". They should remain accessible through pandas.io.sql for anyone who needs them (that includes me). The whole module probably needs more docstrings in general, the classes in particular, but they might change a little still so I guess I would wait until most of the bugs are ironed out. |
@mangecoeur I just mean they aren't in the public API (which they are not now) |
@jreback The original intention was for one to be in public api. Just like you create a HDF5Store and query it, you'd create a PandasSQL (perhaps should rename to SQLStore or something?) and then do read_sql and write_sql against this. IMO you want to have a one constructor (to be in global API) which you can pass either a conn or engine to, since these can be reused. |
See also the discussion in a previous issue on this topic, around #4163 (comment) |
@hayd ok...that makes sense then...sure....I like |
@jreback would that be SQLStore to refer to what is now the PandasSqlEngine (for access to tables and queries of tables in a DB) or to the PandasSqlTable (that handles mapping between a single DataFrame and db table)? |
@mangecoeur I think that would be caveat, I haven't really look at this though in detail. Idea being this would be the first thing a user would use if they say want to open a db then interactively work on it (as opposed to using the |
OK, we have to come to a decision here. Options:
Other options? Or combinations? |
There's tweak to option 4 I rate: Have top level read_sql which does both, that is still called
It could take an optional kwarg if you wanted to be more explicit (like we do with regex kwarg in select):
|
I think a modified option 2 might be the best bet:
then
maybe we come back in a later version of allowing something along what @hayd is suggesting (where you can specify a I dont' think we need to deprecate so maybe call this option 2a. |
I like 2a. I like what @hayd is suggesting for read_sql. I agree,@jreback, If we implement 2a and do what @hayd suggests, all at once, nothing breaks. On Wednesday, April 9, 2014, jreback [email protected] wrote:
|
@jreback problem with your idea is that So, we have a
|
is it always unambiguous what meaning that it can easily figure out whether to send to |
ok
|
The above has been implemented in the meantime in different PRs (at the moment I left both
Now there should be a last round about the OO interface. But I am not really familiar with that (not using it, so don't have a strong opinion on what it should look like) and I don't have time for that this week, so I propose we leave this for next release? Unless someone else wants to tackle this. |
The |
Yes, but the datetime support is not that fantastic in all databases, so that could be a reason to keep your dates as strings in a database (eg in sqlite you have to store it as strings). And also even if there is a datetime type, I don't think everyone in the wild uses this. Actually, I would quite like to be able to use |
yes, that would not be hard for |
@danielballan @mangecoeur any comments on the |
I've never used it and I don't completely understand its purpose in sqlalchemy, so I don't have an informed opinion. |
well, that's my problem too .. :-) But, I personally think that this would be a reason to use the OO API using the |
That seems sensible to me. Without the input of an experienced user of "meta," future generations can't fault us for removing it. :- ) I say go for it. |
@mangecoeur I suppose you put it in, if you have any objections removing it, please speak up (and we can always easily add it back in the future if it is wanted) |
Bumped this to 0.14.1 for the remaining OO API part |
Removed it in jorisvandenbossche@c5c78d5 in #7120 |
@jorisvandenbossche @danielballan wait a sec! Sent my feedback via email but it doesn't seem to have shown up here. The reason for being able to specify meta is that way you can specify the "schema" argument for use with DBs that support schemas, such as Postgres : http://docs.sqlalchemy.org/en/rel_0_8/core/metadata.html#sqlalchemy.schema.MetaData We should make sure this is still possible, though since it's an advanced use case it doesn't need to be in the functional API, but it should remain in the OO api. I can also imagine scenarios where you might want to manually mess with the Metadata object before passing it. |
@mangecoeur Thanks for the feedback! I only removed it in the functional api, so it is still in the OO api. Now, we still have to clean-up the OO api a bit. |
- remove meta kwarg from read_sql_table (see discussion in pandas-dev#6300) - remove flavor kwarg from read_sql (not necessary + not there in 0.13, so would have been API change) - update docstring of to_sql in generic with latest changes - enhance docstring of get_schema
The last! round is the OO API, but I created a new issue for that (this was becoming a bit too lenghty), so closing this. Further discussion in #7960. |
Starting a new issue to discuss this. See merged PR #5950 and follow-up issue #6292.
Summary of situation:
Current (0.13):
read_sql
,read_frame
,write_frame
andDataFrame.to_sql
uquery
,tquery
,has_table
(+read_sql
)New after #5950
read_sql
andDataFrame.to_sql
(but not yet used in the docs)read_sql
,read_table
,to_sql
,has_table
,execute
read_frame
,write_frame
,uquery
,tquery
are deprecatedPoints to dicsuss
read_sql
andDataFrame.to_sql
instead of functions fromsql
sql.read_table
vs existingpd.read_table
. Is this a problem?read_table
andread_sql
or not (see this comment of @mangecoeur to not do that: ENH: sql support via SQLAlchemy, with legacy fallback #5950 (comment))The text was updated successfully, but these errors were encountered: