Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages for Redshift load errors #53

Closed
wants to merge 7 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch improves our error reporting for Redshift LOAD errors. When a load error occurs, we will now try to automatically fetch more detailed error information from Redshift's STL_LOAD_ERRORS table.

As an example of the improved error messages:

Old:

java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'error_message_when_string_too_long_3596907251636891354' failed.  Check 'stl_load_errors' system table for details.;

New:

java.sql.SQLException: Error #1204 while loading data into Redshift: "String length exceeds DDL length".
Table name: the_table_name
Column name: a
Column type: varchar(256)
Raw line: [...]
Raw field value: [...]

@JoshRosen JoshRosen added this to the 0.5 milestone Aug 26, 2015
@JoshRosen JoshRosen mentioned this pull request Aug 26, 2015
@codecov-io
Copy link

Current coverage is 87.22%

Merging #53 into master will increase coverage by +0.87% as of 2840223

@@            master     #53   diff @@
======================================
  Files           10      10       
  Stmts          337     368    +31
  Branches        79      87     +8
  Methods          0       0       
======================================
+ Hit            291     321    +30
  Partial          0       0       
- Missed          46      47     +1

Review entire Coverage Diff as of 2840223

Powered by Codecov. Updated on successful CI builds.

@JoshRosen
Copy link
Contributor Author

Hmm, I guess this should also include the destination table name in the error message.

@jaley
Copy link
Contributor

jaley commented Aug 26, 2015

This is awesome, thanks for adding this, much sanity saved here :)

I don't recall exactly what the default permissions situation is for the load errors table. If it's the case that new users need an explicit grant to be run before they can query it, we might want to add a note to the docs that tells users they need to do this to enable better error messages.

It's possible that Redshift actually does something magic where you can only read rows added by loads from your own user account, in which case it should just work I guess.

@emlyn
Copy link
Contributor

emlyn commented Aug 26, 2015

Nice! I believe anyone can read from stl_load_errors and Redshift will only return rows that relate to the current user (at least that what I've seen when querying it).

@marmbrus
Copy link
Contributor

LGTM

@JoshRosen JoshRosen closed this in 9f19e1c Aug 26, 2015
@JoshRosen JoshRosen deleted the load-error-reporting branch August 26, 2015 19:29
JoshRosen added a commit that referenced this pull request Aug 27, 2015
This patch allows users to specify a `maxlength` column metadata entry for string columns in order to control the width of `VARCHAR` columns in generated Redshift table schemas. This is necessary in order to support string columns that are wider than 256 characters. In addition, this configuration can be used as an optimization to achieve space-savings in Redshift. For more background on the motivation of this feature, see #29.

See also: #53 to improve error reporting when LOAD fails.

Author: Josh Rosen <[email protected]>

Closes #54 from JoshRosen/max-length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants