-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: update pgx to v4 #820
Conversation
Pull Request Test Coverage Report for Build 2ad6bcf46-PR-820
💛 - Coveralls |
Well, this is incredible! I will get a review done this week. Thank you for the contribution!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick visual scan through the code. Nothing major jumps out at me.
Thank you for this!
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! I have done a pass on the code and left some minor comments inline. I will work on some local QA next.
👏
@iwpnd can you rebase the |
How do you do local QA @ARolek? Maybe something I can help you with? |
@iwpnd apologies for the slow response here. I had to vacate my family from the Marshall Fire. Our home survived but over 1000 homes are gone. Truly a sad situation. For local QA I'm planning on spinning up Postgres 13 and Postgres 14 with some datasets to make sure everything works as expected. Did you do any of this type of testing locally? |
Don't apologize for that. Glad to hear you and your family are okay.
Spun up a pg13/PostGIS3.1 instance with some of our data and it worked. Wouldn't call it testing yet though. No TLS connection or anything involved. When I'm back from vacation I thought about forking an image with the change and run it on our staging cluster with some more live data. |
Sounds good! I will report my findings as well once I get back into the swing of things. I appreciate your contributions very much. Enjoy yoru trip! |
I have successfully tested this PR against Postgres 14 using the Natural Earth dataset. Looking good! I still need to test the TLS support, but no issues encountered thus far. |
I'll be forking the image today and will try to connect to our pg13.4/3.1 cluster using TLS. I'll keep you posted here. update: no issues on the TLS connection itself, but I'm getting quite some amount of these:
|
@iwpnd I just saw your update and doing some additional testing I'm hitting the same issue. I found the fix. We need to change our error statements from: return fmt.Errorf("error replacing layer tokens for layer (%v) SQL (%v): %v", layer, sql, err) To return fmt.Errorf("error replacing layer tokens for layer (%v) SQL (%v): %w", layer, sql, err) Note the change of if err != nil {
switch {
case errors.Is(err, context.Canceled):
// Do nothing if we were cancelled.
default:
z, x, y := tile.ZXY()
// TODO (arolek): should we return an error to the response or just log the error?
// we can't just write to the response as the waitgroup is going to write to the response as well
log.Printf("err fetching tile (z: %v, x: %v, y: %v) features: %v", z, x, y, err)
}
return
} Can you please update the errors in the |
While this is not the error that is raised on my end, I see what you mean and will address it. My error occurs in if err != nil {
switch err {
case context.Canceled:
// TODO: add debug logs
return
default:
errMsg := fmt.Sprintf("error marshalling tile: %v", err)
log.Error(errMsg)
http.Error(w, errMsg, http.StatusInternalServerError)
return
}
} should actually be: if err != nil {
switch {
case errors.Is(err, context.Canceled):
// TODO: add debug logs
return
default:
errMsg := fmt.Sprintf("error marshalling tile: %v", err)
log.Error(errMsg)
http.Error(w, errMsg, http.StatusInternalServerError)
return
}
} right? This solved the issue. Update 1: Now I'm left with occaisonal:
:| Update 2:
for those cases Update 3: Executing a query by not passing the rows, err := p.pool.Query(context.Background(), sql) |
Can you log the error type so we can see what pgx is returning? log.Printf("error is of type %T, err: %v", err, err) We might just need to add an additional check if pgx is not wrapping the context.Canceled error correctly. |
Hey @ARolek, sorry for the late reply, I'm back on the grind after vacation.
|
Looking at this you may need to unwrap the error a bit more. It looks like the |
I also found this related issue: jackc/pgx#933. I think what @gdey is saying might be correct, we're going to need to unwrap the errors from the |
I made a local update to wrap the errors in the messages with
"Operation was canceled" does not seem to be a
That does not happen on startup, but I believe as the pool is allocating more connections this error is showing up for each new connection being added to the pool. I'm still investigating but want to share my notes. |
Apparently we need to check
Regarding the TLS connection. There's been a major change in v4 as to how fallback connections and TLS is handled. As I noted in my initial comment, I only skip the verification in most cases. |
I have reviewed the code and we are doing this already. It's usually the last statement in the method: return rows.Err() |
Good morning @ARolek! I took a look at your changes and they were not sufficient to quiet the errors. Both the connection and context canceled was spammed continuously:
I took your advice and did two things. First I made sure that i handle the error in Why exactly is the context canceled here anyways, and what does it entail? |
Good question. So context Canceleing is really important in this codebase as it accounts for the situation when users pan the map before a tile has been fully fetched, encoded and returned to the client. Without proper context handling, we're processing requests that are no longer needed. |
case strings.Contains(err.Error(), "operation was canceled"): | ||
// do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to get away from this if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdey yeah, I know, but we can't in this instance. I left a code comment around this situation in map.go
:
// the underlying net.Dial function is not properly reporting
// context.Cancel errors. Becuase of this, a string check on the error is performed.
// there's an open issue for this and it appears it will be fixed eventually
// but for now we have this check to avoid unnecessary logs
// https://github.com/golang/go/issues/36208
case strings.Contains(err.Error(), "operation was canceled"):
// is not constant, so we lookup the OID once per provider and store it. | ||
// Extensions have to be registered for every new connection. | ||
|
||
if !hstore.hasInit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! I like this hasInit
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pat yourself on the back for that, it was your idea 😄
@@ -201,38 +207,43 @@ func BuildDBConfig(cs string) (*pgxpool.Config, error) { | |||
"application_name": "tegola", | |||
} | |||
|
|||
var hasInit bool | |||
var hstore hstoreOID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we persist this value outside of the AfterConnect call, then it will only be set on the first connection. If I understand you correctly, the hstore registration needs to happen for every connection. If that's the case, then we should move this var to inside the AfterConnect function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your concern.
The idea is that I only query the database once (for the first connection) for the OID of hstore, and save the OID value per provider in the hstore var.
Now on the next connection the query is skipped, however the datatype is registered with the OID of the hstore var.
If I were to add a log line into the if !hstore.hasInit
I'd only see it once, yet registering the datatype happens over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Updating Postgres driver to pgx v4
* refactored providers/postgis to use the pgx4 client. Support for Postgres versions > 12 is now possible. * provider/postgis: Properly wrap errors in messages by moving from using %v -> %w when returning errors in messages. * Added error string check for context.Canceled. The underlying net.Dial function is not properly reporting context.Cancel errors. Becuase of this, a string check on the error is performed. There's an open issue for this and it appears it will be fixed eventually but for now we have this check to avoid unnecessary logs. Related issue: golang/go#36208 * added ctxErr() check thewill check if the supplied context has an error (i.e. context canceled) and if so, return that error, else return the supplied error. This is useful as not all of Go's stdlib has adopted error wrapping so context.Canceled errors are not always easy to capture. closes #748
Hey, as discussed I took a look at the pgx upgrade you wanted to do. It involved some refactoring as you expected.
Some things that took me a while were:
pgx
,pgtype
andpgproto3
are now separate packagespgxpool
is instantiated with a connection string. For now I'm creating the string from the config parameters to allow backward compatibility. This now lends itself to allow a connection string in theconfig.toml
.ConfigTLS()
as most of the options are gone now, and fallback connections are handled differently