Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigQuery Data Transfer Service, all GAPIC clients]: provide a close() method and context manager to clean up client-owned resources #9457

Closed
yebrahim opened this issue Oct 11, 2019 · 15 comments
Assignees
Labels
api: bigquerydatatransfer Issues related to the BigQuery Data Transfer Service API. api: core type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@yebrahim
Copy link

Environment details

pip list output:

cachetools                         3.1.1    
certifi                            2019.9.11
chardet                            3.0.4    
google-api-core                    1.14.3   
google-api-python-client           1.7.11   
google-auth                        1.6.3    
google-auth-httplib2               0.0.3    
google-cloud-bigquery              1.20.0   
google-cloud-bigquery-datatransfer 0.4.1    
google-cloud-core                  1.0.3    
google-cloud-pubsub                1.0.2    
google-resumable-media             0.4.1    
googleapis-common-protos           1.6.0    
grpc-google-iam-v1                 0.12.3   
grpcio                             1.24.1   
httplib2                           0.14.0   
idna                               2.8      
pip                                19.2.3   
protobuf                           3.10.0   
pyasn1                             0.4.7    
pyasn1-modules                     0.2.6    
pytz                               2019.3   
requests                           2.22.0   
rsa                                4.0      
setuptools                         41.2.0   
six                                1.12.0   
uritemplate                        3.0.0    
urllib3                            1.25.6   
wheel                              0.33.6   

OS: Ubuntu
Python 3.5.2
API: BigQuery Data Transfer Service

Steps to reproduce

Initialize more than one data transfer client.

Code example

from google.cloud import bigquery_datatransfer_v1
import time

while True:
  dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
  try:
    dts_client.get_transfer_run('some run id').state
  except:
    print('error!')

  time.sleep(2)

Output of ll /proc/<pid>/fd after three seconds:
0 1 10 11 2 3 4 5 6 7 8 9
And keeps growing.

There's also no way to clean up the client as it doesn't implement __exit__.

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Oct 12, 2019
@busunkim96 busunkim96 changed the title Initializing transfer service client leaks two file descriptors BigQuery Data Transfer: Initializing transfer service client leaks two file descriptors Oct 14, 2019
@busunkim96 busunkim96 added api: bigquerydatatransfer Issues related to the BigQuery Data Transfer Service API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Oct 14, 2019
@tswast
Copy link
Contributor

tswast commented Oct 14, 2019

This client is 100% auto-generated via our gRPC toolchain. I'll try to summon some of our folks who work at those lower levels to investigate.

@busunkim96
Copy link
Contributor

@yebrahim This is unrelated, but you only need to create the dts_client once.

from google.cloud import bigquery_datatransfer_v1
import time

dts_client = bigquery_datatransfer_v1.DataTransferServiceClient()
while True:
  try:
    dts_client.get_transfer_run('some run id').state
  except:
    print('error!')

  time.sleep(2)

@yebrahim
Copy link
Author

Thank you both.
@busunkim96 indeed unrelated. My specific use case required creating the client on the fly using some parameters that I receive as method args. The example above is a simplification that only reproduces the problem.

@tseaver
Copy link
Contributor

tseaver commented Oct 15, 2019

This isn't a "leak", per se: every GAPIC client has a transport attribute, which in turn holds a gRPC channel as its _channel attribute: each channel holds two open sockets.

There's also no way to clean up the client as it doesn't implement exit.

__exit__ is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage. You can, if desired, manually close the channel, e.g., client.transport.channel.close().

@tseaver tseaver added type: question Request for information or clarification. Not an issue. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 15, 2019
@yebrahim
Copy link
Author

@tseaver:

  • Why doesn't the client fit the usage? Is it not Pythonic to encapsulate client libraries calls within a Python context?
  • If I do close the transport channel manually, can the client be reused afterwards or should I then del and recreate it if needed?

@tswast
Copy link
Contributor

tswast commented Nov 13, 2019

FYI: I am able to reproduce this by using the psutil.Process.connections method to observe leaked resources.

import psutil
import pprint

from google.cloud import bigquery_datatransfer_v1


current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
del client
print("connections after deleting client: {}".format(len(current_process.connections())))

@tswast
Copy link
Contributor

tswast commented Nov 13, 2019

Output from ☝️ :

connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after deleting client: 2

@tswast
Copy link
Contributor

tswast commented Nov 13, 2019

The workaround of client.transport.channel.close() does indeed clean up resources.

import psutil
import pprint

from google.cloud import bigquery_datatransfer_v1


current_process = psutil.Process()
print("connections before creating client: {}".format(len(current_process.connections())))
client = bigquery_datatransfer_v1.DataTransferServiceClient()
print("connections after creating client: {}".format(len(current_process.connections())))
sources = list(client.list_data_sources("projects/swast-scratch"))
print("connections after API request: {}".format(len(current_process.connections())))
client.transport.channel.close()
print("connections after closing client.transport.channel: {}".format(len(current_process.connections())))

Output:

connections before creating client: 0
connections after creating client: 0
connections after API request: 2
connections after closing client.transport.channel: 0

@yebrahim
Copy link
Author

Does it make sense to wrap this call in an API method? Something like client.close()?

@tswast tswast added api: core type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Nov 27, 2019
@tswast tswast changed the title BigQuery Data Transfer: Initializing transfer service client leaks two file descriptors [BigQuery Data Transfer Service, all GAPIC clients]: provide a close() method and context manager to clean up client-owned resources Nov 27, 2019
@tswast
Copy link
Contributor

tswast commented Nov 27, 2019

We have just added such a method in the main BigQuery client. Personally, I agree that this makes sense to do across all our client libraries. I've drafted an internal proposal to be reviewed by our core libraries team.

@tswast tswast assigned busunkim96 and unassigned tswast Jan 8, 2020
@tswast
Copy link
Contributor

tswast commented Jan 8, 2020

Googlers: see approved proposal to update generated clients at go/closable-clients-python

As discussed offline, since this requires changes to the generator, this feature will wait until after the new Python 3-only generator migration is underway.

@b-long
Copy link

b-long commented Feb 8, 2020

I'm only asking this in terms of viability, regarding @tseaver 's comment :

exit is part of Python's context manager protocol, designed for use via the with statement: our clients don't fit that usage

Is there anything wrong with this approach?

from google.cloud import bigquery
from google.cloud.bigquery.dbapi import Connection
from google.cloud.bigquery.dbapi.cursor import Cursor

class Pep249Cursor(Cursor):
    def __init__(self, client):
        super(Pep249Cursor, self).__init__(client)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()


def create_tbl_if_not_exists(c):
    c.execute("CREATE TABLE IF NOT EXISTS ...")


client = bigquery.Client()
conn = Connection(client)
with Pep249Cursor(conn) as c:
        create_tbl_if_not_exists(c)

@tswast
Copy link
Contributor

tswast commented Feb 10, 2020

The DB-API is a bit of an oddball in regards to context managers. I would expect a Curor context manager to commit a transaction but not close the connection, which doesn't apply to BigQuery as it is not a transactional database.

@meredithslota
Copy link
Contributor

This feels still like a pressing issue, unless some of the linked issues have solved it (I can't immediately tell). @parthea can you also take a look? Maybe it needs to be moved to https://github.com/googleapis/gapic-generator-python?

@parthea
Copy link
Contributor

parthea commented Oct 21, 2023

Closing as obsolete. This was completed in googleapis/gapic-generator-python#987

@parthea parthea closed this as completed Oct 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerydatatransfer Issues related to the BigQuery Data Transfer Service API. api: core type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

8 participants