Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gsheet #163

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Add gsheet #163

wants to merge 17 commits into from

Conversation

aschroed
Copy link
Member

DRAFT - Do not merge

  • added support for reading google workbooks online
  • the google authentication is currently in testing mode
    • to add this I believe it will need to be approved by google
    • it uses oauth support built into gspread to ask the user to login and then will store token info in a specific file in a specific place
    • it also requires a credentials file that has info about the app as registered on google cloud - this took me a while to figure out how it was OK to share this file because most places in both gspread and google docs rightfully warns about sharing credentials, however when I finally located the info on the way to do this via google sign in it says something like 'of course the credentials.secret will not be secret in this case but as it's the users google sign in info that is used for authentication and is not shared that is OK'

My main question remaining about this is what would be the best practice regarding the credentials.json file required for this to work (the one that are not really secrets as indicated above).
Options:

  • keep it like I currently have it with the credentials.json included in the repo in the .config/gspread/ directory and git ignore the authorized_user.json
  • add a template of the credentials.json file and then only provide it to a submitter who wants to use google spreadsheet and tell the to replace the template with the file we provide
  • git ignore the whole .conf/... directory and have the user make their own and then provide the credentials.json file to them
  • some other option?

Additional changes in the PR include removing any support for downloading or uploading via ftp.

@coveralls
Copy link

coveralls commented Oct 28, 2022

Coverage Status

Coverage increased (+3.3%) to 86.105% when pulling 06c3d70 on add_gsheet into 6d29d78 on master.

@@ -23,6 +23,7 @@ Custom_scripts/
Data_Files/MicroscopyCalibration/Files/
.pytest_cache/
.python-version
wranglertools/.config/gspread/authorized_user.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is nothing worth tracking in .config I would gitignore the whole thing just to be safe... You also may want to provide another means to specify these credentials since folks who install from PyPi won't have this path.

assert message0 == outlist[0]
assert message1 == outlist[1]
assert message2 == outlist[2]
# def test_workbook_reader_post_ftp_file_upload(capsys, mocker, connection_mock, workbooks):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider just adding a comment here indicating the ftp support has been removed.

'image/tiff',
)

# If modifying these scopes, delete the file token.json.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would this be located if installing from PyPi? Make targets fixing these things may be useful... You may also want to read the SCOPES from a config file.

Comment on lines 179 to 187
def google_authenticate():
gsauth = None
creddir = pp.Path(__file__).parent.joinpath('.config', 'gspread')
gsauth = gspread.oauth(
credentials_filename=creddir.joinpath('credentials.json'),
authorized_user_filename=creddir.joinpath('authorized_user.json'),
scopes=SCOPES
)
return gsauth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for users who pip install the package? You probably want to allow folks to set a path in an environment variable as well and maybe fallback to this if not found? I would also definitely add everything in .config to .gitignore.

@@ -1,6 +1,6 @@
[tool.poetry]
name = "Submit4DN"
version = "3.1.1"
version = "3.2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider bumping major version since you are removing FTP support.

"""Set of check for connection, file, dryrun, and prompt."""
def check_and_return_input_type(inputname):
if pp.Path(inputname).is_file():
xlsx_mime = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a good candidate for a constant, as we've seen issues with this mime in the past.

print("ERROR: URL provided does not appear to be google sheet url")
sys.exit(1)
# parse out the bookId
inputname = re.search("/d/([A-Za-z0-9_-]+)/*", inputname).group(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like constants for regex as well as they are often too complex to interpret inline, defining at top level and giving it a good name with an example or two is most helpful.

inputname = re.search("/d/([A-Za-z0-9_-]+)/*", inputname).group(1)
if not re.match("^[A-Za-z0-9_-]+$", inputname):
print("ERROR: invalid format of the google sheet ID in input - {}".format(inputname))
return inputname, 'gsheet'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gsheet and excel are probably good candidates for constant.

print(f"ERROR: File {inputname} not recognized as excel file")
sys.exit(1)
return inputname, 'excel'
elif inputname.startswith('http'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to enforce https here.

Comment on lines 1708 to 1709
if booktype == 'gsheet':
gauth = google_authenticate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before it's highly probable you will want to provide alternative means of specifying the Google credentials, either by argument directly or environment variable to path. I don't see how a user installing from pip could easily use this. You'll need at least some barebones documentation on this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants