Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced simplejson to json in _json.py and dbstore.py #168

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Enigma04
Copy link

@Enigma04 Enigma04 commented Apr 10, 2021

Fixes #80
Subtask of #133

@cclauss I have replaced two more files for you to merge.

Also side note: The infogami/infobase/client.py already has the JSON package imported and replaced so you can check that file off the list.

@cclauss cclauss mentioned this pull request Apr 10, 2021
20 tasks
@Enigma04 Enigma04 changed the title Replaced simplejson to json in _json.py and common.py Replaced simplejson to json in _json.py and client.py Apr 10, 2021
@Enigma04 Enigma04 changed the title Replaced simplejson to json in _json.py and client.py Replaced simplejson to json in _json.py and common.py Apr 10, 2021
Copy link

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem stated in the mentioned issue was that this file is redundant as we have the builtin JSON module which contains the same functions.

What this PR should do is to delete the file (_json.py) and change its references to directly use the builtin JSON module. In this case, the only reference of this file is in infogami/infobase/dbstore.py

Please make the necessary changes.

@dhruvmanila
Copy link

@cclauss Can you close and reopen this PR to trigger the CI once again? Thanks.

@cclauss cclauss closed this Apr 10, 2021
@cclauss cclauss reopened this Apr 10, 2021
@dhruvmanila
Copy link

It seems that the problem is related to the changes made in this PR. The CI is getting stuck with these failures (https://github.com/internetarchive/infogami/pull/168/checks?check_run_id=2313629409#step:8:445) which are all passing on master:

...
0.0 (4): SAVEPOINT webpy_sp_1
0.0 (5): SELECT c.relname FROM pg_class c WHERE c.relkind = 'S'
0.0 (6): INSERT INTO thing (key) VALUES ('/type/type'); SELECT currval('thing_id_seq')
0.0 (7): UPDATE thing SET type = 1 WHERE id=1
0.0 (8): INSERT INTO version (revision, thing_id) VALUES (1, 1)
infogami/infobase/tests/test_account.py EEEEE               <----------------
0.0 (1): SELECT * FROM thing LIMIT 1
0.0 (2): SELECT * FROM thing WHERE key='/type/type'
0.0 (3): SELECT * FROM thing WHERE key='/type/type'
0.0 (4): SAVEPOINT webpy_sp_1
0.0 (5): SELECT c.relname FROM pg_class c WHERE c.relkind = 'S'

Let me take a look at what is happening.

@dhruvmanila
Copy link

@Enigma04 Hi, can you please remove the changes done in infogami/infobase/common.py. The failing test file is using that import: https://github.com/internetarchive/infogami/blob/master/infogami/infobase/tests/test_account.py#L3. Let's see by removing one change at a time whether that is the cause of the problem.

@Enigma04
Copy link
Author

@dhruvmanila will make the changes

@cclauss cclauss changed the title Replaced simplejson to json in _json.py and common.py Replaced simplejson to json in _json.py and dbstore.py Apr 11, 2021
@dhruvmanila
Copy link

Alright, so the problem seems to be in the removal of _json.py. Let me take a deeper look into why that module is actually implemented.

@cclauss
Copy link

cclauss commented Apr 11, 2021

@Enigma04 In the meantime, can you please move the common.py changes to a separate PR. Thx!

@Enigma04
Copy link
Author

@cclauss @dhruvmanila can I close this PR and make a new one?

@cclauss
Copy link

cclauss commented Apr 11, 2021

We should keep this pull request open while creating a second separate PR.

Do you make these pull requests with git on the command line or with the GitHub web UI?

If you do them with the GitHub web UI then just go to https://github.com/internetarchive/infogami/edit/master/infogami/infobase/common.py and then make the appropriate changes and click Commit changes which will allow you to create a second PR while leaving this PR open.

@Enigma04
Copy link
Author

Enigma04 commented Apr 11, 2021

@cclauss So for the common.py, should I replace simplejson with JSON and then make a PR or should I keep it as it is?

@cclauss
Copy link

cclauss commented Apr 11, 2021

Yes. The goal of #133 is to Remove simplejson dependency so replace simplejson with Python's built-in json module. Each pull request should try to modify only one file if possible.

@cclauss
Copy link

cclauss commented Apr 11, 2021

Perhaps cherrypick a few changes from #172 ?

@dhruvmanila
Copy link

Hi, @Enigma04 Can you please rebase on the latest master? Thanks

@cclauss cclauss closed this Apr 12, 2021
@cclauss cclauss reopened this Apr 12, 2021
@cclauss
Copy link

cclauss commented Apr 12, 2021

Closed and reopened to rebase.

@dhruvmanila
Copy link

Ok, so I found where the problem was. We were getting a datetime.datetime object to serialize into JSON which is not possible in here:

self.db.insert(
'data', False, thing_id=id, revision=1, data=simplejson.dumps(data)
)

The object (datetime.datetime) is being passed through here:

last_modified = datetime.datetime.utcnow()
data = dict(
key='/type/type',
type={'key': '/type/type'},
last_modified={'type': '/type/datetime', 'value': last_modified},
created={'type': '/type/datetime', 'value': last_modified},
revision=1,
latest_revision=1,
id=id,
)

This was handled by the _json module but as we are deleting it, that was resulting in the CI being stuck. I don't know why it was getting stuck but I debugged locally and found that this was the problem. A simple fix is that json.dumps method provides a parameter default which can be a function to help serialize any arbitrary object (in our case it is datetime.datetime).

@cclauss I fixed it locally and it works. I think this PR can be closed as I will open another one to solve it.

@cclauss
Copy link

cclauss commented Apr 14, 2021

My strong preference (if it is not too complicated) would be that you help @Enigma04 to understand how to fix this PR so it passes the tests and can be approved and merged. Rohit has worked hard on this PR and it would be great to support him in taking it over the goal line.

We bumped into the datetime in json issue once before so please keep a look out for it in future simplejson --> json efforts.

@dhruvmanila
Copy link

Yeah, no problem. It's a simple fix.
Hey @Enigma04, as mentioned in my comment, we are passing a datetime.datetime object to json.dumps which was making the CI stuck forever. As per the documentation of JSON module, we can pass a function to the default parameter which will be called if any object is not JSON serializable.

These Python objects can be serialized by the JSON module:

Python JSON
dict object
list, tuple array
str string
int, float, int- & float-derived Enums number
True true
False false
None null

So, a simple fix would be to make a function which takes an arbitrary object, say obj and checks whether it is a datetime.datetime and returns a JSON serializable object. The function content will be similar to this:

elif isinstance(d, datetime.datetime):
return d.isoformat()

Else, we will raise a TypeError.

@Enigma04
Copy link
Author

Hey @dhruvmanila @cclauss sorry I had exams so I couldn't get work done. This issue hasn't been fixed yet right?

@dhruvmanila
Copy link

dhruvmanila commented Apr 21, 2021

Hey @Enigma04 it's all good. Hope your exams went well! No, this issue hasn't been fixed. You can make the necessary changes as mentioned in my previous comments and it will be good to go :)

@Enigma04
Copy link
Author

Enigma04 commented Apr 29, 2021

Hey @dhruvmanila so do I have to wrap this line in a function that will take an obj and datetime.datetime?

 self.db.insert( 
     'data', False, thing_id=id, revision=1, data=simplejson.dumps(data) 
 ) 

@dhruvmanila
Copy link

dhruvmanila commented Apr 30, 2021

No, a separate function which will take in some object and check if it is datetime.datetime and return a string if it is otherwise raise an error. That function needs to be passed to default keyword argument here: json.dumps(data, default=...)

Can you please read the documentation as mentioned in my previous link? That will clear it up.

@Enigma04
Copy link
Author

Enigma04 commented May 6, 2021

@dhruvmanila can you tell me if what I've written is correct? Feel free to tell me any changes that should be made :). Once again I apologize for the delay, I am currently having my theory exams.

if isinstance(obj, datetime.datetime):
return obj.isoformat()
else:
return TypeError
Copy link

@cclauss cclauss May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return TypeError
raise TypeError()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes have been made

@@ -612,6 +613,13 @@ def find_user(self, email):
thing_id = d and d[0].thing_id or None
return thing_id and self.get_metadata_from_id(thing_id).key

def check_datetime(self, obj):
Copy link

@dhruvmanila dhruvmanila May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a method inside the object. It's just a helper function not specific to this object. So, please make this function global.

A bit of documentation could go a long way :)

"""Helper function to serialize ``datetime.datetime`` object to JSON compatible.
  
It can be extended to serialize other types of objects as per the requirement."""

We're helping the module in encoding a Python object to a JSON object.

Suggested change
def check_datetime(self, obj):
def encode_datetime(self, obj):

Comment on lines 619 to 621
else:
raise TypeError()

Copy link

@dhruvmanila dhruvmanila May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An appropriate error message will help us in the future :)

Suggested change
else:
raise TypeError()
raise TypeError(
f'Object of type "{obj.__class__.__name__}" is not JSON serializable'
)

self.db.insert(
'data', False, thing_id=id, revision=1, data=simplejson.dumps(data)
)
self.db.insert('data', False, thing_id=id, revision=1, data= json.dumps(data, default=self.check_datetime(t)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only have to pass the reference to the function, so we do not need to call it. The module will call it only if some object is not serializable. Our function is a fallback for the JSON module.

Suggested change
self.db.insert('data', False, thing_id=id, revision=1, data= json.dumps(data, default=self.check_datetime(t)))
self.db.insert('data', False, thing_id=id, revision=1, data= json.dumps(data, default=self.encode_datetime))

@dhruvmanila
Copy link

Awesome work! Can you fix the formatting issues and then this can be merged.

cc @cclauss @cdrini

@cclauss
Copy link

cclauss commented May 13, 2021

The failing test says that black would reformat infogami/infobase/dbstore.py so let’s reformat that file like this…

python3 -m pip install --upgrade black  # make sure we have the current version
black infogami/infobase/dbstore.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace infogami/infobase/_json.py
3 participants