Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive commit of decoupling from mongo and addinng a PanFileDB type #414

Merged
merged 14 commits into from
Feb 4, 2018

Conversation

wtgee
Copy link
Member

@wtgee wtgee commented Jan 24, 2018

  • PanDB is a factory class that will load either PanMongoDB or PanFileDB
  • PanFileDB merely writes the json to a flat file
    • One file per "collection" along with a current_<collection for current
      entries. Note this means each "current" collection is a separate file.
    • Use bson.json_util for serializing. Changes a few things with time
      because it can store timezone aware info, which we don't use properly
      in current_time.
  • Try to only rely on one DB fro weakref kep - WIP
  • Clear both mongo and file types of their "current" entries upon POCS init
  • Change include_collection to store_permanently for calls to insert_current
  • Change use_mongo to store_result where appropriate
  • Private POCS method _check_environment changed to public class method, i.e.
    POCS.check_environment
  • Tests that use the db now parameterize it across db types. Now testing
    takes almost twice as long! (should deal with this somehow)

* PanDB is a factory class that will load either PanMongoDB or PanFileDB
* PanFileDB merely writes the json to a flat file
	* One file per "collection" along with a `current_<collection` for current
	entries. Note this means each "current" collection is a separate file.
	* Use `bson.json_util` for serializing. Changes a few things with time
	because it _can_ store timezone aware info, which we don't use properly
	in `current_time`.
* Try to only rely on one DB fro weakref kep - WIP
* Clear both mongo and file types of their "current" entries upon POCS init
* Change `include_collection` to `store_permanently` for calls to `insert_current`
* Change `use_mongo` to `store_result` where appropriate
* Private POCS method `_check_environment` changed to public class method, i.e.
	`POCS.check_environment`
* Tests that use the `db` now parameterize it across db types. Now testing
	takes almost twice as long! (should deal with this somehow)
@jamessynge
Copy link
Contributor

I think we can use Python's json module, as it supports custom encoder and decoder classes. I'm willing to help with that if you like.

@wtgee
Copy link
Member Author

wtgee commented Jan 26, 2018

I think we can use Python's json module, as it supports custom encoder and decoder classes. I'm willing to help with that if you like.

I can see some advantages to a custom encoder (as mentioned ni #206) but I don't see any immediate need when this already does what we want.

@jamessynge
Copy link
Contributor

Don't you have to install mongodb to use bson.json_util?

@wtgee
Copy link
Member Author

wtgee commented Jan 26, 2018

No, bson is a standalone module. It's installed by pymongo but also doesn't required pymongo.

But I could still see some use cases for a custom encoder/decoder, just not sure it's a priority.

@jamessynge
Copy link
Contributor

FWIW, I notice that we don't use bson.json_util for much: Most of the loads() and dumps() calls are using the regular json library.

$ egrep '\b(bson|json_util|dumps|loads)\b' $(find . -name '*.py')
./scripts/simple_sensors_capture.py:from bson import json_util
./scripts/simple_sensors_capture.py:                        f.write(json_util.dumps(data) + '\n')
./scripts/sensors_simulator.py:import bson.json_util as json_util
./scripts/collect-grid-stats.py:        f.write(json.dumps(readings))
./peas/tests/serial_handlers/protocol_arduinosimulator.py:        s = json.dumps(self.message) + '\r\n'
./peas/tests/serial_handlers/protocol_arduinosimulator.py:            message = json.loads("""
./peas/tests/serial_handlers/protocol_arduinosimulator.py:            message = json.loads("""
./pocs/dome/bisque.py:            response_obj = json.loads(response)
./pocs/mount/bisque.py:            response_obj = json.loads(response)
./pocs/camera/sbigudrv.py:        On construction loads SBIG's shared library which must have already
./pocs/utils/messaging.py:from bson import ObjectId
./pocs/utils/messaging.py:from json import dumps
./pocs/utils/messaging.py:from json import loads
./pocs/utils/messaging.py:        msg_object = dumps(message, skipkeys=True)
./pocs/utils/messaging.py:                msg_obj = loads(msg)
./pocs/utils/logger.py:    logger_key = (profile, json.dumps(log_config, sort_keys=True))
./pocs/utils/rs232.py:        return json.loads(line)
./pocs/utils/database.py:from bson import json_util
./pocs/utils/database.py:                content = json.dumps(entries, default=json_util.default)
./pocs/utils/images/cr2.py:from json import loads
./pocs/utils/images/cr2.py:        exif = loads(subprocess.check_output(cmd_list).decode('utf-8'))
./build/lib/pocs/dome/bisque.py:            response_obj = json.loads(response)
./build/lib/pocs/mount/bisque.py:            response_obj = json.loads(response)
./build/lib/pocs/camera/sbigudrv.py:        On construction loads SBIG's shared library which must have already
./build/lib/pocs/utils/messaging.py:from bson import ObjectId
./build/lib/pocs/utils/messaging.py:from json import dumps
./build/lib/pocs/utils/messaging.py:from json import loads
./build/lib/pocs/utils/messaging.py:        msg_object = dumps(message, skipkeys=True)
./build/lib/pocs/utils/messaging.py:                msg_obj = loads(msg)
./build/lib/pocs/utils/logger.py:    logger_key = (profile, json.dumps(log_config, sort_keys=True))
./build/lib/pocs/utils/rs232.py:        return json.loads(line)
./build/lib/pocs/utils/database.py:from bson import json_util
./build/lib/pocs/utils/database.py:                content = json.dumps(entries, default=json_util.default)
./build/lib/pocs/utils/images/cr2.py:from json import loads
./build/lib/pocs/utils/images/cr2.py:        exif = loads(subprocess.check_output(cmd_list).decode('utf-8'))

@wtgee
Copy link
Member Author

wtgee commented Jan 30, 2018

I get the feeling someone doesn't want us to use bson. :)

We should just have a common encode/decode method in our utils folder that would let us change out the underlying implementation however we want. As for the files listed above, it's not a great example of the usage and the main ones in messaging and database tare the ones that have given us problems precisely because json doesn't handle things out of the box.

I'm not necessarily tied to bson, it just seems like we will be writing in a lot of the same functionality.

For now I'll make a common function that can be called so ti's easier to change in future.

@jamessynge
Copy link
Contributor

I just don't want to be tied to Mongo. I think I now better understand that loads(default=bson.json_utils.default) provides the hook for serializing. We can leave that as is. BUT, let's put the json code into a single py file, so that we can swap it out easily.

@codecov
Copy link

codecov bot commented Feb 1, 2018

Codecov Report

Merging #414 into develop will increase coverage by 0.45%.
The diff coverage is 81.93%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #414      +/-   ##
===========================================
+ Coverage    67.22%   67.68%   +0.45%     
===========================================
  Files           61       62       +1     
  Lines         5058     5164     +106     
  Branches       702      719      +17     
===========================================
+ Hits          3400     3495      +95     
- Misses        1471     1476       +5     
- Partials       187      193       +6
Impacted Files Coverage Δ
pocs/camera/simulator.py 100% <ø> (ø) ⬆️
pocs/observatory.py 73.63% <0%> (+0.28%) ⬆️
peas/weather.py 0% <0%> (ø) ⬆️
pocs/utils/serializers.py 100% <100%> (ø)
pocs/utils/__init__.py 87.09% <100%> (ø) ⬆️
peas/sensors.py 36.88% <50%> (-1.21%) ⬇️
pocs/core.py 79.52% <57.14%> (-1.11%) ⬇️
pocs/base.py 90.47% <75%> (-6.5%) ⬇️
pocs/utils/database.py 89.04% <87.85%> (-2.9%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21c6e32...5463843. Read the comment docs.

Copy link
Contributor

@jamessynge jamessynge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the PR (remove Major WIP) when you're ready for a final review.

@@ -20,6 +20,7 @@ directories:
mounts: POCS/resources/mounts
db:
name: panoptes
type: mongo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should we switch to the new type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to add an sqlite type and make that the default. I think the flat file is a good alternative but I'm not sure ideal.

@@ -76,8 +75,9 @@ def take_exposure(self, seconds=1.0 * u.second, filename=None, dark=False, block
# Build FITS header
header = self._fits_header(seconds, dark)

# Set up a Timer that will wait for the duration of the exposure then copy a dummy FITS file
# to the specified path and adjust the headers according to the exposure time, type.
# Set up a Timer that will wait for the duration of the exposure then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep having little formatting/cleanup changes like these inside of bigger changes. Perhaps after this you'd be willing to do another PR that scrubs the whole repo? No need to remove this change from ths PR.

Copy link
Member Author

@wtgee wtgee Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I've just been doing them as I see them, which does pollute the PR some.

@@ -290,15 +290,12 @@ def analyze_recent(self):
self.current_offset_info))

# Update the observation info with the offsets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comment?

pocs/core.py Outdated
@@ -366,15 +364,16 @@ def is_weather_safe(self, stale=180):
record = self.db.get_current('weather')

is_safe = record['data'].get('safe', False)
timestamp = record['date']
timestamp = record['date'].replace(tzinfo=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that replace tzinfo about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought I had made a comment about this. One thing about bson is that it is correctly serializing the datetime and that includes timezone information. However we are not properly using that timezone information in current_time. Basic math between a naive datetime and a timezone-aware datetime cannot be done so here we are just dropping the tzinfo, which reverts to same behaviour as we currently have.

I had a change to current_time(datetime=True) that made it timezone aware based off what timezone is listed in the config file, but that had implications beyond the scope of this PR. Will file an issue about it.

@@ -68,11 +68,10 @@ def test_send_datetime(forwarder, sub, pub):

def test_mongo_objectid(forwarder, sub, pub, config, db):

db.insert_current('config', {'foo': 'bar'})
id0 = db.insert_current('config', {'foo': 'bar'}, store_permanently=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a test of serialization/deserialization of a mongo specific type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for a mongo specific ObjectId. With flat files that is a uuid4 id. Will rename test, but fundamentally it is testing whether that is delivered via the messaging system correctly.

rec = {'test': 'insert'}
db.insert_current('config', rec, include_collection=False)
id0 = db.insert_current('config', rec, store_permanently=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should asked earlier: what does store_permanently mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It replaces include_collection and basically means that it stores it in the permanent collection and not just in the current collection.

config=config,
simulator=['all'],
ignore_local_config=True,
db_name='panoptes_testing',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think config provides this db name.

@@ -25,6 +31,7 @@ def pocs(config, observatory):
pocs = POCS(observatory,
run_once=True,
config=config,
db_name='panoptes_testing',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

pocs.power_down()


def test_run_interrupt_with_reschedule_of_target(observatory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably very worthwhile, but is this related?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure why this is in this PR. 😕

@jamessynge
Copy link
Contributor

I skimmed the latest 5 commits, which look reasonable and pass travis. Is this still WIP?

@wtgee wtgee changed the title Massive commit of decoupling from mongo and addinng a PanFileDB type - Major WIP Massive commit of decoupling from mongo and addinng a PanFileDB type Feb 4, 2018
@wtgee
Copy link
Member Author

wtgee commented Feb 4, 2018

I'm not too sure about the test_pocs.py as it now runs almost double the tests so I was going to look into changing that. I removed a few of the sleep calls.

But yes, I think ready for checking and I'd like to get on PAN008 soon since I can't have mongo.

@jamessynge
Copy link
Contributor

No mongo because it is a Raspberry Pi?

@wtgee
Copy link
Member Author

wtgee commented Feb 4, 2018

No mongo because it is a Raspberry Pi?

Correct. You can get it on there it is just a pain to get a recent version.

FWIW, I checked out this branch on PAN008 last night and it seemed to work well. It was cloudy and I was mostly doing alignment so didn't run through full observation run but the basics worked ok.

This PR doesn't handle any kind of cleanup/maintenance of the files it creates. I'm also not sure the $PANDIR/json_store is the best location.

@wtgee wtgee merged commit f544638 into panoptes:develop Feb 4, 2018
@wtgee wtgee deleted the decouple-storage-from-mongo-120 branch February 4, 2018 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants