Skip to content

Open Questions

ehennestad edited this page Nov 30, 2022 · 16 revisions

EH

Manifest classes for item tables

  • There is a lot of commonality between the OphysManifest class and the EphysManifest class. It would make sense to gather this in a superclass, and the Manifest class is an obvious candidate. Would that be ok to do, or would that violate some principles of design?

    • VI: Much of OOP design pattern lore guides to "favor association over inheritance", which is what's done currently. However I believe this lore largely stems from Java & strongly reflects its lack of multiple inheritance. I agree with your instinct to favor inheritance where rational such as here. Seems a worthwhile refactor to consider. And regardless of association or inheritance relationship, refactoring to centralize common functionality (e.g. updateManifest as noted below) would be worthwhile. As your next question gets at, this may involve detective work to determine what is or could/should be common.
    • EH: This is now implemented and there were a handful of methods that are shared among Ephys- and OphysManifest that are now methods of the Manifest superclass
  • The Ephys- and OphysManifest classes do not have the same internal logic for retrieving and caching manifests. In the OphysManifest, all the different tables are fetched and put into a struct and immediately added to the disk cache, whereas in the EphysManifest they are handled more individually and the final processed tables are initially cached in memory, and only added to the disk cache when the property get method for particular item types are invoked. Is there a particular reason for this (i.e performance considerations during retrieval)?

    • VI: Great that you're analyzing this. Perhaps DM has insights.
    • DM: The EphysManifest handling adheres very closely to the Allen SDK. The OphysManifest handling predates my involvement, but it's also the case that OphysManifest tables are much much smaller, so re-fetching or re-processing portions is less costly. So the additional caching of EPhysManifest might anyway be warranted.
    • EH: I found that the biggest discrepancy was that item tables for the OphysManifest were all lumped together in one struct in the disk cache, whereas the ephys item tables were treated individually. I made the change to treat ophys tables individually as well, and I think this provides some improvements, e.g when fetching ophys sessions, it only fetches the session table where before it fetched the session, experiment and the cell tables (and the cell table is huge). Also, the underlying code is now more "symmetric". Thus, it should now be easier to do the same (post)processing of ophys tables as ephys tables, i.e joining tables and adding counts for subtables.
  • Would it be cleaner to use the OnDemandProperties mixin for memory caching of tables in the Manifest classes?

    • VI: So far OnDemandProperties has only been used for user-facing properties of item object properties. I guess adapting to private properties would be fairly trivial, but would have to analyze wrt to how intertwined/specialized it is for item objects today. Conceptually seems sound, but would tend to prioritize lower if it doesn't prove highly straightforward.
    • EH: This was straight forward, and since item table get-access is public, they are in theory user facing (although users should not have to interact with them directly).
  • What is the purpose of the UpdateManifests method - Why is it called update? It seems more like a clear/reset method...

    • VI: I believe it was intended to allow a fresh check of the data source (whether API or now possibly S3), which is always subject to change (which has occurred from time to time during the project, affecting demos etc). Maybe "Refresh" would have been a better word than "Update", but the implementation is a clear/reset type operation as you say.
    • EH: I changed the name to resetManifest, and added a new method updateManifest that does the reset, but additionally fetches all the manifests again.

General note: I changed some syntax internally in these classes, so now all the tables are referred to as item tables (in general) and session table, unit table etc. in particular. A manifest would then be the collection of all item tables for a dataset. I think this makes things clearer, as before a manifest could be both an individual table, but also the collection of tables (and, as we discussed before, cache maps are also referred to as manifests).

General syntax suggestions:

  • To give the manifests more of a matlab flavour I would suggest to rename the item table properties like ‘ophys_sessions' or ‘ephys_sessions' to ‘Sessions’.
  • Following the same line, internally in the classes (manifest classes, but also others) to consistently refer to objects of the class with the variable name obj instead of various acronyms that are specific to specific classes.

OnDemandProps

  • In the bot.item.internal.mixin.OnDemandProps/fetch_cached method: a) If a property has previously been marked as Unavailable, right now a new attempt at fetching the property will not do anything. Maybe this makes sense in some circumstances, but I wonder if it would make more sense to make another attempt at fetching the property value. For example, I used the OnDemand properties for the Manifest classes, and sometimes there might be aborted file downloads that causes the property not to be cached. In these cases I want to reference the property again and hopefully fetch the value again. Does this make sense?

  • This is may be more from the developer’s point of view, but related to above, if the attempt at fetching the OnDemand property value fails, the error is now caught, but there is no indication to the user that it failed. Would it be better the throw an error if fetching a property fails? As a practical example, in issue #155 someone experienced a bug, but the error message was unrelated to the point of failure because the actual error was caught silently. This makes it also harder to debug such cases.

Caching / API

  • In the bot.internal.cache/CachedApiCall method, there are a lot of optional arguments, like: nPageSize, strFormat, strRMAPrefix, strHost, strScheme. However, none of the calls to this method makes use of them, and for some, like strScheme, strHost and strRMAPrefix, it seems like they are not really variable in the context of the Allen Brain Observatory Api. Was there any past scenario where these were useful, or is there any future scenario where they might be put to use?
    • DM: This was my conservative implementation, as I learned how to use the back-end HTTP API. Future-proofing in case Allen changes the way the API should be accessed?
  • Do you know if there exists documentation for the RMA API that is specific for the brain observatory? Or is the current API calls in matlab derived from the python toolbox?
    • DM: Unknown to me
  • Is there a reason why behavioral data and pupil data are not added as linked files? Are they not available as well_known_files from the API?

Other

Are the visual stimuli used in the 2P dataset the same as those used in the neuropixels dataset? If no, can we find the 2P visual stimuli anywhere (maybe a question for someone at Allen)

VI

  • Are there H5 files in the Visual Coding 2P dataset at this point? The "SessH5" linked file appears defunct in various ophys sessions I've retrieved lately.

    • EH: The files exist in the data repository, and they contain analysis data, more specifically metrics about individual cells (rois). I don't think these data are used anywhere in matlab bot, but I imagine they could be in the future. The autodownload of these files are turned off, so they are not available through bot, and they also don't appear to have any implemented linked properties.
Clone this wiki locally