-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Util #12
Util #12
Conversation
Correcting documentation in setup.py
""" | ||
_x = x.sort_index() | ||
_y = y.sort_index() | ||
if intersect: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a test that set(_x.index) and _x.index are the same length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even think about that. We shouldn't allow for duplicate ids. Good catch!
A few comments. |
@@ -0,0 +1,225 @@ | |||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that none of this files have the copyright notice on top. Should it be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Few additional comments. Thanks! |
Adding tests for duplicate ids, updating documentation. Adding headers for copyright
# ---------------------------------------------------------------------------- | ||
# Copyright (c) 2016--, gneiss development team. | ||
# | ||
# Distributed under the terms of the Modified BSD License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until ete is BSD compatible, it is not appropriate to import ete3 objects into a BSD project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #13. Thanks!
This PR is dependent on #13. This project needs to be GPLed before we add any more code in. |
Adding some mutability tests Adding warning about replacing internal node names
This is ready for review/merge. |
|
||
if intersect: | ||
idx = subtableids & submetadataids | ||
idx = sorted(idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the ids to be sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise the pandas dataframe will be scrambled due to the way that sets work.
Sorting them the only way to resolve this issue, for the sake of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mortonjt if the sorting is just for testing, can you sort them in the test prior doing the assert_frame_equals? Sorting is O(nlogn) so you can avoid this in big datasets unless strictly needed. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed the sort and moved over this over to the unittests.
ValueError: | ||
Raised if `tree` and `name` have incompatible sizes. | ||
""" | ||
_tree = tree.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the copy strictly necessary? I'm thinking if it will be okay to just do the renaming inplace - just thinking on the memory footprint of this operations as the tree size increases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copy isn't strictly necessary. But it is certainly useful. Just added an inplace
parameter here.
It'll be tricky to have this functionality in the other modules, mainly because pandas and skbio don't make filter and shear
operations in place (as far as I'm aware). If you think this is necessary, we can create an issue for this, so that this PR can proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow with the second part of your comment. With the inplace
parameter my comment is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet. Then that settles it :)
On Mon, Jul 25, 2016 at 4:56 PM, Jose Navas [email protected]
wrote:
In gneiss/util.py
#12 (comment):
- tree : skbio.TreeNode
Tree object where the leafs correspond to the features.
- names : list, optional
List of labels to rename the tip names. It is assumed that the
names are listed in level ordering, and the length of the list
is at least as long as the number of internal nodes.
- Returns
- skbio.TreeNode
Tree with renamed internal nodes.
- ValueError:
Raised if `tree` and `name` have incompatible sizes.
- """
- _tree = tree.copy()
I don't think I follow with the second part of your comment. With the
inplace parameter my comment is resolved.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/biocore/gneiss/pull/12/files/477d1967bb7ee99eb0d4d40d5225e0a62951388a#r72166846,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_a3ZVo6lBTJ2WocPg7ZNETNTnwLhHwks5qZU0pgaJpZM4JPRmM
.
Thanks @mortonjt a couple of comments! |
All comments addressed. Should be good to go after tests pass. |
This is adding in the utility helper functions discussed in code review.
@antgonza @ElDeveloper @josenavas, mind if you could sanity check this PR? Thanks!!
Note that this depends on #13