Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Util #12

Merged
merged 13 commits into from
Jul 26, 2016
Merged

Util #12

merged 13 commits into from
Jul 26, 2016

Conversation

mortonjt
Copy link
Collaborator

@mortonjt mortonjt commented Jul 19, 2016

This is adding in the utility helper functions discussed in code review.

@antgonza @ElDeveloper @josenavas, mind if you could sanity check this PR? Thanks!!

Note that this depends on #13

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 99.18% when pulling 2908c93 on mortonjt:util into 1be0fe7 on biocore:master.

"""
_x = x.sort_index()
_y = y.sort_index()
if intersect:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a test that set(_x.index) and _x.index are the same length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even think about that. We shouldn't allow for duplicate ids. Good catch!

@antgonza
Copy link
Contributor

A few comments.

@@ -0,0 +1,225 @@
import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that none of this files have the copyright notice on top. Should it be added?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@josenavas
Copy link
Member

Few additional comments. Thanks!

mortonjt added 2 commits July 19, 2016 10:13
Adding tests for duplicate ids, updating documentation.
Adding headers for copyright
# ----------------------------------------------------------------------------
# Copyright (c) 2016--, gneiss development team.
#
# Distributed under the terms of the Modified BSD License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until ete is BSD compatible, it is not appropriate to import ete3 objects into a BSD project...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #13. Thanks!

@mortonjt
Copy link
Collaborator Author

This PR is dependent on #13. This project needs to be GPLed before we add any more code in.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 99.219% when pulling d6a037b on mortonjt:util into 1be0fe7 on biocore:master.

Adding some mutability tests
Adding warning about replacing internal node names
@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 99.265% when pulling 477d196 on mortonjt:util into f79d9a6 on biocore:master.

@mortonjt
Copy link
Collaborator Author

This is ready for review/merge.


if intersect:
idx = subtableids & submetadataids
idx = sorted(idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the ids to be sorted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise the pandas dataframe will be scrambled due to the way that sets work.

Sorting them the only way to resolve this issue, for the sake of testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortonjt if the sorting is just for testing, can you sort them in the test prior doing the assert_frame_equals? Sorting is O(nlogn) so you can avoid this in big datasets unless strictly needed. Does that make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed the sort and moved over this over to the unittests.

This was referenced Jul 25, 2016
ValueError:
Raised if `tree` and `name` have incompatible sizes.
"""
_tree = tree.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the copy strictly necessary? I'm thinking if it will be okay to just do the renaming inplace - just thinking on the memory footprint of this operations as the tree size increases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy isn't strictly necessary. But it is certainly useful. Just added an inplace parameter here.

It'll be tricky to have this functionality in the other modules, mainly because pandas and skbio don't make filter and shear operations in place (as far as I'm aware). If you think this is necessary, we can create an issue for this, so that this PR can proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow with the second part of your comment. With the inplace parameter my comment is resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. Then that settles it :)

On Mon, Jul 25, 2016 at 4:56 PM, Jose Navas [email protected]
wrote:

In gneiss/util.py
#12 (comment):

  • tree : skbio.TreeNode
  •    Tree object where the leafs correspond to the features.
    
  • names : list, optional
  •    List of labels to rename the tip names.  It is assumed that the
    
  •    names are listed in level ordering, and the length of the list
    
  •    is at least as long as the number of internal nodes.
    
  • Returns

  • skbio.TreeNode
  •   Tree with renamed internal nodes.
    
  • ValueError:
  •    Raised if `tree` and `name` have incompatible sizes.
    
  • """
  • _tree = tree.copy()

I don't think I follow with the second part of your comment. With the
inplace parameter my comment is resolved.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/biocore/gneiss/pull/12/files/477d1967bb7ee99eb0d4d40d5225e0a62951388a#r72166846,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_a3ZVo6lBTJ2WocPg7ZNETNTnwLhHwks5qZU0pgaJpZM4JPRmM
.

@josenavas
Copy link
Member

Thanks @mortonjt a couple of comments!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 99.27% when pulling 371cc16 on mortonjt:util into f79d9a6 on biocore:master.

@mortonjt
Copy link
Collaborator Author

All comments addressed. Should be good to go after tests pass.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 99.219% when pulling 6d70cf9 on mortonjt:util into f79d9a6 on biocore:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 99.219% when pulling b049ced on mortonjt:util into f79d9a6 on biocore:master.

@antgonza antgonza merged commit ff3df22 into biocore:master Jul 26, 2016
@wasade wasade mentioned this pull request Aug 4, 2016
@mortonjt mortonjt deleted the util branch April 6, 2017 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants