Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option to change static method name when it conflicts with instance name #308

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jslee02
Copy link
Member

@jslee02 jslee02 commented Apr 17, 2020

chimera only print warnings if a static method has the same name with instance methods in the same class (when class.static_methods is used in the template).

However, method overloading for instance and static methods with the same name is not also supported by the native Python. Also, bindings generated using pybind11 cause runtime error.

It would make more sense to either not generating static (or instance) method for the case or change one of the names.

This PR is a possible solution for this issue by providing an option on how to change a static method when the name is conflicting.

Related issue #262.

@jslee02 jslee02 added this to the Chimera 0.2.0 milestone Apr 17, 2020
@jslee02 jslee02 requested a review from psigen April 17, 2020 02:28
@jslee02 jslee02 marked this pull request as ready for review April 17, 2020 05:25
Copy link
Member

@psigen psigen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this approach because it means we are very arbitrarily changing output names. It is very likely that this type of simplistic name mangling will cause problems with methods that include proper names or acronyms.

This type of namespace conflict is also very atypical as it is a sign of poor c++ code design in the first place (it is simply confusing and prone to errors). Do you have some sort of codebase where this is so common that you cannot do this via an override in the configuration file?

If we really must do this (which again, seems like it should not at all be a common problem since it would cause confusion in C++ code as well), then one mechanism to address it would be to create a .static table within the generated bindings and redundantly list the static methods there, then simply always use the instance methods at the top level class. This could be done at the template level.

I would prefer to simply throw a binding-generation-time error here and force users to rename one of the methods in the config file, if it is at all possible. It is certainly reasonable to catch the issue before binding compilation.

@jslee02
Copy link
Member Author

jslee02 commented Apr 26, 2020

I don't like this approach because it means we are very arbitrarily changing output names.

I see your point. Actually that's the motivation of making this feature optional (defaulting to NO_CHANGE) and providing several options on how to change the names (to upper, to lower, to camel, to pascal). I still see this feature is valuable as it doesn't require to change the scope and doesn't need to specify all the individual name overloadings.

Do you have some sort of codebase where this is so common that you cannot do this via an override in the configuration file?

One immediate example is Aikido (StateSpace::State, SE3::State : StateSpace::State), and I believe this pattern is pretty common in C++.

create a .static table within the generated bindings and redundantly list the static methods there, then simply always use the instance methods at the top level class. This could be done at the template level.

This sounds good to me as it doesn't require a code change. Also this wouldn't cause conflict by having a nested class, named static, since static is a C++ keyword. We could add this method to the policy list.

In any case, I think we shouldn't generate invalid bindings by either changing the (default) binding templates (as you suggested) or code by skipping when name conflict is found. It isn't addressed in this PR though.

@psigen
Copy link
Member

psigen commented Apr 26, 2020

What are the situations where simply suppressing the static method in favor of the instance method would not be an appropriate default behavior?

One immediate example is Aikido (StateSpace::State, SE3::State : StateSpace::State), and I believe this pattern is pretty common in C++.

In Aikido, it seems like this is not a conflict, am I missing something? Which method does SE3.State conflict with?

@jslee02
Copy link
Member Author

jslee02 commented Apr 26, 2020

What are the situations where simply suppressing the static method in favor of the instance method would not be an appropriate default behavior?

There isn't, but it would require the user to specify unique names for all the static methods to unsuppress them. Probably, this is because I'm trying to keep the YAML file as small as possible.

In Aikido, it seems like this is not a conflict, am I missing something? Which method does SE3.State conflict with?

Ah, sorry. I'm confused with another issue: #310 Please ignore my comment on it.


This type of namespace conflict is also very atypical as it is a sign of poor c++ code design in the first place (it is simply confusing and prone to errors). Do you have some sort of codebase where this is so common that you cannot do this via an override in the configuration file?

I agree that it may not be a good practice, but it's still a legitimate C++ syntax. I currently don't have a sharable codebase at this moment.

@psigen
Copy link
Member

psigen commented Apr 26, 2020

This sounds good to me as it doesn't require a code change. Also this wouldn't cause conflict by having a nested class, named static, since static is a C++ keyword. We could add this method to the policy list.

Let's see if we can do this. This would potentially be a useful strategy for a couple of other inheritance related issues.


In general, I am fine with any approach that does not attempt to automatically mutate function names, since the output is extremely hard to validate. There are entire libraries centered around doing this conversion (and debate about the correct output of the conversions) and I do not want to take on that scope within Chimera.

Instead, let us either create a .static subtable or, if that's not possible we could append a suffix like _static to the conflicting methods (less desirable, but still better than writing arbitrary name mangling).

@jslee02
Copy link
Member Author

jslee02 commented Apr 26, 2020

There are entire libraries centered around doing this conversion (and debate about the correct output of the conversions) and I do not want to take on that scope within Chimera.

Yeah, I had the same thought initially, but our case can be narrow down just a few cases by the assumption of the valid function name (no space, no special characters but just underscore).

I think we can provide several case conversion options, and if the user doesn't find a best option for them, then we can also provide a regex option for extension. This way the user can do whatever they want. Then finally chimera should skip methods when the modified name by the option (including no_change) is still in conflict (generating a warning).

Instead, let us either create a .static subtable or, if that's not possible we could append a suffix like _static to the conflicting methods (less desirable, but still better than writing arbitrary name mangling).

I think these are good options. Actually I was thinking of adding a prefix or/and a suffix in the next PR.

@psigen
Copy link
Member

psigen commented Apr 26, 2020

I think the static subtable is the best option in the template because we could always put all the static methods in there, and we would know that this would conflict on the off-chance that someone named something _static.

This is better than employing name mangling which is quite a lot of complexity: we are talking about adding an arbitrary regex rewriter for method names just for this one type of conflict that really only exists in very few codebases (I don't have access to yours, but I cannot think of a single one where this is intentionally the case right now)

@jslee02
Copy link
Member Author

jslee02 commented Apr 26, 2020

Sounds fair to me. Let's use the scope-based approach first and revisit name mangling option if we find reasonable use cases. My baseline is a solution that resolves the name conflict with minimal YAML file changes.

I think the static subtable is the best option in the template because we could always put all the static methods in there, and we would know that this would conflict on the off-chance that someone named something _static.

One concern I have with this change is that we cannot programmatically decide (or using configuration YAML) whether to use the static subtable. We may only want to use the static subtable if there are name conflicts.

A possible solution I can think of now is introducing a configuration option (e.g., static_method_scope) whether to use subtable for static methods. If desired, we could take this as a string to be able to use a custom subtable name.

@jslee02 jslee02 marked this pull request as draft April 26, 2020 22:33
@jslee02
Copy link
Member Author

jslee02 commented Apr 27, 2020

Well, I wasn't able to find a way to add a subtable into a class in both Boost.Python and pybind11. 😞

Here is an example code:

class Integer {
public:
    Integer(int val) : m_val(val) {}
    int add(int a) {
        return a + m_val;
    }
    static int add(int a, int b) {
        return a + b;
    }
private:
    int m_val;
};

// pybind11
auto integer = py::class_<Integer>(attr, "Integer")
        .def(py::init<int>())
        .def("add", +[](Integer *self, int a) -> int { return self->add(a); });
auto static_attr = integer.attr("static");
static_attr.def("add",
    +[](int a, int b) -> int { return Integer::add(a, b); });  // no .def() nor .def_static() in static_attr

// Boost.Python
::boost::python::class_<Integer, ::boost::noncopyable>(
    "Integer", boost::python::no_init)
    .def("__init__", ::boost::python::make_constructor(
        +[](int val) -> Integer * { return new Integer(val); },
        ::boost::python::default_call_policies())))
    .def("add", +[](Integer *self, int a) -> int { return self->add(a); }));

::boost::python::object parent_object(
    ::boost::python::scope().attr("Integer").attr("static"));
::boost::python::scope parent_scope(parent_object);
::boost::python::def("add", +[](int a, int b) -> int { return Integer::add(a, b); });

In pybind11, a subtable should be added by adding an attribute to a class using attr() (see the example code). However, adding a method is not allowed to an attribute.

The Boost.Python binding code builds but Python complains when importing the module:

SystemError: initialization of issue262_static_method_boost_python raised unreported exception

@psigen
Copy link
Member

psigen commented May 4, 2020

Hmm, could we create a namespace or class instead of an attr?

@jslee02
Copy link
Member Author

jslee02 commented May 4, 2020

Hmm, could we create a namespace or class instead of an attr?

Yeah, we could create a Python class using a dummy C++ class as something like (but not namespace):

class Integer {
public:
  Integer(int val) : m_val(val) {}
  int add(int a) { return a + m_val; }
  static int add(int a, int b) { return a + b; }

private:
  int m_val;
};
class __dummy__ {};

auto integer = py::class_<Integer>(m, "Integer")
                     .def(py::init<int>())
                     .def("add", +[](Integer *self, int a) -> int {
                       return self->add(a);
                     });
py::class_<__dummy__>(integer, "static")
    .def_static("add",
                +[](int a, int b) -> int { return Integer::add(a, b); });

In Python,

>>> import test_module as t
>>> t.Integer.static.add(1, 2)
3

The question is which static methods should be in .static subtable.

Possible options are:

(1) All the static methods
(2) All the static methods that are not visible (due to name conflicts)
(3) Let the user decide

I'm inclined to (2) (with warnings) because the subtable is just a workaround to avoid simply ignoring all the invisible static methods (instead we generate those static methods in a sub-scope), which we encourage the user to resolve it by either renaming the C++ static method names or override the names in the YAML configuration. For (2), we might need an additional method tag in the class template something like:

(a) {{class.visible_method}}: all the methods except static methods in a name conflict; {{class.invisible_methods}}: all the static methods in a name conflict
(b) {{class.visible_static_method}}: all the static methods except static methods in a name conflict; {{class.invisible_static_methods}}: all the static methods in a name conflict

@psigen
Copy link
Member

psigen commented May 11, 2020

Hmm, I am thinking that (1) might be the clearest convention. Then, users can specifically always access static methods using the .static table if they are in a situation where they must deal with an API where the methods may or may not be visible due to implementation details.

I am worried that we get into situations where adding and removing methods on child classes could change the accessibility of parent's static methods, and using the parent class API if you have a polymorphic class becomes difficult: some child classes will have the .static.foo() reference while others have .foo(). If we do (1), then in this situation, the user could always use .static.foo() which will be present on all child classes. regardless of whether they implement a foo() of their own.

However, now that I'm thinking about it, will pybind11 do this work for us automatically by creating a class reference like __class__, so that we could do: A.__class__.foo()?

Is there a heavy performance or compilation penalty to doing things this way?

@jslee02
Copy link
Member Author

jslee02 commented May 11, 2020

The option (1) also sounds good to me for the reasons you pointed out.

However, now that I'm thinking about it, will pybind11 do this work for us automatically by creating a class reference like __class__, so that we could do: A.__class__.foo()?

Not that I know of.

Is there a heavy performance or compilation penalty to doing things this way?

Honestly, I'm not sure. Probably a few more indirect references?

If we go with (1), I see some sub-options as:

(a) Put static methods always in .static subtable
(b) Use optional boolean configuration whether to put static methods in .static subtable
(c) Use optional string configuration to specify the subtable name. If it's empty or not specified, we don't use the subtable.

I'm thinking of option.{use|enable}_static_method_subtable and option.static_method_subtable_name for (b) and (c), respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants