Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job's __unicode__ method leads to misleading representations #459

Closed
Voyz opened this issue Sep 17, 2020 · 12 comments
Closed

Job's __unicode__ method leads to misleading representations #459

Voyz opened this issue Sep 17, 2020 · 12 comments

Comments

@Voyz
Copy link

Voyz commented Sep 17, 2020

Expected Behavior

When printing job or converting one to string, I'd expect to see an indication that it is the Job object I'm operating on.

Current Behavior

Job prints its name followed by parenthesis and some properties. If the name is misleading (eg. MagicMock), the job disguises itself as something else.

Steps to Reproduce

task = MagicMock()
job = scheduler.add_job(task, ...)
print(job)
>>> MagicMock (trigger:...)

Context (Environment)

It is confusing and potentially error-prone when debugging or serialising Jobs.

Detailed Description

I suggest that Job should always print Job at the beginning when converting to unicode.

@agronholm
Copy link
Owner

The __repr__() function does in fact do this, and it is the proper way to find out what object you're dealing with when debugging.
As for serialization, I have no clue what difficulties could arise from this.

@Voyz
Copy link
Author

Voyz commented Jan 10, 2021

Thanks for the response. When you say 'the proper way to find out what object you're dealing with', do you have any particular programming or Python guideline/mantra in mind that suggests such pattern?

If I may, in my opinion this is a bad design that is bug prone, as it silently aliases one class for another. One improvement I can think of would be to return something like Job(target=MagicMock, ...) from __repr__.

@agronholm
Copy link
Owner

If in a REPL you typed job instead of print(job), it would print the __repl__() result from an object, which in case of the Job class would be along the lines of <Job (id=...>. The print() function, on the other hand, invokes either __str__() or __unicode__() in the object and the output was meant for use in scheduler.print_job(). You can also call repr(job) to directly get the appropriate output.

@Voyz
Copy link
Author

Voyz commented Jan 11, 2021

I understand what you outline, but politely I'll stay with my opinion in that this is a bad design - I hope you can see that I'm merely trying to aid you in removing technical debt from the library. Once again, if you think this is in some way a standard way or 'proper way' to do this, then please share your resources so we all can understand the reasoning behind such decision.

To support my argument let me bring a couple of pieces from the official Python documentation:

object.__str__(self)

Called by str(object) and the built-in functions format() and print() to compute the “informal” or nicely printable string representation of an object. The return value must be a string object.
This method differs from object.__repr__() in that there is no expectation that __str__() return a valid Python expression: a more convenient or concise representation can be used.
The default implementation defined by the built-in type object calls object.__repr__().

object.__repr__(self)

Called by the repr() built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned. The return value must be a string object. If a class defines __repr__() but not __str__(), then __repr__() is also used when an “informal” string representation of instances of that class is required.
This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

And just for clarity, let me repeat the highlighted segments:

  • the “informal” or nicely printable string representation of an object
  • the “official” string representation of an object
  • it is important that the representation is information-rich and unambiguous

I think it is pretty clearly outlined that both __str__ and __repr__ are meant to represent the object. Currently, Job.__str__ (and by extension Job.__unicode__ on which there is no documentation) does not return the “informal” or nicely printable string representation of an object, but a representation of an object (or function) that the Job is going to call. Given that APS currently supports __repr__'s definition, but breaks it for __str__ by aliasing the job object behind another object (or a function), I think it is fair to say that it goes against the official documentation and contains an incorrect implementation of the __str__ method. The fact that one could still call repr(job) to get the representation of the Job object does not change that.

Does anything in APS actually rely on the current implementation of __str__ in a way that would make the suggested correction troublesome?

@agronholm
Copy link
Owner

I hope you can see that I'm merely trying to aid you in removing technical debt from the library

APScheduler does have technical debt but this isn't technical debt, it's deliberate design. I honestly do not understand what you think makes this "bug prone". The string representation is only ever used for visual presentation, and its __repr__() output does clearly indicate that it is a Job instance.

I think it is pretty clearly outlined that both str and repr are meant to represent the object. Currently, Job.str (and by extension Job.unicode on which there is no documentation) does not return the “informal” or nicely printable string representation of an object, but a representation of an object (or function) that the Job is going to call.

They didn't define what "informal representation" means, so APScheduler isn't doing anything contrary to that. The current __unicode__() implementation just basically displays the name, trigger and next run time of a job. In your specific case, the output starts with MagicMock because it is the name of the job which, in the absence of an explicitly defined job name, is derived from the target function.

Does anything in APS actually rely on the current implementation of str in a way that would make the suggested correction troublesome?

Yes, BaseScheduler.print_jobs() relies on the current implementation of __str__() / __unicode__().

@Voyz
Copy link
Author

Voyz commented Jan 11, 2021

I honestly do not understand what you think makes this "bug prone".

It introduces confusion as it aliases one object behind another. Confusion is bug prone. Imagine someone developing using APS and needing to print or convert the job to a string. Without carefully analysing the print log or the str() output, the first information displayed to them - the name of the class or function - would be incorrect and could lead them to incorrect conclusions. This is exactly what happened to me and I spent a good while debugging my code before I realised that job was aliasing itself as another object.

Converting to string using str() or printing an object is a common practice used for logging and serialising objects. To expect each developer to first of all realise the inconsistency in APS between __str__ and __repr__, and secondly to use them accordingly is a bad design.

Imagine I write a test that ensures the logging functionality of my app is correct. It may look as follows:

with self.assertLogs(logging.getLogger('my_logger'), level='INFO') as cm:
    self.assertTrue('Job([...]' in ';'.join(cm.output))

while my logging looks along the lines of:

def my_target():
    pass
aps_job = scheduler.add_job(my_target)
logging.getLogger('my_logger').info(aps_job)

One would assume such a test to pass, given that we're passing a Job object directly to the logger. Unfortunately, quite contrarily, due to the aliasing in Job.__str__ the logger will output my_target(trigger:...) and the test will fail.

Yes, BaseScheduler.print_jobs() relies on the current implementation of __str__() / __unicode__().

Unless you're referring to another version of APS, I just went over the print_jobs and can't find anything specifically relying on the current implementation. Job is printed in two places in the following form:

print(u'    %s' % job, file=out)

Both of which would not break if the __str__() / __unicode__() were to change. Could you elaborable on that?

They didn't define what "informal representation" means, so APScheduler isn't doing anything contrary to that.

Sorry, but I don't think you're being correct by taking these words out of context. The sentence is the “informal” or nicely printable string representation of an object. Yes, the 'informal' is not specified, but it is specific that the representation should be of the object, not of another object or function. APS indeed does that, it returns a representation that is not of the object itself.

Imagine there is a red bicycle and a green car parked on a street. Mark asks Ben: "Hey Ben, what vehicles are parked on the street?", to which Ben replies: "A red bicycle and a green bicycle". I think it's clear this is a wrong answer. Even though the properties were given correctly (as it would in APS's case), the representation is incorrect due to name aliasing. Being 'informal' does not mean that Ben can change the meaning of the information he's trying to convey.

@agronholm
Copy link
Owner

would be incorrect and could lead them to incorrect conclusions. This is exactly what happened to me and I spent a good while debugging my code before I realised that job was aliasing itself as another object.

In my 15 years of using Python (and 12 years of maintaining APScheduler) I've yet to hear a complaint about the difference between __str__() and __repr__() of Job. It seems to me that if anybody else drew the wrong conclusions from that, they have at least not complained about it. This whole issue seems to be based on your false assumption that str(obj) and repr(obj) should return identical output – which is the case if __str__() has been left undefined.

Converting to string using str() or printing an object is a common practice used for logging and serialising objects. To expect each developer to first of all realise the inconsistency in APS between str and repr, and secondly to use them accordingly is a bad design.

Why do you think they're separate methods, if not to allow arbitrary output on str(obj)? I've seen plenty of this in the wild, SQLAlchemy being the first example that comes to mind.

Both of which would not break if the str() / unicode() were to change. Could you elaborable on that?

You're wrong about that. The %s in the format specifier means "replace this placeholder with the output of str(obj)". %r, on the other hand, means "replace this placeholder with the output of repr(obj)" but that's not what's in the code. Try running the following snippet:

class Foo:
    def __str__(self):
        return 'This is __str__() of Foo %s' % id(self)

    def __repr__(self):
        return 'Foo()'

print('%s' % Foo())

What is printed on the console?

Imagine there is a red bicycle and a green car parked on a street. Mark asks Ben: "Hey Ben, what vehicles are parked on the street?", to which Ben replies: "A red bicycle and a green bicycle". I think it's clear this is a wrong answer. Even though the properties were given correctly (as it would in APS's case), the representation is incorrect due to name aliasing. Being 'informal' does not mean that Ben can change the meaning of the information he's trying to convey.

Again, you are treating __unicode__() / __str__() as formal representations of the object, which they are not, like the documentation you linked says too. The output can be anything that represents the object, which in this case is the name of the job, followed by the trigger string and the status. If you're at any point relying on this output for debugging, I'm sorry to say but you're doing it wrong.

If it helps, APScheduler 4.0 does not have a separate __str__() method on its Job class and I intend to keep it that way.

@Voyz
Copy link
Author

Voyz commented Jan 12, 2021

If it helps, APScheduler 4.0 does not have a separate str() method on its Job class and I intend to keep it that way.

I wish this was your first response to this issue! If I'm not mistaken you're not really fixing 3.* anymore until 4.0 gets released. This would have saved us all this discussion. However, I think it's only fair if I respond to your points:

In my 15 years of using Python (and 12 years of maintaining APScheduler) I've yet to hear a complaint

Let's be honest, you could say this about half of new issues, and quite frankly such speech doesn't entirely encourage APS's users to speak to you about new issues. To be fair - in my 12 years of using Python, I've never seen anyone aliasing objects like this either, so let's put the experience calling aside if you may.

This whole issue seems to be based on your false assumption that str(obj) and repr(obj) should return identical output

By no means, and I don't think I've ever suggested such assumption in this thread. However, I believe the fact that they can be different does not mean that one could chose any arbitrary output for the str. Would 'qwertyuiop' be a good output too? No, right? Why? Because the output doesn't represent the object. 'Informal' does not equal 'any' or 'arbitrary', and that's the assumption that you're making that with all respect I believe is incorrect.

You're wrong about that. The %s in the format specifier means "replace this placeholder with the output of str(obj)". %r, on the other hand, means "replace this placeholder with the output of repr(obj)" but that's not what's in the code. Try running the following snippet:

Am I wrong? Removing or changing Job.__str__ , would not break the print(u' %s' % job, file=out), there's no dependency on the way job implements __str__ at the moment. It will produce a different output but it is not a breaking dependency. You could remove __unicode__() / __str__() and the library would function perfectly fine. Please elaborate in case I'm missing something.

What is printed on the console?

This is __str__() of Foo 3053925341192

Now let's remove __str__:

class Foo():
    pass

print("%s", Foo())
>>> <__main__.Foo object at 0x000002C70C229808>

Nothing breaks. Please elaborate how does BaseScheduler.print_jobs() rely on the current implementation of __str__() / __unicode__().

Again, you are treating unicode() / str() as formal representations of the object, which they are not, like the documentation you linked says too.

Again, at no point have I treated them as such. Remove the word 'formal' from this sentence and then I agree - I treat these two as the representation of the object - and the documentation indeed tells you that this should be the case.

The output can be anything that represents the object,

Unless you can support this with some sources I'd say this is your own conjecture. The official documentation clearly states the opposite: the “informal” or nicely printable string representation of an object

If you're at any point relying on this output for debugging, I'm sorry to say but you're doing it wrong.

Yet again, unless you can support this, that's your personal conjecture. Just about every result from the first page of googling 'python __str__' teaches to represent the object itself (not any arbitrary output), I can't find a single mention of name aliasing similar to APS's Job's, and about half of them outline that __str__ can indeed be used for debugging. (1, 2, 3, 4, 5, 6 to name a few)

The output can be anything that represents the object, which in this case is the name of the job, followed by the trigger string and the status

Fantastic statement, now tell me: how does any of the following represent an object called 'Job':

MagicMock (trigger:...)
myfunc(trigger:...)
download(trigger:...)
run(trigger:...)
whathaveyou(trigger:...)

One would clearly see that all of these indeed are instances of an object called 'Job', wouldn't they? If this doesn't convince you, the zen of python has something to say on the topic too:

Explicit is better than implicit. - The Job representation is implicit.
Readability counts. - The Job representation is less readable in its current format, than it would be if it was `Job(name:..., trigger:...)
In the face of ambiguity, refuse the temptation to guess. - The Job representation can be ambiguous.

Please, do bring any external sources outlining or explaining the pattern you've implemented, as I'm sure you're not just basing your arguments on personal opinion only.

Lastly: I hope you're enjoying this convo as much as I am 😊 Even though I'm pointing out what I believe to be mistakes in your library I hope you're not taking any of it personally and that this doesn't hurt you in any way. If it isn't the case, then my apologies!

@agronholm
Copy link
Owner

Nothing breaks. Please elaborate how does BaseScheduler.print_jobs() rely on the current implementation of str() / unicode().

A change there would not raise an exception, but the output would be drastically different from what it used to be. To fix that, I would have to move the string formatting to BaseScheduler.print_jobs() itself. If anybody had a custom implementation of print_jobs() in their own scheduler class, this change could cause unexpected results in it. I'm moving APScheduler towards semantic versioning anyway so I don't want to accidentally break anything in the 3.x series.

I think we'll just have to agree to disagree on the the semantics of __str__(). I'm not enjoying this debate one bit and I don't want to continue it.

I'll close this issue now since it doesn't exist in 4.0 and I won't make this change in 3.x.

@Voyz
Copy link
Author

Voyz commented Jan 14, 2021

Cool, thanks for addressing the issue either way! Any idea on when could we expect 4.0? I remember you mentioned a public beta coming out in October last year, is that available somewhere?

@Voyz
Copy link
Author

Voyz commented Feb 12, 2021

hey @agronholm just following up on these two questions I posted last time. Thanks

@agronholm
Copy link
Owner

@Voyz please subscribe to #465 to keep updated on the progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants