Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle fields whose encoding is unknown to Ruby #580

Open
bpot opened this issue Jul 9, 2013 · 17 comments
Open

Gracefully handle fields whose encoding is unknown to Ruby #580

bpot opened this issue Jul 9, 2013 · 17 comments

Comments

@bpot
Copy link
Contributor

bpot commented Jul 9, 2013

Ruby19.b_value_decode (and probably q_value_decode) may cause an ArgumentError: unknown encoding XXXX exception if they call #force_encoding with an encoding Ruby doesn't know about.

We should either silently fail to decode the field or raise a more specific exception.

@grosser
Copy link
Contributor

grosser commented Jul 21, 2014

👍

@bf4
Copy link
Contributor

bf4 commented Jul 22, 2014

@grosser @bpot Any thoughts which behavior you think is a better default?

@grosser
Copy link
Contributor

grosser commented Jul 22, 2014

(I'm primarily working on the reading mails part)
I'd prefer if mail does it's best to read as much as possible, atm I'm using charlock holmes to guess an encoding if it is unknown, but as far as I am concerned almost anything is better than blowing up from the email reading point of view.

module UnknownEncoding
  # try to guess encoding if we cannot find it
  def q_value_decode(str)
    super
  rescue ArgumentError
    encoding = CharlockHolmes::EncodingDetector.new.detect(str).try(:[], :encoding) || "UTF-8"
    warn "Invalid encoding caught #{$!} using #{encoding}"
    super(str.sub(/^=?.*?\?/, "=?#{encoding}?Q?"))
  end
end

class << Mail::Encodings
  prepend UnknownEncoding
end

@jeremy
Copy link
Collaborator

jeremy commented Jul 23, 2014

We've had to special-case charset -> encoding mappings in other cases too. We'd be better off exposing a uniform way to let users do the lookup and manage fallbacks.

For example, returning 'ascii-8bit' as a fallback encoding, doing charset detection on the decoded content, or raising an ArgumentException subclass.

@grosser
Copy link
Contributor

grosser commented Jul 23, 2014

If the value is later used for headers raising or returning ascii are not very helpful since then the header cannot be parsed and we just end up with other issues, having a detection built in could work, but maybe better to just have some callback in place that can be overwritten.

@jeremy
Copy link
Collaborator

jeremy commented Jul 23, 2014

For most apps, an exception is much better than getting wrongly-encoded data, or a binary string. Both will just cause exceptions when the content is manipulated in any way, but in a much harder to debug way.

For now, 👍 to raising a more specific subclass of ArgumentError, e.g. UnrecognizedCharsetError or ...

For later, give users a better way to manage charset -> Ruby encoding lookup, including failures.

@jmehnle
Copy link

jmehnle commented Sep 26, 2014

I think returning header values in a fallback encoding of ASCII-8BIT or ISO-8859-1 is useful. In my case, it's way more useful than raising an exception (even if it's a more specific exception such as UnrecognizedCharsetError, which is admittedly more useful than raising ArgumentError) because it gives me a semi-decoded header value I can work with. Say, if a header value contains multiple encoded words, and only one of them specifies an invalid encoding, then falling back to ASCII-8BIT or ISO-8859-1 for that one encoded word gives me a header value that has mostly been correctly decoded and only a part of which is garbled. If an exception was raised instead, I'd have nothing.

That said, there could be an option to Mail::Encodings.value_decode that determines whether to use a fallback encoding or raise an exception. Best of both worlds.

@jmehnle
Copy link

jmehnle commented Mar 13, 2015

I've been successfully using this monkey patch for the last 6 months:

# Mail::Encodings.value_decode raises ArgumentError if any encoded word
# specifies an unknown encoding. Monkey-patch Ruby19.{b,q}_value_decode to
# fall back on an ASCII-8BIT encoding.
# Cf. <https://github.com/mikel/mail/issues/580>
###############################################################################

module Mail
  class Ruby19
    class << self
      def b_value_decode_with_fallback_encoding(str)
        b_value_decode_without_fallback_encoding(str)
      rescue ArgumentError
        b_value_decode_without_fallback_encoding(str.sub(/^=\?.*?\?/, "=?ASCII-8BIT?"))
      end

      alias_method_chain :b_value_decode, :fallback_encoding

      def q_value_decode_with_fallback_encoding(str)
        q_value_decode_without_fallback_encoding(str)
      rescue ArgumentError
        q_value_decode_without_fallback_encoding(str.sub(/^=\?.*?\?/, "=?ASCII-8BIT?"))
      end

      alias_method_chain :q_value_decode, :fallback_encoding
    end
  end
end

Is there value in turning this into a pull request?

@grosser
Copy link
Contributor

grosser commented Mar 13, 2015

we have a encoder now, does that take care of it ?

https://github.com/mikel/mail/blob/master/lib/mail/version_specific/ruby_1_9.rb#L5-L29

@jmehnle
Copy link

jmehnle commented Mar 13, 2015

Encoder? Isn't this issue about {b,q}_value_decode?

@grosser
Copy link
Contributor

grosser commented Mar 13, 2015

I think it might be handleable in the encoder, since you see an encoding coming in and can rescue the ArgumentError there ... might be wrong ... can you update to master, reproduce or add a failing test case ?

@jmehnle
Copy link

jmehnle commented Mar 15, 2015

I'll try to reproduce this on master.

In my dev & prod environment I can't update to master, since we're using the mail gem as part of the Rails framework, and we're currently stuck on Rails 3.2, which depends on mail 2.5.4. All I can do is monkey-patch 2.5.4. The reason I'm pushing for this is that I'd rather monkey-patch it in a way that's consistent with any fixes going into master.

@grosser
Copy link
Contributor

grosser commented Mar 15, 2015

just monkey patch the version down :)

gem "mail", :git => '[email protected]:zendesk/mail', :ref =>
'262-as-259'

On Sun, Mar 15, 2015 at 10:25 AM, Julian Mehnle [email protected]
wrote:

I'll try to reproduce this on master.

In my dev & prod environment I can't update to master, since we're using
the mail gem as part of the Rails framework, and we're currently stuck on
Rails 3.2, which depends on mail 2.5.4. All I can do is monkey-patch 2.5.4.
The reason I'm pushing for this is that I'd rather monkey-patch it in a way
that's consistent with any fixes going into master.


Reply to this email directly or view it on GitHub
#580 (comment).

@jmehnle
Copy link

jmehnle commented Mar 19, 2015

@grosser, did you have any compatibility issues with using 2.6 with Rails?

@grosser
Copy link
Contributor

grosser commented Mar 19, 2015

nope, works fine (and we are doing a ton of email ...) :)

@jmehnle
Copy link

jmehnle commented Mar 19, 2015

We're now using our own fork with some fixes pulled in from other forks that haven't been merged into master (most notably https://github.com/okkez/mail/tree/handle-invalid-date), patching the version down to 2.5.999. Apparently this issue (#580) has been resolved in master.

Kind of a ridiculous solution, but like @grosser, we're exposed to a lot of email garbage and need to handle it all in our Rails app. Thanks, @grosser!

@grosser
Copy link
Contributor

grosser commented Mar 19, 2015

Glad that helps, keep making PRs, at least we have a common place for
discussion/patches :)

On Thu, Mar 19, 2015 at 11:54 PM, Julian Mehnle [email protected]
wrote:

We're now using our own fork with some fixes pulled in from other forks
that haven't been merged into master (most notably
https://github.com/okkez/mail/tree/handle-invalid-date), patching the
version down to 2.5.999. Apparently this issue (#580
#580) has been resolved in master.

Kind of a ridiculous solution, but like @grosser
https://github.com/grosser, we're exposed to a lot of email garbage and
need to handle it all in our Rails app. Thanks, @grosser
https://github.com/grosser!


Reply to this email directly or view it on GitHub
#580 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants