-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMail has invalid byte sequence #22
Comments
Seems to do the trick, but I guess the gem is also outdated...:
|
I switched heliotrope from RMail to the mail gem (https://github.com/mikel/mail). The new Here's the diff for that file : diff --git a/lib/heliotrope/message.rb b/lib/heliotrope/message.rb
index 0c928cd..d87cca9 100644
--- a/lib/heliotrope/message.rb
+++ b/lib/heliotrope/message.rb
@@ -1,10 +1,29 @@
# encoding: UTF-8
-require 'rmail'
+require 'mail'
require 'digest/md5'
require 'json'
require 'timeout'
+module Mail
+ class Message
+
+ # a common interface that matches all the field
+ # IMPORTANT : if not existing, it must return nil
+ def fetch_header field
+ sym = field.to_sym
+ self[sym] ? self[sym].to_s : nil
+ end
+
+ # Make sure the message has valid message ids for the message, and
+ # fetch them
+ def fetch_message_ids field
+ self[field] ? self[field].message_ids || [self[field].message_id] : []
+ end
+
+ end
+end
+
module Heliotrope
class InvalidMessageError < StandardError; end
class Message
@@ -14,43 +33,53 @@ class Message
end
def parse!
- @m = RMail::Parser.read @rawbody
+ @m = Mail.read_from_string @rawbody
- @msgid = find_msgids(decode_header(validate_field(:message_id, @m.header["message-id"]))).first
- ## this next error happens if we have a field, but we can't find a <something> in it
- raise InvalidMessageError, "can't parse msgid: #{@m.header['message-id']}" unless @msgid
+ # Mail::MessageIdField.message_id returns the msgid with < and >, which is not correct
+ unless @m.message_id
+ @m.message_id = "<#{Time.now.to_i}-defaulted-#{munge_msgid @m.header.to_s}@heliotrope>"
+ end
+ @msgid = @m.message_id
@safe_msgid = munge_msgid @msgid
- @from = Person.from_string decode_header(validate_field(:from, @m.header["from"]))
- @date = begin
- Time.parse(validate_field(:date, @m.header["date"])).to_i
- rescue ArgumentError
- #puts "warning: invalid date field #{@m.header['date']}"
- 0
- end
+ @from = Person.from_string @m.fetch_header(:from)
- @to = Person.many_from_string decode_header(@m.header["to"])
- @cc = Person.many_from_string decode_header(@m.header["cc"])
- @bcc = Person.many_from_string decode_header(@m.header["bcc"])
- @subject = decode_header @m.header["subject"]
- @reply_to = Person.from_string @m.header["reply-to"]
+ @sender = begin
+ # Mail::SenderField.sender returns an array, not a String
+ Person.from_string @m.fetch_header(:sender)
+ rescue InvalidMessageError
+ ""
+ end
- @refs = find_msgids decode_header(@m.header["references"] || "")
- in_reply_to = find_msgids decode_header(@m.header["in-reply-to"] || "")
- @refs += in_reply_to unless @refs.member? in_reply_to.first
- @safe_refs = @refs.map { |r| munge_msgid(r) }
+ @date = (@m.date || Time.now).to_time.to_i
+
+ @to = Person.many_from_string(@m.fetch_header(:to))
+ @cc = Person.many_from_string(@m.fetch_header(:cc))
+ @bcc = Person.many_from_string(@m.fetch_header(:bcc))
+ @subject = (@m.subject || "")
+ @reply_to = Person.from_string(@m.fetch_header(:reply_to))
+
+ # same as message_id : we must use message_ids to get them without <
+ # and >
+ begin
+ @refs = @m.fetch_message_ids(:references)
+ in_reply_to = @m.fetch_message_ids(:in_reply_to)
+ rescue Mail::Field::FieldError => e
+ raise InvalidMessageError, e.message
+ end
+ @refs += in_reply_to unless @refs.member?(in_reply_to.first)
+ @safe_refs = @refs.nil? ? [] : @refs.compact.map { |r| munge_msgid(r) }
## various other headers that you don't think we will need until we
## actually need them.
## this is sometimes useful for determining who was the actual target of
## the email, in the case that someone has aliases
- @recipient_email = @m.header["envelope-to"] || @m.header["x-original-to"] || @m.header["delivered-to"]
+ @recipient_email = @m.fetch_header(:envelope_to) || @m.fetch_header(:x_original_to) || @m.fetch_header(:delivered_to)
- @list_id = @m.header["list-id"]
- @list_subscribe = @m.header["list-subscribe"]
- @list_unsubscribe = @m.header["list-unsubscribe"]
- @list_post = @m.header["list-post"] || @m.header["x-mailing-list"]
+ @list_subscribe = @m.fetch_header(:list_subscribe)
+ @list_unsubscribe = @m.fetch_header(:list_unsubscribe)
+ @list_post = @m.fetch_header(:list_post) || @m.fetch_header(:x_mailing_list)
self
end
@@ -75,8 +104,8 @@ class Message
{ :from => (from ? from.to_email_address : ""),
:to => to.map(&:to_email_address),
- :cc => cc.map(&:to_email_address),
- :bcc => bcc.map(&:to_email_address),
+ :cc => (cc || []).map(&:to_email_address),
+ :bcc => (bcc || []).map(&:to_email_address),
:subject => subject,
:date => date,
:refs => refs,
@@ -95,8 +124,8 @@ class Message
end
def direct_recipients; to end
- def indirect_recipients; cc + bcc end
- def recipients; direct_recipients + indirect_recipients end
+ def indirect_recipients; (cc || []) + (bcc || []) end
+ def recipients; (direct_recipients || []) + (indirect_recipients || []) end
def indexable_text
@indexable_text ||= begin
@@ -134,10 +163,7 @@ class Message
end
def has_attachment?
- @has_attachment ||=
- mime_parts("text/plain").any? do |type, fn, id, content|
- fn && (type !~ SIGNATURE_ATTACHMENT_TYPE)
- end
+ @m.has_attachments? # defined in the mail gem
end
def signed?
@@ -159,13 +185,9 @@ private
Digest::MD5.hexdigest msgid
end
- def find_msgids msgids
- msgids.scan(/<(.+?)>/).map(&:first)
- end
-
def mime_part_types part=@m
- ptype = part.header["content-type"] || ""
- [ptype] + (part.multipart? ? part.body.map { |sub| mime_part_types sub } : [])
+ ptype = part.fetch_header(:content_type)
+ [ptype] + (part.multipart? ? part.body.parts.map { |sub| mime_part_types sub } : [])
end
## unnests all the mime stuff and returns a list of [type, filename, content]
@@ -176,14 +198,14 @@ private
def decode_mime_parts part, preferred_type, level=0
if part.multipart?
if mime_type_for(part) =~ /multipart\/alternative/
- target = part.body.find { |p| mime_type_for(p).index(preferred_type) } || part.body.first
+ target = part.body.parts.find { |p| mime_type_for(p).index(preferred_type) } || part.body.parts.first
if target # this can be nil
decode_mime_parts target, preferred_type, level + 1
else
[]
end
else # decode 'em all
- part.body.compact.map { |subpart| decode_mime_parts subpart, preferred_type, level + 1 }.flatten 1
+ part.body.parts.compact.map { |subpart| decode_mime_parts subpart, preferred_type, level + 1 }.flatten 1
end
else
type = mime_type_for part
@@ -204,11 +226,11 @@ private
end
def mime_type_for part
- (part.header["content-type"] || "text/plain").gsub(/\s+/, " ").strip.downcase
+ (part.fetch_header(:content_type) || "text/plain").gsub(/\s+/, " ").strip.downcase
end
def mime_id_for part
- header = part.header["content-id"]
+ header = part.fetch_header(:content_id)
case header
when /<(.+?)>/; $1
else header
@@ -217,8 +239,8 @@ private
## a filename, or nil
def mime_filename_for part
- cd = part.header["Content-Disposition"]
- ct = part.header["Content-Type"]
+ cd = part.fetch_header(:content_disposition)
+ ct = part.fetch_header(:content_type)
## RFC 2183 (Content-Disposition) specifies that disposition-parms are
## separated by ";". So, we match everything up to " and ; (if present).
@@ -229,20 +251,7 @@ private
end
## filename could be RFC2047 encoded
- decode_header(filename).chomp if filename
- end
-
- ## rfc2047-decode a header, convert to utf-8, and normalize whitespace
- def decode_header v
- return "" if v.nil?
-
- v = if Decoder.is_rfc2047_encoded? v
- Decoder.decode_rfc2047 "utf-8", v
- else # assume it's ascii and transcode
- Decoder.transcode "utf-8", "ascii", v
- end
-
- v.gsub(/\s+/, " ").strip
+ filename.chomp if filename
end
CONVERSIONS = {
@@ -255,11 +264,10 @@ private
def mime_content_for mime_part, preferred_type
return "" unless mime_part.body # sometimes this happens. not sure why.
- mt = mime_type_for(mime_part) || "text/plain" # i guess
- content_type = if mt =~ /^(.+);/ then $1.downcase else mt end
- source_charset = if mt =~ /charset="?(.*?)"?(;|$)/i then $1 else "US-ASCII" end
+ content_type = mime_part.fetch_header(:content_type) || "text/plain"
+ source_charset = mime_part.charset || "US-ASCII"
- content = mime_part.decode
+ content = mime_part.decoded
converted_content, converted_charset = if(converter = CONVERSIONS[[content_type, preferred_type]])
send converter, content, source_charset
else |
I noticed the Mail project and wondered what effort it would require - this would be great! Would it be possible to just swap or does messageids and such get garbled? |
I'm afraid the management is completely different, so you'd have to reindex all your mails. Sup works by reading the Message-Id header and do some funny logic on it to please the index. Heliotrope's messages store 2 ids : the real Message-Id, and a munge_msgid which is an MD5 hash of the Message-Id. The reason behind is the same (the index doesn't like funny tokens) but I think it's much more simple, and it looks like William thought so. |
I got something very preliminary going over here: https://github.com/gauteh/sup/tree/rmail-to-mail-remove-iconv-ruby-2-comp, kind of have to do it all at once to get it working. I think some of the stuff should be slightly re-designed in heliotrope style. |
Upstream pull request: matta/rubymail#2 |
Fixed by using rmail-sup. |
* rmail -> rmail-sup, it had a bug in 2.0 encoding and sup is maintaining now sup-heliotrope/sup#22 * permutation -> more_math, author combined libraries * apostrophes got html-encoded
The text was updated successfully, but these errors were encountered: