Well, the basic issue is that something (at times it's been Microsoft, this appears to be a custom webshop mailer) sends invalid email - with a content-type of plaintext, no content-charset, but high-ascii characters in the supposedly plain-text email. These get passed to the database and are cut off on insertion because they're invalid.
One possible solution is to run everything with a text/plain content-type through mb_convert_encoding, but that doesn't seem particularly desireable. This is in CerberusParser::parseMime():
==============================================
if($info['content-type'] == 'text/plain') {
$text = mailparse_msg_extract_part_file($section, $full_filename, NULL);
if(isset($info['content-charset']) && !empty($info['content-charset'])) {
if(@mb_check_encoding($text, $info['content-charset'])) {
$text = mb_convert_encoding($text, LANG_CHARSET_CODE, $info['content-charset']);
} else {
$text = mb_convert_encoding($text, LANG_CHARSET_CODE);
}
}
@$message->body .= $text;
==============================================
Calling mb_convert_encoding($text, LANG_CHARSET_CODE); on these messages does fix them, but we can't know when/where they're going to show up.
Comment by
Joe Geck [29/Jun/09 10:00 AM]
Hello, this patch seems to solve this bug:
Index: Parser.php
===================================================================
--- Parser.php (revisione 928)
+++ Parser.php (copia locale)
@@ -88,6 +88,14 @@
}
};
+function check8bit($str) {
+ $l=strlen($str);
+ for ($i=0; $i<$l; $i++)
+ if (ord($str[$i])>0x80)
+ return True;
+ return False;
+ }
+
class CerberusParser {
const ATTACHMENT_BUCKETS = 100; // hash
@@ -132,7 +140,7 @@
if(empty($info['content-name'])) {
if($info['content-type'] == 'text/plain') {
$text = mailparse_msg_extract_part_file($section, $full_filename, NULL);
-
+
if(isset($info['content-charset']) && !empty($info['content-charset'])) {
$message->body_encoding = $info['content-charset'];
@@ -142,7 +150,9 @@
$text = mb_convert_encoding($text, LANG_CHARSET_CODE);
}
}
-
+ elseif(check8bit($text))
+ $text = "[Warning! Nonstandard characters found!]".mb_convert_encoding($text, LANG_CHARSET_CODE, "ASCII,ISO-8859-1,UTF-8");
+
@$message->body .= $text;
unset($text);
What mailer is generating this? It would be so much easier to just create a valid message.
This is all you need in the mailer:
Content-Type: text/plain; charset=iso-8859-1
Otherwise it's assuming 7bit and it's not properly encoded on high ascii. This isn't a Cerb issue, and we can't patch for every malformed sender. If this is your own custom mailer then just fix it on your end.
Unfortunately, this issue goes well beyond our internal order confirms, and originates from a variety of homogenous sources, most outside of our control.
We have dealt with this in our own mailer, but also see it our shopping cart's mailer (third party software; we've alerted them, but not high on their list..)
We also see it daily in inbound emails from international customers who send into our central customer service bucket.
If their mailing client isn't compliant, cerb is too strict and dumps the email.
There is a comment above by Dan Hildebrant that suggests
"Calling mb_convert_encoding($text, LANG_CHARSET_CODE); on these messages does fix them, but we can't know when/where they're going to show up. "
If this is the case wouldn't a cerb plugin that runs a cleanup script on all emails (regardless of origin) fix the issue?
.. BTW: we're open to reasonable cost of sponsoring the development of said plugin...
Comment by
Joe Geck [08/Mar/10 09:12 AM]
clicboutic in town hall was running into the same issue. Another +1 virtual vote.
Fixed in Cerb5