History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: CHD-1000
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Assignee: Unassigned
Reporter: Joe Geck
Votes: 4
Watchers: 6
Operations

Clone this issue
If you were logged in you would be able to see more operations.
Cerberus Helpdesk

Bodies of text/plain charset e-mail with blank encoding and special characters get truncated

Created: 22/Dec/08 06:53 PM   Updated: 19/Apr/10 09:42 AM
Fix Version/s: 5.0 RC1

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown

Value: 3 - Would Be Nice
Marquee: Mail


 Description   
Summary:
http://forums.cerb4.com/showthread.php?p=8001

As reported in the forums, e-mails are getting cutoff after a special character is encountered (including the special character).

(quote)
One of our product names is:

Moiré Gibson Girl Skirt (Moiré is a type of fabric finish)

We have order confirms come into cerb for storage / searchability. When any orders contain that word, the entire order email comes through right up until the special é character, then the rest of the email is missing. We did not have any such trouble with this product name in our cerb 3.5 install.
(end quote)

Note:
This has been verified by myself and Hildy.
Sample message attached (see corresponding customer desk for another sample).

 All   Comments   Work Log   Change History      Sort Order:
Comment by Dan Hildebrandt [WGM] [22/Dec/08 07:06 PM]
Well, the basic issue is that something (at times it's been Microsoft, this appears to be a custom webshop mailer) sends invalid email - with a content-type of plaintext, no content-charset, but high-ascii characters in the supposedly plain-text email. These get passed to the database and are cut off on insertion because they're invalid.

One possible solution is to run everything with a text/plain content-type through mb_convert_encoding, but that doesn't seem particularly desireable. This is in CerberusParser::parseMime():

==============================================
if($info['content-type'] == 'text/plain') {
$text = mailparse_msg_extract_part_file($section, $full_filename, NULL);

if(isset($info['content-charset']) && !empty($info['content-charset'])) {
if(@mb_check_encoding($text, $info['content-charset'])) {
$text = mb_convert_encoding($text, LANG_CHARSET_CODE, $info['content-charset']);
} else {
$text = mb_convert_encoding($text, LANG_CHARSET_CODE);
}
}

        @$message->body .= $text;
==============================================

Calling mb_convert_encoding($text, LANG_CHARSET_CODE); on these messages does fix them, but we can't know when/where they're going to show up.

Comment by Joe Geck [29/Jun/09 10:00 AM]

Comment by Fabio Erri [18/Aug/09 08:53 AM]
Hello, this patch seems to solve this bug:

Index: Parser.php
===================================================================
--- Parser.php (revisione 928)
+++ Parser.php (copia locale)
@@ -88,6 +88,14 @@
  }
 };
 
+function check8bit($str) {
+ $l=strlen($str);
+ for ($i=0; $i<$l; $i++)
+ if (ord($str[$i])>0x80)
+ return True;
+ return False;
+ }
+
 class CerberusParser {
     const ATTACHMENT_BUCKETS = 100; // hash
 
@@ -132,7 +140,7 @@
  if(empty($info['content-name'])) {
  if($info['content-type'] == 'text/plain') {
  $text = mailparse_msg_extract_part_file($section, $full_filename, NULL);
-
+
  if(isset($info['content-charset']) && !empty($info['content-charset'])) {
  $message->body_encoding = $info['content-charset'];
 
@@ -142,7 +150,9 @@
  $text = mb_convert_encoding($text, LANG_CHARSET_CODE);
  }
  }
-
+ elseif(check8bit($text))
+ $text = "[Warning! Nonstandard characters found!]".mb_convert_encoding($text, LANG_CHARSET_CODE, "ASCII,ISO-8859-1,UTF-8");
+
  @$message->body .= $text;
 
  unset($text);

Comment by Jeff Standen [WGM] [11/Sep/09 04:29 PM]
What mailer is generating this? It would be so much easier to just create a valid message.

Comment by Jeff Standen [WGM] [11/Sep/09 04:38 PM]
This is all you need in the mailer:
Content-Type: text/plain; charset=iso-8859-1

Otherwise it's assuming 7bit and it's not properly encoded on high ascii. This isn't a Cerb issue, and we can't patch for every malformed sender. If this is your own custom mailer then just fix it on your end.

Comment by Chris Allen [11/Sep/09 05:19 PM]
Unfortunately, this issue goes well beyond our internal order confirms, and originates from a variety of homogenous sources, most outside of our control.

We have dealt with this in our own mailer, but also see it our shopping cart's mailer (third party software; we've alerted them, but not high on their list..)

We also see it daily in inbound emails from international customers who send into our central customer service bucket.

If their mailing client isn't compliant, cerb is too strict and dumps the email.

There is a comment above by Dan Hildebrant that suggests

"Calling mb_convert_encoding($text, LANG_CHARSET_CODE); on these messages does fix them, but we can't know when/where they're going to show up. "

If this is the case wouldn't a cerb plugin that runs a cleanup script on all emails (regardless of origin) fix the issue?

Comment by Chris Allen [11/Sep/09 05:21 PM]
.. BTW: we're open to reasonable cost of sponsoring the development of said plugin...

Comment by Joe Geck [08/Mar/10 09:12 AM]
clicboutic in town hall was running into the same issue. Another +1 virtual vote.

Comment by Jeff Standen [WGM] [19/Apr/10 09:42 AM]
Fixed in Cerb5