Search The ForumSearch   RegisterRegister  LoginLogin

MailBee.NET Objects

 AfterLogic Forum : MailBee.NET Objects
Subject Topic: charset sniffing for e-mail bodies Post ReplyPost New Topic
Author
Message << Prev Topic | Next Topic >>
smigs
Newbie
Newbie
Avatar

Joined: 24 February 2014
Location: United Kingdom
Online Status: Offline
Posts: 3
Posted: 24 February 2014 at 3:48am | IP Logged Quote smigs

Is there a way to tell MailBee to use the charset from a HTML meta tag (if there is one) to interpret a message part in, or to have it guess the charset, ignoring what the mime headers say? I know when using MailMessage.LoadBodyText() you can set ImportBodyOptions.PreferCharsetFromMetaTag, but I'm downloading these messages via IMAP, so this method isn't used.

I am facing a problem where the sender is basically providing invalid e-mails, with the following:

Content-Type: text/html;
     charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

The charset here is specified as iso-8859-1, however the HTML is actually in UTF-16, and there's a META tag to that effect inside the HTML itself. There's also a byte order mark (encoded to =FF=FE in quoted-printable) at the start of the message.

At the moment, when I examine the the MailMessage.BodyHtmlText String, it renders as having a null character (\0) after every printable character, plus two extra characters at the start of the message. This is presumably because MailBee has decoded the UTF-16 from quoted-printable, but then interpreted it as ISO-8859-1 when converting to the default output encoding (UTF-8?). As UTF-16 is two bytes whereas ISO-8859-1 is just one, this causes each character to double up with a NULL in the next 'character' (since none of the UTF-16 chars actually use the second byte in this case). The two extra characters at the start are presumably the byte order mark decoded as ISO-8859-1 chars.


Any thoughts? It seems as though something in MessageParserConfig should be able to handle this, but I'm drawing a blank.
Back to Top View smigs's Profile Search for other posts by smigs
 
Igor
AfterLogic Support
AfterLogic Support


Joined: 24 June 2008
Location: United States
Online Status: Offline
Posts: 6104
Posted: 24 February 2014 at 4:18am | IP Logged Quote Igor

Would it be possible to provide us with a sample mail message for examination? Please make sure it's saved as EML file, and submit it privately via HelpDesk.

--
Regards,
Igor, AfterLogic Support
Back to Top View Igor's Profile Search for other posts by Igor
 
smigs
Newbie
Newbie
Avatar

Joined: 24 February 2014
Location: United Kingdom
Online Status: Offline
Posts: 3
Posted: 24 February 2014 at 5:48am | IP Logged Quote smigs

Done
Back to Top View smigs's Profile Search for other posts by smigs
 
smigs
Newbie
Newbie
Avatar

Joined: 24 February 2014
Location: United Kingdom
Online Status: Offline
Posts: 3
Posted: 24 February 2014 at 8:14am | IP Logged Quote smigs

Looks like there's no way at the moment to sniff the correct charset from Meta tag ahead of time (other than going through the raw byte[] array of the MimePart yourself), but since all the faulty messages I was dealing with had the same problem, I was able to hardcode a correction:

Code:
private void FixWrongMimeBodyHeader(MailMessage original) {
    TextBodyPart html = original.BodyParts.Html;
    if (html.Charset == "iso-8859-1")
    {
        MimePart mime = html.AsMimePart;
        original.BodyParts.Remove("text/html");

        mime.Headers.Add("Content-Type", "text/html; charset=\"utf-16\"", true);
        byte[] mimeArray = mime.GetRawData();
        MimePart newMime = MimePart.Parse(mimeArray);
        TextBodyPart newHtml = new TextBodyPart(newMime);
        original.BodyParts.Add(newHtml);
    }
}
Back to Top View smigs's Profile Search for other posts by smigs
 

If you wish to post a reply to this topic you must first login
If you are not already registered you must first register

  Post ReplyPost New Topic
Printable version Printable version

Forum Jump

Powered by Web Wiz Forums version 7.9
Copyright ©2001-2004 Web Wiz Guide