[Insert really lame Mime joke here] MimeKit v0.5 (0.6, 0.7, 0.7.1...) released & "Optimization Tips & Tricks used by MimeKit" series kick-off
MimeKit v0.5 has been released in NuGet Gallery with support for .NET Framework 4.0 in addition to that of Xamarin.Android and Xamarin.iOS, which you can make use of without any restrictions because of the licensing under MIT/X11 license. The MIME parser makes use of a real tokenizer instead of regular expressions and string.Split() to parse and decode headers.
MimeKit has an ability to handle rfc2047 encoded-word tokens that contain quoted-printable and base64 payloads which have been improperly broken apart. It can also handle scenarios where multibyte character sequences are split between words. ...
One of the goals of MimeKit, other than being the most robust MIME parser, is to be the fastest C# MIME parser this side of the Mississippi. Scratch that, fastest C# MIME parser in the World.
Seriously, though, I want to get MimeKit to be as fast and efficient as my C parser, GMime, which is one of the fastest (if not the fastest) MIME parsers out there right now, and I don't expect that any parser is likely to smoke GMime anytime soon, so using it as a baseline to compare against means that I have a realistic goal to set for MimeKit.
Now that you know the why, let's examine the how.
First, I'm using one of those rarely used features of C#: unsafe pointers. While that alone is not all that interesting, it's a corner stone for one of the main techniques I've used. In C#, the fixed statement (which is how you get a pointer to a managed object) pins the object to a fixed location in memory to prevent the GC from moving that memory around while you operate on that buffer. Keep in mind, though, that telling the GC to pin a block of memory is not free, so you should not use this feature without careful consideration. If you're not careful, using pointers could actually make your code slower. Now that we've got that out of the way...
MIME is line-based, so a large part of every MIME parser is going to be searching for the next line of input. One of the reasons most MIME parsers (especially C# MIME parsers) are so slow is because they use a ReadLine() approach and most TextReaders likely use a naive algorithm for finding the end of the current line (as well as all of the extra allocating and copying into a string buffer):
MimeKit is a C# library which may be used for the creation and parsing of messages using the Multipurpose Internet Mail Extension (MIME), as defined by the following RFCs:
- 0822: Standard for the Format of Arpa Internet Text Messages
- 1341: MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies
- 1342: Representation of Non-ASCII Text in Internet Message Headers
- 1521: MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies (Obsoletes rfc1341)
- 1522: MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text (Obsoletes rfc1342)
- 1544: The Content-MD5 Header Field
- 1847: Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
- 1864: The Content-MD5 Header Field (Obsoletes rfc1544)
- 2015: MIME Security with Pretty Good Privacy (PGP)
- 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
- 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
- 2047: Multipurpose Internet Mail Extensions (MIME) Part Three: Message Header Extensions for Non-ASCII Text
- 2048: Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures
- 2049: Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples
- 2183: Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field
- 2184: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
- 2231: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations (Obsoletes rfc2184)
- 2311: S/MIME Version 2 Message Specification
- 2312: S/MIME Version 2 Certificate Handling
- 2424: Content Duration MIME Header Definition
- 2630: Cryptographic Message Syntax
- 2632: S/MIME Version 3 Certificate Handling
- 2633: S/MIME Version 3 Message Specification
- 2634: Enhanced Security Services for S/MIME
- 2822: Internet Message Format (Obsoletes rfc0822)
- 3156: MIME Security with OpenPGP (Updates rfc2015)
- 3850: S/MIME Version 3.1 Certificate Handling (Obsoletes rfc2632)
- 3851: S/MIME Version 3.1 Message Specification (Obsoletes rfc2633)
- 5322: Internet Message Format (Obsoletes rfc2822)
- 1523: The text/enriched MIME Content-type
- 1872: The MIME Multipart/Related Content-type
- 1927: Suggested Additional MIME Types for Associating Documents
- 2110: MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)
- 2111: Content-ID and Message-ID Uniform Resource Locators
- 2112: The MIME Multipart/Related Content-type (Obsoletes rfc1872)
- 2387: The MIME Multipart/Related Content-type (Obsoletes rfc2112)
I love this kind of behind-baseball post. Jeffrey talks about his MimeKit project in details, sharing how he's trying to make it uber-fast, tips and tricks, etc...