Rectangle 27 0

Extract only text part of email body using javamail, without html content?


At last I used Jsoup and it works fine. The trick is, you have to remove the <head> part manually first, and Jsoup does the rest.

But, not all messages will contain both html and plain text versions of the message body. If you get only html, you're going to have to write your own code to process the string and remove the html tags, or use some other product to process the html and remove the tags.

First of all, you'll want to read this JavaMail FAQ entry that tells you how to find the main message body. As written, it prefers an html body over a plain text body in cases where the message contains both. It should be clear how to reverse that preference.

Ok thanks. I decide to retrieve whole message as html because it contains more information. I prefer maintain the structure of emails and not mess up all the text.

Per RFC 2046, which defines multipart/alternative, the alternatives appear in order of increasing faithfulness to the original content. That means you'll find text/plain before text/html. If you prefer text/plain, you can change that code to return as soon as it finds text/plain content; there's no need to continue looking for other body parts.

Thanks for comment, but I cannot see why the order means something in the link you posted. And change the order of if - else changes the preference and the output? Can you specify a little more?

Note