If its a docx, you could use docx4j (ASL v2). This uses XSLT to create the HTML.
If you wanted an HTML per page, you could do something with the lastRenderedPageBreak tag that Word puts into the docx (assuming you used Word to create it).
Convert Word doc to HTML programmatically in Java - Stack Overflow
XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx"))); XWPFWordExtractor wx = new XWPFWordExtractor(docx); String text = wx.getText(); System.out.println("text = "+text);
if (DOCX != HTML && hasNoOtherAnswer()) doDownvote()
t will provide you functionality to get data from DOC file and you can directly operate on it without converting it to DOCX.
I need to use DOCX files. There is a strong requirement, most likely because the software already can process DOCX but some input is still the old DOC format. So converting to HTML and adding a html parser doesn't look like an option
Indeed... All my parsers are already set up for the DOCX format (with the Open XML behind). I do not have time to write the parsers for HTML or text file, hence my question
ms office - Convert DOC file to DOCX with Java - Stack Overflow
# BEGIN Wordpress Browser Cache <IfModule mod_mime.c> AddType text/css .css AddType application/x-javascript .js AddType text/x-component .htc AddType text/html .html .htm AddType text/richtext .rtf .rtx AddType image/svg+xml .svg .svgz AddType text/plain .txt AddType text/xsd .xsd AddType text/xsl .xsl AddType text/xml .xml AddType video/asf .asf .asx .wax .wmv .wmx AddType video/avi .avi AddType image/bmp .bmp AddType application/java .class AddType video/divx .divx AddType application/msword .doc .docx AddType application/vnd.ms-fontobject .eot AddType application/x-msdownload .exe AddType image/gif .gif AddType application/x-gzip .gz .gzip AddType image/x-icon .ico AddType image/jpeg .jpg .jpeg .jpe AddType application/vnd.ms-access .mdb AddType audio/midi .mid .midi AddType video/quicktime .mov .qt AddType audio/mpeg .mp3 .m4a AddType video/mp4 .mp4 .m4v AddType video/mpeg .mpeg .mpg .mpe AddType application/vnd.ms-project .mpp AddType application/x-font-otf .otf AddType application/vnd.oasis.opendocument.database .odb AddType application/vnd.oasis.opendocument.chart .odc AddType application/vnd.oasis.opendocument.formula .odf AddType application/vnd.oasis.opendocument.graphics .odg AddType application/vnd.oasis.opendocument.presentation .odp AddType application/vnd.oasis.opendocument.spreadsheet .ods AddType application/vnd.oasis.opendocument.text .odt AddType audio/ogg .ogg AddType application/pdf .pdf AddType image/png .png AddType application/vnd.ms-powerpoint .pot .pps .ppt .pptx AddType audio/x-realaudio .ra .ram AddType application/x-shockwave-flash .swf AddType application/x-tar .tar AddType image/tiff .tif .tiff AddType application/x-font-ttf .ttf .ttc AddType audio/wav .wav AddType audio/wma .wma AddType application/vnd.ms-write .wri AddType application/vnd.ms-excel .xla .xls .xlsx .xlt .xlw AddType application/zip .zip </IfModule> <IfModule mod_expires.c> ExpiresActive On ExpiresByType text/css A31536000 ExpiresByType application/x-javascript A31536000 ExpiresByType text/x-component A31536000 ExpiresByType text/html A3600 ExpiresByType text/richtext A3600 ExpiresByType image/svg+xml A3600 ExpiresByType text/plain A3600 ExpiresByType text/xsd A3600 ExpiresByType text/xsl A3600 ExpiresByType text/xml A3600 ExpiresByType video/asf A31536000 ExpiresByType video/avi A31536000 ExpiresByType image/bmp A31536000 ExpiresByType application/java A31536000 ExpiresByType video/divx A31536000 ExpiresByType application/msword A31536000 ExpiresByType application/vnd.ms-fontobject A31536000 ExpiresByType application/x-msdownload A31536000 ExpiresByType image/gif A31536000 ExpiresByType application/x-gzip A31536000 ExpiresByType image/x-icon A31536000 ExpiresByType image/jpeg A31536000 ExpiresByType application/vnd.ms-access A31536000 ExpiresByType audio/midi A31536000 ExpiresByType video/quicktime A31536000 ExpiresByType audio/mpeg A31536000 ExpiresByType video/mp4 A31536000 ExpiresByType video/mpeg A31536000 ExpiresByType application/vnd.ms-project A31536000 ExpiresByType application/x-font-otf A31536000 ExpiresByType application/vnd.oasis.opendocument.database A31536000 ExpiresByType application/vnd.oasis.opendocument.chart A31536000 ExpiresByType application/vnd.oasis.opendocument.formula A31536000 ExpiresByType application/vnd.oasis.opendocument.graphics A31536000 ExpiresByType application/vnd.oasis.opendocument.presentation A31536000 ExpiresByType application/vnd.oasis.opendocument.spreadsheet A31536000 ExpiresByType application/vnd.oasis.opendocument.text A31536000 ExpiresByType audio/ogg A31536000 ExpiresByType application/pdf A31536000 ExpiresByType image/png A31536000 ExpiresByType application/vnd.ms-powerpoint A31536000 ExpiresByType audio/x-realaudio A31536000 ExpiresByType image/svg+xml A31536000 ExpiresByType application/x-shockwave-flash A31536000 ExpiresByType application/x-tar A31536000 ExpiresByType image/tiff A31536000 ExpiresByType application/x-font-ttf A31536000 ExpiresByType audio/wav A31536000 ExpiresByType audio/wma A31536000 ExpiresByType application/vnd.ms-write A31536000 ExpiresByType application/vnd.ms-excel A31536000 ExpiresByType application/zip A31536000 </IfModule> <IfModule mod_deflate.c> <IfModule mod_setenvif.c> BrowserMatch ^Mozilla/4 gzip-only-text/html BrowserMatch ^Mozilla/4\.0[678] no-gzip BrowserMatch \bMSIE !no-gzip !gzip-only-text/html BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html </IfModule> <IfModule mod_headers.c> Header append Vary User-Agent env=!dont-vary </IfModule> <IfModule mod_filter.c> AddOutputFilterByType DEFLATE text/css application/x-javascript text/x-component text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon </IfModule> </IfModule> <FilesMatch "\.(css|js|htc|CSS|JS|HTC)$"> <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public, must-revalidate, proxy-revalidate" </IfModule> FileETag MTime Size <IfModule mod_headers.c> Header set X-Powered-By "W3 Total Cache/0.9.2.4" </IfModule> </FilesMatch> <FilesMatch "\.(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml|HTML|HTM|RTF|RTX|SVG|SVGZ|TXT|XSD|XSL|XML)$"> <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public, must-revalidate, proxy-revalidate" </IfModule> FileETag MTime Size <IfModule mod_headers.c> Header set X-Powered-By "W3 Total Cache/0.9.2.4" </IfModule> </FilesMatch> <FilesMatch "\.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip|ASF|ASX|WAX|WMV|WMX|AVI|BMP|CLASS|DIVX|DOC|DOCX|EOT|EXE|GIF|GZ|GZIP|ICO|JPG|JPEG|JPE|MDB|MID|MIDI|MOV|QT|MP3|M4A|MP4|M4V|MPEG|MPG|MPE|MPP|OTF|ODB|ODC|ODF|ODG|ODP|ODS|ODT|OGG|PDF|PNG|POT|PPS|PPT|PPTX|RA|RAM|SVG|SVGZ|SWF|TAR|TIF|TIFF|TTF|TTC|WAV|WMA|WRI|XLA|XLS|XLSX|XLT|XLW|ZIP)$"> <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public, must-revalidate, proxy-revalidate" </IfModule> FileETag MTime Size <IfModule mod_headers.c> Header set X-Powered-By "W3 Total Cache/0.9.2.4" </IfModule> </FilesMatch> # END Wordpress Browser Cache # BEGIN Wordpress Page Cache core <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^(.*\/)?Wordpress_rewrite_test$ $1?Wordpress_rewrite_test=1 [L] RewriteCond %{HTTP:Accept-Encoding} gzip RewriteRule .* - [E=Wordpress_ENC:_gzip] RewriteCond %{REQUEST_METHOD} !=POST RewriteCond %{QUERY_STRING} ="" RewriteCond %{HTTP_HOST} =hubtank.com RewriteCond %{REQUEST_URI} !(\/wp-admin\/|\/xmlrpc.php|\/wp-(app|cron|login|register|mail)\.php|\/feed\/|wp-.*\.php|index\.php) [NC,OR] RewriteCond %{REQUEST_URI} (wp-comments-popup\.php|wp-links-opml\.php|wp-locations\.php) [NC] RewriteCond %{HTTP_COOKIE} !(comment_author|wp-postpass|wordpress_\[a-f0-9\]\+|wordpress_logged_in) [NC] RewriteCond %{HTTP_USER_AGENT} !(W3\ Total\ Cache/0\.9\.2\.4) [NC] RewriteCond "%{DOCUMENT_ROOT}/wp-content/Wordpress/pgcache/%{REQUEST_URI}/_index%{ENV:Wordpress_UA}%{ENV:Wordpress_REF}%{ENV:Wordpress_SSL}.html%{ENV:Wordpress_ENC}" -f RewriteRule .* "/wp-content/Wordpress/pgcache/%{REQUEST_URI}/_index%{ENV:Wordpress_UA}%{ENV:Wordpress_REF}%{ENV:Wordpress_SSL}.html%{ENV:Wordpress_ENC}" [L] </IfModule> # END Wordpress Page Cache core # BEGIN WordPress <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule>
RewriteEngine On RewriteBase / # if IP is NOT 123.45.67.89 RewriteCond %{REMOTE_ADDR} !=123.45.67.89 RewriteCond %{REQUEST_URI} !^/sitefiles/ [NC] RewriteRule ^(.*)$ sitefiles/$1 [L] # else if IP is 123.45.67.89 # BEGIN W3TC Browser Cache <IfModule mod_mime.c> AddType text/css .css AddType text/x-component .htc AddType application/x-javascript .js AddType application/javascript .js2 AddType text/javascript .js3 AddType text/x-js .js4 AddType text/html .html .htm AddType text/richtext .rtf .rtx AddType image/svg+xml .svg .svgz AddType text/plain .txt AddType text/xsd .xsd AddType text/xsl .xsl AddType text/xml .xml AddType video/asf .asf .asx .wax .wmv .wmx AddType video/avi .avi AddType image/bmp .bmp AddType application/java .class AddType video/divx .divx AddType application/msword .doc .docx AddType application/vnd.ms-fontobject .eot AddType application/x-msdownload .exe AddType image/gif .gif AddType application/x-gzip .gz .gzip AddType image/x-icon .ico AddType image/jpeg .jpg .jpeg .jpe AddType application/json .json AddType application/vnd.ms-access .mdb AddType audio/midi .mid .midi AddType video/quicktime .mov .qt AddType audio/mpeg .mp3 .m4a AddType video/mp4 .mp4 .m4v AddType video/mpeg .mpeg .mpg .mpe AddType application/vnd.ms-project .mpp AddType application/x-font-otf .otf AddType application/vnd.ms-opentype .otf AddType application/vnd.oasis.opendocument.database .odb AddType application/vnd.oasis.opendocument.chart .odc AddType application/vnd.oasis.opendocument.formula .odf AddType application/vnd.oasis.opendocument.graphics .odg AddType application/vnd.oasis.opendocument.presentation .odp AddType application/vnd.oasis.opendocument.spreadsheet .ods AddType application/vnd.oasis.opendocument.text .odt AddType audio/ogg .ogg AddType application/pdf .pdf AddType image/png .png AddType application/vnd.ms-powerpoint .pot .pps .ppt .pptx AddType audio/x-realaudio .ra .ram AddType application/x-shockwave-flash .swf AddType application/x-tar .tar AddType image/tiff .tif .tiff AddType application/x-font-ttf .ttf .ttc AddType application/vnd.ms-opentype .ttf .ttc AddType audio/wav .wav AddType audio/wma .wma AddType application/vnd.ms-write .wri AddType application/font-woff .woff AddType application/vnd.ms-excel .xla .xls .xlsx .xlt .xlw AddType application/zip .zip </IfModule> <IfModule mod_expires.c> ExpiresActive On ExpiresByType text/css A31536000 ExpiresByType text/x-component A31536000 ExpiresByType application/x-javascript A31536000 ExpiresByType application/javascript A31536000 ExpiresByType text/javascript A31536000 ExpiresByType text/x-js A31536000 ExpiresByType text/html A3600 ExpiresByType text/richtext A3600 ExpiresByType image/svg+xml A3600 ExpiresByType text/plain A3600 ExpiresByType text/xsd A3600 ExpiresByType text/xsl A3600 ExpiresByType text/xml A3600 ExpiresByType video/asf A31536000 ExpiresByType video/avi A31536000 ExpiresByType image/bmp A31536000 ExpiresByType application/java A31536000 ExpiresByType video/divx A31536000 ExpiresByType application/msword A31536000 ExpiresByType application/vnd.ms-fontobject A31536000 ExpiresByType application/x-msdownload A31536000 ExpiresByType image/gif A31536000 ExpiresByType application/x-gzip A31536000 ExpiresByType image/x-icon A31536000 ExpiresByType image/jpeg A31536000 ExpiresByType application/json A31536000 ExpiresByType application/vnd.ms-access A31536000 ExpiresByType audio/midi A31536000 ExpiresByType video/quicktime A31536000 ExpiresByType audio/mpeg A31536000 ExpiresByType video/mp4 A31536000 ExpiresByType video/mpeg A31536000 ExpiresByType application/vnd.ms-project A31536000 ExpiresByType application/x-font-otf A31536000 ExpiresByType application/vnd.ms-opentype A31536000 ExpiresByType application/vnd.oasis.opendocument.database A31536000 ExpiresByType application/vnd.oasis.opendocument.chart A31536000 ExpiresByType application/vnd.oasis.opendocument.formula A31536000 ExpiresByType application/vnd.oasis.opendocument.graphics A31536000 ExpiresByType application/vnd.oasis.opendocument.presentation A31536000 ExpiresByType application/vnd.oasis.opendocument.spreadsheet A31536000 ExpiresByType application/vnd.oasis.opendocument.text A31536000 ExpiresByType audio/ogg A31536000 ExpiresByType application/pdf A31536000 ExpiresByType image/png A31536000 ExpiresByType application/vnd.ms-powerpoint A31536000 ExpiresByType audio/x-realaudio A31536000 ExpiresByType image/svg+xml A31536000 ExpiresByType application/x-shockwave-flash A31536000 ExpiresByType application/x-tar A31536000 ExpiresByType image/tiff A31536000 ExpiresByType application/x-font-ttf A31536000 ExpiresByType application/vnd.ms-opentype A31536000 ExpiresByType audio/wav A31536000 ExpiresByType audio/wma A31536000 ExpiresByType application/vnd.ms-write A31536000 ExpiresByType application/font-woff A31536000 ExpiresByType application/vnd.ms-excel A31536000 ExpiresByType application/zip A31536000 </IfModule> <IfModule mod_deflate.c> <IfModule mod_setenvif.c> BrowserMatch ^Mozilla/4 gzip-only-text/html BrowserMatch ^Mozilla/4\.0[678] no-gzip BrowserMatch \bMSIE !no-gzip !gzip-only-text/html BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html </IfModule> <IfModule mod_headers.c> Header append Vary User-Agent env=!dont-vary </IfModule> AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json <IfModule mod_mime.c> # DEFLATE by extension AddOutputFilter DEFLATE js css htm html xml </IfModule> </IfModule> <FilesMatch "\.(css|htc|less|js|js2|js3|js4|CSS|HTC|LESS|JS|JS2|JS3|JS4)$"> FileETag None <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public" Header unset ETag </IfModule> </FilesMatch> <FilesMatch "\.(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml|HTML|HTM|RTF|RTX|SVG|SVGZ|TXT|XSD|XSL|XML)$"> FileETag None <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public" Header unset ETag </IfModule> </FilesMatch> <FilesMatch "\.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|json|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|woff|xla|xls|xlsx|xlt|xlw|zip|ASF|ASX|WAX|WMV|WMX|AVI|BMP|CLASS|DIVX|DOC|DOCX|EOT|EXE|GIF|GZ|GZIP|ICO|JPG|JPEG|JPE|JSON|MDB|MID|MIDI|MOV|QT|MP3|M4A|MP4|M4V|MPEG|MPG|MPE|MPP|OTF|ODB|ODC|ODF|ODG|ODP|ODS|ODT|OGG|PDF|PNG|POT|PPS|PPT|PPTX|RA|RAM|SVG|SVGZ|SWF|TAR|TIF|TIFF|TTF|TTC|WAV|WMA|WRI|WOFF|XLA|XLS|XLSX|XLT|XLW|ZIP)$"> FileETag None <IfModule mod_headers.c> Header set Pragma "public" Header append Cache-Control "public" Header unset ETag </IfModule> </FilesMatch> # END W3TC Browser Cache # BEGIN W3TC CDN <FilesMatch "\.(ttf|ttc|otf|eot|woff|font.css)$"> <IfModule mod_headers.c> Header set Access-Control-Allow-Origin "*" </IfModule> </FilesMatch> # END W3TC CDN # BEGIN W3TC Page Cache core RewriteCond %{HTTPS} =on RewriteRule .* - [E=W3TC_SSL:_ssl] RewriteCond %{SERVER_PORT} =443 RewriteRule .* - [E=W3TC_SSL:_ssl] RewriteCond %{HTTP:Accept-Encoding} gzip RewriteRule .* - [E=W3TC_ENC:_gzip] RewriteCond %{HTTP_COOKIE} w3tc_preview [NC] RewriteRule .* - [E=W3TC_PREVIEW:_preview] RewriteCond %{REQUEST_METHOD} !=POST RewriteCond %{QUERY_STRING} ="" RewriteCond %{REQUEST_URI} \/$ RewriteCond %{HTTP_COOKIE} !(comment_author|wp\-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC] RewriteCond %{HTTP_USER_AGENT} !(W3\ Total\ Cache/0\.9\.4\.1) [NC] RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_SSL}%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_SSL}%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L] # END W3TC Page Cache core RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] # END WordPress
wordpress - Conditional Redirect on Apache htaccess based on IP - Stac...
public static class MIMEAssistant { private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string> { {"ai", "application/postscript"}, {"aif", "audio/x-aiff"}, {"aifc", "audio/x-aiff"}, {"aiff", "audio/x-aiff"}, {"asc", "text/plain"}, {"atom", "application/atom+xml"}, {"au", "audio/basic"}, {"avi", "video/x-msvideo"}, {"bcpio", "application/x-bcpio"}, {"bin", "application/octet-stream"}, {"bmp", "image/bmp"}, {"cdf", "application/x-netcdf"}, {"cgm", "image/cgm"}, {"class", "application/octet-stream"}, {"cpio", "application/x-cpio"}, {"cpt", "application/mac-compactpro"}, {"csh", "application/x-csh"}, {"css", "text/css"}, {"dcr", "application/x-director"}, {"dif", "video/x-dv"}, {"dir", "application/x-director"}, {"djv", "image/vnd.djvu"}, {"djvu", "image/vnd.djvu"}, {"dll", "application/octet-stream"}, {"dmg", "application/octet-stream"}, {"dms", "application/octet-stream"}, {"doc", "application/msword"}, {"docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document"}, {"dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"}, {"docm","application/vnd.ms-word.document.macroEnabled.12"}, {"dotm","application/vnd.ms-word.template.macroEnabled.12"}, {"dtd", "application/xml-dtd"}, {"dv", "video/x-dv"}, {"dvi", "application/x-dvi"}, {"dxr", "application/x-director"}, {"eps", "application/postscript"}, {"etx", "text/x-setext"}, {"exe", "application/octet-stream"}, {"ez", "application/andrew-inset"}, {"gif", "image/gif"}, {"gram", "application/srgs"}, {"grxml", "application/srgs+xml"}, {"gtar", "application/x-gtar"}, {"hdf", "application/x-hdf"}, {"hqx", "application/mac-binhex40"}, {"htm", "text/html"}, {"html", "text/html"}, {"ice", "x-conference/x-cooltalk"}, {"ico", "image/x-icon"}, {"ics", "text/calendar"}, {"ief", "image/ief"}, {"ifb", "text/calendar"}, {"iges", "model/iges"}, {"igs", "model/iges"}, {"jnlp", "application/x-java-jnlp-file"}, {"jp2", "image/jp2"}, {"jpe", "image/jpeg"}, {"jpeg", "image/jpeg"}, {"jpg", "image/jpeg"}, {"js", "application/x-javascript"}, {"kar", "audio/midi"}, {"latex", "application/x-latex"}, {"lha", "application/octet-stream"}, {"lzh", "application/octet-stream"}, {"m3u", "audio/x-mpegurl"}, {"m4a", "audio/mp4a-latm"}, {"m4b", "audio/mp4a-latm"}, {"m4p", "audio/mp4a-latm"}, {"m4u", "video/vnd.mpegurl"}, {"m4v", "video/x-m4v"}, {"mac", "image/x-macpaint"}, {"man", "application/x-troff-man"}, {"mathml", "application/mathml+xml"}, {"me", "application/x-troff-me"}, {"mesh", "model/mesh"}, {"mid", "audio/midi"}, {"midi", "audio/midi"}, {"mif", "application/vnd.mif"}, {"mov", "video/quicktime"}, {"movie", "video/x-sgi-movie"}, {"mp2", "audio/mpeg"}, {"mp3", "audio/mpeg"}, {"mp4", "video/mp4"}, {"mpe", "video/mpeg"}, {"mpeg", "video/mpeg"}, {"mpg", "video/mpeg"}, {"mpga", "audio/mpeg"}, {"ms", "application/x-troff-ms"}, {"msh", "model/mesh"}, {"mxu", "video/vnd.mpegurl"}, {"nc", "application/x-netcdf"}, {"oda", "application/oda"}, {"ogg", "application/ogg"}, {"pbm", "image/x-portable-bitmap"}, {"pct", "image/pict"}, {"pdb", "chemical/x-pdb"}, {"pdf", "application/pdf"}, {"pgm", "image/x-portable-graymap"}, {"pgn", "application/x-chess-pgn"}, {"pic", "image/pict"}, {"pict", "image/pict"}, {"png", "image/png"}, {"pnm", "image/x-portable-anymap"}, {"pnt", "image/x-macpaint"}, {"pntg", "image/x-macpaint"}, {"ppm", "image/x-portable-pixmap"}, {"ppt", "application/vnd.ms-powerpoint"}, {"pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation"}, {"potx","application/vnd.openxmlformats-officedocument.presentationml.template"}, {"ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow"}, {"ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12"}, {"pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12"}, {"potm","application/vnd.ms-powerpoint.template.macroEnabled.12"}, {"ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12"}, {"ps", "application/postscript"}, {"qt", "video/quicktime"}, {"qti", "image/x-quicktime"}, {"qtif", "image/x-quicktime"}, {"ra", "audio/x-pn-realaudio"}, {"ram", "audio/x-pn-realaudio"}, {"ras", "image/x-cmu-raster"}, {"rdf", "application/rdf+xml"}, {"rgb", "image/x-rgb"}, {"rm", "application/vnd.rn-realmedia"}, {"roff", "application/x-troff"}, {"rtf", "text/rtf"}, {"rtx", "text/richtext"}, {"sgm", "text/sgml"}, {"sgml", "text/sgml"}, {"sh", "application/x-sh"}, {"shar", "application/x-shar"}, {"silo", "model/mesh"}, {"sit", "application/x-stuffit"}, {"skd", "application/x-koan"}, {"skm", "application/x-koan"}, {"skp", "application/x-koan"}, {"skt", "application/x-koan"}, {"smi", "application/smil"}, {"smil", "application/smil"}, {"snd", "audio/basic"}, {"so", "application/octet-stream"}, {"spl", "application/x-futuresplash"}, {"src", "application/x-wais-source"}, {"sv4cpio", "application/x-sv4cpio"}, {"sv4crc", "application/x-sv4crc"}, {"svg", "image/svg+xml"}, {"swf", "application/x-shockwave-flash"}, {"t", "application/x-troff"}, {"tar", "application/x-tar"}, {"tcl", "application/x-tcl"}, {"tex", "application/x-tex"}, {"texi", "application/x-texinfo"}, {"texinfo", "application/x-texinfo"}, {"tif", "image/tiff"}, {"tiff", "image/tiff"}, {"tr", "application/x-troff"}, {"tsv", "text/tab-separated-values"}, {"txt", "text/plain"}, {"ustar", "application/x-ustar"}, {"vcd", "application/x-cdlink"}, {"vrml", "model/vrml"}, {"vxml", "application/voicexml+xml"}, {"wav", "audio/x-wav"}, {"wbmp", "image/vnd.wap.wbmp"}, {"wbmxl", "application/vnd.wap.wbxml"}, {"wml", "text/vnd.wap.wml"}, {"wmlc", "application/vnd.wap.wmlc"}, {"wmls", "text/vnd.wap.wmlscript"}, {"wmlsc", "application/vnd.wap.wmlscriptc"}, {"wrl", "model/vrml"}, {"xbm", "image/x-xbitmap"}, {"xht", "application/xhtml+xml"}, {"xhtml", "application/xhtml+xml"}, {"xls", "application/vnd.ms-excel"}, {"xml", "application/xml"}, {"xpm", "image/x-xpixmap"}, {"xsl", "application/xml"}, {"xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, {"xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template"}, {"xlsm","application/vnd.ms-excel.sheet.macroEnabled.12"}, {"xltm","application/vnd.ms-excel.template.macroEnabled.12"}, {"xlam","application/vnd.ms-excel.addin.macroEnabled.12"}, {"xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12"}, {"xslt", "application/xslt+xml"}, {"xul", "application/vnd.mozilla.xul+xml"}, {"xwd", "image/x-xwindowdump"}, {"xyz", "chemical/x-xyz"}, {"zip", "application/zip"} }; public static string GetMIMEType(string fileName) { //get file extension string extension = Path.GetExtension(fileName).ToLowerInvariant(); if (extension.Length > 0 && MIMETypesDictionary.ContainsKey(extension.Remove(0, 1))) { return MIMETypesDictionary[extension.Remove(0, 1)]; } return "unknown/unknown"; } }
This is based on the filename however. It could be useful for someone, just not the OP, who wanted it done by file contents.
A subset of this list is also convenient for mapping WebImage.ImageFormat back to a mime type. Thanks!
Depending on your goal, you might want to return "application/octet-stream" instead of "unknown/unknown".
Since my edit was rejected I will post here: the extension must be in all lower case, or it won't be found in the dictionary.
@JalalAldeenSaa'd - an even better fix IMHO would be to use StringComparer.OrdinalIgnoreCase to the dictionary constructor. Ordinal comparison is faster than invariant, and you will get rid of .ToLower() and its variations.
c# - Using .NET, how can you find the mime type of a file based on the...
public static class MIMEAssistant { private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string> { {"ai", "application/postscript"}, {"aif", "audio/x-aiff"}, {"aifc", "audio/x-aiff"}, {"aiff", "audio/x-aiff"}, {"asc", "text/plain"}, {"atom", "application/atom+xml"}, {"au", "audio/basic"}, {"avi", "video/x-msvideo"}, {"bcpio", "application/x-bcpio"}, {"bin", "application/octet-stream"}, {"bmp", "image/bmp"}, {"cdf", "application/x-netcdf"}, {"cgm", "image/cgm"}, {"class", "application/octet-stream"}, {"cpio", "application/x-cpio"}, {"cpt", "application/mac-compactpro"}, {"csh", "application/x-csh"}, {"css", "text/css"}, {"dcr", "application/x-director"}, {"dif", "video/x-dv"}, {"dir", "application/x-director"}, {"djv", "image/vnd.djvu"}, {"djvu", "image/vnd.djvu"}, {"dll", "application/octet-stream"}, {"dmg", "application/octet-stream"}, {"dms", "application/octet-stream"}, {"doc", "application/msword"}, {"docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document"}, {"dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"}, {"docm","application/vnd.ms-word.document.macroEnabled.12"}, {"dotm","application/vnd.ms-word.template.macroEnabled.12"}, {"dtd", "application/xml-dtd"}, {"dv", "video/x-dv"}, {"dvi", "application/x-dvi"}, {"dxr", "application/x-director"}, {"eps", "application/postscript"}, {"etx", "text/x-setext"}, {"exe", "application/octet-stream"}, {"ez", "application/andrew-inset"}, {"gif", "image/gif"}, {"gram", "application/srgs"}, {"grxml", "application/srgs+xml"}, {"gtar", "application/x-gtar"}, {"hdf", "application/x-hdf"}, {"hqx", "application/mac-binhex40"}, {"htm", "text/html"}, {"html", "text/html"}, {"ice", "x-conference/x-cooltalk"}, {"ico", "image/x-icon"}, {"ics", "text/calendar"}, {"ief", "image/ief"}, {"ifb", "text/calendar"}, {"iges", "model/iges"}, {"igs", "model/iges"}, {"jnlp", "application/x-java-jnlp-file"}, {"jp2", "image/jp2"}, {"jpe", "image/jpeg"}, {"jpeg", "image/jpeg"}, {"jpg", "image/jpeg"}, {"js", "application/x-javascript"}, {"kar", "audio/midi"}, {"latex", "application/x-latex"}, {"lha", "application/octet-stream"}, {"lzh", "application/octet-stream"}, {"m3u", "audio/x-mpegurl"}, {"m4a", "audio/mp4a-latm"}, {"m4b", "audio/mp4a-latm"}, {"m4p", "audio/mp4a-latm"}, {"m4u", "video/vnd.mpegurl"}, {"m4v", "video/x-m4v"}, {"mac", "image/x-macpaint"}, {"man", "application/x-troff-man"}, {"mathml", "application/mathml+xml"}, {"me", "application/x-troff-me"}, {"mesh", "model/mesh"}, {"mid", "audio/midi"}, {"midi", "audio/midi"}, {"mif", "application/vnd.mif"}, {"mov", "video/quicktime"}, {"movie", "video/x-sgi-movie"}, {"mp2", "audio/mpeg"}, {"mp3", "audio/mpeg"}, {"mp4", "video/mp4"}, {"mpe", "video/mpeg"}, {"mpeg", "video/mpeg"}, {"mpg", "video/mpeg"}, {"mpga", "audio/mpeg"}, {"ms", "application/x-troff-ms"}, {"msh", "model/mesh"}, {"mxu", "video/vnd.mpegurl"}, {"nc", "application/x-netcdf"}, {"oda", "application/oda"}, {"ogg", "application/ogg"}, {"pbm", "image/x-portable-bitmap"}, {"pct", "image/pict"}, {"pdb", "chemical/x-pdb"}, {"pdf", "application/pdf"}, {"pgm", "image/x-portable-graymap"}, {"pgn", "application/x-chess-pgn"}, {"pic", "image/pict"}, {"pict", "image/pict"}, {"png", "image/png"}, {"pnm", "image/x-portable-anymap"}, {"pnt", "image/x-macpaint"}, {"pntg", "image/x-macpaint"}, {"ppm", "image/x-portable-pixmap"}, {"ppt", "application/vnd.ms-powerpoint"}, {"pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation"}, {"potx","application/vnd.openxmlformats-officedocument.presentationml.template"}, {"ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow"}, {"ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12"}, {"pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12"}, {"potm","application/vnd.ms-powerpoint.template.macroEnabled.12"}, {"ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12"}, {"ps", "application/postscript"}, {"qt", "video/quicktime"}, {"qti", "image/x-quicktime"}, {"qtif", "image/x-quicktime"}, {"ra", "audio/x-pn-realaudio"}, {"ram", "audio/x-pn-realaudio"}, {"ras", "image/x-cmu-raster"}, {"rdf", "application/rdf+xml"}, {"rgb", "image/x-rgb"}, {"rm", "application/vnd.rn-realmedia"}, {"roff", "application/x-troff"}, {"rtf", "text/rtf"}, {"rtx", "text/richtext"}, {"sgm", "text/sgml"}, {"sgml", "text/sgml"}, {"sh", "application/x-sh"}, {"shar", "application/x-shar"}, {"silo", "model/mesh"}, {"sit", "application/x-stuffit"}, {"skd", "application/x-koan"}, {"skm", "application/x-koan"}, {"skp", "application/x-koan"}, {"skt", "application/x-koan"}, {"smi", "application/smil"}, {"smil", "application/smil"}, {"snd", "audio/basic"}, {"so", "application/octet-stream"}, {"spl", "application/x-futuresplash"}, {"src", "application/x-wais-source"}, {"sv4cpio", "application/x-sv4cpio"}, {"sv4crc", "application/x-sv4crc"}, {"svg", "image/svg+xml"}, {"swf", "application/x-shockwave-flash"}, {"t", "application/x-troff"}, {"tar", "application/x-tar"}, {"tcl", "application/x-tcl"}, {"tex", "application/x-tex"}, {"texi", "application/x-texinfo"}, {"texinfo", "application/x-texinfo"}, {"tif", "image/tiff"}, {"tiff", "image/tiff"}, {"tr", "application/x-troff"}, {"tsv", "text/tab-separated-values"}, {"txt", "text/plain"}, {"ustar", "application/x-ustar"}, {"vcd", "application/x-cdlink"}, {"vrml", "model/vrml"}, {"vxml", "application/voicexml+xml"}, {"wav", "audio/x-wav"}, {"wbmp", "image/vnd.wap.wbmp"}, {"wbmxl", "application/vnd.wap.wbxml"}, {"wml", "text/vnd.wap.wml"}, {"wmlc", "application/vnd.wap.wmlc"}, {"wmls", "text/vnd.wap.wmlscript"}, {"wmlsc", "application/vnd.wap.wmlscriptc"}, {"wrl", "model/vrml"}, {"xbm", "image/x-xbitmap"}, {"xht", "application/xhtml+xml"}, {"xhtml", "application/xhtml+xml"}, {"xls", "application/vnd.ms-excel"}, {"xml", "application/xml"}, {"xpm", "image/x-xpixmap"}, {"xsl", "application/xml"}, {"xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, {"xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template"}, {"xlsm","application/vnd.ms-excel.sheet.macroEnabled.12"}, {"xltm","application/vnd.ms-excel.template.macroEnabled.12"}, {"xlam","application/vnd.ms-excel.addin.macroEnabled.12"}, {"xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12"}, {"xslt", "application/xslt+xml"}, {"xul", "application/vnd.mozilla.xul+xml"}, {"xwd", "image/x-xwindowdump"}, {"xyz", "chemical/x-xyz"}, {"zip", "application/zip"} }; public static string GetMIMEType(string fileName) { //get file extension string extension = Path.GetExtension(fileName).ToLowerInvariant(); if (extension.Length > 0 && MIMETypesDictionary.ContainsKey(extension.Remove(0, 1))) { return MIMETypesDictionary[extension.Remove(0, 1)]; } return "unknown/unknown"; } }
This is based on the filename however. It could be useful for someone, just not the OP, who wanted it done by file contents.
A subset of this list is also convenient for mapping WebImage.ImageFormat back to a mime type. Thanks!
Depending on your goal, you might want to return "application/octet-stream" instead of "unknown/unknown".
Since my edit was rejected I will post here: the extension must be in all lower case, or it won't be found in the dictionary.
@JalalAldeenSaa'd - an even better fix IMHO would be to use StringComparer.OrdinalIgnoreCase to the dictionary constructor. Ordinal comparison is faster than invariant, and you will get rid of .ToLower() and its variations.
c# - Using .NET, how can you find the mime type of a file based on the...
public static class MIMEAssistant { private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string> { {"ai", "application/postscript"}, {"aif", "audio/x-aiff"}, {"aifc", "audio/x-aiff"}, {"aiff", "audio/x-aiff"}, {"asc", "text/plain"}, {"atom", "application/atom+xml"}, {"au", "audio/basic"}, {"avi", "video/x-msvideo"}, {"bcpio", "application/x-bcpio"}, {"bin", "application/octet-stream"}, {"bmp", "image/bmp"}, {"cdf", "application/x-netcdf"}, {"cgm", "image/cgm"}, {"class", "application/octet-stream"}, {"cpio", "application/x-cpio"}, {"cpt", "application/mac-compactpro"}, {"csh", "application/x-csh"}, {"css", "text/css"}, {"dcr", "application/x-director"}, {"dif", "video/x-dv"}, {"dir", "application/x-director"}, {"djv", "image/vnd.djvu"}, {"djvu", "image/vnd.djvu"}, {"dll", "application/octet-stream"}, {"dmg", "application/octet-stream"}, {"dms", "application/octet-stream"}, {"doc", "application/msword"}, {"docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document"}, {"dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"}, {"docm","application/vnd.ms-word.document.macroEnabled.12"}, {"dotm","application/vnd.ms-word.template.macroEnabled.12"}, {"dtd", "application/xml-dtd"}, {"dv", "video/x-dv"}, {"dvi", "application/x-dvi"}, {"dxr", "application/x-director"}, {"eps", "application/postscript"}, {"etx", "text/x-setext"}, {"exe", "application/octet-stream"}, {"ez", "application/andrew-inset"}, {"gif", "image/gif"}, {"gram", "application/srgs"}, {"grxml", "application/srgs+xml"}, {"gtar", "application/x-gtar"}, {"hdf", "application/x-hdf"}, {"hqx", "application/mac-binhex40"}, {"htm", "text/html"}, {"html", "text/html"}, {"ice", "x-conference/x-cooltalk"}, {"ico", "image/x-icon"}, {"ics", "text/calendar"}, {"ief", "image/ief"}, {"ifb", "text/calendar"}, {"iges", "model/iges"}, {"igs", "model/iges"}, {"jnlp", "application/x-java-jnlp-file"}, {"jp2", "image/jp2"}, {"jpe", "image/jpeg"}, {"jpeg", "image/jpeg"}, {"jpg", "image/jpeg"}, {"js", "application/x-javascript"}, {"kar", "audio/midi"}, {"latex", "application/x-latex"}, {"lha", "application/octet-stream"}, {"lzh", "application/octet-stream"}, {"m3u", "audio/x-mpegurl"}, {"m4a", "audio/mp4a-latm"}, {"m4b", "audio/mp4a-latm"}, {"m4p", "audio/mp4a-latm"}, {"m4u", "video/vnd.mpegurl"}, {"m4v", "video/x-m4v"}, {"mac", "image/x-macpaint"}, {"man", "application/x-troff-man"}, {"mathml", "application/mathml+xml"}, {"me", "application/x-troff-me"}, {"mesh", "model/mesh"}, {"mid", "audio/midi"}, {"midi", "audio/midi"}, {"mif", "application/vnd.mif"}, {"mov", "video/quicktime"}, {"movie", "video/x-sgi-movie"}, {"mp2", "audio/mpeg"}, {"mp3", "audio/mpeg"}, {"mp4", "video/mp4"}, {"mpe", "video/mpeg"}, {"mpeg", "video/mpeg"}, {"mpg", "video/mpeg"}, {"mpga", "audio/mpeg"}, {"ms", "application/x-troff-ms"}, {"msh", "model/mesh"}, {"mxu", "video/vnd.mpegurl"}, {"nc", "application/x-netcdf"}, {"oda", "application/oda"}, {"ogg", "application/ogg"}, {"pbm", "image/x-portable-bitmap"}, {"pct", "image/pict"}, {"pdb", "chemical/x-pdb"}, {"pdf", "application/pdf"}, {"pgm", "image/x-portable-graymap"}, {"pgn", "application/x-chess-pgn"}, {"pic", "image/pict"}, {"pict", "image/pict"}, {"png", "image/png"}, {"pnm", "image/x-portable-anymap"}, {"pnt", "image/x-macpaint"}, {"pntg", "image/x-macpaint"}, {"ppm", "image/x-portable-pixmap"}, {"ppt", "application/vnd.ms-powerpoint"}, {"pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation"}, {"potx","application/vnd.openxmlformats-officedocument.presentationml.template"}, {"ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow"}, {"ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12"}, {"pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12"}, {"potm","application/vnd.ms-powerpoint.template.macroEnabled.12"}, {"ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12"}, {"ps", "application/postscript"}, {"qt", "video/quicktime"}, {"qti", "image/x-quicktime"}, {"qtif", "image/x-quicktime"}, {"ra", "audio/x-pn-realaudio"}, {"ram", "audio/x-pn-realaudio"}, {"ras", "image/x-cmu-raster"}, {"rdf", "application/rdf+xml"}, {"rgb", "image/x-rgb"}, {"rm", "application/vnd.rn-realmedia"}, {"roff", "application/x-troff"}, {"rtf", "text/rtf"}, {"rtx", "text/richtext"}, {"sgm", "text/sgml"}, {"sgml", "text/sgml"}, {"sh", "application/x-sh"}, {"shar", "application/x-shar"}, {"silo", "model/mesh"}, {"sit", "application/x-stuffit"}, {"skd", "application/x-koan"}, {"skm", "application/x-koan"}, {"skp", "application/x-koan"}, {"skt", "application/x-koan"}, {"smi", "application/smil"}, {"smil", "application/smil"}, {"snd", "audio/basic"}, {"so", "application/octet-stream"}, {"spl", "application/x-futuresplash"}, {"src", "application/x-wais-source"}, {"sv4cpio", "application/x-sv4cpio"}, {"sv4crc", "application/x-sv4crc"}, {"svg", "image/svg+xml"}, {"swf", "application/x-shockwave-flash"}, {"t", "application/x-troff"}, {"tar", "application/x-tar"}, {"tcl", "application/x-tcl"}, {"tex", "application/x-tex"}, {"texi", "application/x-texinfo"}, {"texinfo", "application/x-texinfo"}, {"tif", "image/tiff"}, {"tiff", "image/tiff"}, {"tr", "application/x-troff"}, {"tsv", "text/tab-separated-values"}, {"txt", "text/plain"}, {"ustar", "application/x-ustar"}, {"vcd", "application/x-cdlink"}, {"vrml", "model/vrml"}, {"vxml", "application/voicexml+xml"}, {"wav", "audio/x-wav"}, {"wbmp", "image/vnd.wap.wbmp"}, {"wbmxl", "application/vnd.wap.wbxml"}, {"wml", "text/vnd.wap.wml"}, {"wmlc", "application/vnd.wap.wmlc"}, {"wmls", "text/vnd.wap.wmlscript"}, {"wmlsc", "application/vnd.wap.wmlscriptc"}, {"wrl", "model/vrml"}, {"xbm", "image/x-xbitmap"}, {"xht", "application/xhtml+xml"}, {"xhtml", "application/xhtml+xml"}, {"xls", "application/vnd.ms-excel"}, {"xml", "application/xml"}, {"xpm", "image/x-xpixmap"}, {"xsl", "application/xml"}, {"xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, {"xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template"}, {"xlsm","application/vnd.ms-excel.sheet.macroEnabled.12"}, {"xltm","application/vnd.ms-excel.template.macroEnabled.12"}, {"xlam","application/vnd.ms-excel.addin.macroEnabled.12"}, {"xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12"}, {"xslt", "application/xslt+xml"}, {"xul", "application/vnd.mozilla.xul+xml"}, {"xwd", "image/x-xwindowdump"}, {"xyz", "chemical/x-xyz"}, {"zip", "application/zip"} }; public static string GetMIMEType(string fileName) { //get file extension string extension = Path.GetExtension(fileName).ToLowerInvariant(); if (extension.Length > 0 && MIMETypesDictionary.ContainsKey(extension.Remove(0, 1))) { return MIMETypesDictionary[extension.Remove(0, 1)]; } return "unknown/unknown"; } }
This is based on the filename however. It could be useful for someone, just not the OP, who wanted it done by file contents.
A subset of this list is also convenient for mapping WebImage.ImageFormat back to a mime type. Thanks!
Depending on your goal, you might want to return "application/octet-stream" instead of "unknown/unknown".
Since my edit was rejected I will post here: the extension must be in all lower case, or it won't be found in the dictionary.
@JalalAldeenSaa'd - an even better fix IMHO would be to use StringComparer.OrdinalIgnoreCase to the dictionary constructor. Ordinal comparison is faster than invariant, and you will get rid of .ToLower() and its variations.
c# - Using .NET, how can you find the mime type of a file based on the...
public static class MIMEAssistant { private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string> { {"ai", "application/postscript"}, {"aif", "audio/x-aiff"}, {"aifc", "audio/x-aiff"}, {"aiff", "audio/x-aiff"}, {"asc", "text/plain"}, {"atom", "application/atom+xml"}, {"au", "audio/basic"}, {"avi", "video/x-msvideo"}, {"bcpio", "application/x-bcpio"}, {"bin", "application/octet-stream"}, {"bmp", "image/bmp"}, {"cdf", "application/x-netcdf"}, {"cgm", "image/cgm"}, {"class", "application/octet-stream"}, {"cpio", "application/x-cpio"}, {"cpt", "application/mac-compactpro"}, {"csh", "application/x-csh"}, {"css", "text/css"}, {"dcr", "application/x-director"}, {"dif", "video/x-dv"}, {"dir", "application/x-director"}, {"djv", "image/vnd.djvu"}, {"djvu", "image/vnd.djvu"}, {"dll", "application/octet-stream"}, {"dmg", "application/octet-stream"}, {"dms", "application/octet-stream"}, {"doc", "application/msword"}, {"docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document"}, {"dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"}, {"docm","application/vnd.ms-word.document.macroEnabled.12"}, {"dotm","application/vnd.ms-word.template.macroEnabled.12"}, {"dtd", "application/xml-dtd"}, {"dv", "video/x-dv"}, {"dvi", "application/x-dvi"}, {"dxr", "application/x-director"}, {"eps", "application/postscript"}, {"etx", "text/x-setext"}, {"exe", "application/octet-stream"}, {"ez", "application/andrew-inset"}, {"gif", "image/gif"}, {"gram", "application/srgs"}, {"grxml", "application/srgs+xml"}, {"gtar", "application/x-gtar"}, {"hdf", "application/x-hdf"}, {"hqx", "application/mac-binhex40"}, {"htm", "text/html"}, {"html", "text/html"}, {"ice", "x-conference/x-cooltalk"}, {"ico", "image/x-icon"}, {"ics", "text/calendar"}, {"ief", "image/ief"}, {"ifb", "text/calendar"}, {"iges", "model/iges"}, {"igs", "model/iges"}, {"jnlp", "application/x-java-jnlp-file"}, {"jp2", "image/jp2"}, {"jpe", "image/jpeg"}, {"jpeg", "image/jpeg"}, {"jpg", "image/jpeg"}, {"js", "application/x-javascript"}, {"kar", "audio/midi"}, {"latex", "application/x-latex"}, {"lha", "application/octet-stream"}, {"lzh", "application/octet-stream"}, {"m3u", "audio/x-mpegurl"}, {"m4a", "audio/mp4a-latm"}, {"m4b", "audio/mp4a-latm"}, {"m4p", "audio/mp4a-latm"}, {"m4u", "video/vnd.mpegurl"}, {"m4v", "video/x-m4v"}, {"mac", "image/x-macpaint"}, {"man", "application/x-troff-man"}, {"mathml", "application/mathml+xml"}, {"me", "application/x-troff-me"}, {"mesh", "model/mesh"}, {"mid", "audio/midi"}, {"midi", "audio/midi"}, {"mif", "application/vnd.mif"}, {"mov", "video/quicktime"}, {"movie", "video/x-sgi-movie"}, {"mp2", "audio/mpeg"}, {"mp3", "audio/mpeg"}, {"mp4", "video/mp4"}, {"mpe", "video/mpeg"}, {"mpeg", "video/mpeg"}, {"mpg", "video/mpeg"}, {"mpga", "audio/mpeg"}, {"ms", "application/x-troff-ms"}, {"msh", "model/mesh"}, {"mxu", "video/vnd.mpegurl"}, {"nc", "application/x-netcdf"}, {"oda", "application/oda"}, {"ogg", "application/ogg"}, {"pbm", "image/x-portable-bitmap"}, {"pct", "image/pict"}, {"pdb", "chemical/x-pdb"}, {"pdf", "application/pdf"}, {"pgm", "image/x-portable-graymap"}, {"pgn", "application/x-chess-pgn"}, {"pic", "image/pict"}, {"pict", "image/pict"}, {"png", "image/png"}, {"pnm", "image/x-portable-anymap"}, {"pnt", "image/x-macpaint"}, {"pntg", "image/x-macpaint"}, {"ppm", "image/x-portable-pixmap"}, {"ppt", "application/vnd.ms-powerpoint"}, {"pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation"}, {"potx","application/vnd.openxmlformats-officedocument.presentationml.template"}, {"ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow"}, {"ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12"}, {"pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12"}, {"potm","application/vnd.ms-powerpoint.template.macroEnabled.12"}, {"ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12"}, {"ps", "application/postscript"}, {"qt", "video/quicktime"}, {"qti", "image/x-quicktime"}, {"qtif", "image/x-quicktime"}, {"ra", "audio/x-pn-realaudio"}, {"ram", "audio/x-pn-realaudio"}, {"ras", "image/x-cmu-raster"}, {"rdf", "application/rdf+xml"}, {"rgb", "image/x-rgb"}, {"rm", "application/vnd.rn-realmedia"}, {"roff", "application/x-troff"}, {"rtf", "text/rtf"}, {"rtx", "text/richtext"}, {"sgm", "text/sgml"}, {"sgml", "text/sgml"}, {"sh", "application/x-sh"}, {"shar", "application/x-shar"}, {"silo", "model/mesh"}, {"sit", "application/x-stuffit"}, {"skd", "application/x-koan"}, {"skm", "application/x-koan"}, {"skp", "application/x-koan"}, {"skt", "application/x-koan"}, {"smi", "application/smil"}, {"smil", "application/smil"}, {"snd", "audio/basic"}, {"so", "application/octet-stream"}, {"spl", "application/x-futuresplash"}, {"src", "application/x-wais-source"}, {"sv4cpio", "application/x-sv4cpio"}, {"sv4crc", "application/x-sv4crc"}, {"svg", "image/svg+xml"}, {"swf", "application/x-shockwave-flash"}, {"t", "application/x-troff"}, {"tar", "application/x-tar"}, {"tcl", "application/x-tcl"}, {"tex", "application/x-tex"}, {"texi", "application/x-texinfo"}, {"texinfo", "application/x-texinfo"}, {"tif", "image/tiff"}, {"tiff", "image/tiff"}, {"tr", "application/x-troff"}, {"tsv", "text/tab-separated-values"}, {"txt", "text/plain"}, {"ustar", "application/x-ustar"}, {"vcd", "application/x-cdlink"}, {"vrml", "model/vrml"}, {"vxml", "application/voicexml+xml"}, {"wav", "audio/x-wav"}, {"wbmp", "image/vnd.wap.wbmp"}, {"wbmxl", "application/vnd.wap.wbxml"}, {"wml", "text/vnd.wap.wml"}, {"wmlc", "application/vnd.wap.wmlc"}, {"wmls", "text/vnd.wap.wmlscript"}, {"wmlsc", "application/vnd.wap.wmlscriptc"}, {"wrl", "model/vrml"}, {"xbm", "image/x-xbitmap"}, {"xht", "application/xhtml+xml"}, {"xhtml", "application/xhtml+xml"}, {"xls", "application/vnd.ms-excel"}, {"xml", "application/xml"}, {"xpm", "image/x-xpixmap"}, {"xsl", "application/xml"}, {"xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, {"xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template"}, {"xlsm","application/vnd.ms-excel.sheet.macroEnabled.12"}, {"xltm","application/vnd.ms-excel.template.macroEnabled.12"}, {"xlam","application/vnd.ms-excel.addin.macroEnabled.12"}, {"xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12"}, {"xslt", "application/xslt+xml"}, {"xul", "application/vnd.mozilla.xul+xml"}, {"xwd", "image/x-xwindowdump"}, {"xyz", "chemical/x-xyz"}, {"zip", "application/zip"} }; public static string GetMIMEType(string fileName) { //get file extension string extension = Path.GetExtension(fileName).ToLowerInvariant(); if (extension.Length > 0 && MIMETypesDictionary.ContainsKey(extension.Remove(0, 1))) { return MIMETypesDictionary[extension.Remove(0, 1)]; } return "unknown/unknown"; } }
This is based on the filename however. It could be useful for someone, just not the OP, who wanted it done by file contents.
A subset of this list is also convenient for mapping WebImage.ImageFormat back to a mime type. Thanks!
Depending on your goal, you might want to return "application/octet-stream" instead of "unknown/unknown".
Since my edit was rejected I will post here: the extension must be in all lower case, or it won't be found in the dictionary.
@JalalAldeenSaa'd - an even better fix IMHO would be to use StringComparer.OrdinalIgnoreCase to the dictionary constructor. Ordinal comparison is faster than invariant, and you will get rid of .ToLower() and its variations.
c# - Using .NET, how can you find the mime type of a file based on the...
public static class MIMEAssistant { private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string> { {"ai", "application/postscript"}, {"aif", "audio/x-aiff"}, {"aifc", "audio/x-aiff"}, {"aiff", "audio/x-aiff"}, {"asc", "text/plain"}, {"atom", "application/atom+xml"}, {"au", "audio/basic"}, {"avi", "video/x-msvideo"}, {"bcpio", "application/x-bcpio"}, {"bin", "application/octet-stream"}, {"bmp", "image/bmp"}, {"cdf", "application/x-netcdf"}, {"cgm", "image/cgm"}, {"class", "application/octet-stream"}, {"cpio", "application/x-cpio"}, {"cpt", "application/mac-compactpro"}, {"csh", "application/x-csh"}, {"css", "text/css"}, {"dcr", "application/x-director"}, {"dif", "video/x-dv"}, {"dir", "application/x-director"}, {"djv", "image/vnd.djvu"}, {"djvu", "image/vnd.djvu"}, {"dll", "application/octet-stream"}, {"dmg", "application/octet-stream"}, {"dms", "application/octet-stream"}, {"doc", "application/msword"}, {"docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document"}, {"dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template"}, {"docm","application/vnd.ms-word.document.macroEnabled.12"}, {"dotm","application/vnd.ms-word.template.macroEnabled.12"}, {"dtd", "application/xml-dtd"}, {"dv", "video/x-dv"}, {"dvi", "application/x-dvi"}, {"dxr", "application/x-director"}, {"eps", "application/postscript"}, {"etx", "text/x-setext"}, {"exe", "application/octet-stream"}, {"ez", "application/andrew-inset"}, {"gif", "image/gif"}, {"gram", "application/srgs"}, {"grxml", "application/srgs+xml"}, {"gtar", "application/x-gtar"}, {"hdf", "application/x-hdf"}, {"hqx", "application/mac-binhex40"}, {"htm", "text/html"}, {"html", "text/html"}, {"ice", "x-conference/x-cooltalk"}, {"ico", "image/x-icon"}, {"ics", "text/calendar"}, {"ief", "image/ief"}, {"ifb", "text/calendar"}, {"iges", "model/iges"}, {"igs", "model/iges"}, {"jnlp", "application/x-java-jnlp-file"}, {"jp2", "image/jp2"}, {"jpe", "image/jpeg"}, {"jpeg", "image/jpeg"}, {"jpg", "image/jpeg"}, {"js", "application/x-javascript"}, {"kar", "audio/midi"}, {"latex", "application/x-latex"}, {"lha", "application/octet-stream"}, {"lzh", "application/octet-stream"}, {"m3u", "audio/x-mpegurl"}, {"m4a", "audio/mp4a-latm"}, {"m4b", "audio/mp4a-latm"}, {"m4p", "audio/mp4a-latm"}, {"m4u", "video/vnd.mpegurl"}, {"m4v", "video/x-m4v"}, {"mac", "image/x-macpaint"}, {"man", "application/x-troff-man"}, {"mathml", "application/mathml+xml"}, {"me", "application/x-troff-me"}, {"mesh", "model/mesh"}, {"mid", "audio/midi"}, {"midi", "audio/midi"}, {"mif", "application/vnd.mif"}, {"mov", "video/quicktime"}, {"movie", "video/x-sgi-movie"}, {"mp2", "audio/mpeg"}, {"mp3", "audio/mpeg"}, {"mp4", "video/mp4"}, {"mpe", "video/mpeg"}, {"mpeg", "video/mpeg"}, {"mpg", "video/mpeg"}, {"mpga", "audio/mpeg"}, {"ms", "application/x-troff-ms"}, {"msh", "model/mesh"}, {"mxu", "video/vnd.mpegurl"}, {"nc", "application/x-netcdf"}, {"oda", "application/oda"}, {"ogg", "application/ogg"}, {"pbm", "image/x-portable-bitmap"}, {"pct", "image/pict"}, {"pdb", "chemical/x-pdb"}, {"pdf", "application/pdf"}, {"pgm", "image/x-portable-graymap"}, {"pgn", "application/x-chess-pgn"}, {"pic", "image/pict"}, {"pict", "image/pict"}, {"png", "image/png"}, {"pnm", "image/x-portable-anymap"}, {"pnt", "image/x-macpaint"}, {"pntg", "image/x-macpaint"}, {"ppm", "image/x-portable-pixmap"}, {"ppt", "application/vnd.ms-powerpoint"}, {"pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation"}, {"potx","application/vnd.openxmlformats-officedocument.presentationml.template"}, {"ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow"}, {"ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12"}, {"pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12"}, {"potm","application/vnd.ms-powerpoint.template.macroEnabled.12"}, {"ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12"}, {"ps", "application/postscript"}, {"qt", "video/quicktime"}, {"qti", "image/x-quicktime"}, {"qtif", "image/x-quicktime"}, {"ra", "audio/x-pn-realaudio"}, {"ram", "audio/x-pn-realaudio"}, {"ras", "image/x-cmu-raster"}, {"rdf", "application/rdf+xml"}, {"rgb", "image/x-rgb"}, {"rm", "application/vnd.rn-realmedia"}, {"roff", "application/x-troff"}, {"rtf", "text/rtf"}, {"rtx", "text/richtext"}, {"sgm", "text/sgml"}, {"sgml", "text/sgml"}, {"sh", "application/x-sh"}, {"shar", "application/x-shar"}, {"silo", "model/mesh"}, {"sit", "application/x-stuffit"}, {"skd", "application/x-koan"}, {"skm", "application/x-koan"}, {"skp", "application/x-koan"}, {"skt", "application/x-koan"}, {"smi", "application/smil"}, {"smil", "application/smil"}, {"snd", "audio/basic"}, {"so", "application/octet-stream"}, {"spl", "application/x-futuresplash"}, {"src", "application/x-wais-source"}, {"sv4cpio", "application/x-sv4cpio"}, {"sv4crc", "application/x-sv4crc"}, {"svg", "image/svg+xml"}, {"swf", "application/x-shockwave-flash"}, {"t", "application/x-troff"}, {"tar", "application/x-tar"}, {"tcl", "application/x-tcl"}, {"tex", "application/x-tex"}, {"texi", "application/x-texinfo"}, {"texinfo", "application/x-texinfo"}, {"tif", "image/tiff"}, {"tiff", "image/tiff"}, {"tr", "application/x-troff"}, {"tsv", "text/tab-separated-values"}, {"txt", "text/plain"}, {"ustar", "application/x-ustar"}, {"vcd", "application/x-cdlink"}, {"vrml", "model/vrml"}, {"vxml", "application/voicexml+xml"}, {"wav", "audio/x-wav"}, {"wbmp", "image/vnd.wap.wbmp"}, {"wbmxl", "application/vnd.wap.wbxml"}, {"wml", "text/vnd.wap.wml"}, {"wmlc", "application/vnd.wap.wmlc"}, {"wmls", "text/vnd.wap.wmlscript"}, {"wmlsc", "application/vnd.wap.wmlscriptc"}, {"wrl", "model/vrml"}, {"xbm", "image/x-xbitmap"}, {"xht", "application/xhtml+xml"}, {"xhtml", "application/xhtml+xml"}, {"xls", "application/vnd.ms-excel"}, {"xml", "application/xml"}, {"xpm", "image/x-xpixmap"}, {"xsl", "application/xml"}, {"xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, {"xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template"}, {"xlsm","application/vnd.ms-excel.sheet.macroEnabled.12"}, {"xltm","application/vnd.ms-excel.template.macroEnabled.12"}, {"xlam","application/vnd.ms-excel.addin.macroEnabled.12"}, {"xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12"}, {"xslt", "application/xslt+xml"}, {"xul", "application/vnd.mozilla.xul+xml"}, {"xwd", "image/x-xwindowdump"}, {"xyz", "chemical/x-xyz"}, {"zip", "application/zip"} }; public static string GetMIMEType(string fileName) { //get file extension string extension = Path.GetExtension(fileName).ToLowerInvariant(); if (extension.Length > 0 && MIMETypesDictionary.ContainsKey(extension.Remove(0, 1))) { return MIMETypesDictionary[extension.Remove(0, 1)]; } return "unknown/unknown"; } }
This is based on the filename however. It could be useful for someone, just not the OP, who wanted it done by file contents.
A subset of this list is also convenient for mapping WebImage.ImageFormat back to a mime type. Thanks!
Depending on your goal, you might want to return "application/octet-stream" instead of "unknown/unknown".
Since my edit was rejected I will post here: the extension must be in all lower case, or it won't be found in the dictionary.
@JalalAldeenSaa'd - an even better fix IMHO would be to use StringComparer.OrdinalIgnoreCase to the dictionary constructor. Ordinal comparison is faster than invariant, and you will get rid of .ToLower() and its variations.
c# - Using .NET, how can you find the mime type of a file based on the...
<dependency> <groupId>org.docx4j</groupId> <artifactId>docx4j-ImportXHTML</artifactId> <version>3.0.0</version> </dependency>
Note that your input needs to be well-formed XML, so if you have HTML, you'll need to tidy it first (with one of the many java libraries which can do this for you).
i tried this and i get this error : java.lang.NoClassDefFoundError: org/docx4j/org/xhtmlrenderer/render/Box
true, the problem is I'm using the IDE project not maven feature, so i can't get all the working versions of the dependencies :/ please help ?
yes yes, i downloaded the zip and its working fine now, thank you.
java - how to convert HTML to .docx using docx4j? - Stack Overflow
I too was struck at this point. Now I know there is a 3rd party API to convert docx to html works finehttps://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML
You do know this was answered a year ago right?
java - Converting a .docx to html using Apache POI and getting no text...
org.apache.poi.xwpf.converter.core 1.0.5.jar org.apache.poi.xwpf.converter.xhtml 1.0.5.jar
which are compatible with version of
POI 3.10Final
Apache POI 3.11 came out about a month ago, so getting a version compatible with that would be better if you can!
Docx to HTML poi java.lang.NoSuchMethodError - Stack Overflow
Converting rtf to docx content is outside the scope of docx4j.
java - Merging HTML, RTF to Docx using Docx4J - Stack Overflow
Is you find anything which convert both doc and docx into html? Please show me sample code if any?
Convert html to doc in java - Stack Overflow
I've been spending a little time looking into docx4j. It seems to provide nice ways for creating html documents from docx but I can't see anything for the other way round.
At the moment this is still looking like the easiest method as it's just working with jaxb objects (I think).
I have converted docx4j's HTML (as opposed to any old bit of HTML) back to docx. As a result there is a little stuff in docx4j to help you: classes such as org.docx4j.model.properties.run.Bold have constructors which take a CSSValue. Other bits aren't there (eg the code which uses that, the code for converting an HTML table, and the code to import an image).
Convert html to doc in java - Stack Overflow
I am using docx4j to do this and it seems to be working. If you're using Maven you can just add the dependency (but use version 3.0.0) and then use one of the docx4j sample programs called ConvertOutHtml.java. Just change the filepath in ConvertOutHtml.java to point to your file and you should be fine.
apache poi - Unable to convert docx to html using Java - Stack Overflo...
If you want to convert a .docx file to .html then you can't directly read the file as it is a binary file. You can use JODConverter for this. I haven't used this personally but this question is near duplicate of this question.
java - Converting a .docx to html and I am getting unreadable text - S...
@Jason I want to ignore certain formatting in the input docx. As the converted html had some extra spacing or junk characters etc added into it.
As a solution I created a new xslt. For most, it is very similar to the one in the sample but with few minor tweaks. The new xslt now converts the input docx file into a properly formatted(as I need) html for IE8, Mozilla or Chrome.
java - Custom solution in Docx4J for converting Docx to HTML - Stack O...
As you have observed, docx4j doesn't current convert numbered paragraphs into HTML list items.
It would be fairly straightforward to do that, but a bit more effort to wrap the list items in appropriate ol or ul elements.
This commit adds ol or ul elements and list items, provided you set:
SdtWriter.registerTagHandler("HTML_ELEMENT", new SdtToListSdtTagHandler());
and are using XSL for the output method.
Thanks for the response Jason! What would the most straightforward way be to implement this?
As a minor aside, wouldn't a numbered paragraph be better served by being converted to an entry within an <ol> element rather than a <ul>?
When I said "numbered" I mean using numId (ie numbering part) which encompasses both bullets (ul) and numbering (ol).
Hi Jason, lists are now properly being created in the conversion from html->docx, however the conversion of docx->html is still creating <p> elements.