Rectangle 27 0

Get text from docdocx file in pages using Apache tika?


Since POI (fundamentally) cannot read out those page numbers and Tika is not meant to be a document renderer either, the answer is very simply: No, this is not possible.

Tika uses Apache POI to process Word files (both the old binary- and the newer XML-based flavors).

Note