Mammoth
Mammoth converts .docx documents, such as those created by Microsoft Word, to HTML.
Mammoth aims to produce simple and clean HTML by using semantic information in the document,
and ignoring other details.
For instance, Mammoth converts any paragraph with the style Heading1
to h1
elements,
rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading.
If you've defined your own styles in your document,
then Mammoth allows you to map those styles to appropriate HTML.
There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.
The supported platforms are:
- JavaScript, both the browser and node.js. Available on npm.
- Python. Available on PyPI.
- WordPress.
- Java/JVM. Available on Maven Central.
- .NET. Available on NuGet.
Links
Demo
Try it out by uploading a .docx file.
This will run Mammoth using the default style mappings,
so it will only expect standard Word styles such as Heading1
.
Select a .docx file: