Mammoth converts .docx documents, such as those created by Microsoft Word, to HTML.
Mammoth aims to produce simple and clean HTML by using semantic information in the document,
and ignoring other details.
For instance, Mammoth converts any paragraph with the style
rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading.
If you've defined your own styles in your document,
then Mammoth allows you to map those styles to appropriate HTML.
There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.
The supported platforms are:
- Python. Available on PyPI.
- Java/JVM. Available on Maven Central.
- .NET. Available on NuGet.
Try it out by uploading a .docx file.
This will run Mammoth using the default style mappings,
so it will only expect standard Word styles such as
Don't have a document handy? Try this example from Microsoft. Headings and images work well, but the table of contents shows that Mammoth still has a way to go!
Select a .docx file: