Exemple de conversion d’un document word au format html et markdown

Exemple de conversion d’un document word au format html et markdown

Note

Le site généré est visible en suivant l’url ci dessous

https://pnavaro.gitpages.huma-num.fr/test-eost


Convert Word to Markdown

To save images that are included in a binary container (docx, epub, or odt) - here a Microsoft Word document - to a directory use the following command. This will create a folder images/media. The media is extracted from the container and the original filenames are used.

pandoc --extract-media=images -s mydoc.docx -t markdown -o mddoc.md

In Word, images files actually live in a folder called “media” inside the docx. So, the “media” folder will always be created. To have a single directory level with the directory “media” only, use the current directory and this command.

pandoc --extract-media=. -s mydoc.docx -t markdown -o mddoc.md

Convert Word to HTML

To convert a Microsoft Word document to a website, run this command.

pandoc --extract-media=. -s mydoc.docx -t html -c styles.css -o htmldoc.html

To get the desired result, define your styles.css, e.g. as here:

html {
   line-height: 1.7;
   font-family: sans-serif;
   font-size: 20px;
   color: #1a1a1a;
   background-color: #fdfdfd;
}

All images will be stored in the media directory, as above. A table of contents will be generated as anchors. Headers and footers are skipped. If you have page numbering in places, the pages are not separated, it´s one large document, but you can play around with the many switches.