- Published on
How to Convert a LaTeX Document to a DOCX File
- Authors
- Name
- Kevin Navarrete-Parra
Table of contents
Introduction
I found myself needing to convert .tex
documents into Word documents recently, so I sought out a solution that would mean I could avoid manually copying and pasting from one document to the other (which is a pain). While looking for a solution online, I saw a few suitable methods, each with their own minor drawbacks.
A simple solution is to simply open a .pdf
file in Word, which will then convert the document to something editable in the word processor. However, this method often leads to some finicky formatting issues that I'd rather avoid. On top of that, something always feels off about those conversions. I can't quite put my finger on it--it's as if there is a formatting uncanny valley that continually throws me off.
There are other solutions, like tex2rtf, which can convert the base file into a rich text format that Word can read. However, I wasn't a fan of this one for a few reasons. First, the program installation was a bit of a pain. More importantly, the formatting was still a little off!
Pandoc, however, does a pretty good job of converting between the two formats--especially if you have a solid reference doc you can use to ensure the formatting is consistent. Additionally, I was already a little familiar with Pandoc due to my extensive experience with Markdown and RMarkdown, so I figured I'd give it a shot.
Setup
Before you can convert a LaTeX document to a DOCX file, you'll need to have a few things in place. First, you'll need to have Pandoc installed on your machine. If you don't have it installed, you can download it from the Pandoc website.
If you're on a Mac and already have Homebrew installed, then just run:
brew install pandoc
which will run through the process for you automatically.
In addition to the local Pandoc installation, you'll want:
- A LaTeX document to convert;
- A reference
.docx
file to ensure the formatting is consistent; - A
.bib
file if you're using citations in your document; and - Your terminal open and ready to go.
Conversion
Before I get into the specifics, I should note that I'm using a Mac for this process. If you come across any problems on Windows or Linux, you might need to adjust the commands slightly.
To convert my .tex
file to a .docx
file, I first navigated to the directory from which I'd like to operate.
cd /path/to/your/directory
Once in, I ran the following command:
pandoc document/name.tex \
--citeproc --bibliography=works-cited.bib \
-o document/name.docx \
--reference-doc "Turabian Style Research paper.docx"
This command does a few things:
- It begins by calling the
pandoc
command, which will convert the document; - It specifies the
.tex
file you'd like to convert, which in this case is a file calledname.tex
in thedocument/
directory; - It tells Pandoc to use the
citeproc
filter to handle citations, which is necessary if you're using a.bib
file; - It specifies the
.bib
file you're using for citations using the--bibliography
flag; - It tells Pandoc to output (
-o
) the file as a.docx
file, which will be namedname.docx
in thedocument/
directory; and - It specifies the reference document you'd like to use to ensure the formatting is consistent.
In this case, I'm using a Turabian-style research paper as my reference document, but you could use any .docx
file you'd like.
A Note on Complex Tables*
One of the issues I've run into with this process is that Pandoc doesn't always do a good job at handling tables--especially complex ones. If you have a table that's more than just a simple grid, you might run into some issues with the conversion. Namely, the table just won't appear in the converted Word document.
After some digging, I found that the best way to handle this issue is to convert the table to an image and then insert the image into the Word document. While this is more of a workaround than a true solution, it gets the job done and ensures that the output document looks the way you want it without too much hassle.
By complex tables, I'm referring to something along these lines, which includes table notes and a resize box command:
\usepackage{threeparttable}
\usepackage{graphicx}
\begin{document}
\begin{table}[th]
\centering
\resizebox{\textwidth}{!}{%
\begin{threeparttable}
\caption{Variable Descriptions}
\label{tab:variables}
\begin{tabular}{l p{5cm} l l l}
\hline
\textbf{Variable} & \textbf{Description} & \textbf{Hypothesis} & \textbf{Direction} & \textbf{Source} \\
\hline
\multicolumn{5}{l}{\emph{Dependent Variable}} \\
Variable A & Placeholder description for variable A & -- & -- & Source 1 \\
\hline
\multicolumn{5}{l}{\emph{Independent Variables}} \\
Variable B & Placeholder description for variable B & H1 & $+$ & Source 1 \\
Variable C & Placeholder description for variable C & H2 & $-$ & Source 1 \\
Variable D & Placeholder description for variable D & H3 & $-$ & Source 2 \\
Variable E & Placeholder description for variable E & H4 & $-$ & Source 3 \\
Variable F & Placeholder description for variable F & H4 & $-$ & Source 3 \\
Variable G & Placeholder description for variable G & H4 & $-$ & Source 3 \\
Variable H & Placeholder description for variable H & H4 & $-$ & Source 3 \\
\hline
\end{tabular}
\begin{tablenotes}
\small
\item[*] Placeholder note content for table details.
\end{tablenotes}
\end{threeparttable}%
}
\end{table}
\end{document}
The first step to converting this table to an image is to extract the table from the original .tex
document and save it as a seperate .tex
file that looks something like this:
\documentclass[border=10pt, varwidth]{standalone}
\usepackage{threeparttable}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{lmodern}
\begin{document}
\begin{table}[th]
\centering
\resizebox{\textwidth}{!}{%
\begin{threeparttable}
\caption{Variable Descriptions}
\label{tab:variables}
\begin{tabular}{l p{5cm} l l l}
\hline
\textbf{Variable} & \textbf{Description} & \textbf{Hypothesis} & \textbf{Direction} & \textbf{Source} \\
\hline
\multicolumn{5}{l}{\emph{Dependent Variable}} \\
Variable A & Placeholder description for variable A & -- & -- & Source 1 \\
\hline
\multicolumn{5}{l}{\emph{Independent Variables}} \\
Variable B & Placeholder description for variable B & H1 & $+$ & Source 1 \\
Variable C & Placeholder description for variable C & H2 & $-$ & Source 1 \\
Variable D & Placeholder description for variable D & H3 & $-$ & Source 2 \\
Variable E & Placeholder description for variable E & H4 & $-$ & Source 3 \\
Variable F & Placeholder description for variable F & H4 & $-$ & Source 3 \\
Variable G & Placeholder description for variable G & H4 & $-$ & Source 3 \\
Variable H & Placeholder description for variable H & H4 & $-$ & Source 3 \\
\hline
\end{tabular}
\begin{tablenotes}
\small
\item[*] Placeholder note content for table details.
\end{tablenotes}
\end{threeparttable}%
}
\end{table}
\end{document}
Make sure to customize your preamble to match the necessary packages and formatting for your table.
In this new .tex
document, I used the standalone
document class to ensure that the table is the only thing on the page, facilitating the conversion to an image. Once compited using pdflatex
, the table will be saved as a .pdf
file.
Next, you'll want to use a pdf to image converter to convert the .pdf
file to a .png
file. I found that the imagemagick
package works well for this--especially since I can do everything from the command line without needing to open a separate program. If you don't have this installed already, you can do so by running:
brew install imagemagick
If you don't already have the ghostscript
package installed, you'll need to install that as well since it's a dependency for imagemagick
. You can do so by running:
brew install ghostscript
Once you have imagemagick
installed, you can convert the .pdf
file to a .png
file by running:
magick -density 300 table.pdf -quality 100 table.png
which will output a .png
file that you can call in your .tex
file instead of the original table.
The magick
command is the ImageMagick command line tool, which you can use to convert images from one format to another. The -density
flag specifies the resolution of the image, and the -quality
flag specifies the quality of the image. You can adjust these values as needed to get the desired output.
\usepackage{graphicx}
\begin{document}
\begin{figure}
\centering
\label{fig:foo-table}
\includegraphics[width=\textwidth]{images/table.png}
\end{figure}
\end{document}
Make sure to adjust the path to the image directory as needed, of course, and ensure that any cross-references are updated to reflect the new table format.
After doing this, you can run the Pandoc command as usual, and the table will appear in the Word document as an image.
While this is a bit of a workaround, it's a relatively quick process that ensures the table appears in the Word document as you'd like. Moreover, it has a positive externality, depending on how you look at it: it forces you to render tables on their own, meaning that you can fiddle with them in a more controlled environment without having to worry about rendering the rest of the document.1
Conclusion
While this is still not a perfect solution, it's a pretty good one. The formatting is consistent, and the process is relatively quick. If you're looking to convert a LaTeX document to a Word document, I'd recommend giving Pandoc a shot. It's a versatile tool that can handle a variety of conversions, and it's relatively easy to use once you get the hang of it. And, most importantly, it doesn't have that weird uncanny valley look that you might get with other conversion methods.
One issue I'm still having is that it isn't doing a good job converting my LaTeX tables to Word tables. When I find a solution to this issue, I'll be sure to update this post.
Footnotes
Regardless, I'd still prefer to have the table within the
.tex
document itself, rather than on its own. However, if you're in the position of needing to convert a LaTeX document to a Word document, this is a solid workaround that gets the job done. ↩