Markdown to HTML Using Python
The following shows how to use Python to convert a Markdown file to an HTML file:
Almost all of my web pages are written in Markdown. Even this one. But I hate to start up some programming IDE or upload my Markdown file to some random website just to convert it to HTML. So, I wrote a couple of simple Python scripts.
Using Python3, you will also need to download a couple of Python packages. The first is Python-Markdown. On my Linux / Debian machine it is as simple as:
sudo apt install python3-markdown
For the second script on this page, you will also need Glob:
sudo apt install glob
Simple Python Script
I will break it into parts to explain, and then I will put it all together in the end.
Set up your Python script with the usual stuff:
#!/usr/bin/env python3
import markdown
Then set the input and output files with a variable.
Yes, you could ask for the file with an input()
but this is supposed to be simple.
After that, read the file and then markdown.markdown
(lowercase) to convert:
infile = 'directory/example.md'
outfile = 'directory/example.html'
with open(infile, 'r') as f:
html = markdown.markdown(f.read())
This will only convert the scribblings in your Markdown file to HTML.
It will be missing the opening and closing HTML tags your HTML file will need.
So just set this text into string variables.
And, these strings are lazily catenated without using the +
operator, with the \n
added for new lines in the final file:
top = (
"<!DOCTYPE html>\n"
"<html>\n"
"\n"
"<body>\n"
"\n"
)
bottom = (
"\n"
"\n"
"</body>\n"
"\n"
"</html>\n"
)
And then simply catenate the parts together and write it all to the output file:
whole = top + html + bottom
print(outfile)
with open(outfile, 'w') as f:
f.write(whole)
And now the whole thing put together for your cut-paste convenience:
#!/usr/bin/env python3
import markdown
infile = 'directory/example.md'
outfile = 'directory/example.html'
with open(infile, 'r') as f:
html = markdown.markdown(f.read())
top = (
"<!DOCTYPE html>\n"
"<html>\n"
"\n"
"<body>\n"
"\n"
)
bottom = (
"\n"
"\n"
"</body>\n"
"\n"
"</html>\n"
)
whole = top + html + bottom
print(outfile)
with open(outfile, 'w') as f:
f.write(whole)
Simple Python Script for a Whole Website
If you have a whole bunch of markdown files, all of which need to go into a bunch of subdirectories, this might be your script. First the setup...
I like to keep my Markdown in subdirectories that match the HTML subdirectory structure. Here's an example:
working_directory
|
|-- this_markdown_script.py
|
|-- markdown
| |-- index.md
| |-- contact.md
| '-- projects
| |-- project1.md
| '-- project2.md
|
'-- website
|-- index.html
|-- contact.html
'-- projects
|-- project1.html
'-- project2.html
So what this script will do is parse through the markdown
directory, convert several files, and then place them in the website
directory.
The goal is to make updating the website easier.
You can just copy-paste the website
files to the actual website directory.
Also, this script extracts metadata, sets up a more complete HTML file, etc.
Again, start with the usual Python3 stuff:
#!/usr/bin/env python3
import markdown
import glob
The curveball here is to create an instance of the markdown.Markdown (uppercase) class. And, we will include an extension to acquire the metadata in the markdown file:
md = markdown.Markdown(extensions=['meta'])
Actually, let's add another extension...
The base markdown class converts using the least amount of HTML tags.
For additional features, you add extensions.
To add id
information to headers and automatically create a Table of Contents,
we will also include the toc
extension:
md = markdown.Markdown(extensions=['meta', 'toc'])
Then use glob to get a list of the files recursively in the directory structure:
files = glob.glob('markdown/**/*.md', recursive='true')
Iterate through the files.
For each file, set the fileName
variable but strip off the markdown/
directory from the string [9:]
and strip off the .md
extension from the string [:-3]
.
Then, using the Markdown instance, clear it with a reset()
(in case the previous data is still in there) and then convert()
to convert it:
for file in files :
fileName = file[9:][:-3]
outfile = 'website/' + fileName + '.html'
with open(file, 'r') as f:
html = md.reset().convert(f.read())
Extracting metadata is now an option when using a Markdown class instance with extension=['meta']
.
It's stored as a Python Dictionary with a key and value, and the value is a list (since it could be represented by more than one line in the Markdown file).
To store the value in a string, iterate through the list:
meta = md.Meta
metaTitle = ''
metaDate = ''
for data in meta['title'] :
metaTitle += data
for data in meta['date'] :
metaDate += data
And again we set up the other parts of the HTML file, but this time note that the metadata is included:
top = (
"<!DOCTYPE html>\n"
"<html>\n"
"\n"
"<head>\n"
" <meta name=\"title\" content=\"" + metaTitle + "\">\n"
" <meta name=\"date\" content=\"" + metaDate + "\">\n"
" <link rel='stylesheet' type='text/css' href='/mystylesheet.css'>\n"
"</head>\n"
"\n"
"<body>\n"
" <div class=page>\n"
" <div class='header'>\n"
" </div>\n"
" <div class='main'>\n"
" <div style='width: 66vw;'>\n"
" <div>\n"
"\n"
)
bottom = (
"\n"
"\n"
" </div>\n"
" </div>\n"
" </div>\n"
" <div class='footer'>\n"
" <a href='/index.html'><button class='button'>Home</button></a>\n"
" </div>\n"
" </div>\n"
"\n"
"</body>\n"
"\n"
"</html>\n"
)
Again, put it together and then write it all to a file:
whole = top + html + bottom
print(outfile)
with open(outfile, 'w') as f:
f.write(whole)
And now the whole thing for your cut-paste convenience:
#!/usr/bin/env python3
import markdown
import glob
md = markdown.Markdown(extensions=['meta', 'toc'])
files = glob.glob('markdown/**/*.md', recursive='true')
for file in files :
fileName = file[9:][:-3]
outfile = 'website/' + fileName + '.html'
with open(file, 'r') as f:
html = md.reset().convert(f.read())
meta = md.Meta
metaTitle = ''
metaDate = ''
for data in meta['title'] :
metaTitle += data
for data in meta['date'] :
metaDate += data
top = (
"<!DOCTYPE html>\n"
"<html>\n"
"\n"
"<head>\n"
" <meta name=\"title\" content=\"" + metaTitle + "\">\n"
" <meta name=\"date\" content=\"" + metaDate + "\">\n"
" <link rel='stylesheet' type='text/css' href='/mystylesheet.css'>\n"
"</head>\n"
"\n"
"<body>\n"
" <div class=page>\n"
" <div class='header'>\n"
" </div>\n"
" <div class='main'>\n"
" <div style='width: 66vw;'>\n"
" <div>\n"
"\n"
)
bottom = (
"\n"
"\n"
" </div>\n"
" </div>\n"
" </div>\n"
" <div class='footer'>\n"
" <a href='/index.html'><button class='button'>Home</button></a>\n"
" </div>\n"
" </div>\n"
"\n"
"</body>\n"
"\n"
"</html>\n"
)
whole = top + html + bottom
print(outfile)
with open(outfile, 'w') as f:
f.write(whole)
End