Converting HTML documents to PDF format is a common requirement in many web applications. Whether you're generating reports, creating invoices, or archiving web content, Python offers several powerful libraries to handle this conversion. In this guide, we'll explore different approaches and their unique features.
1. WeasyPrint: Simple and Straightforward
WeasyPrint is a popular choice for its simplicity and CSS support. It renders HTML using CSS and converts it to PDF format.
from weasyprint import HTML
def html_to_pdf(html_content, output_path):
HTML(string=html_content).write_pdf(output_path)
file = open('chatgpt_resume.html', 'r')
html_content = file.read()
html_to_pdf(html_content, 'weasyprint.pdf')
Key Features:
- Strong CSS support including CSS Grid and Flexbox
- Handles complex layouts well
- No external dependencies like browsers
- Support for @page rules and print-specific styles
2. xhtml2pdf: Pure Python Solution
xhtml2pdf is a pure Python library that doesn't require external dependencies. It's particularly good for simple documents and business reports.
from xhtml2pdf import pisa
def convert_html_to_pdf(html_string, pdf_path):
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
return not pisa_status.err
file = open('chatgpt_resume.html', 'r')
html_content = file.read()
# Generate PDF
pdf_path = "xhtml2pdf.pdf"
if convert_html_to_pdf(html_content, pdf_path):
print(f"PDF generated and saved at {pdf_path}")
else:
print("PDF generation failed")
Key Features:
- Pure Python implementation
- Good for basic HTML and CSS
- Lightweight and easy to deploy
- Built-in error handling
3. pdfkit: wkhtmltopdf Wrapper
pdfkit is a Python wrapper for wkhtmltopdf, offering more advanced features for complex web pages.
import pdfkit
def convert_html_to_pdf(html_content, pdf_path):
try:
pdfkit.from_string(html_content, pdf_path)
print(f"PDF generated and saved at {pdf_path}")
except Exception as e:
print(f"PDF generation failed: {e}")
file = open('chatgpt_resume.html', 'r')
html_content = file.read()
# Generate PDF
convert_html_to_pdf(html_content, 'pdfkit.pdf')
Key Features:
- Excellent JavaScript support
- Handles modern web layouts
- Multiple input formats (URL, file, string)
- Customizable PDF options
4. pyhtml2pdf: Simple HTML to PDF Converter
pyhtml2pdf provides a straightforward way to convert HTML files to PDF.
import os
from pyhtml2pdf import converter
path = os.path.abspath('index.html')
converter.convert(f'chatgpt_resume.html', 'pyhtml2pdf.pdf')
Key Features:
- Simple API
- Minimal configuration required
- Good for basic conversion needs
5. Pyppeteer: Headless Chrome Automation
Pyppeteer offers conversion through headless Chrome, providing excellent rendering fidelity.
import asyncio
from pyppeteer import launch
async def convert_to_pdf():
file = open('chatgpt_resume.html', 'r')
html_content = file.read()
browser = await launch()
page = await browser.newPage()
await page.setContent(html_content)
await page.pdf({'path': "pyppeteer.pdf", 'format': 'A4'})
await browser.close()
asyncio.get_event_loop().run_until_complete(convert_to_pdf())
Key Features:
- Modern web standards support
- JavaScript execution
- Customizable PDF settings
- Accurate rendering
6. Playwright: Modern Browser Automation
Playwright is a newer alternative that supports multiple browser engines.
import asyncio
from playwright.async_api import async_playwright
async def html_to_pdf(html_content, output_path):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.set_content(html_content)
await page.pdf(path=output_path)
await browser.close()
with open('chatgpt_resume.html', 'r') as file:
html_content = file.read()
output_path = 'playwright.pdf'
asyncio.run(html_to_pdf(html_content, output_path))
print("PDF created successfully!")
Key Features:
- Support for multiple browser engines
- Modern web platform features
- Excellent performance
- Rich API for customization
Choosing the Right Tool
Each library has its strengths:
- WeasyPrint: Best for documents requiring precise CSS control
- xhtml2pdf: Ideal for simple documents and easy deployment
- pdfkit: Great for complex web pages with JavaScript
- pyhtml2pdf: Perfect for basic conversion needs
- Pyppeteer: Excellent for modern web applications
- Playwright: Best for cross-browser compatibility and modern features
Consider your specific needs regarding CSS support, JavaScript handling, deployment constraints, and rendering accuracy when choosing a solution.
Installation Requirements
pip install weasyprint
pip install xhtml2pdf
pip install pdfkit
pip install pyhtml2pdf
pip install pyppeteer
pip install playwright
Note: Some libraries may require additional system dependencies. For example, pdfkit requires wkhtmltopdf to be installed on your system.
Conclusion
Python offers a rich ecosystem of HTML to PDF conversion tools. Whether you need simple document conversion or complex web page rendering, there's a solution available. Consider your specific requirements and constraints when choosing the appropriate library for your project.