How to Generate PDFs in Python for Google App Engine

One of my last projects based on google app engine and python involved storing form data in GAE datastore and generating PDF documents that the user can download. Whilst data storing was the easier part as google’s big data API it is pretty well documented, the trickier aspect was to convert it to PDF using python.

pdf{.alignnone .wp-image-164 width=”108” height=”108”}

This was especially difficult in the face of GAE not providing an easy mechanism for disk writing that most PDF generation libraries require. To share my endeavors, I’m writing this post about how to generate pdfs in python for Google app engine.

The solution I came across was, as far as I know, the only possible way of generating PDFs in python! There are about three PDF generation utilities in python, each differing in terms of their area of usage:

I figured out after researching the above three libraries that a combination of xhtml2pdf and pyPDF is what I needed. Since I already had the html document template ready, I just put placeholders for my form data like __name__ , __occupation__, etc so that I can fill these before converting to PDF.

Now, I could fill these values from my python program, but the real challenge was storing the resulting PDF to disk, which was not allowed by google app engine! Turns out, we don’t need to actually store anything to disk. By sending the CreatePDF() output to a StringIO object, which is stored in memory instead of the filesystem, I could bypass the need to actually store anything to disk!!

f=open('template.htm','r')
sourceHtml = unicode(f.read(), errors='ignore')
f.close()
sourceHtml = template.render(tvals)
sourceHtml = sourceHtml.replace('__name__',sname)
sourceHtml = sourceHtml.replace('__address__',saddress)
sourceHtml = sourceHtml.replace('__occupation__',will.occupation)
packet = StringIO.StringIO() #write to memory
pisa.CreatePDF(sourceHtml,dest=packet)

Now, it would have been simple to just self.response.write(packet) to send this pdf download to the user, but in my case, I had to merge this generated pdf with another template-pdf which contained information like symbols, images and page-numbers that for some reason, could not be placed into the html document. So, I had to create a PdfFileReader object (coutesy of PyPDF library!), and then merge each page of my generated document with this template document. Then where do I write this merged output? Any guesses? - another StringIO object!! And then finally, write this StringIO object to self.response, so the user can download it.

packet.seek(0)
new =PdfFileReader(packet) #generated pdf
template = PdfFileReader(file("template.pdf", "rb")) #template pdf
output=PdfFileWriter() #writer for the merged pdf
for i in range(new.getNumPages()):
	page=template.getPage(i)
	page.mergePage(new.getPage(i))
	output.addPage(page)

outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object

self.response.headers['Content-Type'] = 'application/pdf'
fname = (will.name if mirror=='n' else will.partner)
self.response.headers['Content-Disposition'] = 'attachment; filename=' + str(fname).replace(' ','_') + '.pdf'
self.response.write(outputStream.getvalue())

Remember to add and include the below libraries before you do this:

import StringIO
import xhtml2pdf.pisa as pisa
from pyPdf import PdfFileWriter,PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics,ttfonts

  References:

[ google-app-engine  python  how-to  ]