PDFs can be hard to work with. Does it have embedded recognizable text? Left click and scroll over the documents to see whether it highlights the text. If it does you are in luck. If it doesn't, you will have to convert the image to text by scanning it. This is time consuming and probably not worth it for this class. When the text is recognizable xpdf will convert it, but it can be a little tricky. It is accessed from the command line. You need to know the basics of how to use the command line - not hard. Mac users, it may already be on your machine. Try typing 'pdftotext, the path to a .pdf file, and the name of an output.txt file. Then check to see if there's an output file. Windows users A windows exe installer can be found here: http://www.compgeom.com/~piyush/scripts/scripts.html. It worked well for me you have to find an work from the xpdf directory (on the command line). Then type 'pdftotext'. Launched? To convert a file, pdfttotext and the location/name of the pdf file. The output will be a txt file of the same name in the same directory. It is also possible to embed the command line executions into python script (if for example you want to loop through a bunch of files) This code worked on a Mac but not on a PC(?). For a Mac, on the command line enter 'pdftotext path/inputfile.pdf outputfile.txt'. For PC it would be 'pdftotext path/inputfile.pdf' (an input.txt file is created by default). import os file = 'Downloads/FOMC20080121confcall.pdf' # local location of pdf, os.system((\pdftotext %s %s\)%(file,'temp.txt')) f = open('temp.txt','rb') , # returns the entire file, f.read() Below works for a PC. Could also be part of a function that is used to loop through lots of files. TO REPEAT, THIS IS CALLING A PROGRAM THAT WORKS FROM THE COMMAND LINE (os.system does this). YOU JUST NEED TO TELL pdftotext WHERE TO FIND THE FILE TO BE CONVERTED import os %cd C:\\Program Files (x86)\\xpdf file = 'C:\\Users\\John\\Desktop\\Programs\\Pyth\\FOMC2\\FOMC20081029meeting.pdf' os.system((\pdftotext %s\) C:\\Program Files (x86)\\xpdf