-
Notifications
You must be signed in to change notification settings - Fork 1
Bookmarks
Through experimentation, Haley Boles found that opening the PDFs in Adobe Acrobat Pro showed “bookmarks” dividing a majority of the emails within a larger PDF, probably due to a feature of compiling a PDF directly from an email client. The bookmarks are usually titled with the subject line of the email (although we have not used this feature yet). Haley used the bookmarks to divide the larger PDFs into smaller sections that roughly correspond with individual emails (or email chains, if they are threaded). When the emails were primarily being used on Google Drive, this facilitated analysis compared to wrangling one long PDF. On google drive, the naming convention, which includes bookmark number, is “deq #_Part####”.
When creating individual text files for each PDF page, we retained the bookmark level in the file naming convention, because it is helpful to know the PDF name, the page number, and the bookmark number. If and when we have to train machine learning to find distinct emails, bookmark information can comprise part of the algorithm.
soon I will organize the pages into a kind of table of contents/outline below.