Skip to content

Data source

Louise-Seamster edited this page May 5, 2020 · 2 revisions

Original data archive:

The dataset was originally made available on Michigan’s website at http://www.michigan.gov/snyder/0,4668,7-277-57577_57657-376716--,00.html, where it is still online as of 3/17/20. It is stored on the state’s website in PDF format, with lengths ranging from one page to ____ pages. The PDF naming convention is according to the name of a state department (presumably responding to the FOIA request represented in the PDF). Some FOIA requests are included in the PDF, but most PDFs do not begin with this documentation. It appears that PDFs may be roughly grouped by email sender, implying that the sequencing of PDF numbers and within-PDF order may correspond with a) FOIA response by department and b) email sender. Department identification and acronyms are in Appendix A.

Additional cached data not in our dataset:

A student in 2018 noticed some additional archived versions of this website that had additional PDFs. Currently, the dataset we are working with has only the data that is currently available on the website, although we have one copy of the additional PDFs (rendered through OCR) available on the Google Drive folder. In a later stage of the process we can add these PDFs into the overall dataset. Many of the cached PDFs are scanned copies of handwritten notes or paper documents with handwriting in the margins, so they will be more complex to analyze anyway.

Clone this wiki locally