-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow for archiving reports and other sporadicly released documents #162
Comments
Are you looking for an automated workflow or rather manual? |
With sporadic reports, it would probably be manually curated, although I suppose I could create some Python functions to make it easier to sort/process these files into the archive. |
What I think would be nice is to have a small web form where one can submit such documents. In the back, some function could then download this document and do all the necessary stuff. Then those documents could be reviewed and added to the archive if approved. (It could even be a Google form but not sure if you could download the document automatically) Not sure if it is worth building automatic scraping for sporadic documents. Although that would help in not missing them. |
I think a web form is a great idea. You can attach files of virtually any size to a Google form (e.g., a ZIP file with all the documents), so it should be good for mass collection. I don't think I would automate document ingestion --- could end up with a lot of duplicates/poorly classified files this way. I'd probably do it manually. I think the most important thing at this stage is just making sure that these documents are preserved. I'm sure a lot of people have been downloading these to their hard drive and would be willing to upload if prompted. I'll get to work on drafting a form. |
@mschoettle I've mocked up a Google Form for data submission. What do you think? |
Looks good. The only thing I don't like is that one has to log in with their Google account :) I don't know how many submissions you are expecting but you could potentially add a lot of work (for you/the CCODWG) to go through them, verify that they are not archived yet etc. |
Thanks. Regarding the effort, I'd rather collect the data now so it doesn't get lost. Organizing it can come later... Looks like the sign-in is required because of the file submission part. I'll move that to a separate form, which should remove the sign-in requirement. |
@mschoettle It should now be possible to complete the form without signing in. I've moved the file upload to a second form which is linked to after submission of the initial form. |
Just tried it out. Not sure if you intended this but the second form still requires a login. |
- Remove "Contributing" section from README.md, including contribution form (#162)
Something that is missing from the archive are reports such as the PHO daily and weekly epidemiology reports. Also, more sporadically updated reports such as the INSQP modelling reports mentioned by @mschoettle in #158.
These reports and other documents typically lack stable URLs, which make it difficult to fit them into the existing workflow of the archival tool. While some of them probably could be incorporated because the URLs are predictable and/or can be scraped predictably, this certainly doesn't apply to the more sporadically released documents. In many cases, the existing naming file and naming structure of the archive may not really apply to them either.
Help will be needed to ensure these documents are preserved for the future.
The text was updated successfully, but these errors were encountered: