Workflow for archiving reports and other sporadicly released documents #162

jeanpaulrsoucy · 2021-05-08T18:04:19Z

Something that is missing from the archive are reports such as the PHO daily and weekly epidemiology reports. Also, more sporadically updated reports such as the INSQP modelling reports mentioned by @mschoettle in #158.

These reports and other documents typically lack stable URLs, which make it difficult to fit them into the existing workflow of the archival tool. While some of them probably could be incorporated because the URLs are predictable and/or can be scraped predictably, this certainly doesn't apply to the more sporadically released documents. In many cases, the existing naming file and naming structure of the archive may not really apply to them either.

Help will be needed to ensure these documents are preserved for the future.

mschoettle · 2021-05-11T00:26:58Z

Are you looking for an automated workflow or rather manual?

jeanpaulrsoucy · 2021-05-11T02:21:09Z

With sporadic reports, it would probably be manually curated, although I suppose I could create some Python functions to make it easier to sort/process these files into the archive.

mschoettle · 2021-05-22T14:50:04Z

What I think would be nice is to have a small web form where one can submit such documents. In the back, some function could then download this document and do all the necessary stuff. Then those documents could be reviewed and added to the archive if approved. (It could even be a Google form but not sure if you could download the document automatically)

Not sure if it is worth building automatic scraping for sporadic documents. Although that would help in not missing them.

jeanpaulrsoucy · 2021-05-24T14:02:28Z

I think a web form is a great idea. You can attach files of virtually any size to a Google form (e.g., a ZIP file with all the documents), so it should be good for mass collection. I don't think I would automate document ingestion --- could end up with a lot of duplicates/poorly classified files this way. I'd probably do it manually. I think the most important thing at this stage is just making sure that these documents are preserved.

I'm sure a lot of people have been downloading these to their hard drive and would be willing to upload if prompted. I'll get to work on drafting a form.

jeanpaulrsoucy · 2021-09-29T00:35:25Z

@mschoettle I've mocked up a Google Form for data submission. What do you think?

https://docs.google.com/forms/d/e/1FAIpQLSeiUd415u_qdqNwNHVEeA_6KCEMRJhXJSL9_9i1UvLDN3LGQA/viewform?usp=sf_link

mschoettle · 2021-09-29T00:47:35Z

Looks good. The only thing I don't like is that one has to log in with their Google account :)

I don't know how many submissions you are expecting but you could potentially add a lot of work (for you/the CCODWG) to go through them, verify that they are not archived yet etc.

jeanpaulrsoucy · 2021-09-29T01:11:04Z

Thanks. Regarding the effort, I'd rather collect the data now so it doesn't get lost. Organizing it can come later...

Looks like the sign-in is required because of the file submission part. I'll move that to a separate form, which should remove the sign-in requirement.

jeanpaulrsoucy · 2021-09-29T01:23:59Z

@mschoettle It should now be possible to complete the form without signing in. I've moved the file upload to a second form which is linked to after submission of the initial form.

mschoettle · 2021-10-02T18:19:13Z

Just tried it out. Not sure if you intended this but the second form still requires a login.
But it's possible to upload the files somewhere and put that URL in the first form so that's what I did.

- Remove "Contributing" section from README.md, including contribution form (#162)

jeanpaulrsoucy added enhancement New feature or request help wanted Extra attention is needed new data Request to add a new dataset labels May 8, 2021

jeanpaulrsoucy changed the title ~~Create workflow for archiving reports and other sporadicly updated files~~ Create workflow for archiving reports and other sporadicly released documents May 8, 2021

jeanpaulrsoucy pinned this issue May 8, 2021

jeanpaulrsoucy mentioned this issue May 8, 2021

Archive some INSPQ variant CSVs #158

Closed

jeanpaulrsoucy changed the title ~~Create workflow for archiving reports and other sporadicly released documents~~ Workflow for archiving reports and other sporadicly released documents Feb 10, 2022

jeanpaulrsoucy unpinned this issue Apr 7, 2024

jeanpaulrsoucy mentioned this issue Apr 7, 2024

Project roadmap #298

Open

jeanpaulrsoucy added a commit that referenced this issue Apr 12, 2024

Update README.md

9fa0720

- Remove "Contributing" section from README.md, including contribution form (#162)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow for archiving reports and other sporadicly released documents #162

Workflow for archiving reports and other sporadicly released documents #162

jeanpaulrsoucy commented May 8, 2021

mschoettle commented May 11, 2021

jeanpaulrsoucy commented May 11, 2021

mschoettle commented May 22, 2021

jeanpaulrsoucy commented May 24, 2021

jeanpaulrsoucy commented Sep 29, 2021

mschoettle commented Sep 29, 2021

jeanpaulrsoucy commented Sep 29, 2021

jeanpaulrsoucy commented Sep 29, 2021

mschoettle commented Oct 2, 2021

Workflow for archiving reports and other sporadicly released documents #162

Workflow for archiving reports and other sporadicly released documents #162

Comments

jeanpaulrsoucy commented May 8, 2021

mschoettle commented May 11, 2021

jeanpaulrsoucy commented May 11, 2021

mschoettle commented May 22, 2021

jeanpaulrsoucy commented May 24, 2021

jeanpaulrsoucy commented Sep 29, 2021

mschoettle commented Sep 29, 2021

jeanpaulrsoucy commented Sep 29, 2021

jeanpaulrsoucy commented Sep 29, 2021

mschoettle commented Oct 2, 2021