Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow for archiving reports and other sporadicly released documents #162

Open
jeanpaulrsoucy opened this issue May 8, 2021 · 9 comments
Labels
enhancement New feature or request help wanted Extra attention is needed new data Request to add a new dataset

Comments

@jeanpaulrsoucy
Copy link
Member

Something that is missing from the archive are reports such as the PHO daily and weekly epidemiology reports. Also, more sporadically updated reports such as the INSQP modelling reports mentioned by @mschoettle in #158.

These reports and other documents typically lack stable URLs, which make it difficult to fit them into the existing workflow of the archival tool. While some of them probably could be incorporated because the URLs are predictable and/or can be scraped predictably, this certainly doesn't apply to the more sporadically released documents. In many cases, the existing naming file and naming structure of the archive may not really apply to them either.

Help will be needed to ensure these documents are preserved for the future.

@jeanpaulrsoucy jeanpaulrsoucy added enhancement New feature or request help wanted Extra attention is needed new data Request to add a new dataset labels May 8, 2021
@jeanpaulrsoucy jeanpaulrsoucy changed the title Create workflow for archiving reports and other sporadicly updated files Create workflow for archiving reports and other sporadicly released documents May 8, 2021
@jeanpaulrsoucy jeanpaulrsoucy pinned this issue May 8, 2021
@mschoettle
Copy link
Contributor

Are you looking for an automated workflow or rather manual?

@jeanpaulrsoucy
Copy link
Member Author

With sporadic reports, it would probably be manually curated, although I suppose I could create some Python functions to make it easier to sort/process these files into the archive.

@mschoettle
Copy link
Contributor

What I think would be nice is to have a small web form where one can submit such documents. In the back, some function could then download this document and do all the necessary stuff. Then those documents could be reviewed and added to the archive if approved. (It could even be a Google form but not sure if you could download the document automatically)

Not sure if it is worth building automatic scraping for sporadic documents. Although that would help in not missing them.

@jeanpaulrsoucy
Copy link
Member Author

I think a web form is a great idea. You can attach files of virtually any size to a Google form (e.g., a ZIP file with all the documents), so it should be good for mass collection. I don't think I would automate document ingestion --- could end up with a lot of duplicates/poorly classified files this way. I'd probably do it manually. I think the most important thing at this stage is just making sure that these documents are preserved.

I'm sure a lot of people have been downloading these to their hard drive and would be willing to upload if prompted. I'll get to work on drafting a form.

@jeanpaulrsoucy
Copy link
Member Author

@mschoettle I've mocked up a Google Form for data submission. What do you think?

https://docs.google.com/forms/d/e/1FAIpQLSeiUd415u_qdqNwNHVEeA_6KCEMRJhXJSL9_9i1UvLDN3LGQA/viewform?usp=sf_link

@mschoettle
Copy link
Contributor

Looks good. The only thing I don't like is that one has to log in with their Google account :)

I don't know how many submissions you are expecting but you could potentially add a lot of work (for you/the CCODWG) to go through them, verify that they are not archived yet etc.

@jeanpaulrsoucy
Copy link
Member Author

Thanks. Regarding the effort, I'd rather collect the data now so it doesn't get lost. Organizing it can come later...

Looks like the sign-in is required because of the file submission part. I'll move that to a separate form, which should remove the sign-in requirement.

@jeanpaulrsoucy
Copy link
Member Author

@mschoettle It should now be possible to complete the form without signing in. I've moved the file upload to a second form which is linked to after submission of the initial form.

@mschoettle
Copy link
Contributor

Just tried it out. Not sure if you intended this but the second form still requires a login.
But it's possible to upload the files somewhere and put that URL in the first form so that's what I did.

@jeanpaulrsoucy jeanpaulrsoucy changed the title Create workflow for archiving reports and other sporadicly released documents Workflow for archiving reports and other sporadicly released documents Feb 10, 2022
@jeanpaulrsoucy jeanpaulrsoucy unpinned this issue Apr 7, 2024
jeanpaulrsoucy added a commit that referenced this issue Apr 12, 2024
- Remove "Contributing" section from README.md, including contribution form (#162)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed new data Request to add a new dataset
Projects
None yet
Development

No branches or pull requests

2 participants