Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building notebooks into doc #1091

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

building notebooks into doc #1091

wants to merge 23 commits into from

Conversation

vuillaut
Copy link
Member

@vuillaut vuillaut commented May 4, 2023

TODO (after this PR)

  1. go through every notebook
  • delete unnecessary ones (e.g. there are at least 3 notebooks on sensitivity calculation)
  • if they can be executed without external input, clear their output (then will then get executed during doc compilation)
  • if they require input and if possible, set the input paths to the available test data (relative path after they have been downloaded) for the CI to be able to build them
  • if there are still notebooks that require external input, use papermill to parameterize them (this will add complexity for users as well, should be avoided if possible)
  1. set nbsphinx_allow_errors = False to make sure notebooks don't break in the future
  2. organize them into categories to clarify the doc

@vuillaut vuillaut changed the title tying to build notebooks into doc Draft: tying to build notebooks into doc May 4, 2023
@vuillaut vuillaut marked this pull request as draft May 4, 2023 10:39
@vuillaut vuillaut changed the title Draft: tying to build notebooks into doc trying to build notebooks into doc May 4, 2023
@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.50%. Comparing base (e4a5b7a) to head (e0c2b63).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1091      +/-   ##
==========================================
+ Coverage   73.49%   73.50%   +0.01%     
==========================================
  Files         134      134              
  Lines       14210    14210              
==========================================
+ Hits        10443    10445       +2     
+ Misses       3767     3765       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@morcuended
Copy link
Member

I think we should resume this effort. What about building the notebooks in the documentation, so they are kind of tested, as it is done in Gammapy. Right now there are outdated notebooks, which are likely not working with the current lstchain version.

@moralejo
Copy link
Collaborator

How does that work? How to set the path to the input files? (which should be added to the test data, right?)

@morcuended
Copy link
Member

morcuended commented Feb 15, 2024

How does that work? How to set the path to the input files? (which should be added to the test data, right?)

We could use sphinx-gallery

And, yes, if we wanted to use example datasets we would need to add them to the test data.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@vuillaut vuillaut marked this pull request as ready for review February 19, 2024 11:33
@vuillaut
Copy link
Member Author

ok coming back to this forgotten PR...
this is now working, adding all notebooks from notebooks into the documentation.

Some notes:

  • Going through every notebook and fix them if they are wrong or outdated would be too much work for this PR
  • At the moment most notebooks are just rendered as they are and are not recompiled
  • Notebooks for which the outputs (from code cells) are empty (can be done using jupyter nbconvert --clear-output --inplace notebook.ipynb) will be rebuilt during compilation - ensuring they are executable
  • I would propose that people go through the notebooks they know best to fix them if needed and clear the outputs in dedicated PRs - the notebooks will then be executed, making sure they stay up to date in the future
  • passing parameters to notebooks could be done in the future using papermill. I would propose to postpone this to another PR.

Quick peek at the result:

Capture d’écran 2024-02-19 à 12 54 31

@morcuended
Copy link
Member

thanks for the update @vuillaut. I agree with your points.

I'm updating some of the notebooks I showed in the school in a separate PR (docs are failing currently because of some of them)

@vuillaut
Copy link
Member Author

thanks for the update @vuillaut. I agree with your points.

I'm updating some of the notebooks I showed in the school in a separate PR (docs are failing currently because of some of them)

Hi @morcuended
Thanks for the feedback. I have updated some notebooks for formatting issues mainly, you might want to pull my branch before to avoid notebooks conflicts that will be annoying to solve.

@vuillaut
Copy link
Member Author

Tests are now passing :)

@vuillaut vuillaut changed the title trying to build notebooks into doc building notebooks into doc Feb 19, 2024
@moralejo
Copy link
Collaborator

TODO (after this PR)

  1. go through every notebook
  • if they require input and if possible, set the input paths to the available test data (relative path after they have been downloaded) for the CI to be able to build them
  • if there are still notebooks that require external input, use papermill to parameterize them (this will add complexity for users as well, should be avoided if possible)

Hi @vuillaut, can you clarify what is it to be done for notebooks which need more input that we can have in the test data? (and why it is needed). Thanks!

@vuillaut
Copy link
Member Author

TODO (after this PR)

  1. go through every notebook
  • if they require input and if possible, set the input paths to the available test data (relative path after they have been downloaded) for the CI to be able to build them
  • if there are still notebooks that require external input, use papermill to parameterize them (this will add complexity for users as well, should be avoided if possible)

Hi @vuillaut, can you clarify what is it to be done for notebooks which need more input that we can have in the test data? (and why it is needed). Thanks!

for those, I would suggest (at least as a first step) either:

  • to run them once with all the required data and leave them as they are - they will then show the plots and results but will be fixed in time and not tested
  • to run them with test data anyway. In this case, they get tested during CI but the output might not be the one you want in the doc (in case of lower stat for example)

to get both we need parameterized notebooks using papermill. We would keep the nice static version for the doc and test with dummy data in the CI but without updating them for the docs. But as said, that needs more work and maintenance.

@rlopezcoto
Copy link
Contributor

Hi,

to run them once with all the required data and leave them as they are - they will then show the plots and results but will be fixed in time and not tested

this is what we had in the past for some notebooks and the repo got too heavy, that is why we decided to remove output from notebooks. We can try to keep output low at the beginning, but it always gets bulkier and bulkier, so I would try to avoid this solution

@vuillaut
Copy link
Member Author

Hi,

to run them once with all the required data and leave them as they are - they will then show the plots and results but will be fixed in time and not tested

this is what we had in the past for some notebooks and the repo got too heavy, that is why we decided to remove output from notebooks. We can try to keep output low at the beginning, but it always gets bulkier and bulkier, so I would try to avoid this solution

Note that many notebooks already have outputs, and the weight should not be that large if the notebooks are not updated regularly (a few MB?). But if they are, they stay in the git history and that becomes a problem, I see your point.
For these notebooks that can't be run during CI, we could either:

  • accept that they won't run and stay without outputs in the doc
  • move them in a dedicated repo (reviving lstchain-extra)

@rlopezcoto
Copy link
Contributor

accept that they won't run and stay without outputs in the doc

we have lived with this option up to now, I do not see it that bad, but I won't oppose to the other solution, that will also require more dedicated work

@morcuended
Copy link
Member

Some notebooks that use example data not available will look like this in the documentation:

image

@vuillaut
Copy link
Member Author

Some notebooks that use example data not available will look like this in the documentation:

image

that is only the current status because I did not go through all notebooks to identify / fix them

  • if their output is already computed, they will stay as such and won't show errors
  • if we accept that they won't run and stay without outputs in the doc, we can force them to not be run and not show errors like this one

@morcuended
Copy link
Member

that is only the current status because I did not go through all notebooks to identify / fix them

if their output is already computed, they will stay as such and won't show errors
if we accept that they won't run and stay without outputs in the doc, we can force them to not be run and not show errors like this one

Then for the time being I would keep the output previously computed, so it is not changed in the docs deployment
In the long term, I would include all needed test/example data in the private test_data folder so notebooks can access it and can be re-run anytime.

@vuillaut
Copy link
Member Author

Hi
So, how do we move on from there?

@morcuended
Copy link
Member

Coming back to this issue. I would say that to have up-to-date working notebooks they should be produced in every doc build (using sphinx gallery or something like that). Otherwise, we will end up having again very outdated notebooks after some time. It requires more initial work but is easier to maintain and usable in the long run.

@vuillaut
Copy link
Member Author

Coming back to this issue. I would say that to have up-to-date working notebooks they should be produced in every doc build (using sphinx gallery or something like that). Otherwise, we will end up having again very outdated notebooks after some time. It requires more initial work but is easier to maintain and usable in the long run.

Hi @morcuended
How do you propose to handle notebooks requiring external data?

@morcuended
Copy link
Member

Maybe adding the needed minimal files to the test dataset

@morcuended morcuended added enhancement New feature or request question Further information is requested labels Jun 21, 2024
@morcuended morcuended mentioned this pull request Sep 17, 2024
5 tasks
@vuillaut
Copy link
Member Author

Hi @morcuended
I am ready to move forward on this one.
Any suggestion or request?

@morcuended
Copy link
Member

At least for the high-level analysis we now have a small DL3 crab dataset publicly available in zenodo that we can use for checking that the notebooks work and are up to date.
For other notebooks, I would try to use as much as possible our test data or MC test file to validate them.
I agree with your proposals in the first comment of this PR

@vuillaut
Copy link
Member Author

vuillaut commented Sep 24, 2024

@morcuended
All notebooks are currently included in the documentation but I have disabled their execution (most of them already have outputs so "they look good" but are not tested).

I propose to open another PR after this one to enable execution.
It will fail because some notebooks simply are not up to date, or because notebooks are trying to access data that are not available. I will list original authors of each notebook so they can update them accordingly in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants