Should there be a PrePARE testing suite? #491

mauzey1 · 2019-05-24T17:20:04Z

There are currently tests for the C, Fortran, and Python CMOR API but no tests for PrePARE. This would be helpful in making sure PrePARE is functioning as intended after making changes.

We could have a list of sample NetCDF files, or generate test data with CMOR. Any suggestions?

doutriaux1 · 2019-05-24T17:27:41Z

@mauzey1 i think off the CV test are the PrePARE tests. @taylor13 @dnadeau4 am i right?

doutriaux1 · 2019-05-24T17:28:24Z

https://github.com/PCMDI/cmor/blob/master/.circleci/config.yml#L62

mauzey1 · 2019-05-24T17:36:05Z

@doutriaux1 Those tests seem to be only testing the CMOR Python interface and CMIP6 tables. I was thinking of tests for the PrePARE.py script.

doutriaux1 · 2019-05-24T17:48:06Z

@mauzey1 I guess you're right. @taylor13 any idea of useful test we could develop?

taylor13 · 2019-07-26T00:26:36Z

Yes, this is a good idea. Not sure I will have time soon to think about this.

mauzey1 · 2020-01-03T00:39:11Z

@taylor13 @durack1 @sashakames

I would like to get back to this since there are many issues posted related to PrePARE (#532, #533, #534, #540, #541, #553). It would be helpful to have continuous integration tests for PrePARE for the changes that will be added to it. The run_prepare_tests tests currently being used in CircleCI are only testing the CMIP6 CV, not the PrePARE script.

I think we should make a directory of small NetCDF files that have flaws that should be caught by PrePARE as well as some that should pass.

There should be a script that will run PrePARE tests and capture the stdout and stderr output to see if it matches what we expected.

Any suggestions are welcomed.

durack1 · 2020-01-03T00:46:31Z

@mauzey1 thanks again for raising this, a test suite is a great idea.

To be honest, we have a huge multi-PB archive of CMIP6 files mounted on the css03 hardware so coming up with a very comprehensive test suite wouldn't be an issue (we have every pathology you have ever thought of in the ~1 million files). I suppose CircleCI runs on remote systems right, so we can't mount direct?

taylor13 · 2020-01-03T16:56:46Z

Running PrePARE on a million files (if that is what is suggested), especially the very large files in the CMIP6 archive, would seem to be inefficient, and perhaps not even practical.

One way to make incremental progress would be to design a test each time we add or modify a PrePARE check to determine whether it actually catches the and correctly describes any problem. You may have a more ambitious (and comprehensive) testing strategy in mind, which I'd be happy to discuss next week.

I'm not sure the test suite should necessarily hold up moving forward on the issues mentioned above.

sashakames · 2020-01-03T17:56:09Z

Agreed that a million files is too many. Though a limit of 100 representative files is reasonable for a test suite that we expect to run repeatedly and also others can easily run to verify that their installation is working properly.

For starters the files are here, but we could provide a script to allow others to download.

As an aside wrt testing, I've run with 10000s of files via the publisher and could easily continue that.

taylor13 · 2020-01-03T18:54:38Z

I'm not sure the number of files matters, especially if they are all QC'd files in the CMIP6 archive. How would such files test PrePARE's ability to identify non-compliant files? I think we need to specially construct non-compliant files and then see if PrePARE finds them and provides helpful error messages to users.

durack1 · 2020-01-03T19:00:06Z

Sorry folks I should have been far clearer. I was not suggesting creating a test suite to use a million files, rather I was suggesting that we select a subset of these files that capture known pathologies and then build the test suite on this comprehensive pathology archive. If we encounter new pathologies increment the test suite by one or more. Also for the purposes of the test suite, we could temporally subset the test files to include a single time step, reducing storage footprints and file copy/read times

sashakames · 2020-01-03T19:14:18Z

Given that many files already published contain various pathologies (due to several causes), I agree with Paul that we can and should source these from the archive.

The related issues are mainly (1) "soft" false negatives - warnings that should be errors (return a -1 or False) (2) poor error messages that leave the user scratching their head. When these are fixed.
The test suite won't guarantee that we don't encounter additional files that produce (1) or (2). Additionally, I haven't yet seen any false-positives. I think its more likely we hear from a user that data fails when they are certain that it should pass. Catching these isn't the goal of the general regression test suite, but something worth doing long-term.

mauzey1 added enhancement question labels May 24, 2019

mauzey1 added this to the 4.0/Future milestone May 24, 2019

mauzey1 added the PrePARE label May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should there be a PrePARE testing suite? #491

Should there be a PrePARE testing suite? #491

mauzey1 commented May 24, 2019

doutriaux1 commented May 24, 2019

doutriaux1 commented May 24, 2019

mauzey1 commented May 24, 2019

doutriaux1 commented May 24, 2019

taylor13 commented Jul 26, 2019

mauzey1 commented Jan 3, 2020

durack1 commented Jan 3, 2020

taylor13 commented Jan 3, 2020

sashakames commented Jan 3, 2020

taylor13 commented Jan 3, 2020

durack1 commented Jan 3, 2020

sashakames commented Jan 3, 2020

Should there be a PrePARE testing suite? #491

Should there be a PrePARE testing suite? #491

Comments

mauzey1 commented May 24, 2019

doutriaux1 commented May 24, 2019

doutriaux1 commented May 24, 2019

mauzey1 commented May 24, 2019

doutriaux1 commented May 24, 2019

taylor13 commented Jul 26, 2019

mauzey1 commented Jan 3, 2020

durack1 commented Jan 3, 2020

taylor13 commented Jan 3, 2020

sashakames commented Jan 3, 2020

taylor13 commented Jan 3, 2020

durack1 commented Jan 3, 2020

sashakames commented Jan 3, 2020