Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Handling pagination of repository vulnerabilities #85

Merged
merged 9 commits into from
Oct 5, 2023

Conversation

JoseAngel1196
Copy link
Contributor

@JoseAngel1196 JoseAngel1196 commented Oct 4, 2023

Fixes #7

Description

In this PR, we're addressing pagination for repository vulnerabilities. To put it simply, imagine If we have a substantial number of vulnerabilities for a repository (which may not be ideal), we need to ensure we handle them appropriately. It's also not ideal to skip these vulnerabilities, as we are currently doing today.

I have ensured that these new changes do not break the reporting functionality. I have verified this locally using a test account.

@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

Merging #85 (b2915cc) into main (205b72b) will decrease coverage by 1.35%.
Report is 1 commits behind head on main.
The diff coverage is 18.75%.

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
- Coverage   75.41%   74.07%   -1.35%     
==========================================
  Files          15       15              
  Lines         716      729      +13     
==========================================
  Hits          540      540              
- Misses        168      180      +12     
- Partials        8        9       +1     
Flag Coverage Δ
unittests 74.07% <18.75%> (-1.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
querying/github.go 76.31% <18.75%> (-9.83%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JoseAngel1196 JoseAngel1196 marked this pull request as ready for review October 4, 2023 23:57
@JoseAngel1196 JoseAngel1196 requested a review from a team as a code owner October 4, 2023 23:57
Copy link
Contributor

@tarkatronic tarkatronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, there's a problem with this approach. This will drastically increase the number of calls made to the GraphQL API, which will both slow us down and potentially cause rate limiting. Previously, given n = number of repositories, we would perform n / 100 API calls. Now we perform n + 1 calls. Given a sufficiently large org, such as one containing 500 repositories, we just went from 5 calls to 501. And that's assuming we never actually paginate the vulnerabilities, which is the whole point here!

I think a better way we could accomplish this is through a combination of approaches. In other words, only perform these repo-specific queries when necessary.

Taking another look at this, I believe this could actually be somewhat easy using mostly just the pre-existing data structures. The current flow would work as-is, but then when HasNextPage is true for a repositories vulnerabilities, you would send a new query, utilizing the EndCursor, which would look something like this:

type repoVulnerabilityQuery struct {
  Repository orgRepo `graphql:"repository(name: $repoName, owner: $orgName)"`
}

I think that might be the only new data structure necessary.. and then we would continue to have our n / 100 queries, except in cases where vulnerabilities are paginated.

I feel like I may have rambled a bit in there, so I hope it all makes sense. Please feel free to ask if you have any questions!

@JoseAngel1196
Copy link
Contributor Author

JoseAngel1196 commented Oct 5, 2023

Hmmm, there's a problem with this approach. This will drastically increase the number of calls made to the GraphQL API, which will both slow us down and potentially cause rate limiting. Previously, given n = number of repositories, we would perform n / 100 API calls. Now we perform n + 1 calls. Given a sufficiently large org, such as one containing 500 repositories, we just went from 5 calls to 501. And that's assuming we never actually paginate the vulnerabilities, which is the whole point here!

I think a better way we could accomplish this is through a combination of approaches. In other words, only perform these repo-specific queries when necessary.

Taking another look at this, I believe this could actually be somewhat easy using mostly just the pre-existing data structures. The current flow would work as-is, but then when HasNextPage is true for a repositories vulnerabilities, you would send a new query, utilizing the EndCursor, which would look something like this:

type repoVulnerabilityQuery struct {
  Repository orgRepo `graphql:"repository(name: $repoName, owner: $orgName)"`
}

I think that might be the only new data structure necessary.. and then we would continue to have our n / 100 queries, except in cases where vulnerabilities are paginated.

I feel like I may have rambled a bit in there, so I hope it all makes sense. Please feel free to ask if you have any questions!

Alright Joey, thanks for the feedback. I have addressed the comment, lmk what you think. It was smart thinking about the rate limiting. Will keep this in mind for the future!


nextCursor := repositoryQuery.Repository.VulnerabilityAlerts.PageInfo.EndCursor
log.Info().Any("alertCursor", queryVars["alertCursor"]).Msg("Querying for more vulnerabilities for a repository.")
return gh.processRepoFindings(projects, repositoryQuery.Repository, nextCursor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified this locally, will add some unit test for this in a follow up PR.

Copy link
Contributor

@tarkatronic tarkatronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I love the recursive approach. And this is so much easier than I had feared in my head!

I just wanted to quickly address these last couple of comments, but aside from that this looks all ready!

querying/github.go Outdated Show resolved Hide resolved
querying/github.go Outdated Show resolved Hide resolved
querying/github.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tarkatronic tarkatronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Wonderful! Love it! Thanks so much for contributing this @JoseAngel1196

@tarkatronic tarkatronic merged commit 0e1fb03 into main Oct 5, 2023
28 of 30 checks passed
@tarkatronic tarkatronic deleted the jose/fixes-7 branch October 5, 2023 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle pagination of repository vulnerabilities
2 participants