feat: Handling pagination of repository vulnerabilities #85

JoseAngel1196 · 2023-10-04T22:15:17Z

Fixes #7

Description

In this PR, we're addressing pagination for repository vulnerabilities. To put it simply, imagine If we have a substantial number of vulnerabilities for a repository (which may not be ideal), we need to ensure we handle them appropriately. It's also not ideal to skip these vulnerabilities, as we are currently doing today.

I have ensured that these new changes do not break the reporting functionality. I have verified this locally using a test account.

codecov · 2023-10-04T23:47:09Z

Codecov Report

Merging #85 (b2915cc) into main (205b72b) will decrease coverage by 1.35%.
Report is 1 commits behind head on main.
The diff coverage is 18.75%.

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
- Coverage   75.41%   74.07%   -1.35%     
==========================================
  Files          15       15              
  Lines         716      729      +13     
==========================================
  Hits          540      540              
- Misses        168      180      +12     
- Partials        8        9       +1

Flag	Coverage Δ
unittests	`74.07% <18.75%> (-1.35%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
querying/github.go	`76.31% <18.75%> (-9.83%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

tarkatronic

Hmmm, there's a problem with this approach. This will drastically increase the number of calls made to the GraphQL API, which will both slow us down and potentially cause rate limiting. Previously, given n = number of repositories, we would perform n / 100 API calls. Now we perform n + 1 calls. Given a sufficiently large org, such as one containing 500 repositories, we just went from 5 calls to 501. And that's assuming we never actually paginate the vulnerabilities, which is the whole point here!

I think a better way we could accomplish this is through a combination of approaches. In other words, only perform these repo-specific queries when necessary.

Taking another look at this, I believe this could actually be somewhat easy using mostly just the pre-existing data structures. The current flow would work as-is, but then when HasNextPage is true for a repositories vulnerabilities, you would send a new query, utilizing the EndCursor, which would look something like this:

type repoVulnerabilityQuery struct {
  Repository orgRepo `graphql:"repository(name: $repoName, owner: $orgName)"`
}

I think that might be the only new data structure necessary.. and then we would continue to have our n / 100 queries, except in cases where vulnerabilities are paginated.

I feel like I may have rambled a bit in there, so I hope it all makes sense. Please feel free to ask if you have any questions!

JoseAngel1196 · 2023-10-05T18:23:06Z

Hmmm, there's a problem with this approach. This will drastically increase the number of calls made to the GraphQL API, which will both slow us down and potentially cause rate limiting. Previously, given n = number of repositories, we would perform n / 100 API calls. Now we perform n + 1 calls. Given a sufficiently large org, such as one containing 500 repositories, we just went from 5 calls to 501. And that's assuming we never actually paginate the vulnerabilities, which is the whole point here!

I think a better way we could accomplish this is through a combination of approaches. In other words, only perform these repo-specific queries when necessary.

Taking another look at this, I believe this could actually be somewhat easy using mostly just the pre-existing data structures. The current flow would work as-is, but then when HasNextPage is true for a repositories vulnerabilities, you would send a new query, utilizing the EndCursor, which would look something like this:
type repoVulnerabilityQuery struct {
  Repository orgRepo `graphql:"repository(name: $repoName, owner: $orgName)"`
}
I think that might be the only new data structure necessary.. and then we would continue to have our n / 100 queries, except in cases where vulnerabilities are paginated.

I feel like I may have rambled a bit in there, so I hope it all makes sense. Please feel free to ask if you have any questions!

Alright Joey, thanks for the feedback. I have addressed the comment, lmk what you think. It was smart thinking about the rate limiting. Will keep this in mind for the future!

JoseAngel1196 · 2023-10-05T18:39:58Z

querying/github.go

+
+		nextCursor := repositoryQuery.Repository.VulnerabilityAlerts.PageInfo.EndCursor
+		log.Info().Any("alertCursor", queryVars["alertCursor"]).Msg("Querying for more vulnerabilities for a repository.")
+		return gh.processRepoFindings(projects, repositoryQuery.Repository, nextCursor)


I have verified this locally, will add some unit test for this in a follow up PR.

tarkatronic

This is great! I love the recursive approach. And this is so much easier than I had feared in my head!

I just wanted to quickly address these last couple of comments, but aside from that this looks all ready!

querying/github.go

tarkatronic

Amazing! Wonderful! Love it! Thanks so much for contributing this @JoseAngel1196

JoseAngel1196 added 3 commits October 4, 2023 18:15

feat: Handling pagination of repository vulnerabilities

11d89ed

new updates

312ff5b

updating test

d51771a

JoseAngel1196 marked this pull request as ready for review October 4, 2023 23:57

JoseAngel1196 requested a review from a team as a code owner October 4, 2023 23:57

tarkatronic suggested changes Oct 5, 2023

View reviewed changes

JoseAngel1196 added 3 commits October 5, 2023 14:20

new updates

adb2a29

new updates

b0587e7

new updates

afb5491

JoseAngel1196 commented Oct 5, 2023

View reviewed changes

JoseAngel1196 requested a review from tarkatronic October 5, 2023 18:40

tarkatronic reviewed Oct 5, 2023

View reviewed changes

querying/github.go Outdated Show resolved Hide resolved

querying/github.go Outdated Show resolved Hide resolved

joey's suggestion

2216126

JoseAngel1196 requested a review from tarkatronic October 5, 2023 18:53

new updates

50f98f8

tarkatronic reviewed Oct 5, 2023

View reviewed changes

querying/github.go Outdated Show resolved Hide resolved

another one

b2915cc

JoseAngel1196 requested a review from tarkatronic October 5, 2023 20:15

tarkatronic approved these changes Oct 5, 2023

View reviewed changes

tarkatronic merged commit 0e1fb03 into main Oct 5, 2023
28 of 30 checks passed

tarkatronic deleted the jose/fixes-7 branch October 5, 2023 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Handling pagination of repository vulnerabilities #85

feat: Handling pagination of repository vulnerabilities #85

JoseAngel1196 commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 4, 2023 •

edited

Loading

tarkatronic left a comment

JoseAngel1196 commented Oct 5, 2023 •

edited

Loading

JoseAngel1196 Oct 5, 2023

tarkatronic left a comment

tarkatronic left a comment

feat: Handling pagination of repository vulnerabilities #85

feat: Handling pagination of repository vulnerabilities #85

Conversation

JoseAngel1196 commented Oct 4, 2023 • edited Loading

Description

codecov bot commented Oct 4, 2023 • edited Loading

Codecov Report

tarkatronic left a comment

Choose a reason for hiding this comment

JoseAngel1196 commented Oct 5, 2023 • edited Loading

JoseAngel1196 Oct 5, 2023

Choose a reason for hiding this comment

tarkatronic left a comment

Choose a reason for hiding this comment

tarkatronic left a comment

Choose a reason for hiding this comment

JoseAngel1196 commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 4, 2023 •

edited

Loading

JoseAngel1196 commented Oct 5, 2023 •

edited

Loading