-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EES-5017, EES-5016 Add data set GET query endpoint #4709
Conversation
9407d80
to
ec53169
Compare
@@ -0,0 +1,9 @@ | |||
namespace GovUk.Education.ExploreEducationStatistics.Public.Data.Api.ViewModels; | |||
|
|||
public record ErrorViewModel : Common.ViewModels.ErrorViewModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the need for this record, rather than just using the base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some weaknesses with the common ErrorViewModel
in that its fields are essentially all optional.
This is problematic as it means as this means that the view model is potentially going to be wrong in the generated docs.
There's also some other issues with the current generated docs in terms of required properties. Instead of doing this here, I'm going to try and tackle this in EES-5057
public record ErrorViewModel | ||
{ | ||
/// <summary> | ||
/// The error message. | ||
/// </summary> | ||
public string Message { get; init; } = string.Empty; | ||
public virtual string Message { get; init; } = string.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the need for this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per other comment - this is due to issues with the current view model. We ideally want to be able to extend this to create a more correct view model for the public API.
For now, I'm leaving this to implement it in EES-5057 instead.
....Education.ExploreEducationStatistics.Public.Data.Api/Validators/LocationStringValidators.cs
Show resolved
Hide resolved
@@ -29,6 +31,7 @@ public record DataSetQueryResultViewModel | |||
/// <summary> | |||
/// The geographic level relevant to this result's data. | |||
/// </summary> | |||
[JsonConverter(typeof(EnumToEnumValueJsonConverter<GeographicLevel>))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this not work out of the box with the custom schema filter we have for GeographicLevel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't for Swagger. It's for JSON serialisation 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely a bit tired when reviewing these bits of the code.....
...GovUk.Education.ExploreEducationStatistics.Public.Data.Api/ViewModels/TimePeriodViewModel.cs
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Api/Services/DataSetQueryParser.cs
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Api/Services/DataSetQueryService.cs
Outdated
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Api/Services/DataSetQueryService.cs
Outdated
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Api/Requests/DataSetQuerySort.cs
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Scripts/Commands/SeedDataCommand.cs
Outdated
Show resolved
Hide resolved
18866c1
to
0108572
Compare
68dbf4d
to
5c5c709
Compare
This includes upgrading DuckDB.NET to 0.10.1.2 and introduces: - Dapper - for executing queries and mapping results - InterpolatedSql - for writing SQL queries and mitigating SQL injection - MiniProfiler - for performance profiling
…rrectly This changes the `GeographicLevels` to be correctly populated from the `GeographicLevelMeta` instead of the `LocationMetas`. As part of this, we've also updated `DataSetVersion` generator extensions to set the geographic levels meta based on the locations meta in the default cases. This is sensible default to avoid having to explicitly set the geographic levels every time.
…e consistent This updates the location and time period string validator errors so that they're consistent with recent changes to the sort string format (and its error output).
90ae671
to
b8b6484
Compare
{ | ||
var command = duckDbConnection.SqlBuilder( | ||
$""" | ||
SELECT DISTINCT {FiltersTable.Cols.ColumnName:raw} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jack-hive penny for your thoughts as I've gone back and forth on this in my head a few times, but what do you think about this column name?
I'm kinda considering renaming it to FilterId
(filter_id
in the table itself). However, this is a touch confusing given the table itself is called filters.parquet
and we have other id columns too.
Like I said, a bit on the fence about this. It might be fine as-is, but maybe not. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm......
I'm a bit on the fence too.
I think FilterId
makes more sense in terms of the code - but is less intuitive in terms of the domain context.
Maybe CsvColumnName
might be more explicit and intuitive than ColumnName
. I must admit, it wasn't immediately obvious to me what we meant by ColumnName
when I was first reviewing the code. So might be best we change it.
I think I'd either choose CsvColumnName
or FilterId
This change changes data seeding so that null meta IDs in the data Parquet table are coalesced into `0`. This avoids unexpected behaviour in queries where rows with null meta IDs will not be included in the results correctly.
This changes the location and filter Parquet tables to `filter_options` and `location_options` respectively. As part of this, we've also renamed `filter_options.column_name` to `filter_options.filter_id` to make it more obvious what this is.
This PR adds the data set GET query endpoint -
/api/v1/data-sets/{dataSetId}/query
. This provides functionality to let users make data set queries using query parameters e.g.?indicators=headcount&filters.in=kaNhs,zvUFQ&sorts=geographicLevel|Asc
Note that this doesn't provide test coverage for most of this functionality. This will be covered in EES-5015.
Query parsing
Parsing data set queries to this endpoint roughly proceeds like so:
DataSetGetQueryRequest
)DataSetQueryRequest
(will be used in POST query variant) inDataSetQueryService
DataSetQueryParser
, producing a SQL WHERE fragmentWe've now implemented the majority of the base query criteria parsing functionality to handle the various facet types and their comparators. This doesn't include handling of more complex operators like
and
,or
andnot
. This will be implemented in EES-4722.Dapper and SQL builders
As there is no ORM support for DuckDB, the querying functionality has been implemented using a combination of Dapper and InterpolatedSql.
Dapper itself is very simplistic and only works with raw SQL strings out of the box. This isn't particularly safe in the context of SQL injection, so we've complemented it with InterpolatedSql which provides protection using its
SqlBuilder
API e.g.For real usages of this, check out the various repositories that have been created. We use a modified variant of InterpolatedSql's Dapper integration to make it a bit nicer to build and execute queries.
DuckDB issues
Unfortunately, there were some issues with DuckDB that we've had to work around, primarily around named parameters. Dapper only really works with named parameters, but this seems to cause issues with the DuckDB.NET library that we're using.
When using named parameters, DuckDB.NET would consistently fail to throw exceptions around parameter binding (need to create an issue for this), or memory access violations when debugging.
I'm communicating with the library maintainer to see if there's anything we can do here. For the time-being, I've been able to implement workarounds by wrapping the DuckDB.NET database connection code and forcing the use of positional auto-incrementing parameters (i.e.
?
parameters) in generated SQL.MiniProfiler
To help with debugging and performance, we've added MiniProfiler which lets us profile our code. This is particularly useful for data set queries, where it can help identify performance bottlenecks.
MiniProfiler is enabled via the
MiniProfiler.Enabled
app setting, so it can be disabled or enabled on a per-environment basis.Related changes
GeographicLevels
in the data set meta summary not being populated correctly from a data set version. This now uses theGeographicLevelMeta
instead ofLocationMetas
to populate it.Other changes
ValidationFailure
inFluentValidationActionFilter