Releases: ecmwf-ifs/loki
v0.2.7
What's New
- Experimental Fortran-to-CUDA transpilation demonstrated on CLOUDSC (#328)
- A new
SplitReadWriteTransformation
that allows user-guided GPU optimisation to make loads independent from stores (#329) - A new
LowerConstantArrayIndices
transformation to pass full arrays instead of constant slices in kernel calls (#348) - New transformation utilities to introduce loop blocking for driver loops (#362)
- A new string-based substitution mechanism for expressions (#366)
- Refactoring of SCC tests (#353) and transformation utilities (#354)
- And many small improvements and bug fixes (see below)
All Changes
- IR: Automatic sanitisation of tuples in IR constructors by @mlange05 in #350
- Run pytest on macos in GH actions by @reuterbal in #262
- SCC test reshuffle by @mlange05 in #353
- Transformations: Move common SCC utility routines to
utilities
by @mlange05 in #354 - Transformations: Test and fix corner case in get_local_arrays by @mlange05 in #355
- Tools: Disable timeout utility test on MacOS due to sporadic failures by @mlange05 in #356
- Fixed logical evaluation of PRESENT intrinsics on Array variables by @JoeffreyLegaux in #341
- ecWAM regression tests: switch to develop-1.3 branch by @awnawab in #358
- Split reads and writes for certain accumulation patterns by @awnawab in #329
- fix for 'resolve_vector_notation' utility by @MichaelSt98 in #361
- Transformations: Internalise
IdemTransformation
by @mlange05 in #360 - New transformation 'LowerConstantArrayIndices' to allow to … by @MichaelSt98 in #348
- OMNI: Fix dimension range-indexing in frontend by @mlange05 in #363
- Loki-transform: Pass
cuf
option to FilewriteTrafo by @mlange05 in #364 - Filter out globals in
get_local_arrays
by @awnawab in #370 - extend hoist variables functionality by @MichaelSt98 in #357
- Change/fix pipeline for mode 'scc-raw-stack' by @MichaelSt98 in #371
- Minimal padding in pool allocator by @awnawab in #365
- CLOUDSC low-level GPU (transpilation) via Loki (CUF/CUDA) by @MichaelSt98 in #328
- Loop splitting/blocking of block loops by @wertysas in #362
- String-based expression substitution and moar expression tests! by @mlange05 in #366
- SCC: Add vectorisation annotations in SCCRevector and translate in SCCAnnotate by @mlange05 in #359
- Update VERSION to 0.2.7 by @reuterbal in #381
New Contributors
Full Changelog: v0.2.6...v0.2.7
v0.2.6
This is a minor release with a number of housekeeping changes and some new features.
What's new
- We had a dependency on the Pydantic 1.x releases until now, and this release adds support for Pydantic 2. The next release will require Pydantic 2. (#349)
- The InlineTransformation allows now to inline statement functions (#345)
- A new LoopUnrollTransformation allows to explicitly unroll pragma-annotated loops (#347)
- Loki IR has now support for the
FORALL
statement and construct. However, this feature is only fully supported with the Fparser2 frontend (#210) - Cray pointers are now represented in the Loki IR as
Intrinsic
nodes (#342) - Python package installation works now correctly also from tarballs and other non-git versioned installation sources (#344)
- The test base has been cleaned up: all regression tests use now publicly available source branches, and all tests should now create temporary files in test-local temporary directories to avoid littering the source tree (#335, #343)
All changes
- Add support of the FORALL statement and construct (fparser/fgen) by @quepas in #210
- Rigorous use of tmp_path in tests by @reuterbal in #343
- DrHookTransformation: Add explicit label renaming by @mlange05 in #346
- Support for representing cray pointers using OFP or FP (fixes #338) by @reuterbal in #342
- Housekeeping on CMakeLists.txt and pyproject.toml by @reuterbal in #344
- BlockIndexInject: Exclude non-target calls from arg_map (fixes #336) by @reuterbal in #337
- Allow inlining of Statement Functions by @MichaelSt98 in #345
- IR: Update to Pydantic >2.0 compatibility by @mlange05 in #349
- Fix DEV_ALLOC_SIZE for ecwam regression and add SCC-HOIST variant by @awnawab in #351
- Update ecwam regression tests to use develop branch by @reuterbal in #335
- Add loop unroll transformation by @Andrew-Beggs-ECMWF in #347
New Contributors
- @Andrew-Beggs-ECMWF made their first contribution in #347
Full Changelog: v0.2.5...v0.2.6
v0.2.5
A minor release adding new transformations and fixing issues in the frontends, handling of derived types, dataflow analysis and transformations.
What's New
- A general
BlockIndexInjectTransformation
that injects the block-index into all array subscripts that have a local rank one less than their declared rank (#303) - A corresponding, IFS-specific
BlockViewToFieldViewTransformation
to replace per-block view pointers with full field pointers (#303) - A new
SCCRawStackPipeline
that uses a pool-allocator variant where each use of temporaries is replaced with fixed offsets into a pre-allocated scratch memory (#314, incorporating #201 by @rolfhm)
All Changes
- Block-index injection transformations by @awnawab in #303
- Fix parse failures with REGEX frontend due to white space in declarations by @reuterbal in #323
DataFlowAnalysis
bug fixes by @awnawab in #320- Fix derived type inheritance when parent type is not available (#330) by @reuterbal in #331
- InlineTransformation: Update Scheduler SGraph if marked_inline is activated by @awnawab in #322
HoistVariablesAnalysis
: remove unused explicit interfaces after inlining by @awnawab in #319- Fix Linter warnings for inline calls with interface block imported from header with func.h suffix by @reuterbal in #332
- Add transformation generated imports to driver or after inlining by @awnawab in #321
- Fix wrong classification as StatementFunction in translation to Loki IR by @reuterbal in #327
- get_pragma_parameters: Fix parsing clauses without parentheses in the tail string by @reuterbal in #324
- ProgramUnit.resolve_typebound_var: raise error if top-level parent is not declared by @reuterbal in #325
- Transformations: SCCRawStackPipeline and SCC config-from-file by @mlange05 in #314
Full Changelog: v0.2.4...v0.2.5
v0.2.4
This is a minor maintenance release matching the declaration of Hybrid 2024 Milestone 1.
What's Changed
- Repo reorganisation: Moving transformations by @mlange05 in #296
- Fix: import of private symbols affects the type inference by @quepas in #308
- JIT compilation updates and compatibility with f90wrap v0.2.14 by @reuterbal in #315
- IR: Fix
get_pragma_params
for multiline pragmas by @mlange05 in #313 - Transformations: Remap declaration symbols and adjust imports when inlining by @mlange05 in #311
- Docs: Update to links from static doc pages by @mlange05 in #312
New Contributors
Full Changelog: v0.2.3...v0.2.4
v0.2.3
This is a minor bugfix/maintenance release to resolve some issues around the Loki installation and version number discovery, particularly when installing from a code version that is not under Git version control.
What's Changed
- Fix installation without git checkout by @reuterbal in #302
- Fetch tags in Github workflows by @reuterbal in #305
- Update version number to 0.2.3 by @reuterbal in #306
Full Changelog: v0.2.2...v0.2.3
v0.2.2
This is a feature and bugfix release, which adds new functionality and resolves a number of problems.
What's New
- Loki supports a new, streamlined way of composing transformation pipelines from individual
Transformation
classes. Transformation arguments are shared among transformations, ensuring consistency, e.g., forDimension
parameters. Pipelines and transformation arguments can even be constructed purely from the config file, which will become the default for theloki-transform.py convert
command in the future. See #217 for more details on how this works. - The pool allocator transformation has a new option to improve compatibility with Cray Compiler Environment 16 on AMD platforms. For that, the pointer arithmetic is removed and
LOC
calls are used directly in the kernel to determine the offset of a temporary in the scratch allocation. See #231 for more details. - A new
RemoveCodeTransformation
has been added, replacing theRemoveCallsTransformation
and incorporating the dead code removal. Additionally, it provides a new feature to remove pragma-annotated code sections via!$loki remove
/!$loki end remove
(#276). - Loki's JIT functionality that is used to build and run tests has been amended so that it honours environment variables and no longer depends on
gfortran
exclusively. Instead, environment variablesCC
,FC
,F90
, andLD
are inspected to determine the compile commands to use, andCFLAGS
,FCFLAGS
,F90FLAGS
, andLDFLAGS
can be used to set corresponding flags. Default values are provided for GNU and NVHPC compilers. With this, it is now possible to run the test suite also on MacOS after installinggcc
andgfortran
(e.g., via Homebrew), and setting the environment variables accordingly. Note that Numpy's F2PY, which is used to call Fortran routines from the Python test base, works also with non-GNU compilers (e.g., NVHPC) but requires gcc to compile the C interface routines. Also, not all tests are compatible with NVHPC and test failures are a known issue that will be resolved in the future (#301). See #294 for more details. - The
parse_expr
utility's functionality has been expanded to support derived types and underpins now theget_pragma_parameters
utility, providing a vastly expanded functionality for expressions in pragma annotations (#292).
What's Changed
- [CMake] Expose GLOBAL_VAR_OFFLOAD and INCLUDES in loki_transform_target by @awnawab in #264
- Preserve imported statement functions by @awnawab in #251
- Fix codecov by adding CODECOV_TOKEN by @reuterbal in #278
cgen
: multiconditional/switch/select case statement by @MichaelSt98 in #267- Introducing the Pipeline class by @mlange05 in #260
- Alternative stack/pool allocator implementation based on Cray pointers compatible with Cray+AMD stack by @MichaelSt98 in #231
- improved
replace_intrinsics
and addedrename_variables
by @MichaelSt98 in #266 - Revert "DEPENDENCY TRAFO: statement functions included via c-style imports preserved" (#251) by @reuterbal in #282
cgen
: return type and var for function(s) by @MichaelSt98 in #269- Pipeline configuration from file by @mlange05 in #271
- Fixing nested associate scope-parentage tracking after inlining by @mlange05 in #281
- F2C:
DeReferenceTrafo
by @MichaelSt98 in #273 - REGEX frontend: white space and nesting bugfix by @reuterbal in #274
- Preserve import statement functions - take II by @awnawab in #283
- Skip driver routine in
GlobalVariableAnalysis
by @awnawab in #265 - MaskedTransformer: Fix in-place rebuilding of scoped nodes by @mlange05 in #284
- Avoid variable_map in TypedSymbol.get_derived_type_member and verify type information is derived correctly by @reuterbal in #285
- SCCHoist: hoist inline call temporaries and don't hoist statically declared arrays by @awnawab in #268
- Pool allocator: correctly resolve derived type member as block dimension and ignore pointer/allocatable arrays by @awnawab in #249
- Marked region removal and general code removal transformation by @mlange05 in #276
- SCC: make vertical dimension optional by @awnawab in #270
SCCBaseTransformation.get_integer_variable
now also checks module imports by @awnawab in #279- Improve performance of pragma-region attach/detach by using transformers by @mlange05 in #286
- Reorganising test directories by @mlange05 in #287
- [Bugfix] available_frontends: Import pytest locally to make dependency optional by @reuterbal in #290
DataflowAnalysis
bugfix: preserve body nesting invisit_MaskedStatement
by @awnawab in #288- Loki expression parser based on pymbolic parser by @MichaelSt98 in #272
- F2C: optional case-sensitivity for variables/symbols by @MichaelSt98 in #277
- Transformation to hoist temporaries in kernel language transpilation by @MichaelSt98 in #291
- fix scoping for global var hoisting by @MichaelSt98 in #293
- SCC: Support for bounds aliases and derived type members as bounds by @awnawab in #250
- Consistent, environment-configurable use of Compiler class in JIT compilation by @reuterbal in #294
- Derived-type inheritance by @awnawab in #295
- Improve
parse_expr
and use inprocess_dimension_pragmas
by @MichaelSt98 in #292
Full Changelog: v0.2.1...v0.2.2
v0.2.1
This is a bugfix release that contains a number of small fixes in transformations and Scheduler.
What's New
- Utility methods have been added to
CallStatement
, which simplify inspecting, validating and converting keyword-arguments to positional arguments (see #235) - The batch-processing module
loki.bulk
has been renamed toloki.batch
What's Changed
- kwargs utilities by @MichaelSt98 in #235
- Allow to ignore specific dimensions in "shift to zero indexing" by @MichaelSt98 in #236
- Add 'reverse_traversal=True' to DerivedTypeArgumentsTransformation manifest by @MichaelSt98 in #238
- Create a pid-specific temporary directory and clean it up at the end by @reuterbal in #261
- SCC-HOIST: Hoist variables as
kwargs
(optionally) by @MichaelSt98 in #237 GlobalVarHoistTransformation
: fix for functions/inline calls by @MichaelSt98 in #240- Support colon notation for all dimensions in flatten_arrays by @MichaelSt98 in #239
- Small CMake layer fixes for SL by @awnawab in #248
- Rename
bulk
->batch
and createir
sub-package by @mlange05 in #258 - SingleColumn: Demote arrays that are not used at all in the body by @mlange05 in #259
- Scheduler: Fix handling of external module procedures by @reuterbal in #263
Full Changelog: v0.2.0...v0.2.1
v0.2.0
This release contains a rewrite of Loki's Scheduler, which is responsible for planning and executing batch transformations across complex source trees. Compared to the original implementation, it is more flexible, enables handling of more dependency types (data dependencies, type dependencies as well as control flow dependencies) and is faster. Additional capabilities of pruning the dependency graph have been added as part of this, and the new discovery mechanism may make changes to Scheduler config files necessary. See the expanded documentation section for more details.
No other changes are included in this release and in case of problems we advise to report them as issues and stay on v0.1.7 in the meantime.
What's Changed
- The new Scheduler by @reuterbal in #213
- Update Github actions to get rid of node.js warnings by @reuterbal in #257
Full Changelog: v0.1.7...v0.2.0
v0.1.7
This is the final release of the v0.1 version of Loki. The new v0.2 will become available soon and use a rewritten Scheduler implementation for batch processing.
Most changes in this release are bugfixes, minor improvements or preparatory work for the new Scheduler integration. See below for the full list.
What's new
- The file parsing speed has been improved, which should make the processing of large source trees significantly faster (#229, #241, #242, #245).
- A new set of coding standards checks has been added, corresponding to the new IFS Arpège coding standards. This includes only three rules currently but will be expanded going forward (#247).
- The pool allocator transformation inserts the argument for the scratch space now as integer variables instead of a dedicated derived type. This was found to avoid an allocation on NVIDIA GPUs and yield performance improvements (#214).
What's Changed
- Create separate ModuleWrapTransformation from DependencyTransformation by @reuterbal in #197
- Transformation configuration and SchedulerConfig update by @mlange05 in #191
- Pragma-driven subroutine inlining and associated utilities by @mlange05 in #198
- introduce 'flatten_arrays()' (to overcome pointer hack) by @MichaelSt98 in #199
- Array shadowing bugfix for inliner by @skarppinen in #202
- Fix for SCC-HOIST, regarding wrong (hoisted) argument(s) (indexing) i… by @MichaelSt98 in #206
- Refactored Global Variable Offload by @reuterbal in #207
- Fixes to minor issues related to SCC HOIST by @skarppinen in #211
- introduce flag to allow removing all derived types by @MichaelSt98 in #215
- CMake: Fix DERIVE_ARGUMENT_SHAPE_ARRAY and argument handling by @mlange05 in #217
- Recursive inlining via InlineTransform and associated fixes by @mlange05 in #205
- Minor fixes to frontends and IR nodes by @reuterbal in #212
- Stack/Pool Allocator: pass stack as integer(s) by @MichaelSt98 in #214
- Frontend: Fix bug for multi-line pragmas and add short test by @mlange05 in #221
- assumed size array handling for 'normalize_array_shape_and_access' by @MichaelSt98 in #218
- Nested derived type bug fix by @rolfhm in #192
- C/C++ De/Reference by @MichaelSt98 in #223
- allow F2C transpilation using c_ptr (switch for old behaviour or new … by @MichaelSt98 in #219
- Github actions: Only load SSH_KEY from secrets if available by @reuterbal in #228
- f2c transpile via convert by @MichaelSt98 in #208
- Global var hoisting by @MichaelSt98 in #226
- Allow for limiting
resolve_sequence_association
to procedures that are inlined in loki-transform convert by @skarppinen in #225 - Performance optimisations for frontend parsing/sanitising by @mlange05 in #229
- Enabling SCC-Stack for EC-physics, part 2 by @mlange05 in #222
- Frontend Transformer optimisation by @mlange05 in #241
- FParser: Perform in-place scope attachment during parse by @mlange05 in #242
- Subroutine: Only clone symbol when inferring from allocatable by @mlange05 in #245
- Loki-lint: First 3 rules for new IFS-Arpégé coding standards by @reuterbal in #247
Full Changelog: v0.1.6...v0.1.7
v0.1.6
This release primarily contains bugfix and maintenance changes and few new features. It is intended as a stable basis before a set of breaking changes will be made for the next release. These will primarily relate to Scheduler behaviour and a new, consolidated config file format.
What's new
- A new utility by @rolfhm allows to resolve sequence association (#173)
- The
disable
property in the Scheduler config allows now wildcards/simple patterns (#194) - A new utility by @skarppinen allows to extract internal subprograms from procedures to convert them into standalone procedures (#181)
Transformation
classes have now static properties that define how the scheduler should traverse the dependency graph, e.g., forward/reverse traversal, recursion into contained scopes, or file graph traversal (#154)
The full list of changes:
- Fix vector section trimming in driver loop by @awnawab in #169
- TransformInline: Fix rescoping in expression substitution by @mlange05 in #170
- Fix deep-cloning of subroutiens and modules (fix #174) by @mlange05 in #175
- Sourcefile does not have a filepath by @joscao in #177
- InlineMember: rename duplicate locals in the body by @rolfhm in #172
- Extract/Improve Polyhedron class by @joscao in #178
- Provide linear algebra utility by @joscao in #179
- add 'kernel_only=True' to RemoveCallsTransformation by @MichaelSt98 in #185
- Display dataflow analysis (if attached) in IR graph by @joscao in #183
- Fix handling of empty files in frontends (fix #186) by @reuterbal in #187
- Transformation utility to fix sequence association by @rolfhm in #173
- Minor transformation fixes by @reuterbal in #188
- Small parsing fixes by @awnawab in #190
- pool_allocator handles range indices by @rolfhm in #193
- Generic enrichment process by @reuterbal in #189
- Allow wildcards in
disable
list for scheduler by @mlange05 in #194 - Utility transformation for creating standalone subroutines from contained subroutines by @skarppinen in #181
- Static Transformation properties (manifest) by @mlange05 in #154
New Contributors
- @skarppinen made their first contribution in #181
Full Changelog: v0.1.5...v0.1.6