From da317d6f0f53c61df342fb043f22dcc1c3585c07 Mon Sep 17 00:00:00 2001 From: Dmitry Romanov Date: Wed, 14 Aug 2024 11:42:58 -0400 Subject: [PATCH] [doc] Some parts from Concepts moved to index page and rewritten. --- docs/concepts.md | 75 -------------------------------------------- docs/index.md | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+), 75 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index f56992b69..7603394bd 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -54,80 +54,5 @@ The lifetime of a `JService` not only spans the time that any `JEventProcessors` `JApplication`, which is helpful for things like writing test cases. -## Design philosophy - -JANA's design philosophy can be boiled down to five values, ordered by importance: - -### Simple to use - -JANA chooses its battles carefully. First and foremost, JANA is about parallelizing computations over data organized -into events. From a 30000-foot view, it should look more like OpenMP or Thread Building Blocks or RaftLib than like ROOT. -Unlike the aforementioned, JANA's vocabulary of abstractions is designed around the needs of physicists rather than -general programmers. However, JANA does not attempt to meet _all_ of the needs of physicists. - -JANA recognizes when coinciding concerns ought to be handled orthogonally. A good example is persistence. JANA does not -seek to provide its own persistence layer, nor does it require the user to commit to a specific dependency such as ROOT -or Numpy or Apache Arrow. Instead, JANA tries to make it feasible for the user to choose their persistence layer independently. -This way, if a collaboration decides they wish to (for instance) migrate from ROOT to Arrow, they have a well-defined migration -path which keeps the core analysis code largely intact. - -In particular, this means minimizing the complexity of the build system and minimizing orchestration. Building code -against JANA should require nothing more than implementing certain key interfaces, adding a single path to includes, -and linking against a single library. - -### Well-organized - -While JANA's primary goal is running code in parallel, its secondary goal is imposing an organizing principle on -the users' codebase. This can be invaluable in a large collaboration where members vary in programming skill. Specifically, -JANA organizes processing logic into decoupled units. JFactories are agnostic of how and when their prerequisites are -computed, are only run when actually needed, and cache their results for reuse. Different analyses can coexist in separate -JEventProcessors. Components can be compiled into independent plugins, to be mixed and matched at runtime. All together, -JANA enforces an organizing principle that enables groups to develop and test their code with both freedom and discipline. - - -### Safe - -JANA recognizes that not all of its users are proficient parallel programmers, and it steers users towards patterns which -mitigate some of the pitfalls. Specifically, it provides: - -- **Modern C++ features** such as smart pointers and judicious templating, to discourage common classes of bugs. JANA seeks to -make its memory ownership semantics explicit in the type system as much as possible. - -- **Internally managed locks** to reduce the learning curve and discourage tricky parallelism bugs. - -- **A stable API** with an effort towards backwards-compatibility, so that everybody can benefit from new features -and performance/stability improvements. - - -### Fast - -JANA uses low-level optimizations wherever it can in order to boost performance. - -### Flexible - -The simplest use case for JANA is to read a file of batched events, process each event independently, and aggregate -the results into a new file. However, it can be used in more sophisticated ways. - -- Disentangling: Input data is bundled into blocks (each containing an array of entangled events) and we want to -parse each block in order to emit a stream of events (_flatmap_) - -- Software triggers: With streaming data readout, we may want to accept a stream of raw hit data and let JANA -determine the event boundaries. Arbitrary triggers can be created using existing JFactories. (_windowed join_) - -- Subevent-level parallelism: This is necessary if individual events are very large. It may also play a role in -effectively utilizing a GPU, particularly as machine learning is adopted in reconstruction (_flatmap+merge_) - -JANA is also flexible enough to be compiled and run different ways. Users may compile their code into a standalone -executable, into one or more plugins which can be run by a generic executable, or run from a Jupyter notebook. - - -## Comparison to other frameworks - -Many different event reconstruction frameworks exist. The following are frequently compared and contrasted with JANA: - -- [Clara](https://claraweb.jlab.org/clara/) While JANA specializes in thread-level parallelism, Clara - uses node-level parallelism via a message-passing interface. This higher level of abstraction comes with some performance - overhead and significant orchestration requirements. On the other hand, it can scale to larger problem sizes and - support more general stream topologies. JANA is to OpenMP as Clara is to MPI. diff --git a/docs/index.md b/docs/index.md index f5fe1f1f9..3e90ae435 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,6 +25,87 @@ for(auto t : tracks){ } ``` + +## Design philosophy + +JANA2's design philosophy can be boiled down to five values, ordered by importance: + +### Simple to use + +JANA2 focuses on making parallel computations over event-based\* data simple. +Unlike the aforementioned, JANA2's vocabulary of abstractions is designed around the needs of physicists rather than +general programmers. However, JANA2 does not attempt to meet _all_ of the needs of physicists. + +JANA2 recognizes that some tasks, like data persistence, should be handled separately. +As example, instead of providing its own persistence layer or requiring specific dependencies like ROOT, Numpy, or Apache Arrow, +JANA2 allows users to choose their preferred tools. +This flexibility ensures that if a team wants to switch from one tool to another (e.g., from ROOT to Arrow), +the core analysis code remains largely unaffected. + +To keep things simple, JANA minimizes the complexity of its build system and orchestration. +Using JANA should be straightforward: implement a several key interfaces, add an include path, and link against a single library. + +?> **Tip** The term `event-based` in JANA2 doesn't strictly refer to _physics_ or _trigger_ events. +In JANA2, `event` is used in a broader computer science context, aligning with the streaming readout paradigm +and supporting concepts like event nesting and sub-event parallelization. + + +### Well-organized + +While JANA's primary goal is running code in parallel, its secondary goal is imposing an organizing principle on the users' codebase. +This can be invaluable in a large collaboration where members vary in programming skill. Specifically, +JANA organizes processing logic into decoupled units. JFactories are agnostic of how and when their prerequisites are +computed, are only run when actually needed, and cache their results for reuse. Different analyses can coexist in separate +JEventProcessors. Components can be compiled into independent plugins, to be mixed and matched at runtime. All together, +JANA enforces an organizing principle that enables groups to develop and test their code with both freedom and discipline. + + +### Safe + +JANA recognizes that not all of its users are proficient parallel programmers, and it steers users towards patterns which +mitigate some of the pitfalls. Specifically, it provides: + +- **Modern C++ features** such as smart pointers and judicious templating, to discourage common classes of bugs. JANA seeks to +make its memory ownership semantics explicit in the type system as much as possible. + +- **Internally managed locks** to reduce the learning curve and discourage tricky parallelism bugs. + +- **A stable API** with an effort towards backwards-compatibility, so that everybody can benefit from new features +and performance/stability improvements. + + +### Fast + +JANA uses low-level optimizations wherever it can in order to boost performance. + +### Flexible + +The simplest use case for JANA is to read a file of batched events, process each event independently, and aggregate +the results into a new file. However, it can be used in more sophisticated ways. + +- Disentangling: Input data is bundled into blocks (each containing an array of entangled events) and we want to +parse each block in order to emit a stream of events (_flatmap_) + +- Software triggers: With streaming data readout, we may want to accept a stream of raw hit data and let JANA +determine the event boundaries. Arbitrary triggers can be created using existing JFactories. (_windowed join_) + +- Subevent-level parallelism: This is necessary if individual events are very large. It may also play a role in +effectively utilizing a GPU, particularly as machine learning is adopted in reconstruction (_flatmap+merge_) + +JANA is also flexible enough to be compiled and run different ways. Users may compile their code into a standalone +executable, into one or more plugins which can be run by a generic executable, or run from a Jupyter notebook. + + +## Comparison to other frameworks + +Many different event reconstruction frameworks exist. The following are frequently compared and contrasted with JANA: + +- [Clara](https://claraweb.jlab.org/clara/) While JANA specializes in thread-level parallelism, Clara + uses node-level parallelism via a message-passing interface. This higher level of abstraction comes with some performance + overhead and significant orchestration requirements. On the other hand, it can scale to larger problem sizes and + support more general stream topologies. JANA is to OpenMP as Clara is to MPI. + + ## History [JANA](https://halldweb.jlab.org/DocDB/0011/001133/002/Multithreading_lawrence.pdf) (**J**Lab **ANA**lysis framework)