Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RNTuple] accessing nested structs is not lazy enough #314

Open
Moelf opened this issue Mar 13, 2024 · 1 comment
Open

[RNTuple] accessing nested structs is not lazy enough #314

Moelf opened this issue Mar 13, 2024 · 1 comment

Comments

@Moelf
Copy link
Member

Moelf commented Mar 13, 2024

Consider the following top-level field (column in the table analogy)

├─ Symbol("AntiKt4TruthDressedWZJetsAux:")  Struct
│                                            ├─ :m  Vector
│                                            │       ├─ :offset  Leaf{UnROOT.Index64}(col=23)
│                                            │       └─ :content  Leaf{Float32}(col=24)
│                                            ├─ :pt  Vector
│                                            │        ├─ :offset  Leaf{UnROOT.Index64}(col=17)
│                                            │        └─ :content  Leaf{Float32}(col=18)
│                                            ├─ :eta  Vector
│                                            │         ├─ :offset  Leaf{UnROOT.Index64}(col=19)
│                                            │         └─ :content  Leaf{Float32}(col=20)
│                                            ├─ :constituentWeights  Vector
│                                            │                        ├─ :offset  Leaf{UnROOT.Index64}(col=29)
│                                            │                        └─ :content  Vector
│                                            │                                      ├─ :offset  Leaf{UnROOT.Index64}(col=30)
│                                            │                                      └─ :content  Leaf{Float32}(col=31)

currently, when we loop over the events, the access is too "eager":

for evt in rntuple
    evt.var"AntiKt4TruthDressedWZJetsAux:".pt
end

In this case, we only want to access the storage related to the pTs (i.e. rntuple column 17 and 18), but in reality we're reading all the columns (17,18,19,20,23,24,29,30,31) as soon as we do evt.var"AntiKt4TruthDressedWZJetsAux:"

One possible way is to switch to AwkwardArray.jl by @jpivarski, and represent the whole rntuple as a big RecordArray and theoretically it will work for columnar access (i.e. rntuple.var"AntiKt4TruthDressedWZJetsAux:".pt), and it may not solve our event-iteration problem.

Another possible way is to use StructArrays.jl more smartly, @peremato did you run into anything like this in EDM4hep.jl? If so anything you found working?

@peremato
Copy link
Member

peremato commented Mar 13, 2024

Another possible way is to use StructArrays.jl more smartly, @peremato did you run into anything like this in EDM4hep.jl? If so anything you found working?

With EDM4hep, I think I do to have this problem since the top level is Vector of POD structs instead of being a struct of vectors as is in this case. It is true that I read all the fields (I guess) because I really construct at the end a SaA of the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants