Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a file-based database for simulation results #273

Open
nelimee opened this issue Jul 25, 2024 · 5 comments
Open

Implement a file-based database for simulation results #273

nelimee opened this issue Jul 25, 2024 · 5 comments
Labels
backend Issue pertaining to the Python backend (tqec package) enhancement New feature or request, may not be in the task flow future Issues to track future development that is currently blocked by other issues.

Comments

@nelimee
Copy link
Contributor

nelimee commented Jul 25, 2024

Is your feature request related to a problem? Please describe.
The goal of our initiative is to generate graphs such as

Image

In the above graph, each point is:

  • one quantum circuit, generated by tqec for a given value of k, and that can be represented as a stim file,
  • a large number of simulations performed by stim.

One problem is that Stim simulations are not free, and computing one point from the above graph can take minutes to hours of computational time.

Currently, we have no clever way of storing such data, meaning that the stim simulations have to be re-done each time we want to generate a new graph.

Describe the solution you'd like

We should have a database-like way of storing simulation data. There are multiple requirements:

  • we should be able to retrieve easily already existing results,
  • data should be written on disk,
  • we should be able to add new results to existing ones (typically, start a simulation with 1000 shots to see the overall look of the plot and check that there is not mistake, and once obvious mistakes have been corrected be able to launch 999000 more shots to reduce the error bars),
  • we should be able to remove existing results, but this should be hard to do (i.e., be wary of accidental data loss)

Note that simulation results might be quite heavy in terms of memory, so an optimised storage would be a plus.

@nelimee nelimee added backend Issue pertaining to the Python backend (tqec package) enhancement New feature or request, may not be in the task flow labels Jul 25, 2024
@inmzhang
Copy link
Contributor

We can think about utilizing the existing sampling tool like sinter. But as far as I know, currently there is no API provided by sinter to store the intermediate sampled detectors/observables to files.

@nelimee
Copy link
Contributor Author

nelimee commented Jul 26, 2024

We can think about utilizing the existing sampling tool like sinter. But as far as I know, currently there is no API provided by sinter to store the intermediate sampled detectors/observables to files.

Yep, the goal of this issue is not the generation (which will very likely be handled by sinter as you note) but rather the storage of generated results.

Also, even if sinter had the possibility to store to files, we would need to have a clear organisation to allow easy retrieval, modification and deletion, so in any case we will need at least helper methods to do that.

Note that it looks a lot like the work done by a database, that might be a path to the solution.

@afowler
Copy link

afowler commented Jul 26, 2024 via email

@nelimee
Copy link
Contributor Author

nelimee commented Jul 26, 2024

Craig: can you comment on how Stim/sinter simulation results can be systematically stored so that one could later gather additional data for a plot to improve its statistics or explore a wider range of code distances and error rates?

Whenever I have a task like that, I really follow the database point of view:

  1. I try to find a set of small data points that uniquely identify an "experiment" (in database terms, the primary key),
  2. I try to store in the "experiment" (i.e., the value associated to the primary key) whatever I may need in the future.

In this specific case, I think that the primary key will be composed of:

  1. an algorithmically generated (hash-like) key representing the experiment being benchmarked. For the moment, with the limited use-cases we explicitly target, I guess that we can compute such a hash (or a unique value if we really want to avoid any collision) by only considering:

    • each block identifier ("xzx", "zxz", "xozh", ...),
    • each block position (i.e., the position of its origin, that is uniquely defined for each block).

    These can be directly obtained from the SketchUp file representing the computation and should be:

    1. robust enough in the sense that if the computation does not change, the value should not change,
    2. sensitive enough to avoid representing 2 different computations by the same value.
  2. the value of k (determining the size of our logical qubits, and code distance),

  3. the noise level might be tricky because of the floating-point representation, but there are ways around it that I think should be satisfactory for this use case, e.g., representing the noise level e = powerOfTenMantissa * 10**(-negativePowerOfTen) as a tuple (powerOfTenMantissa, negativePowerOfTen) where 0 <= powerOfTenMantissa <= 1 can be represented as a fraction.

The data stored will have to include the outputs of stim simulations (depending on what we need, direct measurements or detection events), and I think some metadata could be added to such a value such as:

  • date of data generation,
  • library versions used to generate the data,
  • custom annotations/tags provided by the user (e.g., "confidential", "internal use only", "public") to be able to filter out some data,
  • ...

In terms of format, and because the main data we will store is binary anyway, I do not have any preferences and it can be anything (a real database, a file/folder-based storage, ...).

@afowler
Copy link

afowler commented Jul 26, 2024 via email

@nelimee nelimee added on hold Issues that are on hold for the moment, due to background work not pushed yet or lack of relevance. future Issues to track future development that is currently blocked by other issues. and removed on hold Issues that are on hold for the moment, due to background work not pushed yet or lack of relevance. labels Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issue pertaining to the Python backend (tqec package) enhancement New feature or request, may not be in the task flow future Issues to track future development that is currently blocked by other issues.
Projects
None yet
Development

No branches or pull requests

3 participants