How to backup scorch online? #1396

darren · 2020-05-12T09:10:08Z

Is there any way to backup scorch index online, ie take a snapshot of current scorch index without closing it and copy the snapshot to another location?

mschoch · 2020-05-12T11:43:22Z

Currently, no. We are aware this is a big limitation.

jkassis · 2020-08-18T22:24:31Z

#213

jkassis · 2020-08-18T22:26:49Z

so current best practice is to close the index and tar / copy the folder from disk?

jkassis · 2020-08-19T00:33:07Z

@mschoch

mosuka/blast is the only open source project i can find that attempts to use bleve with raft. the technique used there for snapshotting / backups is to iterate through index items with index.Index.DocIDReaderAll. is that the right approach for a scorch index? (mosuka/blast#140).

What's your preferred approach for snapshots of a scorch index in the RAFT use case? I can think of a few...

Close the index and tar the disk.
Close the index, re-open, block-writes, and stream the results of DocIDReaderAll to disk.
Block writes and stream the results of DocIDReaderAll. Would everything be flushed at this point? Would this be more efficient than a tar operation?
Don't block writes and stream the results of DocIDReaderAll to disk. This is the approach of mosuka/blast and seems guaranteed to be not a point-in-time snapshot / backup.
Use Snapshots. This seems like a preferred approach, but navigating this is beyond me without much more effort...

jkassis · 2020-08-19T16:01:44Z

Dump looks like it only works for UpsideDownCouchRow... does it work for scorch?

mschoch · 2020-08-19T18:06:30Z

Dump was only suitable for debugging, it was never intended for backup (and it had the same limitations in needing exclusive access to the k/v store).

Generally, online backup should be more straightforward to implement for scorch. All of index data is written to immutable segments, meaning if you can guarantee they won't go away for a period, you can simply copy them. Now, that is just the raw index data, you need the index meta-data to know which set of files represent a snapshot, and also the deleted bitmaps associated with them. This is the part that is still sub-optimal in scorch, because we're using BoltDB for storing the meta-data, and that comes with limitations on exclusive access.

So, due to that, an out-of-process backup isn't really doable in the short term. But, the existing read/write scorch instance could reasonably have a method to Backup() into some other location.

Unfortunately, it won't be as simple as just adding it to the Reader method, as even though that represents a logical snapshot of the index, some of the segments in that may be in-memory still.

Instead, it would probably need to be a new top-level method, which found the most recent fully-persisted snapshot in the root bolt, copied it, and the root.bolt itself into the new location.

I don't think it's a whole lot of work, roughly:

bump ref counts and/or protect files from deletion for the duration of the operation (trickiest part)
copy segment files
use boltdb's tx.WriteTo(...) to copy metadata
release ref counts and/or protected file lists

jkassis · 2020-08-19T22:14:06Z

Thanks Marty, that's helpful. Would it be possible / convenient to get a snapshot of the current state of the index and then wait for the segments to become persistent before proceeding with the rest of the backup as you described?

jkassis · 2020-08-20T16:51:46Z

@mschoch digging deeper... it seems to me that if unsafe_batch is configured to false, calling Reader should produce a snapshot with all segments persisted to disk. does that make sense? agree?

mschoch · 2020-08-20T16:59:56Z

@jkassis

Would it be possible / convenient to get a snapshot of the current state of the index and then wait for the segments to become persistent before proceeding with the rest of the backup as you described?

Unfortunately no, the way Bleve works, when those in-memory segments eventually get persisted, they are introduced as a new snapshot. This simplifies the code most places because once you get a snapshot, it's immutable.

it seems to me that if unsafe_batch is configured to false, calling Reader should produce a snapshot with all segments persisted to disk. does that make sense? agree?

No, unsafe_batch just affects whether a call to Batch() blocks until that data is on disk. In a live system, calling Reader() at some point in time, there could still be a segment in memory.

jkassis · 2020-08-21T01:51:56Z

@mschoch Thanks again for your helpful response. I decided to go white box and thoroughly reviewed the Scorch.Batch, introducerLoop, and introduceSegment. I updated the README.md and renamed some variables to make it easier to read.

PR is here: #1452

Near the end of that review... i ran into the Scorch.rootPersisted channel. This seems promising. A client that wants a backup could stop write traffic to the index while waiting for the rootPersisted channel to close. That should solve Part A... finding a persistent Snapshot. Agree?

I'm still looking to see where that is closed. Maybe I'll find it tomorrow.

At that point, the Backup function follows your outline.

bump the ref counts on the IndexSnapshot.
from your ^^ it sounds like you feel that might not be enough. yes/no?
access meta-data for the snapshot
stream files
stream metadata
release ref counts

When recreating the DB from the backup however... i'm worried that perhaps not all the required metadata will be rehydrated. should i save off more than the snapshot data? or is that it?

This might not be a fully online backup per se since we need to stop the world until the current root is persisted, but it might be a good place to start. Just demonstrating backup / recovery would be a win IMO and this might be just enough for me.

jkassis · 2020-08-21T14:44:45Z

i'm working on a PR now. it doesn't look for a persistent snapshot... it just writes snapshot data to a writer.

mschoch · 2020-08-21T14:50:23Z

I'm doing my best, but I cannot provide feedback and the pace you're moving forward.

I think you're going down a wrong path trying to block updates, it should not be necessary. Aside from the uninteresting case of a brand new index, there is always at least one fully-persisted snapshot on disk. Backup should just add a ref count to that and/or prevent it's files from deletion (we have some additional maps used for that), and then backup that already-on-disk snapshot.

jkassis mentioned this issue Aug 19, 2020

Snapshotting and Chaos Testing mosuka/blast#140

Open

jkassis mentioned this issue Nov 24, 2020

v2.0.0 Proposal #1495

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to backup scorch online? #1396

How to backup scorch online? #1396

darren commented May 12, 2020

mschoch commented May 12, 2020

jkassis commented Aug 18, 2020

jkassis commented Aug 18, 2020

jkassis commented Aug 19, 2020

jkassis commented Aug 19, 2020

mschoch commented Aug 19, 2020 •

edited

Loading

jkassis commented Aug 19, 2020

jkassis commented Aug 20, 2020

mschoch commented Aug 20, 2020

jkassis commented Aug 21, 2020 •

edited

Loading

jkassis commented Aug 21, 2020

mschoch commented Aug 21, 2020

How to backup scorch online? #1396

How to backup scorch online? #1396

Comments

darren commented May 12, 2020

mschoch commented May 12, 2020

jkassis commented Aug 18, 2020

jkassis commented Aug 18, 2020

jkassis commented Aug 19, 2020

jkassis commented Aug 19, 2020

mschoch commented Aug 19, 2020 • edited Loading

jkassis commented Aug 19, 2020

jkassis commented Aug 20, 2020

mschoch commented Aug 20, 2020

jkassis commented Aug 21, 2020 • edited Loading

jkassis commented Aug 21, 2020

mschoch commented Aug 21, 2020

mschoch commented Aug 19, 2020 •

edited

Loading

jkassis commented Aug 21, 2020 •

edited

Loading