Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.0 Proposal #1495

Open
3 tasks done
mschoch opened this issue Nov 10, 2020 · 11 comments
Open
3 tasks done

v2.0.0 Proposal #1495

mschoch opened this issue Nov 10, 2020 · 11 comments

Comments

@mschoch
Copy link
Contributor

mschoch commented Nov 10, 2020

Last year Bleve adopted Go modules and released v1.0.0. Since that time, we have made several incremental fixes and improvements, but the time has come to release v2.0.0. This issue will describe the scope and impact of this change.

The code for these changes can be reviewed here: #1494

Why?

Releasing a new major version brings with it some pain for users, as it implies some breaking changes to the projects APIs. We do not take this lightly, but we also believe it is sometimes appropriate for the health of the project.

When we released v1.0.0 we were interested in making a release which was largely compatible with the previous pre-modules versions of Bleve. With this in mind, we limited the amount of things we moved around. This led to an unfortunate side-effect we did not properly appreciate at the time. Although we broke the zap segment code out into a separate module (allowing major versions to represent file format changes at that level), the bleve and zap modules continued to have dependencies on one another. Go modules allows for circular dependencies, provided that the underlying packages themselves do not contain circular dependencies. As we later found out, even though this is allowed, it creates significant problems when making releases. Right now making any release of zap or bleve takes approximately 2 hours, during this period some things may not work, and any mistake made along the way often requires starting over.

Further, we know several of you out there are required to maintain your own fork of Bleve, and you all have to navigate these same issues on your own. This situation has become unacceptable, and fixing it is the primary motivation of the 2.0.0 release. Fixing this requires moving some additional pieces around, which is then a breaking change, hence requiring this be 2.0.0 following the semantic versioning guidelines.

Summary of Changes

  • Remove circular dependency between Bleve and Zap modules
  • Make Scorch and Zap v15 the default index/segment type when using the New() method
  • New option to disable freq/norm information for a field
  • Types corrected for MatchQueryOperatorOr and MatchQueryOperatorAnd (see MatchOperator should be that type instead of int #1410)
  • Deprecate upsidedown index format and all key/value adapters
  • Deprecate HTTP sub-package
  • Deprecate the bleve sub-commands: bulk, create, index, dump
  • Deprecate the config sub-package

Changes in Detail

Bleve/Zap Circular Dependency

See the Why? section above to understand the motivation for this change.

Bleve obviously has to depend on scorch and zap, as these are the primary implementations. So the fix to this problem was to rearrange the "shared" interface packages to be their own separate modules. This allows zap to refer to these shared packages, mostly containing interfaces and support structures, without referring back to all of bleve.

The release candidates for these API modules can be found here:

Further, we needed new versions of zap which would use these modules, and not bleve. Unforunately, in zap, we primarily use the major version to indicate file format compatibility. However, since some of the shared data-structures have moved, it wasn't possible for them to remain compatible with bleve 1.x and 2.x at the same time. The solution we arrived out as was to clone the zap repository and name it zapx. All of the major version branches are intact, and the segment file format version numbers remain the same, only these are compatible with bleve 2.x. We can continue to bug-fix bleve 1.x in the zap repository.

We understand this is confusing, and we hope to not need to use this trick again, now that we better appreciate the problems of circular module dependencies.

Going forward, our plan is for the bleve_index_api and scorch_segment_api to remain at 1.x for a long time. If changes are required, they will be done in a backwards compatible way, using the same techniques we used prior to Go modules. Essentially limit changes to new top-level identifiers and discoverable optional interfaces. By doing this, we enable both Bleve and Zap to introduce new major versions independently.

New() defaults to Scorch/Zap v15

Early on in the 1.x planning, we recognized that the default behaviors of methods in the API like New() were essentially part of the API, as programs would break if they were changed. With that in mind, we left upsidedown/boltdb the default index type in v1.0.0.

However, this has led to considerable confusion, as new users routinely run into performance issues, and our first recommendation is to use scorch/zap.

Bleve 2.0.0 will be configured for New() to create a scorch index using zap v15, as this represents the best implementation at this time.

As always, we recommend production applications use the NewUsing(...) method, and explicitly configure the type/version they are expected.

Option to Disable Freq/Norm

The index mapping has always offered ways to configure on/off optional parts of the index like term-vectors and doc-values. However, we have now extended this to allow disabling the frequency and norm information itself, offering additional space savings for indexes that do not require it. Currently zap v15 only offers a limited size reduction, but we hope newer formats may take better advantage of this setting.

Deprecate Index Type Upsidedown and Related Key/Value Storage Adapters

The upsidedown index format was part of the original design of bleve. It has served us well for a long time, and while it has not been the primary focus of developers for some time, we have continued to support its use. However, a significant gap has emerged in supporting the key/value stores over time. In some cases, compilation has been broken, or it only works with specific versions that are not documented.

Further, as we continue to advance the scorch index, it becomes increasingly difficult to support all functionality on all index types. We understand the importance of supporting indexes over time, and we feel we have done that with upsidedown. We are not removing support yet, but now is the time for users to find a path forward. If you have not yet tried scorch, you should. If you have use cases that are not well supported by scorch, now is the time to speak up and let us know. We may remove support for upsidedown in the future.

For most users, reindexing the original data with scorch is the recommended solution. However, if any users wish to explore index conversion, reach out to us.

Known Limitations of Scorch

There are some known limitations of scorch we hope to fix before removing support for upsidedown:

  • Does not support in-memory only mode
  • Does not allow for online-backup
  • Does not allow for multi-process simultaneous read-only access with active writer (works with upsidedown/rocksdb)

Deprecate HTTP sub-package

Bleve has included a set of HTTP handlers to wrap each of the basic indexing/search operations. In practice, these handlers are not useful on their own. Any real application must extend these with their own security/business logic, and thus these serve only as examples.

Based on this, we have decided to deprecate their use, and they may be removed in the future.

Deprecate Bleve Command Line Sub-Commands: bulk, create, index, dump

The bulk, create, and index commands are all extremely limited because the assume and only work with the default index mapping. In practice this represents a very limited use case, and thus these commands are not useful in practice.

The debug command is only implemented by the upsidedown index type (scorch has it's own debug mechanisms inside the scorch sub-command).

For these reasons these sub-commands are being deprecated and may be removed in the future.

Deprecate the config sub-package

The config package offered several blevex imports hidden behind a common set of build tags. These have been removed immediately as they contribute to the circular imports issue (bleve->blevex->bleve). Application that relied on these build tag hidden imports should be able to easily copy/paste these files into their application.

The config sub-package also offers a convenience mechanism for bulk-importing many common bleve components. This remains for now, but is deprecated, and may be removed in the future.

Additional Considerations

Blevex

The blevex package contains several addons to bleve. Many of these are K/V store adapters which will be deprecated, but we are updating them to continue working for now. The proposed changes to update them for bleve 2.x are here:

blevesearch/blevex#50

Remaining Work

  • Each of the API interfaces will be tagged v1.0.0 and where required point to v1.0.0 of each other
  • All dependent repos will be updated to point to v1.0.0 of the APIs
  • There will a final additional commit added to the PR before merging. This will declare the module version as 2, and will update all the internal references to bleve to use correct version.

Summary of API Changes

bleve top-level

  • ErrorUnknownStorageType has been moved to the index/upsidedown package
  • Index.Document() method return type changed to use bleve_index_api Document interface
  • Index.Advanced() method now returns 2 arguments instead of 3. The k/v store is no longer returned. If one needs this, one must type assert the internal index to upsidedown, and then access its Advanced() method.

bleve/analysis

  • TokenLocation moved to bleve_index_api
  • TokenFreq
  • TokenFrequencies

bleve/document

  • Document.ID string changed to func() string
  • IndexingOptions moved to bleve_index_api
  • Field Option() method return type updated for other type move
  • Field Analyze() method changed to not return anything (previous return values accessible via new methods)
  • All field implementations updated to satisfy Field interface changes

bleve/index/scorch

  • Scorch.Update() method now takes bleve_index_api Document interface
  • Advanced method removed
  • Analyze method no longer exported
  • ResetPlugins renamed ResetSegmentPlugins
  • RegisterPlugin renamed RgisterSegmentPlugin
  • RegisterSegmentPlugin now works with scorch package's SegmentPlugin interface
  • IndexSnapshot.FieldDictOnly() method removed
  • IndexSnapshot.Document() now returns bleve_index_api Document interface
  • IndexSnapshot.DocumentVisitFieldTerms() removed
  • SegmentSnapshot.VisitDocument now takes scorch_segment_api StoredFieldValueViitor

bleve/index/upsidedown

  • Analyze method no longer exported
  • IndexReader.Document() now returns bleve_index_api Document interface
  • IndexReader.DocumentVisitFieldTerms no longer exported
  • UpsideDownCouch.UpdateWithAnalysis method now takes bleve_index_api Document interface

bleve/mapping

  • FieldMapping.Options() now returns bleve_index_api FieldIndexingOptions

bleve/search/highlight

  • Highlighter.BestFragmentInField now takes bleve_index_api Document interface
  • Highlighter.BestFragmentsInField now takes bleve_index_api Document interface
  • Simple Highlighter updated to satisfy interface change

bleve/search/query

  • MatchQueryOperatorOr constant changed to type MatchQueryOperator
  • MatchQueryOperatorAnd constant changed to type MatchQueryOperator

bleve_index_api

  • AnalysisResult removed
  • AnalysisWork changed to func()
  • Batch.IndexOps now map[string]Document (this package's Document interface)
  • Batch.Update() now takes this packages Document interface
  • DocumentFieldTermVisitor renamed DocValueVisitor
  • Index.Update() takes this packages Document interface
  • Index.Stats() method removed
  • Index.Advance() method removed
  • Index.Analyze() method removed
  • Index.DumpAll() method removed
  • Index.DumpDoc() method removed
  • Index.DumpFields() method removed
  • Regexp interface removed
  • IndexReaderOnly interface removed
  • FieldTerms type removed

scorch_segment_api

  • ParseRegexp no longer exported
  • LiteralPrefix no longer exported
  • IncrementBytes no longer exported
  • Plugin interface removed
  • DocumentFieldValueVisitor renamed StoredFieldValueVisitor
  • NewUnadornedPostingsIteratorFromBitmap removed
  • UnadornedPostingIteratorBitmap removed
  • NewUnadornedPostingsIteratorFrom1Hit removed
  • UnadornedPostingIterator1Hit removed
  • UnadornedPosting removed
  • EmptySegment removed
  • EmptyDictionary removed
  • EmptyDictionaryIterator removed
  • EmptyPostingsList removed
  • EmptyPostingsIterator removed
  • AnEmptyPostingsIterator removed
  • MaxVarintSize removed
  • IntMin removed
  • IntMax removed
  • EncodeUvarintAscending removed
  • DecodeUvarintAscending removed
  • MemUvarintReader removed
  • NewMemUvarintReader removed
  • ErrMemUvarintReaderOverflow removed
  • Segment.VisitDocument() renamed VisitStoredFields
  • TermDictionary.Iterator() removed
  • TermDictionary.PrefixIterator() removed
  • TermDictionary.RangeIterator() removed
  • TermDictionary.OnlyIterator() removed
  • DocumentFieldTermVisitable renamed DocValueVisitable
  • DocumentFieldTermVisitable.VisitDocumentFieldTerms renamed DocValueVisitable.VisitDocValues

Frequently Asked Questions

I'm confused, what are scorch and upsidedown?

Bleve supports plugging in different index implementations (conforming to a common interface). The original index implementation offered by bleve was named upsidedown. This particular implementation also supports performing storage using pluggable key/value storage implementations.

Scorch is a newer index implementation, developed almost 3 years ago now. Scorch performs it's own storage in files on the filesystem and does not use key/value storage.

Scorch has been used successfully in production by multiple companies for several years now. It has performance advantages and is the focus of current and future development inside bleve. We have continued to support upsidedown indexes, as many users have these indexes deployed in production. But, we can no longer provide adequate support, and we encourage all users to migrate to scorch at this time.

Should I upgrade to Bleve v2.0.0 or use Bluge?

  • Is your application in production? If yes, you should upgrade to Bleve v2.0.0 as Bluge is still in developer preview.
  • Are you basically happy with Bleve? If yes, you should upgrade to Bleve v2.0.0 as for most users the changes required are minimal.
  • Are you unhappy with Bleve? Then you should take a look at Bluge and see if it's changes address your concerns.
@jkassis

This comment has been minimized.

@mschoch

This comment has been minimized.

@jkassis

This comment has been minimized.

@mschoch

This comment has been minimized.

@jkassis

This comment has been minimized.

@mschoch

This comment has been minimized.

@jkassis

This comment has been minimized.

@mschoch

This comment has been minimized.

@mschoch

This comment has been minimized.

@mschoch mschoch changed the title v2.0.0 Proposal (DRAFT) v2.0.0 Proposal Jan 5, 2021
@amnonbc
Copy link
Contributor

amnonbc commented Jan 6, 2021

It would be useful to have some guidance about whether it is best to upgrade to v2.0.0, or to bluge.

Thanks,
Amnon

@mschoch
Copy link
Contributor Author

mschoch commented Jan 6, 2021

@amnonbc I have added an FAQ section to the issue description and included a response to this there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants