Skip to content
Yatao Li edited this page Aug 7, 2017 · 1 revision

id: LINQ title: Language-Integrated Query permalink: /docs/manual/DataAccess/LINQ.html

Cell Enumeration

With the cell selectors, we can select all cells of a specific type in the local memory storage, wrapped in an IEnumerable<CellType> or IEnumerable<CellType_Accessor>. This interface exposes basic enumeration capabilities. By itself, an IEnumerable<T> is nothing more than a container where it can pump out elements one after another -- similar to making iterations through the whole database with cursors in other databases. It does not provide indexer so we cannot take an element by specifying a subscript; there is no rewind facilities so U-turn and revisiting an element is impossible.

Enumerable Collection Operators

Custom logic can be performed on the cells when iterating through them. The .NET framework provides a set of static methods for querying enumerable collections. For a complete list of query methods, refer to MSDN.

With the extension methods provided by System.Linq.Enumerable, we can use the cell selectors to manipulate data in a succinct style. Instead of writing data processing logic in a foreach loop, we can use the query interfaces to extract and aggregate information in a declarative way. For example, instead of writing:

var sum = 0;
foreach(var n in Global.LocalStorage.Node_Selector())
    sum += n.val;

We can simply write:

var sum = Global.LocalStorage.Node_Selector().Sum(n=>n.val);

Or:

var sum = Global.LocalStorage.Node_Selector().Select(n=>n.val).Sum();

The code is kept away from intermediate states(e.g., the sum variable in this example) and internal implementations. In GE, certain query optimizations can be done automatically by the query execution engine to leverage the indexes defined in TSL. More specifically, it inspects the filters, extracts the substring queries, and redirects them to proper substring query interfaces generated by the TSL compiler. The basic rule of expression rewriting is as follows:

  • Select operators are not allowed to return accessors.

  • For a Where operator, if there is an invocation of String.Contains on a string field of a cell and the field is indexed, the invocation sent to the inverted index module as a substring query.

  • If a string container field (such as a list of strings or an array of strings) is marked as indexed, the TSL compiler will generate extension methods ContainerType.Contains which accepts same parameters as those on System.String. Invocation of these methods are also executed as inverted index queries.

{% comment %} Note: This subsection covers some system implementation details and you can safely skip it at your first reading.

GE translates the query on a selector as an action performed over every cell of a specific type. Logically, there is no much difference from implementing the logic imperatively. However, with a certain pattern found in the query expression, GE will rewrite the expression for optimization.

Let's view a query expression as a chain S->E_1->E_2->...->E_n, where S denotes a selector and E_i denotes an query operator (a method from System.Linq.Enumerable). Let W_i denote the Where operators, and S denote the first Select operators in the chain. GE will overlook all query operators after S since after a Select operator the data is projected into something that is not defined in the TSL (projecting accessor to accessor is not allowed), and thus not available in any substring indicies defined in TSL. Now, let W_1,...,W_m denote all the Where operators before S (not necessarily consecutive). These are all the conditional filters applied onto the native cells(without projection into other types), so we combine them together as W1 and W2 and ... and Wn and regard this expression as a whole. GE then examines the expression and aggregates String.Contains invocations on cell fields into a expression tree. All the expressions under a NOT operator are ignored. This is because making a substring query then obtain its compliment set would usually yield too many results to process, in which case we would have better ignored this rewritting. {% endcomment %}

Language-Integrated Query (LINQ)

LINQ is a convenient way to query a data collection. The expression power of LINQ is equivalent to those extension methods provided by the System.Linq.Enumerable class, only more convenient to use. The following example demonstrates LINQ in GE versus its imperative equivalent:

/*==========================  LINQ version ==============================*/ 
var result = from node in Global.LocaStorage.Node_Accessor_Selector()     
             where node.color == Color.Red && node.degree > 5             
             select node.CellID.Value;                                    
/*==========================  Imperative version ========================*/
var result = Global.LocalStorage.Node_Accessor_Selector()                      
            .Where(  node => node.color == Color.Red && node.degree > 5 )
            .Select( node => node.CellID.Value  );

Both versions will be translated to the same binary code; the elements in the LINQ expression will eventually be one-to-one mapped to the imperative interfaces provided in System.Linq.Enumerable class. But, with LINQ we can write cleaner code. For example, if we try to write an imperative equivalent for the following LINQ expression, a nested lambda expression must be used.

 var positive_feedbacks = from user in Global.LocalStorage.User_Accessor_Selector()
                          from comment in user.comments
                          where comment.rating == Rating.Excellent
                          select new 
                          {
                            uid = user.CellID,
                            pid = comment.ProductID
                          };

Parallel LINQ (PLINQ)

PLINQ(MSDN) is a parallel implementation of LINQ. It runs the query on multiple processors simultaneously whenever possible. Calling AsParallel() on a selector will turn it into a parallel enumerable container that works with PLINQ.

{% comment %} However, due to the limitations (described below), using PLINQ over cell accessors natively, so the AsParallel() interface of cell accessor selectors are overridden and returns a Trinity.Linq.PLINQWrapper that delays an unsupported PLINQ query to the next query operator(until it's supported). {% endcomment %}

Limitations

There is a limitation of IEnumerable<T>: IDisposable elements are not disposed along the enumeration. However, disposing a cell accessor after use is crucial in GE, and a non-disposed cell accessor will result in the target cell being locked permanantly.

This has led to the design decision made in GE, that we actively dispose a cell accessor when the user code finishes using the accessor in the enumeration loop. As a result, it is not allowed for a user to capture the value/reference of an accessor during an enumeration and store it somewhere for later use. Because the reference will be destroyed and the value will be invalidated immediately after the enumeration loop body, any operation done to the stored value/reference will cause data corruption or system crash. This is the root cause for the following limitations:

  • Select operator cannot return cell accessors, because the accessors are disposed as soon as the loop is done.

  • LINQ operators that cache elements (such as join, group by) are not supported.

  • PLINQ caches some elements and then distributes them to multiple cores, therefore it will not work with cell accessors. It does work with cell object selectors, though.

  • Although enumeration operation will not block the whole database, it does employ trunk-level locks. Compound LINQ selectors with join operations are not supported, because the inner loop will try to obtain the trunk lock already taken by the outer one.

Clone this wiki locally