Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible concurrency issues with caches? #2117

Open
JamalQasem opened this issue Oct 2, 2024 · 8 comments
Open

Possible concurrency issues with caches? #2117

JamalQasem opened this issue Oct 2, 2024 · 8 comments

Comments

@JamalQasem
Copy link

We are using SqlKata/Dapper(2.1.35) to implement a Web-API for a Postgres database with
around 20 tables. The API contains methods to query and manipulate data. It does not use
stored procedures. We test the repository methods of the API with around
200 unit tests, which run transactions against a test database rolling them back instead of committing.

Recently one of our tests started failing with the following error message:
Npgsql.PostgresException : 23503: insert or update on table "___" violates foreign key constraint "___"

We then annotated the offending test case with

[TestFixture, RequiresThread(ApartmentState.STA)]

and the problem vanished. All our repository methods rely on functions like

    protected async Task<IEnumerable<TEntity>> QueryAsync(Query query, IDbTransaction transaction, CancellationToken token)
    {
        var command = CompileQuery(query, transaction, token);
        return await transaction.Connection!.QueryAsync<TEntity>(command).ConfigureAwait(false);
    }

    private async Task<int> ExecuteAsync(Query query, IDbTransaction transaction, CancellationToken token)
    {
        var command = CompileQuery(query, transaction, token);
        return await transaction.Connection!.ExecuteAsync(command).ConfigureAwait(false);        
    }

We first checked that the queries use different connections of the connection pool and experimented
with the command flag NoCacheto no avail.
We finally decided to serialize the access using a semaphore.
This together with a call of SqlMapper.PurgeQueryCache(); made the failing test working again in MT.

The current implementation looks as follows

    protected async Task<IEnumerable<TEntity>> QueryAsync(Query query, IDbTransaction transaction, CancellationToken token)
    {
        await _semaphore.WaitAsync().ConfigureAwait(false);
        try
        {
            var command = CompileQuery(query, transaction, token);
            return await transaction.Connection!.QueryAsync<TEntity>(command).ConfigureAwait(false);
        }
        finally
        {
            SqlMapper.PurgeQueryCache();
            _semaphore.Release();
        }
    }

We suspect that we somehow manage to use different statements which map
to the same cache Identity so that concurrently running queries may use
a wrong cache entry. Leaving us with 2 unanswered questions:

  • Did anyone run into the same issue?
  • Is there a way to completely avoid caching?
@mgravell
Copy link
Member

mgravell commented Oct 2, 2024

Hmm; this is odd. There are ways to bypass all the ref-emit caching; the one I'd say "try first", however, is the DapperAOT work. This is a little hard to opine on in this specific case because of your Query type, which is an unknown to me. The other option is the NoCache flag, which you say you have already been using.

However, I wonder whether we've ruled out simpler causes, such as the data genuinely violating FK rules for data reasons, unrelated to Dapper.

@goerch goerch mentioned this issue Oct 3, 2024
@JamalQasem
Copy link
Author

Thank you very much for your swift reply.

Hmm; this is odd. There are ways to bypass all the ref-emit caching; the one I'd say "try first", however, is the DapperAOT
work.

As far as we know DapperAOT does not support all functionalities that Dapper provides.

This is a little hard to opine on in this specific case because of your Query type, which is an unknown to me.

The used Query type originates from the SqlKata query builder and generates a SQL command string which is injected in the
Dapper command definition like so:

var statement = _compiler.Compile(query).ToString();
return new CommandDefinition(statement, transaction:transaction, 
        cancellationToken:token, flags:flags);

The other option is the NoCache flag, which you say you have already been using.

NoCache seems to work for us with PR #2118.

However, I wonder whether we've ruled out simpler causes, such as the data genuinely violating FK rules for data reasons, unrelated to Dapper.

The FK violation is the result of the previous call to MultiMapAsync silently returning wrong data.

@goerch
Copy link
Contributor

goerch commented Oct 5, 2024

We suspect that we somehow manage to use different statements which map
to the same cache Identity so that concurrently running queries may use
a wrong cache entry.

@mgravell: we were able to reduce the failing scenario to two test cases running the same statement concurrently. Unfortunately these seem to have identical Dapper Identity hash codes although the associated DbDataReaders are different. It looks to me as if SqlMapper's GetColumnHash() computation doesn't incooperate the actual reader? We would be grateful about advice how to proceed.

@JamalQasem
Copy link
Author

Unfortunately these seem to have identical Dapper Identity hash codes although the associated DbDataReaders are different. It looks to me as if SqlMapper's GetColumnHash() computation doesn't incooperate the actual reader?

A change like

        private static int GetColumnHash(DbDataReader reader, int startBound = 0, int length = -1)
        {
            unchecked
            {
                int max = length < 0 ? reader.FieldCount : startBound + length;
                int hash = -37 * (Thread.CurrentThread.GetHashCode() + startBound) + max;
                for (int i = startBound; i < max; i++)
                {
                    object tmp = reader.GetName(i);
                    hash = (-79 * ((hash * 31) + (tmp?.GetHashCode() ?? 0))) + (reader.GetFieldType(i)?.GetHashCode() ?? 0);
                }
                return hash;
            }
        } 

passes the Dapper test suite and seems to solve our original problem.

@mgravell
Copy link
Member

mgravell commented Oct 7, 2024

@JamalQasem incorprating a thread-identifier into the hash simply masks the problem - it is artificially forcing more lookup misses, by only allowing requests on the same thread to ever equate - however, if there is an equality collision problem, it does nothing to prevent the actual problem if it occurs from the same thread. So: the worst of every outcome, even if it passes tests.

However: I am very open to us needing to radically rework the check here. Identity already incorporates a number of high-level checks including the SQL, parameter type, result type, and even connection-string metadata (in case of different databases having different schemas). The column-hash here is intended as a final smoke test along the lines of "did somebody run a DDL command on the database and change the schema in between two commands?". But: if that isn't working well in a scenario: let's explore that and fix it.

So: can I get more info about the scenario here? In particular, because Identity already includes those aforementioned checks, the only way I can imagine this happening is with a multi-purpose stored procedure that returns radically different shapes for different inputs, i.e. where MyProc @Mode=1, @Name='abc', @Id=123 returns radically different columns to MyProc @Mode=2,@Name='abc',@Id=123, and where by some complete fluke: the column hash comes back as identical.

The "real" solution here is to radically rework the way the column lookup works: we've already done that work over in AOT, which is why I say that if possible, "move to AOT" may be a quick fix. It is stupendously hard to fully rework that into Dapper "vanilla" (see #1909), however, we might be able to do some compromise that fixes the underlying problem. For example, instead of a column hash, we could store a string of the column type data for "new" records, compare against that full string (ideally without actually allocating a string each test time, but that's an implementation detail - either via forwards crawl, or "generate to a buffer, compare buffers").

So: @goerch - I want to explore this, but to do that: do you have a concrete example I can look at, that fails?. In particular, is my "SP that returns different things in different contexts" guess even remotely correct? This would also seem to be consistent with your mention of "the same statement concurrently"; again, normally expects that the same statement, with the same input and output types, is going to have the same fundamental shape. If that expectation is wrong: we can fix it.

@goerch
Copy link
Contributor

goerch commented Oct 7, 2024

@mgravell: many thanks for your support!

do you have a concrete example I can look at, that fails?

We are currently lucky(?) enough that our problem occurs rather deterministically, which is not a given in MT enviroments. We'll next try to reduce and anonymize the tests, which will need a bit of time because some infrastructure components are involved.

@goerch
Copy link
Contributor

goerch commented Oct 8, 2024

@mgravell: we have published a reduced test case which hopefully helps to reproduce the issue.

@goerch
Copy link
Contributor

goerch commented Oct 15, 2024

Unfortunately our analysis so far seems to be incomplete or wrong: even with full serialization and purging the cache we started to see the problem in our production code. I'm out of my depth trying to investigate the details of Dapper deserialization. Our workaround for now is to use a simple implementation of the needed Dapper interfaces using ADO.NET and basic reflection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants