Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for dealing with info and root params in resolvers #1848

Closed
patrick91 opened this issue Apr 26, 2022 · 14 comments
Closed

Proposal for dealing with info and root params in resolvers #1848

patrick91 opened this issue Apr 26, 2022 · 14 comments

Comments

@patrick91
Copy link
Member

Just summarising the discussion around how we can approach info and root/self params in resolvers.

Other than the potential root/self confusion (which we can discuss again in another issue), we don't have many issues we the current approach, one issue is naming, right now you can't have any params called info, root or self without using root_: Annotated[str, strawberry.argument(name="root"), fortunately this has not been a big issue (as far as I can tell from not having had any issues on GitHub).

The other issue is that we have some complicated logic to pass arguments to a resolver, removing this might be nice and might also improve performance (although it's not just that that needs to be improved).

We have a couple of proposal to improve how we deal with arguments

Proposal 1: using types to tell what arguments are:

let's say that we have this resolver:

def get_friend_names(user: Root[User], info: Info) -> list[str]:
    return ["Marco"]

Root and Info are two Strawberry types that are used to tell strawberry how should we pass the root value and the info value.

We could make this more magic by detecting the type of root, for example:

def get_friend_names(user: User, info: Info) -> list[str]:
    return ["Marco"]

@strawberry.type
class User:
    friends: list[str] = strawberry.field(resolver=get_friends_names)

Root is redundant because we have all the information to determine where the root value is going to be.

Pros

  1. encourage typing everything in your resolvers
  2. feels less magic

Cons

  1. implementation might be even more complex

Proposal 2: using contextvars

the example above would look like this:

from strawberry.context import context

def get_friend_names() -> list[str]:
    user = context.get("root")
    info = context.get("info")
    return ["Marco"]

Pros

  1. the implementation should be cleaner
  2. feels less magic

Cons

  1. feels more magic
  2. might have some performance impact
  3. Harder to type, root is contextual, you'd need to use a cast

In any case before doing any decision we should try @skilkis approach of compiling resolvers so that we don't get the overhead of arguments (and the rest of the magic) all the time.

My suggestion would be:

  1. Implement proposal 1, since we already use this approach, we can refine it (and also apply to directives)
  2. Improve performance compiling the resolvers
  3. Try contextvars and see if they impact performance and how the api feels

Related issues:

#1672
#374
#1713

@skilkis
Copy link
Contributor

skilkis commented Apr 26, 2022

@patrick91 I've pushed PR #1853 as a proof of concept. This PR extends #1666 with commit 9e76389. The compilation happens in resolver.py and is implemented as follows:

INFO_ARG = "{python_name}=info"
SELF_ARG = "self"
ROOT_ARG = "{python_name}=self"
DYNAMIC_ARGS = "*args"

RESOLVER_SRC = """
def compiled_resolver(self, info, args, kwargs):
    return resolver({arguments})
"""

@cached_property
def compiled_resolver(
    self: StrawberryResolver,
) -> Callable[[StrawberryType, Info, List[Any], Dict[str, Any]], Any]:
    args = []
    kwargs = []

    if self.self_parameter:
        args.append(self.SELF_ARG)
    if self.root_parameter:
        kwargs.append(self.ROOT_ARG.format(python_name=self.root_parameter.name))
    if self.info_parameter:
        kwargs.append(self.INFO_ARG.format(python_name=self.info_parameter.name))

    args.append("*args")
    kwargs.append("**kwargs")

    arguments = ", ".join((*args, *kwargs))
    src = textwrap.dedent(self.RESOLVER_SRC.format(arguments=arguments))

    module_code = compile(src, filename="<compiled_resolver>", mode="exec")
    function_code = module_code.co_consts[0]
    return types.FunctionType(
        function_code, globals={"resolver": self._unbound_wrapped_func}
    )

Note: I've avoided using AST to quickly demonstrate resolver compilation 😊

@BryceBeagle
Copy link
Member

We could make this more magic by detecting the type of root, for example:

Is this feasible? Isn't it be possible to have a resolver accept an argument of the same type as root that just isn't root?

def two_args_one_root(user: Root[User], other_user: User, info: Info) -> list[str]:
    return ["Marco"]
def one_arg_not_root(other_user: User, info: Info) -> list[str]:
    return ["Marco"]

@bellini666
Copy link
Member

Hey guys,

Although I'm fine with both approaches, I'm going to defend the proposal 2 here since I was the one who proposed it :)

Other than the benefits that @patrick91 listed, another interesting one is the possibility of accessing the context from other non-graphql parts of the code. For example, one could access the context's current user inside an ORM model method.

Regarding the cons:

  1. might have some performance impact

The documentation states that the function has an O(1) complexity.

Also, I just noticed that our resolvers actually already run in separate contexts due to the fact that they each are scheduled using asyncio.create_task, and it will call that exact function if no context is passed to it (https://github.com/python/cpython/blob/main/Modules/_asynciomodule.c#L2022).

Also, I did a simple benchmark to test how much overhead calling copy_context and also setting the context to a complex type would add:

import dataclasses
import timeit
from contextvars import ContextVar, copy_context
from typing import Optional


def foo(some_var: int):
    a = some_var**2
    assert a


without_ctxvars = timeit.timeit(lambda: foo(10), number=100_000)


@dataclasses.dataclass
class SomeType:
    a: str
    b: str


x: ContextVar[Optional[SomeType]] = ContextVar("x", default=None)


def bar(some_var: int):
    x.set(SomeType(a="foo", b="bar"))
    a = some_var**2
    assert a


with_ctxvars = timeit.timeit(lambda: copy_context().run(bar, 10), number=100_000)

print("without ctxvars", with_ctxvars)
print("with ctxvars", with_ctxvars)
print("diff", with_ctxvars - without_ctxvars)

The results were:

without ctxvars 0.06831309996778145
with ctxvars 0.06831309996778145
diff 0.04401913395849988

In here we have a 0.04 difference from foo to bar after calling both functions 100k times.

Also, I would say that this is probably faster than having to inspect each resolver at runtime, but I don't know how it would compare with the compiled solution from @skilkis

  1. Harder to type, root is contextual, you'd need to use a cast

Using the proof of concept I wrote, the main difference is that the typing would need to be defined at the variable level and not at the parameter level. For example, instead of:

class MyCustomRoot:
   ...

def some_resolver(info: Info, root: MyCustomRoot):
    root  # this is MyCustomRoot instance
    info  # this is Info instance

We would do:

class MyCustomRoot:
   ...

def some_resolver():
    ctx: Context[Info, MyCustomRoot] = context
    ctx.root  # this is MyCustomRoot instance
    ctx.info  # this is Info instance

We could expose a function called get_context instead of exposing context directly so that it would make more sense to pass and type ctx, like:

class MyCustomRoot:
   ...

def some_resolver():
    ctx: Context[Info, MyCustomRoot] = get_context()
    ctx.root  # this is MyCustomRoot instance
    ctx.info  # this is Info instance

@skilkis
Copy link
Contributor

skilkis commented May 4, 2022

@bellini666 thanks for your in-depth write-up and the performance benchmarks! I like you approach of adding type-annotations to the contextvars as well! For completeness, I added an approximation of the compiled approach and rewrote the benchmark a bit to make it similar to what we currently do with parsing the arguments of resolvers. This is what I ran on my machine:

import dataclasses
import timeit
from contextvars import ContextVar, copy_context
from typing import Any


class Info:
    pass


def foo():
    info = Info()
    root = object()
    kwargs = {}
    if foo_resolver:
        kwargs["info"] = info
    if foo_resolver:
        kwargs["root"] = root
    return foo_resolver(**kwargs)


def foo_resolver(info, root):
    i = info
    r = root


without_ctxvars = timeit.timeit(lambda: foo(), number=100_000)


def foo_compiled():
    info = Info()
    root = object()
    return compiled_resolver(info, root)


def compiled_resolver(info, root):
    foo_resolver(info, root)


without_ctxvars_compiled = timeit.timeit(lambda: foo_compiled(), number=100_000)


@dataclasses.dataclass
class Context:
    info: Info
    root: Any


ctx: ContextVar[Context] = ContextVar("Context")


def bar():
    ctx.set(Context(info=Info(), root=object()))
    return bar_resolver()


def bar_resolver():
    current_context = ctx.get()
    i = current_context.info
    root = current_context.root


with_ctxvars = timeit.timeit(lambda: copy_context().run(bar), number=100_000)

print("without ctxvars", without_ctxvars)
print("without ctxvars compiled", without_ctxvars_compiled)
print("with ctxvars", with_ctxvars)
print(
    "diff (with_ctxvars vs. without_ctxvars_compiled)",
    with_ctxvars - without_ctxvars_compiled,
)
print(
    "times faster (without_ctxvars_compiled vs. with_ctxvars)",
    with_ctxvars / without_ctxvars_compiled,
)

The result of the above is as follows for my machine (Python 3.7):

without ctxvars 0.0455207
without ctxvars compiled 0.02955300000000001
with ctxvars 0.08794679999999999
diff (with_ctxvars vs. without_ctxvars_compiled) 0.05839379999999998
times faster (without_ctxvars_compiled vs. with_ctxvars) 2.9759009237640837

I added the last print statement to highlight that with a compiled approach we could have executed ~3x more requests. Even though contextvars is brilliant and is implemented in O(1) time complexity, it adds overhead to set and get items compared to simply passing them. From a pure performance standpoint I would still go with a compiled approach at the moment, but I think it is wise to test both solutions in a non-synthetic benchmark. Also, there are of course other things to consider such as maintainability and ease of use as compared to just looking at this from a performance standpoint 😊

@skilkis
Copy link
Contributor

skilkis commented May 23, 2022

@patrick91 @bellini666 when would be a good time to discuss these proposals? I think we can make quick progress on this if we have a chat. I'm available this week from Wednesday on-wards

@patrick91
Copy link
Member Author

hi @skilkis, I was going to send a message about this as I was working on related ideas for a talk. Other than potential performance implication of contextvars having parameters will make future optimisation even easier.

We chat about this on a voice call on discord if you want, I'm going back to london tomorrow. And I should be available on the 26th and 27th :)

@bellini666
Copy link
Member

@skilkis @patrick91 26th/27th both works for me, waiting for time suggestions :)

@patrick91
Copy link
Member Author

I've created a doodle here: https://www.when2meet.com/?15742118-kmPJs

@skilkis
Copy link
Contributor

skilkis commented May 24, 2022

I've filled in the doodle, hopefully we can find a suitable time for all of us! 😊

@patrick91
Copy link
Member Author

I've created an event here: https://discord.gg/XRRHgBUY?event=978459358583201792

It should be at 16.00 London time, hopefully I didn't screw up the time :D

@skilkis
Copy link
Contributor

skilkis commented May 27, 2022

@patrick91 @bellini666 In preparation of our discussion today I've made a benchmark of runtime compilation of Python functions to see if there is any performance penalty. Although it will presumably add to the startup time, at runtime it makes no difference for Python to run code generated by compile or that defined in code. The glories of an interpreted language 😄

compile_performance_dracula
compile_performance_dufte

As discussed in PR #1853, the approach would be to compile a wrapper function that always takes self, info, and root, and selectively passes these onto the original resolver function. This is to keep a user's code debuggable. An alternative is to re-write a user's resolvers function to add self, info, and root by doing AST manipulation. However it is not yet known if by doing so the function would remain debuggable.

On another aspect, since this might add to the start-up time of Strawberry on a large code-base. It might be possible to leverage py_compile to rewrite entire source files in one go. However, this then requires overriding the byte-code hash to trick Python into thinking that the rewritten source code is identical to the original. For this purpose we could use the UNCHECKED_HASH flag.

Benchmarking Code
"""Demonstrate that runtime-compiled bytecode is just as fast as compile-time."""

import textwrap
import types
from typing import Any, Dict

import matplotx
import perfplot
from matplotlib import pyplot as plt


def resolver(x):
    return x


def foo(x):
    resolver(x)


def compile_function(
    source: str,
    globals: Dict[str, Any],
    optimize: int = -1,
):
    dedented_source = textwrap.dedent(source)

    module_code = compile(
        dedented_source, filename="<compiled_resolver>", mode="exec", optimize=optimize
    )
    function_code = module_code.co_consts[0]
    return types.FunctionType(function_code, globals=globals)


function_str = "def foo_compiled(x): resolver(x)"  # Same as :py:func:`foo` above
foo_compiled = compile_function(function_str, globals=globals())
foo_compiled_optimized = compile_function(function_str, globals=globals(), optimize=2)


def test_kernel(n, fn):
    for _ in range(n):
        fn(n)


results = perfplot.bench(
    setup=lambda n: n,
    kernels=[
        lambda n: test_kernel(n, foo),
        lambda n: test_kernel(n, foo_compiled),
        lambda n: test_kernel(n, foo_compiled_optimized),
    ],
    labels=["fn", "compiled fn", "compiled & optimized fn"],
    n_range=[2 ** k for k in range(1, 20)],
    xlabel="$n_{calls}$",
    equality_check=None,
)
for style in ("dufte", "dracula"):
    plt.style.use(getattr(matplotx.styles, style))
    results.save(
        f"compile_performance_{style}.svg", transparent=False, bbox_inches="tight"
    )

@patrick91
Copy link
Member Author

We just had our meeting on this, we'll be trying the approach I've highlighted in the first post:

My suggestion would be:
Implement proposal 1, since we already use this approach, we can refine it (and also apply to directives)
Improve performance compiling the resolvers
Try contextvars (in future) and see if they impact performance and how the api feels

😊

@patrick91
Copy link
Member Author

I'll close this (see the previous comment)

@Speedy1991
Copy link
Contributor

Hi guys, sorry to reactivate this old discussion - but is there a follow up issue/MR? It looks like this went out of scope?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants