Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the NEB Tm calculator behavior in pydna #237

Open
BjornFJohansson opened this issue Jun 25, 2024 · 28 comments
Open

Reproducing the NEB Tm calculator behavior in pydna #237

BjornFJohansson opened this issue Jun 25, 2024 · 28 comments

Comments

@BjornFJohansson
Copy link
Owner

BjornFJohansson commented Jun 25, 2024

As per the 4th Pydna meeting, it was generally agreed that Pydna and tools that depend on it would benefit from being able to simulate modern commercial Tm calculators offline.

I have compiled data around this here

The NEB.ipynb notebook contain attempts to replicate the NEB Tm calculator simply by selecting parameters in the biopython Tm module. This was successful for the standard Taq polymerase with the buffer composition stated by NEB.

I was not able to guess the result for the Q5 polymerase. A contributing factor is the secrecy of the buffer composition.

The NEB logics seems contained in the file NEB/NEB_website/NEB Tm Calculator_files/main-3d92a74abb.js

I formatted the code with an online js formatter in the file main-3d92a74abb_formatted.js.

The variables below seem important, but I was not able to follow the logics.

r.TmCalc.prototype.nnBr
r.TmCalc.prototype.nnSa
r.TmCalc.prototype.nnde
r.TmCalc.prototype.nndhds
r.TmCalc.prototype.nnloop
r.TmCalc.prototype.nnbulge
r.TmCalc.prototype.nntmm
r.TmCalc.prototype.R=1.987,
r.TmCalc.prototype.dSBr
r.TmCalc.prototype.dSSa
r.TmCalc.prototype.dHBr
r.TmCalc.prototype.dHSa
r.TmCalc.prototype.init
r.TmCalc.prototype.saltCorrect
r.TmCalc.prototype.setCt
r.TmCalc.prototype.setMonosalt
r.TmCalc.prototype.setDisalt
r.TmCalc.prototype.setDMSO
r.TmCalc.prototype.buildPairMap

Some other people have worked on similar issues here and here

I think this is worth spending some time, since interestingly, the Thermofisher Tm calculator here gives similar, but not identical results and claims to use similar but not identical thermodynamic data.

@manulera
Copy link
Collaborator

@BjornFJohansson @hiyama341 , it would be great if you could provide:

  • The place where you could see the javascript code
  • An example request to the endpoint

@BjornFJohansson
Copy link
Owner Author

BjornFJohansson commented Jun 27, 2024

Nearest Neighbor Method explained: https://www.youtube.com/watch?v=jy5h9NwBEgk

@BjornFJohansson
Copy link
Owner Author

BjornFJohansson commented Jun 28, 2024

Minimal example extracted from teemi. The results are identical to the web interface.

import json
import requests

# NEB example primers:
primers = """\
AGCGGATAACAATTTCACACAGGA
GTAAAACGACGGCCAGT
AGCGGATAACAATTTCAC
AGCGGATAAGGGCAATTTCAC
GTAAAACGACGGCCA""".splitlines()

conc=0.5
prodcode="q5-0"
url = "https://tmapi.neb.com/tm/batch"

for primer in primers:
    seqpairs = [[primers]]
    input_ = {"seqpairs": seqpairs, "conc": conc, "prodcode": prodcode}
    headers = {"content-type": "application/json"}
    res = requests.post(url, data=json.dumps(input_), headers=headers)
    r = json.loads(res.content)
    
    if r["success"]:
        for row in r["data"]:
            print(row["tm1"])
    else:
        print("request failed")
        print(r["error"][0])
        
# Result  
# 66
# 62
# 57
# 65
# 59

@MarkusPiotrowski
Copy link

I had settings for Biopython's MeltingTemp to simulate Q5 and Phusion Tm calculations on the NEB website some time ago (at that time, calculations for Q5 and Phusion gave different results), but obviously they changed some settings in their calculator.

@BjornFJohansson
Copy link
Owner Author

Yes, I get wildly different results right now. I would like to mimic the behaviour of the NEB calculator, the source code is not easy to follow though.

@manulera
Copy link
Collaborator

I had a look at the js code and it's compiled by babel, so I don't think it is possible to extract the source code to reverse-engineer without spending a lot of time. Some possible solutions to consider @BjornFJohansson @hiyama341:

  • Find a first approximation using the default tm calculator, then call the api
  • Use asynchronous calls to the API when running several primers in parallel

They would both take some thinking, but they seem feasable, so it's a matter of how much is this slowing you down.

manulera added a commit that referenced this issue Jul 16, 2024
@manulera
Copy link
Collaborator

manulera commented Jul 16, 2024

@hiyama341 I went ahead and made a first version using the mentioned approximation (first with the default pydna, then with the one from neb, and it's faster):

dev_bjorn...issue_237_optimisation

The file example.py compares the two approaches for finding a primer and you can see that the approximation is faster and uses fewer requests. This is what it prints. Let me know what you think.

Also, I could not use the teemi functions to make the request with "normal" headers. I was getting 403 response from the server. I had to mimmic a browser request to be able to use the API. Not sure if you encounter these problems. I could not even run their curl example. See the headers here.

making a request
making a request
making a request
making a request
making a request
result atgtcgtATGaaaccgttatcgatcatatgtGcgaaatgtcgcgcgtcatctacgtatcatcgatctactTAAacgtgta
--- 1.4502029418945312 seconds ---
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
making a request
result atgtcgtATGaaaccgttatcgatcatatgtGcgaaatgtcgcgcgtcatctacgtatcatcgatctactTAAacgtgta
--- 6.762638092041016 seconds ---

@hiyama341
Copy link
Collaborator

@manulera and @BjornFJohansson great work!

Yeah, two days ago I started getting the 403 response from the server. They must have changed something cause it was working just fine before. And thanks I'm gonna try to add the headers to teemi and see if it works.

Regarding the workload, maybe we can discuss on the 5th pydna meeting whether it is worth to reverse-engineer the primer calc function. Personally, I think it would be useful to have something fast and stable when we are calculating/making hundreds of primers but let's discuss next week.

@BjornFJohansson
Copy link
Owner Author

I agree with @hiyama341 on this one. Localized Tm code would be very useful and I suspect would convince potential users.

@manulera
Copy link
Collaborator

I also agree that ideally we would want the code in the library and not rely on an external API that may cut access like they just did.

The issue is that it's not as simple as "translating the javascript code into python". The javascript code we can access from their website is bundled code. This is generated by javascript libraries like Babel, which basically make a single javascript file with all the code for the page, without dependencies on external files or libraries, and maximizing browser compatibility. This means that the source code that people in NEB wrote looks very different (and much more understandable) that what is in that file which is not meant to be read by humans, so reverse-engineering would take a lot of time and knowledge of javascript.

@MarkusPiotrowski
Copy link

Why don't you use Biopython's MeltingTemp and find the correct settings to simulate NEB's TM calculator? Isn't Biopython a library that is likely to be used together with PyDNA?

@manulera
Copy link
Collaborator

manulera commented Jul 17, 2024

Hi @MarkusPiotrowski NEB's calculator gives different values depending on the polymerase that will be used, and overall does a more complex calculation from what the others said. When I was doing PCR I was never too concerned with exact Tm calculations but it seems some people trust NEB's calculator over anything else and would exclude tools that use other Tm calculations.

@MarkusPiotrowski
Copy link

@manulera I'm the author of Biopython's MeltingTemp. There are several options that are usually applied for calculating Tm's with the nearest neighbor thermodynamic method:

  1. Initialization (different methods are used in literature).
  2. "Zipping", again different tables with thermodynamic data are available.
  3. Salt correction. There are different methods and, of course, you can set different values.

I would wonder if NEB uses a "more sophisticated" method and if their results can't be simulated with Biopython's MeltingTemp. I guess it's a question of finding the right parameters, and there are numerous possibilities for them. And of course they may have different parameters for different enzymes.

Their own help page gives some details (if they stick to them and is still up-to-date): https://tmcalculator.neb.com/#!/help
E.g. they write that they use different salt correction method for Phusion.

@BjornFJohansson
Copy link
Owner Author

BjornFJohansson commented Jul 17, 2024

@MarkusPiotrowski @manulera @hiyama341 Please have a look at this comparison Biopython vs NEB Seems like there is a ~7 degree difference for a large list of primers. I replicated the behaviour as best I could given the sometimes incomplete information given by NEB.

@BjornFJohansson
Copy link
Owner Author

In this repo they somehow run the NEB Tm Calculator from Python.

https://github.com/kalekundert/ligrna

@MarkusPiotrowski
Copy link

@BjornFJohansson Thanks for the pointers tho these pages.

For Taq polymerase I now managed to simulate NEB's Tm calculator with Biopython's MeltingTemp. In NEB's calculator, they obviously do not correct for Mg ions. So the Mg2+ concentrations and dNTPs concentrations are irrelevant for their calculation. Thus, we need to remove them from the call to Tm_NN (when Mg is 0 [default], dNTPs are not considered). Then it fits nicely:

import Bio
from Bio.SeqUtils import MeltingTemp as mt

primers = (
    'AGCGGATAACAATTTCACACAGGA',
    'GTAAAACGACGGCCAGT',
    'AGCGGATAAGGGCAATTTCAC',
    'GTAAAACGACGGCCA',
    )

NEBtms = (58, 54, 57, 50) # from NEB Tm calculator

for temp, primer in zip(NEBtms, primers):
    tm = mt.Tm_NN(
        primer,
        nn_table=mt.DNA_NN3,
        Na=0,  # mM
        K=50,  # mM
        Tris=10,  # mM
        # Mg=1.5,
        dnac1=200,  # nM
        dnac2=0,  # nM
        dNTPs=0.8,  # mM
        saltcorr=6,
    )
    print(
        f'Primer: {primer}, Tm: NEB: {temp}, Biopython: {round(tm)}'
        f' delta: {round(tm - temp, 2)}'
    )

Result:

Primer: AGCGGATAACAATTTCACACAGGA, Tm: NEB: 58, Biopython: 58 delta: -0.39
Primer: GTAAAACGACGGCCAGT, Tm: NEB: 54, Biopython: 54 delta: -0.04
Primer: AGCGGATAAGGGCAATTTCAC, Tm: NEB: 57, Biopython: 57 delta: -0.05
Primer: GTAAAACGACGGCCA, Tm: NEB: 50, Biopython: 50 delta: 0.36

The deltas are of course because NEB's calculator already gives rounded values.

As I said, one need to find the 'correct' settings. I'll go on to check for Phusion and Q5.

@MarkusPiotrowski
Copy link

MarkusPiotrowski commented Jul 18, 2024

I can't find the composition of Phusions HF buffer, however, looking on the assay conditions and some assumption, I would suggest the following settings:

# For Phusion, use this settings
tm = mt.Tm_NN(
        primer,
        nn_table=mt.DNA_NN3,
        Na=0,  # mM
        K=50,  # mM
        Tris=25,  # mM
        Mg=1.5,
        dnac1=200,  # nM
        dnac2=0,  # nM
        dNTPs=0.8,  # mM
        saltcorr=1,
)

These give values that derive from the rounded NEB data by about 0.2 °C.

As written in their Help page, for Phusion they use a different salt correction method and, obviously, there they include Mg and dNTPs concentrations in their calculation.

@MarkusPiotrowski
Copy link

These settings fit for Q5:

tm = mt.Tm_NN(
        primer,
        nn_table=mt.DNA_NN3,
        Na=0,  # mM
        K=50,  # mM
        Tris=10,  # mM
        Mg=1.5,
        dnac1=200,  # nM
        dnac2=0,  # nM
        dNTPs=0.8,  # mM
        saltcorr=6,
    )

@BjornFJohansson
Copy link
Owner Author

Thank you all for your contributions. Ill look into it asap.

@MarkusPiotrowski
Copy link

Let me add a little bit to the discussion:

While I could provide settings to simulate quite well the Tm values of NEB's Tm calculator (see above, fine for Taq and Q5, some small deviation with Phusion sometimes), I wonder if this is a goal you want to go for.
Why is NEB the source you want to trust? As already mentioned in the introduction by @BjornFJohansson , FisherScientifics' Tm calculator gives largely (3-4 °C) different results for Phusion than NEB's calculator. These calculators may also change (well, they changed in the past), you never know if they do the calculation correct (if you just want to simulate you may want to ignore this), and it's hard to find the correct parameters. And these may also change in the future.

When I rewrote Biopython's MeltingTemp code some years ago, I also compared the results to other calculators (mostly to check if I do the math correctly), but I focused on established academic tools, at that time this was MELTING and Primer3 or Primer3Plus. Online calculators from companies were hard to compare with because except some general statements ("using nearest neighbor.... according to...."), detailed information about the exact usage of the algorithm (which thermodynamic tables do they use, how do they calculate the initialization, which salt correction algorithm do they use) were sparse.

BTW, this source https://github.com/kalekundert/ligrna (mentioned by @BjornFJohansson ) seems to contain the pre-2016 code? (it mentions Breslauer, that is not mentioned in the recent pages).

Also, I browsed through the Javascript and while I can find the calculations and published constants, I don't find the actual buffer salt concentrations.

@manulera
Copy link
Collaborator

Hi @MarkusPiotrowski thanks for testing, it is very appreciated.

Elaborating a bit more on why we were trying to match NEB's results: As far as I understand it, the reason for wanting to reproduce the NEB's calculations is not so much to achieve the most "thermodynamically correct" results. Instead, from what @hiyama341 was saying, some researchers trust NEB over other calculators because they have been using it for a long time, so it's more an issue of "habit" rather than accuracy. Because of this, they would not trust/use a tool that uses other calculators and this could be an issue for some of the tools built on top of pydna that we are working on, namely teemi and ShareYourCloning.

@MarkusPiotrowski
Copy link

@manulera Thank you for the explanation. You know your users better than I, so you can probably be convinced that they prefer NEB's calculator. Some years ago we had a Promega (now Thermo) freezer in the neighboring department, so we got our enzymes usually from them, and at that time I would have preferred to use their calculator.

What I would sum up from my testing and also by taking a look at the Javascript code from NEB's calculator:
I don't see any magic in the JS code and it should be possible to use Biopython's MeltingTemp to simulate their results. Interestingly, the code contains much more thermodynamic tables and formulas than they claim to use (in fact, I wouldn't wonder if they use Biopython's MeltingTemp as template).

Above you have the settings that work well. Maybe some hints:

  • Na, K and (Tris/2) sum up to give one 'monovalent' salt concentration. Mg may be added to that as follows: 120*([Mg2+] - [dNTPs])^0.5, but only if Mg > dNTPS
  • This means that you don't need to know the exact composition of the buffer. As long as you know the thermodynamic data set and the salt correction formula (and this we do), you could run a test loop with just different numbers for Na until you got the smallest deviation from the NEB results for a data set of primers.
  • If you want to simulate other calculators, you should know that many of them use the primer concentration in a wrong way. Often they use [primer]/4, but this is only true if you were hybridizing primers to themselves ([primer1] = [primer2]). This can be mimicked by Biopython's MeltingTemp if you set dnac1 = dnac2 = primer/2

So I would suggest, to use a small wrapper around Biopython's Meltingtemp and supply your users with different predefined sets of settings, e.g., NEB_Taq, NEB_Phusion, NEB_Q5 etc.

@MarkusPiotrowski
Copy link

@BjornFJohansson There is an error in https://github.com/BjornFJohansson/tm/blob/master/NEB%20Phusion/NEB%20Phusion.ipynb . You gave the dNTPs in µmolar instead of millimolar. Thus free Mg becomes 0 and ist not longer taken into consideration for calculations.

@BjornFJohansson
Copy link
Owner Author

Thanks, Ill look into it. I am compiling results for the NEB Phusion which worked perfectly.

@manulera
Copy link
Collaborator

By the way @MarkusPiotrowski, we will meet this Thursday 25th of July at 14:00 UK time to discuss some issues of pydna, including this one. If you want to join the discussion, your insight would be appreciated! Below a Teams link for that:

https://teams.microsoft.com/l/meetup-join/19%3ameeting_MTk0N2UxNTktODQ4Yy00OGQyLWFiNmEtNzZhNTZhMjMyNzVk%40thread.v2/0?context=%7b%22Tid%22%3a%221faf88fe-a998-4c5b-93c9-210a11d9a5c2%22%2c%22Oid%22%3a%22e1394e78-cf0f-492d-9394-b06f66175536%22%7d

@BjornFJohansson
Copy link
Owner Author

BjornFJohansson commented Jul 23, 2024

Dear all, Some updated comparisons between neb calculator and the different combinations of biopython options. Specifically, I made a sum of least squares minimization as suggested using Scipy.minimize for the Q5 polymerase.

https://github.com/BjornFJohansson/tm/blob/c38c6c07e73ba2f8036b19293a5a58af905db5e8/NEB%20Q5/NEB%20Q5.py

I get close, but not as close as with some of the others (thanks, @MarkusPiotrowski for the help).

There seems to be some creative mixtures of thermodynamic data in the neb tm calculator source code. The ones I could find are extracted to Python in the NEB_website folder.

Salt correction seems to be the same as biopython 7 (Owczarzy 2008) and not Owczarzy 2004 as documented.

@manulera
Copy link
Collaborator

manulera commented Jul 31, 2024

Hello, I was working on #250 and I noticed that sometimes longer primers give lower Tms, is this normal?

from pydna.tm import tm_default
print(tm_default("agatcgactatctacttatgcatctta"))  # 58.90561619071127
print(tm_default("agatcgactatctacttatgcatctt"))  # 59.1327553388262

BjornFJohansson pushed a commit that referenced this issue Aug 8, 2024
@manulera manulera changed the title Expanding the oligo melting temperature calculations Reproducing the NEB Tm calculator behavior in pydna Sep 5, 2024
@hiyama341
Copy link
Collaborator

Hi all,
I didn't mention this but I tried recreating this with brute force but couldn't get closer than Björn.
Good idea to put it on the Hackathon list @manulera :)

manulera added a commit that referenced this issue Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants