基于ColabFold网站，使用alphafold2进行蛋白互作预测（含PYMOL使用方法）

崔耳又又

AlphaFold2是由DeepMind开发的，利用深度学习在生物信息学领域的突破性成果，能够预测出蛋白质的三维结构，其精度接近于实验方法。而ColabFold的出现则是降低了AlphaFold2的使用门槛，使得研究人员可以轻松、免费的在线进行蛋白结构和互作预测，且无需复杂的计算资源。接下来我详细介绍ColabFold的使用方法。

需要条件：

VPN PYMOL 软件（https://pymol.org/）

1 进入 ColabFold 在线网址

https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb

2 在以下红色输入框中添入输入氨基酸序列

图片1.png

格式为蛋白 1:蛋白 2 以我的两个基因为例：

LPEQVAPYLPKVAEKAKEPGVVRLFGVNLMENTNNAAAPTAGNASAGAGETSARVAGSVEGSGQLSAFSKVTKVANESPREIQSQQNNAGRNRVKVQMHGNAVGRAVDLASLDGYEGLTSELEQMFEIKDIKQNFKVAFTDNEGDTMKVGDDPWMEFCRMVRKIVIYPIEDDKNMDPRQTSVLAAAPDPDPKANL:MEESSMKREAMPRLLDLIPDEKEWSLRGGAPRQGRSKNTGFGSDEDEKLELKLGLPGLVQEEPAASSREKRVHQESPALSLGYPPKHSTATTGAKRGFLDTVEAKAQGYEKEQARAAACGKELAVEENTAAVGERKKYERLSLAVKDLFHGFLQVQRDPSKVERTQQGADEKIFSQLLDGSGEYTLVYEDGEGDRMLVGDVPWNVFVSTAKRLRVLRSSELSHGLIGATPERAANG

3 在以下输入框中输入项目名称

图片7.png

4 点击代码执行程序-全部运行，运行时间 1-2 小时

5 运行结束后自动跳出下载链接，rank001 为可信度最高结构，示例数据对应为model4

图片2.png

6 在ColabFold在线网页中查看model4中最高pLDDT（predicted Local DistanceDifference Test）值对应的 pTM（predicted Total Modeling Score）和 ipTM（predicted per-residue confidence）值，两个值相加大于 0.75 为互作可能性高，示例文件为 0.82)

图片3.png

7 将 rank001 对应的 pdb 文件导入到 pymol 软件中File-open

默认图形如下:

8 依次输入以下命令，将两个蛋白分开

sele Chain A
set_name sele,seleA
sele Chain B
set_name sele,seleB
#A和B为两个蛋白的名称

9 可在右上角区域分别设置两个蛋白的颜色

图片4.png

图片5.png

设置完成后如下：

10 显示互作区域

在命令行输入以下命令，InterfaceResidues.py

interfaceResidue your_projec_name, Chain A, Chain B
# your_projec_name 为自己的项目名称

InterfaceResidues.py脚本链接：

https://pymolwiki.org/index.php/InterfaceResidues

InterfaceResidues.py脚本内容：

from pymol import cmd, stored

def interfaceResidues(cmpx, cA='c. A', cB='c. B', cutoff=1.0, selName="interface"):
    """
    interfaceResidues -- finds 'interface' residues between two chains in a complex.

    PARAMS
        cmpx
            The complex containing cA and cB

        cA
            The first chain in which we search for residues at an interface
            with cB

        cB
            The second chain in which we search for residues at an interface
            with cA

        cutoff
            The difference in area OVER which residues are considered
            interface residues.  Residues whose dASA from the complex to
            a single chain is greater than this cutoff are kept.  Zero
            keeps all residues.

        selName
            The name of the selection to return.

    RETURNS
        * A selection of interface residues is created and named
            depending on what you passed into selName
        * An array of values is returned where each value is:
            ( modelName, residueNumber, dASA )

    NOTES
        If you have two chains that are not from the same PDB that you want
        to complex together, use the create command like:
            create myComplex, pdb1WithChainA or pdb2withChainX
        then pass myComplex to this script like:
            interfaceResidues myComlpex, c. A, c. X

        This script calculates the area of the complex as a whole.  Then,
        it separates the two chains that you pass in through the arguments
        cA and cB, alone.  Once it has this, it calculates the difference
        and any residues ABOVE the cutoff are called interface residues.

    AUTHOR:
        Jason Vertrees, 2009.
    """
    # Save user's settings, before setting dot_solvent
    oldDS = cmd.get("dot_solvent")
    cmd.set("dot_solvent", 1)

    # set some string names for temporary objects/selections
    tempC, selName1 = "tempComplex", selName+"1"
    chA, chB = "chA", "chB"

    # operate on a new object & turn off the original
    cmd.create(tempC, cmpx)
    cmd.disable(cmpx)

    # remove cruft and inrrelevant chains
    cmd.remove(tempC + " and not (polymer and (%s or %s))" % (cA, cB))

    # get the area of the complete complex
    cmd.get_area(tempC, load_b=1)
    # copy the areas from the loaded b to the q, field.
    cmd.alter(tempC, 'q=b')

    # extract the two chains and calc. the new area
    # note: the q fields are copied to the new objects
    # chA and chB
    cmd.extract(chA, tempC + " and (" + cA + ")")
    cmd.extract(chB, tempC + " and (" + cB + ")")
    cmd.get_area(chA, load_b=1)
    cmd.get_area(chB, load_b=1)

    # update the chain-only objects w/the difference
    cmd.alter( "%s or %s" % (chA,chB), "b=b-q" )

    # The calculations are done.  Now, all we need to
    # do is to determine which residues are over the cutoff
    # and save them.
    stored.r, rVal, seen = [], [], []
    cmd.iterate('%s or %s' % (chA, chB), 'stored.r.append((model,resi,b))')

    cmd.enable(cmpx)
    cmd.select(selName1, 'none')
    for (model,resi,diff) in stored.r:
        key=resi+"-"+model
        if abs(diff)>=float(cutoff):
            if key in seen: continue
            else: seen.append(key)
            rVal.append( (model,resi,diff) )
            # expand the selection here; I chose to iterate over stored.r instead of
            # creating one large selection b/c if there are too many residues PyMOL
            # might crash on a very large selection.  This is pretty much guaranteed
            # not to kill PyMOL; but, it might take a little longer to run.
            cmd.select( selName1, selName1 + " or (%s and i. %s)" % (model,resi))

    # this is how you transfer a selection to another object.
    cmd.select(selName, cmpx + " in " + selName1)
    # clean up after ourselves
    cmd.delete(selName1)
    cmd.delete(chA)
    cmd.delete(chB)
    cmd.delete(tempC)
    # show the selection
    cmd.enable(selName)

    # reset users settings
    cmd.set("dot_solvent", oldDS)

    return rVal

cmd.extend("interfaceResidues", interfaceResidues)

Interface 即为互作区域，设置颜色

图片6.png

最终结果，黄色为互作区域：

11 显示互作碱基

点击 display-sequence，显示序列

序列中黄色区域为互作的氨基酸（黄色区域为上一步设置的互作颜色）

微信扫一扫分享文章