AlphaFold2是由DeepMind开发的,利用深度学习在生物信息学领域的突破性成果,能够预测出蛋白质的三维结构,其精度接近于实验方法。而ColabFold的出现则是降低了AlphaFold2的使用门槛,使得研究人员可以轻松、免费的在线进行蛋白结构和互作预测,且无需复杂的计算资源。接下来我详细介绍ColabFold的使用方法。
需要条件:
VPN
PYMOL 软件(https://pymol.org/)
1 进入 ColabFold 在线网址
https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb
2 在以下红色输入框中添入输入氨基酸序列
格式为 蛋白 1:蛋白 2
以我的两个基因为例:
LPEQVAPYLPKVAEKAKEPGVVRLFGVNLMENTNNAAAPTAGNASAGAGETSARVAGSVEGSGQLSAFSKVTKVANESPREIQSQQNNAGRNRVKVQMHGNAVGRAVDLASLDGYEGLTSELEQMFEIKDIKQNFKVAFTDNEGDTMKVGDDPWMEFCRMVRKIVIYPIEDDKNMDPRQTSVLAAAPDPDPKANL:MEESSMKREAMPRLLDLIPDEKEWSLRGGAPRQGRSKNTGFGSDEDEKLELKLGLPGLVQEEPAASSREKRVHQESPALSLGYPPKHSTATTGAKRGFLDTVEAKAQGYEKEQARAAACGKELAVEENTAAVGERKKYERLSLAVKDLFHGFLQVQRDPSKVERTQQGADEKIFSQLLDGSGEYTLVYEDGEGDRMLVGDVPWNVFVSTAKRLRVLRSSELSHGLIGATPERAANG
3 在以下输入框中输入项目名称
4 点击代码执行程序-全部运行,运行时间 1-2 小时
5 运行结束后自动跳出下载链接,rank001 为可信度最高结构,示例数据对应为model4
6 在ColabFold在线网页中查看model4中最高pLDDT(predicted Local DistanceDifference Test)值对应的 pTM(predicted Total Modeling Score)和 ipTM(predicted per-residue confidence)值,两个值相加大于 0.75 为互作可能性高,示例文件为 0.82)
7 将 rank001 对应的 pdb 文件导入到 pymol 软件中File-open
默认图形如下:
8 依次输入以下命令,将两个蛋白分开
sele Chain A
set_name sele,seleA
sele Chain B
set_name sele,seleB
#A和B为两个蛋白的名称
9 可在右上角区域分别设置两个蛋白的颜色
设置完成后如下:
10 显示互作区域
在命令行输入以下命令,InterfaceResidues.py
interfaceResidue your_projec_name, Chain A, Chain B
# your_projec_name 为自己的项目名称
InterfaceResidues.py脚本链接:
https://pymolwiki.org/index.php/InterfaceResidues
InterfaceResidues.py脚本内容:
from pymol import cmd, stored
def interfaceResidues(cmpx, cA='c. A', cB='c. B', cutoff=1.0, selName="interface"):
"""
interfaceResidues -- finds 'interface' residues between two chains in a complex.
PARAMS
cmpx
The complex containing cA and cB
cA
The first chain in which we search for residues at an interface
with cB
cB
The second chain in which we search for residues at an interface
with cA
cutoff
The difference in area OVER which residues are considered
interface residues. Residues whose dASA from the complex to
a single chain is greater than this cutoff are kept. Zero
keeps all residues.
selName
The name of the selection to return.
RETURNS
* A selection of interface residues is created and named
depending on what you passed into selName
* An array of values is returned where each value is:
( modelName, residueNumber, dASA )
NOTES
If you have two chains that are not from the same PDB that you want
to complex together, use the create command like:
create myComplex, pdb1WithChainA or pdb2withChainX
then pass myComplex to this script like:
interfaceResidues myComlpex, c. A, c. X
This script calculates the area of the complex as a whole. Then,
it separates the two chains that you pass in through the arguments
cA and cB, alone. Once it has this, it calculates the difference
and any residues ABOVE the cutoff are called interface residues.
AUTHOR:
Jason Vertrees, 2009.
"""
# Save user's settings, before setting dot_solvent
oldDS = cmd.get("dot_solvent")
cmd.set("dot_solvent", 1)
# set some string names for temporary objects/selections
tempC, selName1 = "tempComplex", selName+"1"
chA, chB = "chA", "chB"
# operate on a new object & turn off the original
cmd.create(tempC, cmpx)
cmd.disable(cmpx)
# remove cruft and inrrelevant chains
cmd.remove(tempC + " and not (polymer and (%s or %s))" % (cA, cB))
# get the area of the complete complex
cmd.get_area(tempC, load_b=1)
# copy the areas from the loaded b to the q, field.
cmd.alter(tempC, 'q=b')
# extract the two chains and calc. the new area
# note: the q fields are copied to the new objects
# chA and chB
cmd.extract(chA, tempC + " and (" + cA + ")")
cmd.extract(chB, tempC + " and (" + cB + ")")
cmd.get_area(chA, load_b=1)
cmd.get_area(chB, load_b=1)
# update the chain-only objects w/the difference
cmd.alter( "%s or %s" % (chA,chB), "b=b-q" )
# The calculations are done. Now, all we need to
# do is to determine which residues are over the cutoff
# and save them.
stored.r, rVal, seen = [], [], []
cmd.iterate('%s or %s' % (chA, chB), 'stored.r.append((model,resi,b))')
cmd.enable(cmpx)
cmd.select(selName1, 'none')
for (model,resi,diff) in stored.r:
key=resi+"-"+model
if abs(diff)>=float(cutoff):
if key in seen: continue
else: seen.append(key)
rVal.append( (model,resi,diff) )
# expand the selection here; I chose to iterate over stored.r instead of
# creating one large selection b/c if there are too many residues PyMOL
# might crash on a very large selection. This is pretty much guaranteed
# not to kill PyMOL; but, it might take a little longer to run.
cmd.select( selName1, selName1 + " or (%s and i. %s)" % (model,resi))
# this is how you transfer a selection to another object.
cmd.select(selName, cmpx + " in " + selName1)
# clean up after ourselves
cmd.delete(selName1)
cmd.delete(chA)
cmd.delete(chB)
cmd.delete(tempC)
# show the selection
cmd.enable(selName)
# reset users settings
cmd.set("dot_solvent", oldDS)
return rVal
cmd.extend("interfaceResidues", interfaceResidues)
Interface 即为互作区域,设置颜色
最终结果,黄色为互作区域:
11 显示互作碱基
点击 display-sequence,显示序列
序列中黄色区域为互作的氨基酸(黄色区域为上一步设置的互作颜色)