• EI
  • Scopus
  • 中国科技期刊卓越行动计划项目资助期刊
  • 北大核心期刊
  • DOAJ
  • EBSCO
  • 中国核心学术期刊RCCSE A+
  • 中国精品科技期刊
  • JST China
  • FSTA
  • 中国农林核心期刊
  • 中国科技核心期刊CSTPCD
  • CA
  • WJCI
  • 食品科学与工程领域高质量科技期刊分级目录第一方阵T1
中国精品科技期刊2020

里氏木霉基因组密码子偏好性研究

杨鑫, 秦丽娜, 江贤章

杨鑫,秦丽娜,江贤章. 里氏木霉基因组密码子偏好性研究[J]. 食品工业科技,2022,43(6):141−149. doi: 10.13386/j.issn1002-0306.2021110338.
引用本文: 杨鑫,秦丽娜,江贤章. 里氏木霉基因组密码子偏好性研究[J]. 食品工业科技,2022,43(6):141−149. doi: 10.13386/j.issn1002-0306.2021110338.
YANG Xin, QIN Lina, JIANG Xianzhang. Analysis of Codon Usage Bias in the Genome of Trichoderma reesei[J]. Science and Technology of Food Industry, 2022, 43(6): 141−149. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2021110338.
Citation: YANG Xin, QIN Lina, JIANG Xianzhang. Analysis of Codon Usage Bias in the Genome of Trichoderma reesei[J]. Science and Technology of Food Industry, 2022, 43(6): 141−149. (in Chinese with English abstract). doi: 10.13386/j.issn1002-0306.2021110338.

里氏木霉基因组密码子偏好性研究

基金项目: 国家自然科学基金(31800060);福建省自然科学基金(2019I0009,2020J01177)。
详细信息
    作者简介:

    杨鑫(1997−),女,硕士研究生,研究方向:丝状真菌分子遗传学,E-mail:649492494@qq.com

    通讯作者:

    江贤章(1980−),男,博士,副教授,研究方向:工业微生物,E-mail:jiangxz@fjnu.edu.cn

  • 中图分类号: Q939.9

Analysis of Codon Usage Bias in the Genome of Trichoderma reesei

  • 摘要: 在不同物种中,密码子偏好性存在一定程度的差异。为了研究纤维素酶主要工业生产菌株——里氏木霉(Trichoderma reesei)基因组密码子的偏好性,对里氏木霉9352个基因的编码区进行密码子分析。结果显示,里氏木霉97%的基因GC含量为50%~68%,GC3的平均含量为70.4%。中性分析与ENC-plot分析表明,里氏木霉密码子的使用主要受选择压力的影响。相关性分析结果显示,基因组GC含量与GC1、GC2和GC3显著相关(P<0.05),有效密码子数与GC3显著相关(P<0.05)。此外,在里氏木霉使用频率较高的24个密码子中,有22个均是以GC结尾的。进一步确定了21个高表达优越密码子和4个高表达最优密码子(CUC、GCC、CGC和GGC)。里氏木霉与长梗木霉、粗糙脉孢霉在密码子使用频率上差异较小,与酿酒酵母的差异相对较大。本研究为里氏木霉中的密码子优化提供了理论依据,对开发高效的里氏木霉基因表达宿主以及开发里氏木霉作为合成生物学基盘细胞具有重要的意义。
    Abstract: Codon usage bias is quite different among different organisms. To elucidate the genetic codon preference in the most commonly used industrial strain Trichoderma reesei, codon usage analysis was performed with the opening reading frames of 9352 genes in T. reesei. The results showed that the GC content of 97% of T. reesei genes was 50%~68%, and the average content of GC3 was 70.4%. Neutral analysis and ENC-plot analysis demonstrated that the usage bias of T. reesei codons was mainly affected by selection pressure. Correlation analysis results revealed that the GC content of the genome was significantly correlated with GC1, GC2 and GC3 (P<0.05), and the number of effective codons was significantly correlated with GC3 (P<0.05). In addition, among the 24 codons frequently used by T. reesei, 22 of them contained G or C in their wobble position. Moreover, 21 codons associated with relatively effective gene expression and 4 codons relevant to optimal gene expression were further identified. The analysis also showed that the codon usage bias in T. reesei was similar to that in Trichoderma longibrachiatum and Neurospora crassa, while it was relatively different from that in Saccharomyces cerevisiae. The study herein would provide a feasible and advantageous method for genetic codon optimization in T. reesei, which would contribute to developing T. reesei as an excellent host for recombinant gene expression, as well as chassis for synthetic biology.
  • 遗传信息是由三联体密码子记载的。由于密码子的简并性,大多数氨基酸是由2~6种同义密码子编码。不同的物种编码同种氨基酸所利用的密码子种类与使用频率存在差别,这种现象称为密码子偏好性(Codon Usage Bias)[1]。许多因素影响各种生物体中密码子的使用,如自然选择(基因表达水平[2]、RNA丰度[3]、基因长度[4-5]、基因翻译起始信号和蛋白质结构[6])和突变压力(GC含量、突变频率和模式),以及随机遗传漂变等[7-9]。密码子使用模式的全基因组研究对理解基因组中分子组织的基本特征具有重要意义。迄今为止,对于遗传密码子偏好性的研究主要集中在一些模式物种,包括模式真菌酿酒酵母(Saccharomyces cerevisiae)、模式细菌大肠杆菌(Escherichia coli)、模式植物拟南芥(Arabidopsis thaliana)等方面[10-11],相比之下,对丝状真菌的研究相对较少。

    丝状真菌里氏木霉作为生产纤维素酶和半纤维素酶的工业微生物,具有生长环境粗放、稳定性好、安全无毒、产酶效率高等优点。在食品加工工业中,用纤维素酶对农产品进行预处理,可以使植物组织膨化松软,减少农产品营养物质的损失。里氏木霉除了在食品工业中的作用外,还用于生物乙醇[12]和工业酶的生产,具有广泛的生物应用价值。迄今为止,大约有243种通过微生物发酵制造的市售酶产品,其中30种是使用里氏木霉作为宿主制成的,其中21种是重组产品,用于饲料和技术应用,包括纺织品、纸浆和纸张等[13-17],因此里氏木霉具有重要的研究价值。

    目前,里氏木霉QM6a菌株的基因组已经完成测序[18],这为研究该丝状真菌的分子生物学提供了有利条件。本文以里氏木霉基因组为研究对象,通过对编码序列的核苷酸组成及密码子的偏好性进行分析,探究影响里氏木霉密码子使用偏差的因素。本研究结果有助于阐明该物种分子进化的机制,同时为通过密码子优化提高里氏木霉外源基因表达水平提供了理论依据。

    里氏木霉QM6a基因组数据来自Joint Genome Institute(基因联合研究所,JGI http://genome.jgi.doe.gov/portal/)公共数据库,基因组项目编号为1184794;使用Galaxy生物信息学分析平台(https://usegalaxy.org/)中的Fasta Statistics对里氏木霉QM6a进行统计、Filter sequences by length对CDS进行过滤、cusp对密码子的GC含量进行分析;利用Python 3.9中的biopython-1.79模块对序列进行处理;利用CodonW 1.4.2软件对各个CDS密码子进行分析;利用Origin 9.0进行数据统计与作图。

    通过JGI数据库下载CDS序列,由于计算短序列的密码子数没有生物学意义[19],因此利用Galaxy的Filter sequences by length脚本过滤长度小于300 bp的CDS,收集最终序列(包含9352个CDS)用于进一步分析。

    利用Galaxy的cusp脚本统计分析各基因GC总含量以及密码子的第1、2和3位碱基为G或C的含量比例,分别记为GC、GC1、GC2和GC3。其中GC3对密码子使用偏好性具有重要影响。

    中性图是一种用于测量密码子使用模式的分析方法。本研究分析了第1、2和3位密码子位置(分别为GC1、GC2和GC3)的GC含量。GC12代表GC1和GC2的平均值;GC12和GC3用于中性绘图分析。在中性图中,如果GC12和GC3之间的相关性在统计学上显著,且回归线的斜率接近1,则假设突变偏差是影响密码子使用的主要因素。相反,针对突变偏倚的选择可能导致GC含量的窄分布以及GC12和GC3之间缺乏相关性[20]

    有效密码子数(Effective number of codon,ENC)提供了对绝对密码子偏差的有用估计,是确定某个基因的总体密码子使用偏差的一种度量。总GC含量,尤其是GC3(第三位的GC含量),经常反映定向突变的强度。以ENC为纵坐标,GC3为横坐标绘制的ENC-plot,广泛用于确定基因的密码子使用是否受到突变和选择的影响[21]。当对应点落在预期曲线附近时,突变是决定密码子使用的主要力量,当对应点大大低于预期曲线时,选择是决定密码子使用的主要力量。

    变量和样本之间的关系可以通过多元统计分析来探索。使用皮尔森相关系数(Pearson correlation coefficient)进行相关性分析,使用双尾检测相关系数的显著性。相关性分析用来揭示密码子使用模式的主要因素并探究样本各变量之间的关联性[22]

    计算第3密码子位置(A3、U3、C3和G3)的核苷酸组成,并分析AT偏差(A3/(A3+U3))和GC偏差(G3/(G3+C3))。PR2-plot是以AT偏差(A3/(A3+T3))作为纵坐标和GC偏差(G3/(G3+C3))作为横坐标绘制的[23]。若核苷酸组成是影响同义密码子使用的唯一因素,那么A(T)和C(G)的使用频率应该相等。

    运用CodonW软件分析相对同义密码子使用度(relative synonymous codon usage,RSCU)。RSCU是指对于某一特定的密码子在编码对应氨基酸的同义密码子的相对概率,它去除了氨基酸组成对密码子的影响。公式如下:

    RSCUij=xij1ninij1xij

    式中:xij表示编码第i个氨基酸的第j个密码子的出现次数;ni总表示编码第i个氨基酸的同义密码子的数量(值为1~6)。如果密码子使用没有偏好,则该密码子的RSCU值等于1。当某一密码子的RSCU值大于1,则表明密码子的使用偏好性较强。由于它计算方便,而且很直观地反映出密码子使用的偏好性,因此在大多数的密码子相关分析中,都使用它作为衡量偏好性的标准。

    ENC值的范围在20~61之间,单个基因的ENC值越低,该基因的整体密码子使用偏好就越强,基因的表达量相对越高[24]。故以ENC值为偏好性标准,两级各选10%的基因分别创建高低表达样本库,取两库△RSCU>0.08的密码子进行分析[25-26]

    运用Bioinformatics在线平台(http://www.bioinformatics.org)的Codon Usage计算里氏木霉各密码子的使用频率[27]。用CodonW分别计算出同属的长梗木霉(Trichoderma longibrachiatum)、丝状模式真菌粗糙脉孢霉(Neurospora crassa)、模式真菌酿酒酵母(Saccharomyces cerevisiae)的密码子使用频率,将里氏木霉密码子使用频率与它们进行比较分析。

    单一类型密码子的密码子使用偏差受基因组总核苷酸含量的影响很大[28],因此,首先利用Galaxy平台分析了里氏木霉基因组中编码序列(CDS)的GC核苷酸组成。在里氏木霉基因组中,97%的基因GC含量、96%的基因GC1含量、18%的基因GC2含量和37%的基因GC3含量分布在50%~68%(图1),GC核苷酸平均含量为58.1%,三个密码子位置(GC1、GC2和GC3)的GC平均含量分别为58.9%、45.0%和70.4%。单因素方差分析表明密码子三个位置的GC含量差异极显著(P<0.001),GC3>GC1>GC2表明第三位置的GC含量不同于第一和第二位置的GC含量,第三位密码子的GC含量在密码子位置中最高,说明GC3是导致密码子使用发生偏好性的重要原因,且里氏木霉的密码子第三位受到的选择压力较大。核苷酸组成分析结果表明,里氏木霉基因第三位偏好G/C末端密码子比A/U末端密码子更受欢迎。

    图  1  GC含量分布
    Figure  1.  Distribution of the GC contents

    中性分析是揭示GC12和GC3之间关系的一种有用方法。为了分析三个密码子位置之间的关系,本文构建了里氏木霉基因组编码序列的中性图(GC12与GC3)。结果显示(图2)GC12和GC3不相关(R2=0.0009),且斜率接近0,说明里氏木霉密码子没有受到定向突变压力的影响,导致密码子偏好性的原因主要是选择压力。

    图  2  里氏木霉中性分析
    Figure  2.  Neutrality plot of T. reesei

    有效密码子数广泛用于测量单个基因的密码子偏好水平。为了阐明里氏木霉序列中核苷酸组成和密码子偏好之间的关系,绘制了ENC和GC3s图,从而探索了基因间密码子使用的主要特征。如图3所示,大部分基因的ENC观察值落在ENC期望值曲线之下,表明其里氏木霉密码子的使用主要受选择压力的影响,与中性绘图分析的结果一致。

    图  3  里氏木霉ENC-plot曲线
    Figure  3.  Relationship between the ENC and GC3 in T. reesei

    为了更准确地估计观测值和预期ENC值之间的差异,本文计算了(ENCexp-ENCobs)/ENCexp的值。如图4所示,(ENCexp-ENCobs)/ENCexp的峰值为0~0.1,表明大多数基因的ENC值与基于GC3的预期ENC值略有不同。因此,大多数基因观察到的ENC接近基于GC3的预期ENC,尽管有部分基因观察到的ENC要低得多。

    图  4  有效密码子数(ENC)比率的频率分布图
    Figure  4.  Frequency distribution of the effective number of codons (ENC) ratio

    里氏木霉基因组编码区中GC含量、ENC和密码子出现频数(CN)间相关性分析结果显示(表1),GC Total与GC1、GC2、GC3呈极显著相关(P<0.001),GC3与GC1、GC2相关性水平不显著,说明GC3与GC1、GC2的密码子组成存在较大差异。进一步发现,ENC与GC1、GC2相关性较弱,与GC3以及GC TOTAL极显著相关(P<0.001),表明密码子不同位置的碱基组成会影响有效密码子数。CN值与GC1、GC2、GC3、GC Total相关性都不显著,说明CN对ENC的影响很小,排除了基因序列过短对后续分析的影响。

    表  1  各基因相关参数的相关性分析
    Table  1.  Correlation analysis of each gene-related parameters
    指标GC1GC2GC3GC Total
    GC20.0293
    GC30.2164−0.1271
    GC Total0.5335***0.3577***0.8178***
    ENC−0.20170.0601−0.8904***−0.7549***
    CN0.0242−0.0805−0.1001−0.1078
    注:***表示P<0.001;GC1、GC2、GC3分别表示基因中所有密码子的第1位、第2位、第3位的GC含量;GC Total 表示密码子总GC含量;ENC表示有效密码子数;CN:密码子数。
    下载: 导出CSV 
    | 显示表格

    为了研究偏向密码子选择是否局限于高度偏向的蛋白质编码基因,通过PR2-plot分析了64个密码子氨基酸家族中嘌呤和嘧啶之间的关联[29],若密码子使用模式完全由突变造成,则G和C以及A和T的使用频率应相等。然而图5显示,在里氏木霉中G和C的使用频率高于A和T,说明里氏木霉密码子的使用模式除了核苷酸的组成,还受到其它因素的影响,例如选择压力等。

    图  5  PR2-plot分析
    Figure  5.  Parity Rule 2 (PR2)-plot analysis

    为了确定同义密码子的使用模式以及C/G末端密码子的首选程度,本文进行了相对同义密码子使用(RSCU)分析并计算了RSCU值(表2),绘制RSCU堆积图(图6)。在24个最常用的密码子中,22个(UUC、CUG、AUC、AUG、GUC、UCC、CCC、ACC、GCC、UAG、CAC、CAG、AAC、AAG、GAC、GAG、UCG、UGG、CGC、AGC、AGG、GGC)是C/G末端密码子(C末端:13个;G末端:9个),其余2个(UAA、AGA)是A末端密码子;没有一个首选密码子是U末端的。这些结果表明,核苷酸组成在里氏木霉密码子使用模式中起着不可或缺的作用。

    表  2  里氏木霉蛋白编码区相对同义密码子使用度
    Table  2.  RSCU analysis of protein coding region acid in T. reesei coding sequences
    氨基酸
    密码子
    数目
    RSCU 氨基酸
    密码子
    数目
    RSCU
    PheUUU657980.77 TyrUAU393760.62
    UUC1052281.23UAA867331.38
    LeuUUA83360.12TERUAA24340.80
    UUG540880.77UAG25140.82
    CUU523750.75HisCAU394040.70
    CUC1418372.02CAC738601.30
    CUA208520.30GlnCAA528490.55
    CUG1437712.05CAG1386621.45
    IleAUU692920.94AsnAAU420590.52
    AUC1305691.77AAC1210501.48
    AUA214460.29LysAAA410320.36
    MetAUG1032321.00AAG1838781.64
    ValGUU513100.71AspGAU933420.68
    GUC1377741.91GAC1805851.32
    GUA180670.25GluGAA865540.60
    GUG808281.12GAG2033151.40
    SerUCU532940.83CysUGU141490.49
    UCC866411.35UGC434861.51
    UCA410820.64TERUGA42321.38
    UGG781921.22TrpUGG684521.00
    ProCCU583180.81ArgCGU269270.55
    CCC1039101.45CGC889301.81
    CCA459250.64CGA484970.99
    CCG791301.10CGG501891.02
    ThrACU412100.61SerAGU248090.39
    ACC946301.41AGC1009161.57
    ACA441350.66ArgAGA322600.66
    ACG881311.31AGG472310.96
    AlaGCU826190.76GlyGGU448000.55
    GCC1874061.73GGC1733182.14
    GCA682050.63GGA591830.73
    GCG957920.88GGG462940.57
    注:大于等于1的RSCU值加粗表示。
    下载: 导出CSV 
    | 显示表格
    图  6  里氏木霉相对密码子使用堆积图
    Figure  6.  Stacked plot of RSCU in T. reesei

    以ENC值为偏好性标准,对基因进行排序,两极各取10%构建高低表达基因库,计算高低基因表达库密码子RSCU值和△RSCU值(表3),星号标注的21个密码子是高表达优越密码子,这些密码子(除了终止密码子UAA)全部以C或G结尾,这表明里氏木霉中的密码子使用偏向于C或G结尾的同义密码子。此外,4个密码子CUC、GCC、CGC和GGC是里氏木霉高表达基因的最优密码子。

    表  3  里氏木霉最优密码子分析
    Table  3.  Optimal codons in T. reesei
    AA密码子H gene L gene△RSCU 氨基酸密码子H gene L gene△RSCU
    数目RSCU数目RSCU数目RSCU数目RSCU
    PheUUU35240.53 71080.89−0.36 TyrUAU9580.20 55220.94−0.74
    UUC*97371.4788831.110.36 UAC**88351.8062521.060.74
    LeuUUA750.0223290.33−0.31TERUAA**4491.461930.630.83
    UUG11660.2475931.09−0.85 UAG1760.573101.01−0.44
    CUU12410.2580201.15−0.90HisCAU7180.1962381.03−0.84
    CUC***144662.9699181.421.54 CAC**68421.8158190.970.84
    CUA2560.0541610.60−0.55GlnCAA10820.1981160.89−0.70
    CUG**121092.4897491.401.08 CAG**105461.81100831.110.70
    IleAUU36280.6574451.02−0.37AsnAAU8760.1562300.86−0.71
    AUC**127032.2999751.370.92 AAC**111571.8582781.140.71
    AUA3070.0644490.61−0.55LysAAA8550.1068210.74−0.64
    MetAUG72191.0093161.000.00 AAG**168151.90117211.260.64
    ValGUU20930.3665941.03−0.67AspGAU26060.28117880.95−0.67
    GUC**160872.8089321.401.40 GAC**159821.72130771.050.67
    GUA2500.0433740.53−0.49GluGAA20570.21113640.87−0.66
    GUG45670.7966661.04−0.25 GAG**171441.79148301.130.66
    SerUCU19970.5066861.07−0.57CysUGU2040.1129230.79−0.68
    UCC**85532.1462361.001.14 UGC**36131.8945221.210.68
    UCA10420.2661790.99−0.73TERUGA2950.964151.36−0.40
    UCG48701.2262841.010.21TrpUGG46761.0072181.000.00
    ProCCU22560.4965101.02−0.53ArgCGU13320.4433800.71−0.27
    CCC**109692.4059860.941.46 CGC***106263.5153231.112.40
    CCA6820.1569961.10−0.95 CGA13470.4554911.15−0.70
    CCG43830.9660600.950.01 CGG24740.8243730.92−0.10
    ThrACU15800.3354570.88−0.55SerAGU3750.0941550.67−0.58
    ACC**107332.2164081.031.18 AGC**71521.7979401.270.52
    ACA10860.2265551.06−0.84ArgAGA5210.1752151.09−0.92
    ACG60201.2463731.030.21 AGG18570.6148881.02−0.41
    AlaGCU38790.4793491.00−0.53GlyGGU27870.4451570.75−0.31
    GCC***224602.69108791.161.53 GGC***190973.04100701.461.58
    GCA14880.1896131.03−0.85 GGA15500.2571211.03−0.78
    GCG55180.6675230.81−0.15 GGG16550.2652640.76−0.50
    注:*表示△RSCU>0.3;**表示△RSCU>0.5;***表示△RSCU >1.5。
    下载: 导出CSV 
    | 显示表格

    将里氏木霉分别与同属的长梗木霉、丝状模式真菌粗糙脉孢霉、模式真菌酿酒酵母的密码子使用频率进行比较(表4),其中R/L、R/N、R/S分别表示里氏木霉与长梗木霉、粗糙脉孢霉、酿酒酵母的每种密码子使用频率比值。结果显示,里氏木霉与酿酒酵母的密码子使用频率比值有34种大于等于2.0或小于等于0.5,占53.1%;与长梗木霉的密码子使用频率比值几乎都接近于1;而里氏木霉与粗糙脉孢霉的密码子使用频率比值有6种大于等于1.5或小于等于0.67,占9.3%。说明丝状真菌里氏木霉与模式真菌酿酒酵母的密码子偏好性差别较大,而与同属的长梗木霉以及丝状模式真菌粗糙脉孢霉的密码子偏好性差别相对较小。粗糙脉孢霉经常作为研究木质纤维素降解的模式真菌,将相关基因表达于里氏木霉时,两者密码子使用模式接近,无需考虑两者密码子的偏好性。

    表  4  里氏木霉与其他物种密码子偏好性比较
    Table  4.  Comparision of codon preference between T. reesei and other species
    氨基酸
    密码子
    密码子使用频率 频率比值
    RLNSR/LR/NR/S
    PheUUU13.9613.3812.2726.97 1.041.140.52
    UUC22.3223.0321.4318.450.971.041.21
    LeuUUA1.771.742.8326.331.020.630.07
    UUG11.4711.6415.4227.000.990.740.42
    CUU11.1111.5314.3412.630.960.770.88
    CUC30.0830.3926.235.730.991.155.25
    CUA4.424.396.0513.501.010.730.33
    CUG30.4930.5218.3110.871.001.672.80
    IleAUU14.7014.1913.9130.161.041.060.49
    AUC27.6928.1526.0116.840.981.061.64
    AUA4.554.514.2318.361.011.080.25
    MetAUG21.9022.0521.4620.780.991.021.05
    ValGUU10.8810.9114.0321.631.000.780.50
    GUC29.2229.3924.1311.290.991.212.59
    GUA3.833.885.5412.180.990.690.31
    GUG17.1417.2315.8011.020.991.081.55
    SerUCU11.3011.0412.2423.491.020.920.48
    UCC18.3818.4720.0114.181.000.921.30
    UCA8.718.659.5718.891.010.910.46
    UCG16.5816.6614.948.861.001.111.87
    ProCCU12.3712.2615.6013.331.010.790.93
    CCC22.0421.6922.346.941.020.993.17
    CCA9.749.4612.8217.521.030.760.56
    CCG16.7817.1815.115.320.981.113.15
    ThrACU8.748.4911.4119.981.030.770.44
    ACC20.0719.9224.7412.401.010.811.62
    ACA9.368.9211.1417.751.050.840.53
    ACG18.6918.8413.928.230.991.342.27
    AlaGCU17.5217.4020.9220.141.010.840.87
    GCC39.7539.1935.2612.251.011.133.24
    GCA14.4714.0813.0016.261.031.110.89
    GCG20.3221.1517.476.340.961.163.21
    TyrUAU8.358.308.6218.901.010.970.44
    UAC18.4018.3016.9414.391.011.091.28
    TERUUA0.520.420.541.001.240.960.52
    UAG0.530.470.560.491.130.951.09
    HisCAU8.368.479.8413.680.990.850.61
    CAC15.6715.6314.757.611.001.062.06
    GlnCAA11.2111.1417.2926.501.010.650.42
    CAG29.4129.3625.4712.341.001.152.38
    AsnAAU8.928.7810.6936.101.020.830.25
    AAC25.6725.5426.3424.401.010.971.05
    LysAAA8.708.7911.9442.220.990.730.21
    AAG39.0038.7238.6630.421.011.011.28
    AspGAU19.8019.9824.3137.770.990.810.52
    GAC38.3038.2032.2120.011.001.191.91
    GluGAA18.3618.4523.1645.431.000.790.40
    GAG43.1243.1841.7819.591.001.032.20
    CysUGU3.003.063.528.180.980.850.37
    UGC9.229.327.455.000.991.241.85
    TERUGA0.900.830.760.631.081.181.42
    TrpUGG14.5214.4513.3110.491.001.091.38
    ArgCGU5.715.878.576.260.970.670.91
    CGC18.8618.5317.342.701.021.097.00
    CGA10.2910.567.183.140.971.433.28
    CGG10.6511.138.521.910.961.255.58
    SerAGU5.265.218.9014.631.010.590.36
    AGC21.4021.1617.7910.171.011.202.10
    ArgAGA6.846.818.0020.941.000.860.33
    AGG10.0210.2611.839.620.980.851.04
    GlyGGU9.509.8317.5022.50 0.970.540.42
    GGC36.7636.0828.479.901.021.293.71
    GGA12.5512.6813.8211.240.990.911.12
    GGG9.8210.1811.456.230.960.861.58
    注:与粗糙脉孢霉密码子使用频率比值≥1.50 或≤0.67的用下划线标记;与酿酒酵母密码子使用频率比值≥2.00 或≤0.5的用粗体标记;字母R、L、N、S分别表示里氏木霉、长梗木霉、粗糙脉孢霉、酿酒酵母。
    下载: 导出CSV 
    | 显示表格

    当重组蛋白异源表达时,密码子使用偏好对蛋白质表达水平有重要的影响。DNA序列中密码子的频率与物种中相应的tRNA呈正相关,tRNA浓度决定了可用于蛋白质翻译延伸的氨基酸数量,进而影响蛋白质合成的效率[30]。蛋白质的表达水平与密码子使用偏好高度相关。稀有密码子往往会降低翻译速度,甚至导致翻译错误。因此,密码子优化是增加蛋白质表达的最关键的决定因素。

    里氏木霉作为工业生产纤维素酶的菌株,其某些突变株的蛋白分泌能力在发酵条件下可达到100 g/L[31-32],鉴于此优良特征,里氏木霉可以作为异源蛋白表达的优良宿主。对其密码子偏好性进行研究具有重要的理论研究和工业应用意义。在本研究中,通过对里氏木霉基因组进行分析,编码区的GC3(70.4%)含量表明,该基因组富含C+G,总体密码子使用偏向于C和G末端密码子。在进化过程中,若A(T)到G(C)的突变压力大,那么密码子的第3位碱基是G(C)的概率就要高[33]。在里氏木霉使用频率较高的24个密码子中,有22个均是以GC结尾的。通过对里氏木霉基因组密码子使用模式的分析发现,其密码子使用的偏好性受到选择压力的影响,其次自然选择在塑造密码子偏好性使用过程中也扮演着非常重要的作用。通过ENC差异构建了里氏木霉高低表达基因库,确定了21个高表达优越密码子和4个高表达最优密码子(CUC、GCC、CGC和GGC)。

    将里氏木霉分别与其它真菌的密码子使用频率进行比较,发现里氏木霉基因的密码子偏好性与酵母的差异较大,这可以解释为什么里氏木霉的许多基因都无法实现在毕赤酵母中的异源表达,然而通过对来源于里氏木霉的Cel5A、Cel6A经过密码子优化后,可以成功在毕赤酵母中进行表达[34,35]。里氏木霉与粗糙脉孢霉的密码子偏好性差异最小,因此不经过任何密码子优化的里氏木霉基因可以在粗糙脉孢霉中成功表达并且互补粗糙脉孢霉相关基因的缺失表型[36]。这些例子充分表明密码子偏好性对基因表达的重要性。本研究对里氏木霉的密码子使用偏好性进行了系统分析,可为外源基因在里氏木霉以及与其进化关系较为接近的其他物种中进行异源表达时提供密码子优化指导。

  • 图  1   GC含量分布

    Figure  1.   Distribution of the GC contents

    图  2   里氏木霉中性分析

    Figure  2.   Neutrality plot of T. reesei

    图  3   里氏木霉ENC-plot曲线

    Figure  3.   Relationship between the ENC and GC3 in T. reesei

    图  4   有效密码子数(ENC)比率的频率分布图

    Figure  4.   Frequency distribution of the effective number of codons (ENC) ratio

    图  5   PR2-plot分析

    Figure  5.   Parity Rule 2 (PR2)-plot analysis

    图  6   里氏木霉相对密码子使用堆积图

    Figure  6.   Stacked plot of RSCU in T. reesei

    表  1   各基因相关参数的相关性分析

    Table  1   Correlation analysis of each gene-related parameters

    指标GC1GC2GC3GC Total
    GC20.0293
    GC30.2164−0.1271
    GC Total0.5335***0.3577***0.8178***
    ENC−0.20170.0601−0.8904***−0.7549***
    CN0.0242−0.0805−0.1001−0.1078
    注:***表示P<0.001;GC1、GC2、GC3分别表示基因中所有密码子的第1位、第2位、第3位的GC含量;GC Total 表示密码子总GC含量;ENC表示有效密码子数;CN:密码子数。
    下载: 导出CSV

    表  2   里氏木霉蛋白编码区相对同义密码子使用度

    Table  2   RSCU analysis of protein coding region acid in T. reesei coding sequences

    氨基酸
    密码子
    数目
    RSCU 氨基酸
    密码子
    数目
    RSCU
    PheUUU657980.77 TyrUAU393760.62
    UUC1052281.23UAA867331.38
    LeuUUA83360.12TERUAA24340.80
    UUG540880.77UAG25140.82
    CUU523750.75HisCAU394040.70
    CUC1418372.02CAC738601.30
    CUA208520.30GlnCAA528490.55
    CUG1437712.05CAG1386621.45
    IleAUU692920.94AsnAAU420590.52
    AUC1305691.77AAC1210501.48
    AUA214460.29LysAAA410320.36
    MetAUG1032321.00AAG1838781.64
    ValGUU513100.71AspGAU933420.68
    GUC1377741.91GAC1805851.32
    GUA180670.25GluGAA865540.60
    GUG808281.12GAG2033151.40
    SerUCU532940.83CysUGU141490.49
    UCC866411.35UGC434861.51
    UCA410820.64TERUGA42321.38
    UGG781921.22TrpUGG684521.00
    ProCCU583180.81ArgCGU269270.55
    CCC1039101.45CGC889301.81
    CCA459250.64CGA484970.99
    CCG791301.10CGG501891.02
    ThrACU412100.61SerAGU248090.39
    ACC946301.41AGC1009161.57
    ACA441350.66ArgAGA322600.66
    ACG881311.31AGG472310.96
    AlaGCU826190.76GlyGGU448000.55
    GCC1874061.73GGC1733182.14
    GCA682050.63GGA591830.73
    GCG957920.88GGG462940.57
    注:大于等于1的RSCU值加粗表示。
    下载: 导出CSV

    表  3   里氏木霉最优密码子分析

    Table  3   Optimal codons in T. reesei

    AA密码子H gene L gene△RSCU 氨基酸密码子H gene L gene△RSCU
    数目RSCU数目RSCU数目RSCU数目RSCU
    PheUUU35240.53 71080.89−0.36 TyrUAU9580.20 55220.94−0.74
    UUC*97371.4788831.110.36 UAC**88351.8062521.060.74
    LeuUUA750.0223290.33−0.31TERUAA**4491.461930.630.83
    UUG11660.2475931.09−0.85 UAG1760.573101.01−0.44
    CUU12410.2580201.15−0.90HisCAU7180.1962381.03−0.84
    CUC***144662.9699181.421.54 CAC**68421.8158190.970.84
    CUA2560.0541610.60−0.55GlnCAA10820.1981160.89−0.70
    CUG**121092.4897491.401.08 CAG**105461.81100831.110.70
    IleAUU36280.6574451.02−0.37AsnAAU8760.1562300.86−0.71
    AUC**127032.2999751.370.92 AAC**111571.8582781.140.71
    AUA3070.0644490.61−0.55LysAAA8550.1068210.74−0.64
    MetAUG72191.0093161.000.00 AAG**168151.90117211.260.64
    ValGUU20930.3665941.03−0.67AspGAU26060.28117880.95−0.67
    GUC**160872.8089321.401.40 GAC**159821.72130771.050.67
    GUA2500.0433740.53−0.49GluGAA20570.21113640.87−0.66
    GUG45670.7966661.04−0.25 GAG**171441.79148301.130.66
    SerUCU19970.5066861.07−0.57CysUGU2040.1129230.79−0.68
    UCC**85532.1462361.001.14 UGC**36131.8945221.210.68
    UCA10420.2661790.99−0.73TERUGA2950.964151.36−0.40
    UCG48701.2262841.010.21TrpUGG46761.0072181.000.00
    ProCCU22560.4965101.02−0.53ArgCGU13320.4433800.71−0.27
    CCC**109692.4059860.941.46 CGC***106263.5153231.112.40
    CCA6820.1569961.10−0.95 CGA13470.4554911.15−0.70
    CCG43830.9660600.950.01 CGG24740.8243730.92−0.10
    ThrACU15800.3354570.88−0.55SerAGU3750.0941550.67−0.58
    ACC**107332.2164081.031.18 AGC**71521.7979401.270.52
    ACA10860.2265551.06−0.84ArgAGA5210.1752151.09−0.92
    ACG60201.2463731.030.21 AGG18570.6148881.02−0.41
    AlaGCU38790.4793491.00−0.53GlyGGU27870.4451570.75−0.31
    GCC***224602.69108791.161.53 GGC***190973.04100701.461.58
    GCA14880.1896131.03−0.85 GGA15500.2571211.03−0.78
    GCG55180.6675230.81−0.15 GGG16550.2652640.76−0.50
    注:*表示△RSCU>0.3;**表示△RSCU>0.5;***表示△RSCU >1.5。
    下载: 导出CSV

    表  4   里氏木霉与其他物种密码子偏好性比较

    Table  4   Comparision of codon preference between T. reesei and other species

    氨基酸
    密码子
    密码子使用频率 频率比值
    RLNSR/LR/NR/S
    PheUUU13.9613.3812.2726.97 1.041.140.52
    UUC22.3223.0321.4318.450.971.041.21
    LeuUUA1.771.742.8326.331.020.630.07
    UUG11.4711.6415.4227.000.990.740.42
    CUU11.1111.5314.3412.630.960.770.88
    CUC30.0830.3926.235.730.991.155.25
    CUA4.424.396.0513.501.010.730.33
    CUG30.4930.5218.3110.871.001.672.80
    IleAUU14.7014.1913.9130.161.041.060.49
    AUC27.6928.1526.0116.840.981.061.64
    AUA4.554.514.2318.361.011.080.25
    MetAUG21.9022.0521.4620.780.991.021.05
    ValGUU10.8810.9114.0321.631.000.780.50
    GUC29.2229.3924.1311.290.991.212.59
    GUA3.833.885.5412.180.990.690.31
    GUG17.1417.2315.8011.020.991.081.55
    SerUCU11.3011.0412.2423.491.020.920.48
    UCC18.3818.4720.0114.181.000.921.30
    UCA8.718.659.5718.891.010.910.46
    UCG16.5816.6614.948.861.001.111.87
    ProCCU12.3712.2615.6013.331.010.790.93
    CCC22.0421.6922.346.941.020.993.17
    CCA9.749.4612.8217.521.030.760.56
    CCG16.7817.1815.115.320.981.113.15
    ThrACU8.748.4911.4119.981.030.770.44
    ACC20.0719.9224.7412.401.010.811.62
    ACA9.368.9211.1417.751.050.840.53
    ACG18.6918.8413.928.230.991.342.27
    AlaGCU17.5217.4020.9220.141.010.840.87
    GCC39.7539.1935.2612.251.011.133.24
    GCA14.4714.0813.0016.261.031.110.89
    GCG20.3221.1517.476.340.961.163.21
    TyrUAU8.358.308.6218.901.010.970.44
    UAC18.4018.3016.9414.391.011.091.28
    TERUUA0.520.420.541.001.240.960.52
    UAG0.530.470.560.491.130.951.09
    HisCAU8.368.479.8413.680.990.850.61
    CAC15.6715.6314.757.611.001.062.06
    GlnCAA11.2111.1417.2926.501.010.650.42
    CAG29.4129.3625.4712.341.001.152.38
    AsnAAU8.928.7810.6936.101.020.830.25
    AAC25.6725.5426.3424.401.010.971.05
    LysAAA8.708.7911.9442.220.990.730.21
    AAG39.0038.7238.6630.421.011.011.28
    AspGAU19.8019.9824.3137.770.990.810.52
    GAC38.3038.2032.2120.011.001.191.91
    GluGAA18.3618.4523.1645.431.000.790.40
    GAG43.1243.1841.7819.591.001.032.20
    CysUGU3.003.063.528.180.980.850.37
    UGC9.229.327.455.000.991.241.85
    TERUGA0.900.830.760.631.081.181.42
    TrpUGG14.5214.4513.3110.491.001.091.38
    ArgCGU5.715.878.576.260.970.670.91
    CGC18.8618.5317.342.701.021.097.00
    CGA10.2910.567.183.140.971.433.28
    CGG10.6511.138.521.910.961.255.58
    SerAGU5.265.218.9014.631.010.590.36
    AGC21.4021.1617.7910.171.011.202.10
    ArgAGA6.846.818.0020.941.000.860.33
    AGG10.0210.2611.839.620.980.851.04
    GlyGGU9.509.8317.5022.50 0.970.540.42
    GGC36.7636.0828.479.901.021.293.71
    GGA12.5512.6813.8211.240.990.911.12
    GGG9.8210.1811.456.230.960.861.58
    注:与粗糙脉孢霉密码子使用频率比值≥1.50 或≤0.67的用下划线标记;与酿酒酵母密码子使用频率比值≥2.00 或≤0.5的用粗体标记;字母R、L、N、S分别表示里氏木霉、长梗木霉、粗糙脉孢霉、酿酒酵母。
    下载: 导出CSV
  • [1]

    KEMURA T. Codon usage and tRNA content in unicellular and multicellular organisms[J]. Molecular Biology and Evolution,1985,2:13−34.

    [2]

    DAS S, ROYMONDAL U, SAHOO S. Analyzing gene expression from relative codon usage bias in yeast genome: A statistical significance and biological relevance[J]. Gene,2009,443(1-2):121−31. doi: 10.1016/j.gene.2009.04.022

    [3]

    MORIYAMA E N. Codon usage bias and tRNA Abundance in Drosophila[J]. Journal of Molecular Evolution,1997,45(5):514−523. doi: 10.1007/PL00006256

    [4]

    DURET L, MOUCHIROUD D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis[J]. Proceedings of the National Academy of Sciences of the United States of America,1999,96(8):4482−4487. doi: 10.1073/pnas.96.8.4482

    [5]

    MORIYAMA E. Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli[J]. Nucleic Acids Research,1998,26(19):3188.

    [6]

    GUPTA S K, MAJUMDAR S, BHATTACHARYA T K, et al. Studies on the relationships between the synonymous codon usage and protein secondary structural units[J]. Biochemical and Biophysical Research Communications,2000,269(3):692−696. doi: 10.1006/bbrc.2000.2351

    [7]

    SHARP P M, BAILES E, GROCOCK R J, et al. Variation in the strength of selected codon usage bias among bacteria[J]. Nucleic Acids Resarch,2005,33(4):1141−1153. doi: 10.1093/nar/gki242

    [8]

    SHARP P M, TUOHY T M, MOSURSKI K R. Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes[J]. Nucleic Acids Resarch,1986,14(13):5125−5143. doi: 10.1093/nar/14.13.5125

    [9]

    ZHOU T, SUN X, LU Z. Synonymous codon usage in environmental chlamydia UWE25 reflects an evolutional divergence from pathogenic chlamydiae[J]. Gene,2006,368:117−125. doi: 10.1016/j.gene.2005.10.035

    [10]

    CHIAPELLO H, LISACEK F, CABOCHE M, et al. Codon usage and gene function are related in sequences of Arabidopsis thaliana[J]. Gene,1998,209(1):1−38.

    [11]

    SHARP P M, ELIZABETH C, HIGGINS D G, et al. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity[J]. Nucleic Acids Resarch,1988,16(17):8207−8211. doi: 10.1093/nar/16.17.8207

    [12]

    BISCHOF R H, RAMONI J, SEIBOTH B. Cellulases and beyond: the first 70 years of the enzyme producer Trichoderma reesei[J]. Microbial Cell Factories,2016,15(1):106. doi: 10.1186/s12934-016-0507-6

    [13]

    ZIELIŃSKA D, SZENTNER K, WAśKIEWICZ A, et al. Production of nanocellulose by enzymatic treatment for application in polymer composites[J]. Materials,2021,14(9):2124. doi: 10.3390/ma14092124

    [14]

    KUMAR M R, KUMARAN M D, BALASHANMUGAM P. Production of cellulase enzyme by Trichoderma reesei Cefl9 and its application in the production of bio-ethanol[J]. Pakistan Journal of Pharmaceutical Sciences,2014,17(5):735−739.

    [15]

    PEIL S, BECKERS S J, FISCHER J, et al. Biodegradable, lignin-based encapsulation enables delivery of Trichoderma reesei with programmed enzymatic release against grapevine trunk diseases[J]. Materials Today Bio,2020,7:100061. doi: 10.1016/j.mtbio.2020.100061

    [16]

    LICHTENBERG J, PEREZ C E, MADSEN K, et al. Safety evaluation of a novel muramidase for feed application[J]. Regulatory Toxicology and Pharmacology,2017,89:57−69. doi: 10.1016/j.yrtph.2017.07.014

    [17]

    LANDOWSKI C P, HUUSKONEN A, WAHL R, et al. Enabling low cost biopharmaceuticals: a systematic approach to delete proteases from a well-known protein production host Trichoderma reesei[J]. PLoS One,2015,10(8):e0134723. doi: 10.1371/journal.pone.0134723

    [18]

    SALOHEIMO M, CULLEN D, MARTINEZ D, et al. Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina)[J]. Nature Biotechnology,2008,26(5):553−560. doi: 10.1038/nbt1403

    [19]

    MICHAEL S. ROSENBERG M S, SUBRAMANIAN S, KUMAR S. Patterns of transitional mutation biases within and among mammalian genomes[J]. Molecular Biology and Evolution,2003,20(6):988−993. doi: 10.1093/molbev/msg113

    [20]

    SUEOKA N. Directional mutation pressure and neutral molecular evolution[J]. Proceedings of the National Academy of Sciences,1988,85:2653−2657. doi: 10.1073/pnas.85.8.2653

    [21]

    WRIGHT F. The 'effective number of codons' used in a gene[J]. Gene,1990,87(1):23−29. doi: 10.1016/0378-1119(90)90491-9

    [22]

    LIU H, RUI H, ZHANG H, et al. Analysis of synonymous codon usage in Zea mays[J]. Molecular Biology Reports,2010,37(2):677−684. doi: 10.1007/s11033-009-9521-7

    [23]

    SUEOKA N. Near homogeneity of PR2-bias fingerprints in the human genome and their implications in phylogenetic analyses[J]. Journal of Molecular Evolution,2001,53(4-5):469−476. doi: 10.1007/s002390010237

    [24]

    FUGLSANG A. The 'effective number of codons' revisited[J]. Biochemical and Biophysical Research Communications,2004,317(3):957−964. doi: 10.1016/j.bbrc.2004.03.138

    [25] 刘庆坡, 薛庆中. 粳稻叶绿体基因组的密码子用法[J]. 作物学报, 2004, 30(12): 1220-1224.

    LIU Q P, XUE Q Z. Codon usage in the chloroplast genome of rice (Oryza sativa L. ssp. japonica)[J]. Acta Agronomica Sinica, 2004, 30(12); 1220-1224.

    [26]

    ZHANG W J, JIE Z, LI Z F, et al. Comparative analysis of codon usage patterns among mitochondrion, chloroplast and nuclear Genes in Triticum aestivum L[J]. Journal of Integrative Plant Biology,2007,49(2):246−254. doi: 10.1111/j.1744-7909.2007.00404.x

    [27] 范三红, 郭蔼光, 单丽伟, 等. 拟南芥基因密码子偏爱性分析[J]. 生物化学与生物物理进展,2003,30(2):221−225. [FAN S H, GUO A G, SHAN L W, et al. Analysis of genetic code preference in Arabidopsis thaliana[J]. Progress in Biochemistry and Biophysics,2003,30(2):221−225. doi: 10.3321/j.issn:1000-3282.2003.02.012
    [28]

    JENKINS G M, HOLMES E C. The extent of codon usage bias in human RNA viruses and its evolutionary origin[J]. Virus Research,2003,92(1):1−7. doi: 10.1016/S0168-1702(02)00309-X

    [29]

    SUEOKA N. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position[J]. Gene,1999,238(1):53−58. doi: 10.1016/S0378-1119(99)00320-0

    [30]

    GUSTAFSSON C, GOVINDARAJAN S, MINSHULL J. Codon bias and heterologous protein expression[J]. Trends is Biotechnology,2004,22(7):346−53. doi: 10.1016/j.tibtech.2004.04.006

    [31]

    CHERRY J R, FIDANSTSEF A L. Directed evolution of industrial enzymes: an update[J]. Current Opinion in Biotechnology,2003,14(4):438−443. doi: 10.1016/S0958-1669(03)00099-5

    [32]

    VISSER H, JOOSTEN V, PUNT P J, et al. Development of a mature fungal technology and production platform for industrial enzymes based on a Myceliophthora thermophila isolate, previously known as Chrysosporium lucknowense C1[J]. Industrial Biotechnology,2011,7(3):214−223. doi: 10.1089/ind.2011.7.214

    [33]

    NOVEMBRE J A. Accounting for background nucleotide composition when measuring codon usage bias[J]. Molecular Biology and Evolution,2002(8):1390.

    [34]

    SUN F F, BAI R, YANG H, et al. Heterologous expression of codon optimized Trichoderma reesei Cel6A in Pichia pastoris[J]. Enzyme and Microbial Technology,2016,92:107−116. doi: 10.1016/j.enzmictec.2016.07.004

    [35] 白仁惠, 张云博, 王春迪, 等. 里氏木霉Cel5A基因优化及其在毕赤酵母中的高效表达[J]. 生物工程学报,2016,32(10):1381−1394. [BAI R H, ZHANG Y B, WANG C D, et al. Gene optimization and efficient expression of Trichoderma reesei Cel5A in Pichia pastoris[J]. Chinese Journal of Biotechnology,2016,32(10):1381−1394.
    [36]

    XIONG Y, WU V W, LUBBE A, et al. A fungal transcription factor essential for starch degradation affects integration of carbon and nitrogen metabolism[J]. PLoS Genetics,2017,13(5):e1006737. doi: 10.1371/journal.pgen.1006737

图(6)  /  表(4)
计量
  • 文章访问数:  301
  • HTML全文浏览量:  235
  • PDF下载量:  39
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-11-28
  • 网络出版日期:  2022-01-17
  • 刊出日期:  2022-03-14

目录

/

返回文章
返回