[1]YANG Shuai,GUO Maozu,ZHAO Lingling,et al.The method of 100-kernel weight related genes mining in maize mixed with genetic algorithm and XGboost[J].CAAI Transactions on Intelligent Systems,2022,17(1):170-180.[doi:10.11992/tis.202105005]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
17
Number of periods:
2022 1
Page number:
170-180
Column:
人工智能院长论坛
Public date:
2022-01-05
- Title:
-
The method of 100-kernel weight related genes mining in maize mixed with genetic algorithm and XGboost
- Author(s):
-
YANG Shuai1; 2; GUO Maozu1; 2; ZHAO Lingling3; LI Yang1; 2
-
1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China;
2. Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, China;
3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
-
- Keywords:
-
genetic algorithm; eXtreme gradient boosting; machine learning; maize; transcriptome analysis; 100-kernel weight; gene ontology; kyoto encyclopedia of genes and genomes
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202105005
- Abstract:
-
The RNA-Seq-based transcriptome sequencing data has a high feature dimension that requires a lot of computing resources when using traditional methods to find phenotype related genes. Moreover, the range of candidate genes obtained by difference analysis is large, and further screening depends on existing a prior knowledge. A transcriptome analysis method combining genetic algorithm and XGBoost, GA-XGBoost, was proposed to narrow the range of candidate genes for subsequent analysis by incorporating machine learning algorithm. A comparative experiment and subsequent analysis of the gene-100-kernel weight trait association on a set of high-quality maize datasets showed that, compared with training the XGBoost model directly with whole genes and differentially expressed genes, the candidate gene training XGBoost model obtained by the proposed method had the minimum MSE in predicting the 100-kernel weight of maize. Compared with 1542 differentially expressed genes in the results of differential expression analysis, the range of candidate genes was reduced to 48 by the GA-XGBoost method, which was reduced by 31 times, indicating that the proposed method could effectively improve the ability and efficiency of transcriptome data analysis.