Publications
2024
- Under ReviewLearning genome-wide interactions of intrinsically disordered proteins with DNA using U-DisCoHongwei Tu, Yang Zhang, and Jian MaUnder Review. GitHub , 2024
Proteins are essential regulators of cellular processes. Intrinsically disordered proteins (IDPs), despite lacking stable tertiary structures under physiological conditions, play crucial yet often underexplored roles in biological processes. With recent experimental advances like DisP-seq for probing IDP-DNA binding, there is a pressing need for efficient, interpretable computational methods to identify sequence determinants of IDP-DNA interactions and analyze their cooperative effects on gene regulation. To address this, we develop U-DisCo, a novel deep learning model that predicts base-resolution IDP-DNA binding profiles directly from DNA sequences. Leveraging a U-Net architecture, U-DisCo captures both local base-level interactions and long-range dependencies up to 20 kilobases with high accuracy and computational efficiency, outperforming the baseline BPNet. By incorporating ATAC-seq data, U-DisCo enables robust cross-cell type predictions as a multimodal framework. U-DisCo identified key IDP-binding motifs, revealing distinct interaction patterns and cooperative behaviors across different IDPs. Interestingly, we observed short-range interactions for motifs like AP-2 and EWS-FLI1 (single GGAA motif), while others exhibited independent, enhancer-like functions. Further analysis revealed that some IDPs favored certain strand orientations, suggesting their involvement in specific regulatory mechanisms. Overall, U-DisCo is the first computational approach to explore multiple IDPs within a single cell type, offering a versatile framework for studying IDP-mediated gene regulation and genome-wide regulatory elements.
@article{tu2024learning, title = {Learning genome-wide interactions of intrinsically disordered proteins with DNA using U-DisCo}, author = {Tu, Hongwei and Zhang, Yang and Ma, Jian}, journal = {Under Review}, year = {2024}, }
- Small MethodsRotNet: A Rotationally Invariant Graph Neural Network for Quantum Mechanical CalculationsHongwei Tu, Yanqiang Han, Zhilong Wang, and 6 more authorsSmall Methods. GitHub , 2024
Deep learning has proven promising in biological and chemical applications, aiding in accurate predictions of properties such as atomic forces, energies, and material band gaps. Traditional methods with rotational invariance, one of the most crucial physical laws for predictions made by machine learning, have relied on Fourier transforms or specialized convolution filters, leading to complex model design and reduced accuracy and efficiency. However, models without rotational invariance exhibit poor generalization ability across datasets. Addressing this contradiction, this work proposes a rotationally invariant graph neural network, named RotNet, for accurate and accelerated quantum mechanical calculations that can overcome the generalization deficiency caused by rotations of molecules. RotNet ensures rotational invariance through an effective transformation and learns distance and angular information from atomic coordinates. Benchmark experiments on three datasets (protein fragments, electronic materials, and QM9) demonstrate that the proposed RotNet framework outperforms popular baselines and generalizes well to spatial data with varying rotations. The high accuracy, efficiency, and fast convergence of RotNet suggest that it has tremendous potential to significantly facilitate studies of protein dynamics simulation and materials engineering while maintaining physical plausibility.
@article{tu2024rotnet, title = {RotNet: A Rotationally Invariant Graph Neural Network for Quantum Mechanical Calculations}, author = {Tu, Hongwei and Han, Yanqiang and Wang, Zhilong and Chen, An and Tao, Kehao and Ye, Simin and Wang, Shiwei and Wei, Zhiyun and Li, Jinjin}, journal = {Small Methods}, volume = {8}, number = {1}, pages = {2300534}, year = {2024}, publisher = {Wiley Online Library}, doi = {10.1002/smtd.202300534}, dimensions = {true} }
2022
- BIBClustered tree regression to learn protein energy change with mutated amino acidHongwei Tu, Yanqiang Han, Zhilong Wang, and 1 more authorBriefings in Bioinformatics. GitHub , 2022
Accurate and effective prediction of mutation-induced protein energy change remains a great challenge and of great interest in computational biology. However, high resource consumption and insufficient structural information of proteins severely limit the experimental techniques and structure-based prediction methods. Here, we design a structure-independent protocol to accurately and effectively predict the mutation-induced protein folding free energy change with only sequence, physicochemical and evolutionary features. The proposed clustered tree regression protocol is capable of effectively exploiting the inherent data patterns by integrating unsupervised feature clustering by K-means and supervised tree regression using XGBoost, and thus enabling fast and accurate protein predictions with different mutations, with an average Pearson correlation coefficient of 0.83 and an average root-mean-square error of 0.94kcal/mol. The proposed sequence-based method not only eliminates the dependence on protein structures, but also has potential applications in protein predictions with rare structural information.
@article{tu2022clustered, title = {Clustered tree regression to learn protein energy change with mutated amino acid}, author = {Tu, Hongwei and Han, Yanqiang and Wang, Zhilong and Li, Jinjin}, journal = {Briefings in Bioinformatics}, volume = {23}, number = {6}, pages = {bbac374}, year = {2022}, publisher = {Oxford University Press}, doi = {10.1093/bib/bbac374}, dimensions = {true} }