page banner

LIUDUS® Platform

Over the years, HaploX has developed a number of bioinformatics software, machine learning models and data libraries, which form the underlying data processing infrastructure to enable complex sequencing data to be efficiently processed with error removal and low-frequency mutations to be accurately detected and interpreted.

Bioinformatics Software

01

Fastp

To improve data preprocessing, HaploX developed Fastp, which integrates most features of commonly used tools for quality control, adapter trimming, and per-read quality pruning and filtering to achieve all-in-one preprocessing. Fastp is developed in C++ with multi-threading support, thus enabling it to run substantially faster. In published validation studies, Fastp demonstrated to be two to five times faster while providing similar or even better quality of data filtering and specificity of mutation detection compared with other major preprocessing tools.


Fastp was published in September 2018 in Bioinformatics, with more than 6,600 citations as of the Latest Practicable Date and being ranked among the top-five most cited science publications authored by Chinese researchers in the last five years. Fastp is available as an open-source tool on GitHub.

02

Gencore

Gencore is an efficient bioinformatics tool developed by HaploX for removing redundant sequencing information. It can provide fast, memory-efficient PCR duplicate removal and consensus read generation for NGS data with or without UMIs, with informative statistical reports to facilitate quality control and downstream analysis.


PCR duplicates are sequence reads resulting from sequencing multiple PCR copies of the same DNA fragment. They are removed by most analysis pipelines as they may contain PCR errors contributing to false-positive mutations. However, most existing bioinformatics tools for PCR duplicate removal either cannot handle UMI-barcoded data or are slow, memory-intensive and lack statistical results reporting that informs quality control and downstream analysis.


Gencore was published in December 2019 in BMC Bioinformatics and is available as an open-source tool on GitHub.

03

MutScan

MutScan is a high-performance bioinformatics tool for the detection and visualization of target mutations. It is designed to improve detection sensitivity and offer efficient validation for target mutations.


In contrast to conventional multi-step mutation detection pipeline, MutScan can directly detect target mutations from raw FASTQ files using a string searching algorithm with high error tolerance to increase sensitivity. MutScan can then validate the detected mutations, or mutations identified by conventional pipelines, via an HTML report for each mutation, from which users can evaluate the confidence of a mutation via multiple metrics, such as the number of supporting reads, the quality scores of the bases at a mutation point and the rate of duplicated reads.


MutScan was published in BMC Bioinformatics in January 2018 and is available as an open-source tool on GitHub.

04

GeneFuse

GeneFuse is a fast and sensitive bioinformatics tool for the detection and visualization of target gene fusions.


GeneFuse is designed with a clinical focus. In contrast to most gene fusion tools, which rely on an alignment step in which sequencing reads are aligned to a reference genome using mapping tools, GeneFuse is designed to detect gene fusions by scanning raw sequencing data directly. This distinctive feature allows GeneFuse to have a higher sensitivity and specificity by circumventing common false positive and false negative issues that arise due to misalignment, which often happens for sequencing reads containing fusions.


GeneFuse was published in the International Journal of Biological Sciences in May 2018 and is available as an open-source tool on GitHub.

05

FineMSI

FineMSI is HaploX's novel, patented bioinformatics tool for analyzing microsatellite loci and determining MSI status.


It evaluates the earth movers' distance (EMD), a mathematical method to evaluate dissimilarity, between the distributions of MSI-high data and MSI-low data, with the EMD value signifying the degree of MSI.


FineMSI can interrogate substantially more informative microsatellite loci than existing gold standard methods. In in-house validation studies, FineMSI demonstrated greater sensitivity and specificity than a widely used NGS-based MSI method, indicating the potential of FineMSI as an accurate method for MSI status determination.

Machine Learning

Recognizing the power of machine learning to solve complicated image denoising and target recognition problems, HaploX has established a machine learning-based workflow, trained on our extensive real-world data, to identify and eliminate background noise from sequencing data with minimal manual intervention, thus improving our ability to reveal low-frequency cancer mutations, which translates into more sensitive and specific genetic tests.


For example, HaploX developed a patented software, MrBam, for variant noise filtering by training machine learning models with extensive data of background noise and false-positive mutation sites. HaploX has also developed machine learning approaches in classification problems, exemplified by TCRnodseek, a machine learning-based model to classify pulmonary nodules into malignant or benign types for early lung cancer detection.

01

TCRnodseek

To enable a more accurate classification of benign and malignant pulmonary nodules, we developed TCRnodseek, jointly with Sichuan Cancer Hospital. TCRnodseek uses support vector machine, and integrates TCR characteristics and clinical information. Based on the results in 99 individuals with indeterminate pulmonary nodules, TCRnodseek was able to correctly classify most malignant and benign nodules, with robust sensitivity of 76%, specificity of 91%, accuracy of 84% and an AUC of 0.8. The clinical results were published in October 2022 in Signal Transduction and Targeted Therapy.

Luo, H., Zu, R., Huang, Z. et al. Characteristics and significance of peripheral blood T-cell receptor repertoire features in patients with indeterminate lung nodules. Sig Transduct Target Ther 7, 348 (2022). https://doi.org/10.1038/s41392-022-01169-7

Data Libraries

数据库

01

Data Libraries

Leveraging the real-world sequencing data accumulated through our long-standing molecular diagnostics services, HaploX has developed several proprietary mutation knowledge data libraries for facilitating mutation interpretation, including:


1. HapKnow with annotations for over 1.4 million tumor somatic mutations;


2. HapHeal with annotations for over 1.3 million hereditary mutations.


These data libraries are used collectively in our HapReport, our proprietary report interpretation system, to facilitate automated reporting of mutations with concise interpretations of their clinical significance.