HaploX's proprietary molecular technologies cover all the major steps within the wet lab NGS workflow to allow high-yield, high-quality nucleic acid isolation and sequencing library construction with minimized sequencing errors.
01
CUBE-ctDNA
CUBE-ctDNA technology is HaploX's proprietary single-molecule barcoding technology to improve the signal-to-noise ratio in ultra-deep sequencing. During library preparation, this technology assigns two unique molecular identifiers (UMIs) to both ends of each input double-stranded DNA molecule. Based on its UMIs, every sequence read after PCR amplification and sequencing can be traced back to the original DNA molecule. The true mutations should be present in most sequencing reads tagged with the same UMIs after PCR amplification and sequencing, whereas false mutations such as PCR and sequencing errors would only be present in a fraction of sequence reads that share the same UMIs. Moreover, by attaching UMIs to both strands of the DNA, CUBE-ctDNA is able to leverage the sequence complementarity of double-stranded DNA to filter out false mutations detected in one but not the other strand of the original double-stranded DNA molecule.
Through the CUBE-ctDNA technology, we can significantly reduce NGS background noise from an industry of 0.1% to 0.00058% in preclinical validation studies. This allows us to conduct ultra-deep sequencing with a ctDNA LoD as low as 0.005%.
Compared to a competitive DNA extraction product from an international molecular diagnostics company in our preclinical validation studies, Effentration® consistently achieved a greater cfDNA isolation efficiency and a higher average sequencing depth.
02
Effentration®
DNA extraction is a critical first step to the success of NGS-based variant detection. To that end, HaploX developed Effentration® for effective cfDNA extraction. Via the use of lysis, binding and washing solutions with proprietary formulations, Effentration® can effectively enrich low-abundance cfDNA and remove undesirable nucleic acid contaminants.
03
HAPCap®
During library preparation, the starting DNA material from biopsy samples needs to be PCR amplified to ensure enough amount of DNA for sequencing. As ctDNA usually represents a small fraction of normally occurring cfDNA, conventional PCR amplification inevitably dilutes the probability of detecting because the PCR favors amplification of the dominant cfDNA types. To address this problem, HaploX developed HAPCap® for ctDNA enrichment during DNA capture. As illustrated in the diagram, HAPCap® first uses emulsion PCR, in which input DNA molecules are amplified in physically separated micro-volume water-in-oil droplets. It then uses ctDNA-specific, biotin-labeled DNA probes to hybridize to ctDNA and captures the probe-bound ctDNA using streptavidin magnetic beads. With HAPCap®, PCR amplification bias can be effectively removed, enabling low-abundance ctDNA to be captured effectively and selectively amplified to enhance the sensitivity of mutation detection in ctDNA.
HAPCap® Technology
Over the years, HaploX has developed a number of bioinformatics software, machine learning models and data libraries, which form the underlying data processing infrastructure to enable complex sequencing data to be efficiently processed with error removal and low-frequency mutations to be accurately detected and interpreted.
Bioinformatics Software
01
Fastp
To improve data preprocessing, HaploX developed Fastp, which integrates most features of commonly used tools for quality control, adapter trimming, and per-read quality pruning and filtering to achieve all-in-one preprocessing. Fastp is developed in C++ with multi-threading support, thus enabling it to run substantially faster. In published validation studies, Fastp demonstrated to be two to five times faster while providing similar or even better quality of data filtering and specificity of mutation detection compared with other major preprocessing tools.
Fastp was published in September 2018 in Bioinformatics, with more than 6,600 citations as of the Latest Practicable Date and being ranked among the top-five most cited science publications authored by Chinese researchers in the last five years. Fastp is available as an open-source tool on GitHub.
02
Gencore
Gencore is an efficient bioinformatics tool developed by HaploX for removing redundant sequencing information. It can provide fast, memory-efficient PCR duplicate removal and consensus read generation for NGS data with or without UMIs, with informative statistical reports to facilitate quality control and downstream analysis.
PCR duplicates are sequence reads resulting from sequencing multiple PCR copies of the same DNA fragment. They are removed by most analysis pipelines as they may contain PCR errors contributing to false-positive mutations. However, most existing bioinformatics tools for PCR duplicate removal either cannot handle UMI-barcoded data or are slow, memory-intensive and lack statistical results reporting that informs quality control and downstream analysis.
Gencore was published in December 2019 in BMC Bioinformatics and is available as an open-source tool on GitHub.
03
MutScan
MutScan is a high-performance bioinformatics tool for the detection and visualization of target mutations. It is designed to improve detection sensitivity and offer efficient validation for target mutations.
In contrast to conventional multi-step mutation detection pipeline, MutScan can directly detect target mutations from raw FASTQ files using a string searching algorithm with high error tolerance to increase sensitivity. MutScan can then validate the detected mutations, or mutations identified by conventional pipelines, via an HTML report for each mutation, from which users can evaluate the confidence of a mutation via multiple metrics, such as the number of supporting reads, the quality scores of the bases at a mutation point and the rate of duplicated reads.
MutScan was published in BMC Bioinformatics in January 2018 and is available as an open-source tool on GitHub.
04
GeneFuse
GeneFuse is a fast and sensitive bioinformatics tool for the detection and visualization of target gene fusions.
GeneFuse is designed with a clinical focus. In contrast to most gene fusion tools, which rely on an alignment step in which sequencing reads are aligned to a reference genome using mapping tools, GeneFuse is designed to detect gene fusions by scanning raw sequencing data directly. This distinctive feature allows GeneFuse to have a higher sensitivity and specificity by circumventing common false positive and false negative issues that arise due to misalignment, which often happens for sequencing reads containing fusions.
GeneFuse was published in the International Journal of Biological Sciences in May 2018 and is available as an open-source tool on GitHub.
05
FineMSI
FineMSI is HaploX's novel, patented bioinformatics tool for analyzing microsatellite loci and determining MSI status.
It evaluates the earth movers' distance (EMD), a mathematical method to evaluate dissimilarity, between the distributions of MSI-high data and MSI-low data, with the EMD value signifying the degree of MSI.
FineMSI can interrogate substantially more informative microsatellite loci than existing gold standard methods. In in-house validation studies, FineMSI demonstrated greater sensitivity and specificity than a widely used NGS-based MSI method, indicating the potential of FineMSI as an accurate method for MSI status determination.
Machine Learning
Recognizing the power of machine learning to solve complicated image denoising and target recognition problems, HaploX has established a machine learning-based workflow, trained on our extensive real-world data, to identify and eliminate background noise from sequencing data with minimal manual intervention, thus improving our ability to reveal low-frequency cancer mutations, which translates into more sensitive and specific genetic tests.
For example, HaploX developed a patented software, MrBam, for variant noise filtering by training machine learning models with extensive data of background noise and false-positive mutation sites. HaploX has also developed machine learning approaches in classification problems, exemplified by TCRnodseek, a machine learning-based model to classify pulmonary nodules into malignant or benign types for early lung cancer detection.
01
TCRnodseek
To enable a more accurate classification of benign and malignant pulmonary nodules, we developed TCRnodseek, jointly with Sichuan Cancer Hospital. TCRnodseek uses support vector machine, and integrates TCR characteristics and clinical information. Based on the results in 99 individuals with indeterminate pulmonary nodules, TCRnodseek was able to correctly classify most malignant and benign nodules, with robust sensitivity of 76%, specificity of 91%, accuracy of 84% and an AUC of 0.8. The clinical results were published in October 2022 in Signal Transduction and Targeted Therapy.
Luo, H., Zu, R., Huang, Z. et al. Characteristics and significance of peripheral blood T-cell receptor repertoire features in patients with indeterminate lung nodules. Sig Transduct Target Ther 7, 348 (2022). https://doi.org/10.1038/s41392-022-01169-7
Data Libraries
01
Data Libraries
Leveraging the real-world sequencing data accumulated through our long-standing molecular diagnostics services, HaploX has developed several proprietary mutation knowledge data libraries for facilitating mutation interpretation, including:
1. HapKnow with annotations for over 1.4 million tumor somatic mutations;
2. HapHeal with annotations for over 1.3 million hereditary mutations.
These data libraries are used collectively in our HapReport, our proprietary report interpretation system, to facilitate automated reporting of mutations with concise interpretations of their clinical significance.
HaploX has in-house developed an end-to-end IT infrastructure that serves as the backbone of our digital, automated and smart laboratory operations.
01
HapLab®
HaploX’s proprietary molecular technologies and in-house developed bioinformatics tools under LIUDUS® are embedded within HapLab®, a LIMS system that enables information-based, closed-loop management of our end-to-end NGS workflow.
HapLab® comprises four core modules:
1. Order management module for systematic accession and progress tracking of clinical samples to ensure sample integrity and proper chain of custody;
2. Wet lab workflow management module for automating project resourcing and organization of experimental workflow and data to reduce the need for user intervention, increase wet lab operation efficiency and maintain data integrity;
3. Bioinformatics analysis module for operationalizing complex multi-step bioinformatics workflow with automatic parallel analysis to enhance analysis efficiency;
4. Report interpretation module for integrating multidimensional information from patients to facilitate data management and data mining at a patient level (e.g., data on multiple variables for the same patient) and at a population level (e.g., associations between various patient-level variables for a particular indication).
02
HapYun®
HapYun® is our proprietary one-stop cloud platform that drives efficient and scalable data management, storage, analysis and delivery.
HapYun® is built with a rich library of modular, out-of-the-box pipelines and tools for diverse molecular diagnostics applications, thus enabling efficient, customizable analysis of high-throughput genomics data at scale, with automatic task execution, real-time operation log and performance monitoring.
8th Floor, Aotexun Powering Building, Songpingshan Road, High-tech North District, Nanshan District, Shenzhen
深圳市海普洛斯生物科技有限公司 2014-2024 © 版权所有 | 粤ICP备14082947号