Clustering and Curation of Electropherograms: An Efficient Method for Analysing Large Cohorts of Glycomic Profiles in Tracking the Effects of Multivariate Parameters in Bioprocessing Operations

Page view(s)

Checked on Aug 23, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/17248

Title:

Clustering and Curation of Electropherograms: An Efficient Method for Analysing Large Cohorts of Glycomic Profiles in Tracking the Effects of Multivariate Parameters in Bioprocessing Operations

Journal Title:

Beilstein Journal of Organic Chemistry

DOI:

10.3762/bjoc.16.176

Publication URL:

https://doi.org/10.3762/bjoc.16.176

Authors:

Walsh Ian, Choo Matthew S F, Chiin Sim Lyn, Mak Amelia, Tay Shi Jie, Rudd Pauline M, Yuansheng Yang, Choo Andre, Ying Swan Ho, Nguyen-Khuong Terry

Keywords:

capillary electrophoresis, clustering, data analysis, electropherogram, Glycosylation, monoclonal antibodies, peak picking, process development

Publication Date:

27 August 2020

Citation:

Walsh, I.; Choo, M. S. F.; Chiin, S. L.; Mak, A.; Tay, S. J.; Rudd, P. M.; Yuansheng, Y.; Choo, A.; Swan, H. Y.; Nguyen-Khuong, T. Beilstein J. Org. Chem. 2020, 16, 2087–2099. doi:10.3762/bjoc.16.176

Abstract:

The accurate assessment of antibody glycosylation during bioprocessing requires the high-throughput generation of large amounts of glycomics data. This allows bioprocess engineers to identify critical process parameters that control the glycosylation critical quality attributes. The advances made in protocols for capillary electrophoresis-laser-induced fluorescence (CE-LIF) measurements of antibody N-glycans have increased the potential for generating large datasets of N-glycosylation values for assessment. With large cohorts of CE-LIF data, peak picking and peak area calculations still remain a problem for fast and accurate quantitation, despite the presence of internal and external standards to reduce misalignment for the qualitative analysis. The peak picking and area calculation problems are often due to fluctuations introduced by varying process conditions resulting in heterogeneous peak shapes. Additionally, peaks with co-eluting glycans can produce peaks of a non-Gaussian nature in some process conditions and not in others. Here, we describe an approach to quantitatively and qualitatively curate large cohort CE-LIF glycomics data. For glycan identification, a previously reported method based on internal triple standards is used. For determining the glycan relative quantities our method uses a clustering algorithm to ‘divide and conquer’ highly heterogeneous electropherograms into similar groups, making it easier to define peaks manually. Open-source software is then used to determine peak areas of the manually defined peaks. We successfully applied this semi-automated method to a dataset (containing 391 glycoprofiles) of monoclonal antibody biosimilars from a bioreactor optimization study. The key advantage of this computational approach is that all runs can be analyzed simultaneously with high accuracy in glycan identification and quantitation and there is no theoretical limit to the scale of this method.

License type:

http://creativecommons.org/licenses/by/4.0/

Funding Info:

The authors thank the Agency for Science, Technology and Research (A*STAR), Singapore for supporting this study (SSF Project Grant A1818g0025).

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/17248

ISSN:

1860-5397

Collections:

Bioprocessing Technology Institute

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
There are no attached files.