Welcome international healthcare professionals

This site is no longer supported and will not be updated with new content. You are welcome to browse and download all content already included in the site. Please note you will have to register your email address to access the site.

You are here

Identification of B-cell lymphoma subsets by plasma protein profiling using recombinant antibody microarrays

Leukemia Research, 6, 38, pages 682 - 690


B-cell lymphoma (BCL) heterogeneity represents a key issue, often making the classification and clinical management of these patients challenging. In this pilot study, we outlined the first resolved view of BCL disease heterogeneity on the protein level by deciphering disease-associated plasma biomarkers, specific for chronic lymphocytic leukemia, diffuse large B-cell lymphoma, follicular lymphoma, and mantle cell lymphoma, using recombinant antibody microarrays targeting mainly immunoregulatory proteins. The results showed the BCLs to be heterogeneous, and revealed potential novel subgroups of each BCL. In the case of diffuse large B-cell lymphoma, we also indicated a link between the novel subgroups and survival.

Abbreviations: BCL - B-cell lymphoma, CLL - chronic lymphocytic leukemia, DLBCL - diffuse large B-cell lymphoma, FC - fold change, FL - follicular lymphoma, IGHV - immunoglobulin heavy-chain variable, IHC - immunohistochemistry, MCL - mantle cell lymphoma, N - population controls, NHL - non-Hodgkin lymphoma, scFv - single-chain Fragment variable.

Keywords: B-cell lymphoma, Plasma protein profiling, Biomarker, Disease heterogeneity, MCL, FL, CLL, DLBCL.

1. Introduction

Non-Hodgkin lymphoma (NHL), the most common malignant hematological disorder, is to 85% made up of B-cell lymphomas (BCLs) [1] . This heterogeneous disease group ranges from indolently growing tumors, e.g. follicular lymphoma (FL) and chronic lymphocytic leukemia (CLL), to aggressive malignancies, e.g. diffuse large B-cell lymphoma (DLBCL) and mantle cell lymphoma (MCL) [2] . However, each of these entities is also heterogeneous with regards to clinical presentation and outcome, often making clinical management of these patients difficult [2] . Recent technological advances, mainly in gene expression profiling, have shed some light on this heterogeneity, revealing multiple subsets correlated with diverse outcome, hopefully allowing for more efficient, personalized treatments in the future [3], [4], [5], [6], and [7]. In the case of DLBCL, two or three subgroups stemming from cellular origin have been suggested, which differ in disease severity; germinal center B-cell like, activated B-cell like and type 3 [3] and [4]. CLL can in turn be subdivided into two broad subgroups by the immunoglobulin heavy-chain variable (IGHV) mutational status, while FL can be classified into grades according to the proportion of centroblasts present, and also sometimes transform into the more aggressive DLBCL [5], [6], and [7]. Albeit potentially powerful as prognostic markers, genetic markers and gene expression profiles suffer from the drawback that they cannot readily be introduced into today’ clinical routine practice due to technical issues [8] . Classification of lymphomas has traditionally been performed on tumor tissue, by microscopic studies of cell morphology along with immunophenotyping by immunohistochemistry (IHC). Common markers are cell surface membrane proteins, such as CD5, CD19, CD20, CD10, CD45, CD20, and CD3 along with the presence, or absence, of intracellular proteins, such as BCL-2, BCL-6, Cyclin D1, and SOX-11 [9], [10], [11], and [12]. The IHC approach has the advantage of revealing both subcellular localization and distribution of proteins; however, throughput is a key bottleneck [13] . Hence, additional means of deciphering heterogeneity among BCLs in a high-throughput manner, preferentially targeting a non-invasive sample format, such as plasma, would be essential.

In this pilot study, we attempted to decipher a first resolved view of BCL disease heterogeneity on the protein level by identifying BCL-associated plasma biomarker signatures, specific for CLL, FL, DLBCL, and MCL, using our in-house designed recombinant antibody microarrays. The array set-up was based on 159 antibodies targeting 66 unique proteins, mainly immunoregulatory analytes [14] and [15], anticipated to reflect the molecular pathogenesis of BCLs. The results showed the BCLs to be highly heterogeneous, and revealed potential novel subgroups of each BCL studied based on plasma protein signatures. Furthermore, in the case of the aggressive DLBCLs, we also indicated a possible link between the newly discovered subgroups and survival.

2. Materials and methods

2.1. Clinical samples

In total, de-identified plasma samples from 218 subjects were collected from the SCALE (Scandinavian Lymphoma Etiology) study [16] . Briefly, this population-based case–control study encompassed residents 18–74 years old, living in Denmark from June 1, 2000, to August 30, 2002 and in Sweden from October 1, 1999, to April 15, 2002, and samples were collected from 157 hospital clinics in the two countries. Control subjects were randomly sampled from updated population registers and frequency-matched on sex and age (in 10-year intervals) to the expected distribution of NHL case patients in each country. For the present analysis, a sample of patients diagnosed with CLL (n = 40), DLBCL (n = 58), FL (n = 40), and MCL (n = 40) that had not yet initiated treatment for lymphoma, and population controls were selected (n = 40) ( Table 1 ). The patient subsets and controls were randomly selected within matched strata by sex, age group, and Ann Arbor stage (patients only). Patients with DLBCL were additionally selected in two equally sized groups based on prognosis; one group with lymphoma-specific death occurring with 18 months (short survival group) and one with patients surviving at least 4 years (long survival group). Even though the DLBCL patients were not originally matched to the other patient subgroups and controls, age, sex, and stage distributions were similar, although the female sex and early stages were better represented among the DLBCLs.

Table 1 Demographic data of the patients included in the study.

Parameter DLBCL a CLL MCL FL N
  DLBCL total DLBCL short DLBCL long        
No. 54 28 26 30 39 38 40
Gender (male:female) 31:23 16:12 15:11 20:10 28:11 28:10 27:13
Age at diagnosis 62 (46–74) 62 (47–73) 61 (46–74) 64 (46–73) 63 (46–73) 62 (45–74) 63 (46–74)
3-year overall survival (%) 48% b 0% 100% 83% 51% 68%
Mutational status (no. mutated:unmutated) 17:8 c
Ann Arbor stage (1:2:3:4) 6:13:13:19 d 3:6:7:10 d 3:7:6:9 d 2:3:8:26 5:1:9:22 e

a Data on GC/non-GC not at hand.

b Samples chosen to match number survived vs. deceased.

c Data missing for 5 CLL samples.

d Data missing for 3 DLBCL samples, 2 DLBCL short and 1 DLBCL long.

e Data missing for 1 FL sample.

Information regarding misdiagnosis of 9 CLL samples was received after laboratory work was completed; hence, these samples were analyzed on antibody microarrays, but were excluded from the data analysis. In addition, 1 CLL sample, 1 MCL sample, 2 FL samples and 4 DLBCL samples, were excluded from the data analysis due to high background and low signal-to-noise ratios. Hence, 201 samples were included in the data analysis. This change did not impair the degree of matching (data not shown). All samples were aliquoted and stored at −80 °C.

2.2. Labeling of plasma samples

The plasma-EDTA samples were labeled with biotin at a molar ratio of biotin:protein of 15:1 using EZ-Link Sulfo-NHS-LC-Biotin (Pierce, Rockford, IL), according to a protocol previously described elsewhere [17] with one modification, EDTA dipotassium salt dihydrate (Sigma–Aldrich, St. Louis, USA) was added to the labeling buffer to a concentration of 4 mM in order to avoid clotting. The samples were aliquoted and stored at −20 °C prior to use.

2.3. Production and purification of single-chain fragment variable (scFv)

One hundred and fifty-nine human recombinant scFv antibody fragments directed against 66 different proteins mainly involved in immunoregulation expected to reflect the pathogenesis of B-cell lymphomas were selected from a large phage display library [18] ( Table 2 ). The library has been genetically constructed around a single, constant scaffold, VH3-23 – VλL1-47, known to display excellent structural and functional properties [19] . The specificity, affinity (normally in the nM range), and on-chip functionality of these phage display derived scFv antibodies was ensured by using (i) stringent phage-display selection and screening protocols [18] , (ii) multiple clones (1–4) per target, and (iii) a molecular design, adapted for microarray applications [20] . In addition, the specificity of several of the antibodies have previously also been validated using well-characterized, standardized serum samples (with known analytes of the targeted analytes), and orthogonal methods, such as mass spectrometry (affinity pull-down experiments), ELISA, MesoScaleDiscovery (MSD) assay, cytometric bead assay, and MS, as well as using spiking and blocking experiments [21], [22], [23], [24], [25], [26], [27], [28], and [29]. Notably, the reactivity of some antibodies might be lost since the label (biotin) used to label the sample to enable detection could block the affinity binding to the antibodies (epitope masking), but we have bypassed this problem, as in this study, by frequently including more than one antibody against the same protein, but directed against different epitopes [20] .

Table 2 Summary of proteins analyzed by the microarray.

Antigen (no. of clones) Antigen (no. of clones)
Angiomotin (2) IL-7 (2)
Apo-A1 (3) IL-8 (3) *
β-Galactosidase (1) IL-9 (3)
Bruton tyrosine kinase BTK (1) IL-10 (3) *
C1 esterase inhibitor (4) IL-11 (3)
C1q (1) * IL-12 (4) *
C1s (1) IL-13 (4) *
C3 (6) * IL-16 (3)
C4 (4) * IL-18 (3)
C5 (3) * Integrin α-10 (1)
CD40 (4) Integrin α-11 (1)
CD40 ligand (1) IFN-γ (3)
Cholera toxin subunit B (1) (control) LDL (2)
Cystatin C (4) Leptin (1)
Digoxin (control) (1) Lewisx (2)
Eotaxin (3) Lewisy (1)
Factor B (4) * MCP-1 (9) *
GLP-1 (1) MCP-3 (3)
GLP-1 R (1) MCP-4 (3)
GM-CSF (3) Mucin-1 (1)
HLA-DR (1) Procathepsin W (1)
ICAM (1) Properdin (1) *
IgM (5) PSA (1)
IL-1α (3) * RANTES (3)
IL-1β (3) Sialyl Lewisx (1)
IL-1ra (3) TGF-β1 (3)
IL-2 (3) TM peptide (1)
IL-3 (3) TNF-α (3)
IL-4 (4) * TNF-β (4) *
IL-5 (3) * Tyrosine protein kinase JAK3 (1)
IL-6 (4) * VEGF (3) *

* The specificity of all antibodies, selected from phage display libraries, was ensured using stringent selection and screening protocols. In addition, extra control of the specificity was performed for all antibodies marker with a *, targeting pure analytes and/or well-characterized serum samples using either antibody microarray analysis and/or orthogonal methods, such as mass spectrometry (affinity pull-down experiments), ELISA, protein array, and MSD, as well as blocking/spiking experiments [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], and [29].

All scFv antibodies were produced in 100 ml Escherichia coli and purified from expression supernatants using affinity chromatography on Ni2+-NTA agarose (Qiagen, Hilden, Germany). ScFvs were eluted using 250 mM imidazole, extensively dialyzed against PBS (pH 7.4), and stored at 4 °C until use. The protein concentration was determined by measuring the absorbance at 280 nm (average 610 μg/ml, range 40–2100 μg/ml). The degree of purity and integrity of the scFv antibodies was evaluated by 10% SDS-PAGE (Invitrogen, Carlsbad, CA, USA).

2.4. Fabrication and processing of antibody microarrays

For production of the antibody microarrays, we used a setup previously optimized and validated [17], [21], and [23]. Briefly, the scFv microarrays were fabricated using a noncontact printer (sciFLEXARRAYER S11, Scienion AG, Berlin, Germany). The antibodies were spotted with one drop at each position (300 pl) onto black polymer MaxiSorb microarray slides (NUNC A/S, Roskilde, Denmark), resulting in an average amount of 1.4 fmol scFv per spot (range 0.5–4.2 fmol). Eight replicates of each scFv clone were arrayed to ensure adequate statistics.

In total, 208 antibodies and controls were printed per slide orientated in 4 × 4 subarrays with 13 × 8 (replicates) spots per subarray. For handling of the arrays, we used a protocol recently optimized [17] . Briefly, the slides were manually blocked in 5% (w/v) fat-free milk powder (Semper AB, Sundbyberg, Sweden) in PBS, and then placed in a Protein Array Work station (Perkin Elmer Life and Analytical Sciences) for automated handling. The slides were washed with 0.5% (v/v) Tween-20 in PBS. The biotinylated plasma sample was diluted 1:2 (resulting in a total serum dilution of 1:90) in 1% (w/v) fat-free milk powder and 1% (v/v) Tween-20 in PBS (PBS-MT) prior to incubation on the array. The arrays were visualized with 1 μg/ml Alexa-647 conjugated streptavidin diluted in PBS-MT. Finally, the arrays were dried under a stream of nitrogen gas and scanned with a confocal microarray scanner (ScanArray Express, Perkin Elmer Life and Analytical Sciences) at 5–10 μm resolution, using three different scanner settings. The ScanArray Express software V4.0 (Perkin Elmer Life and Analytical Sciences) was used to quantify the intensity of each spot, using the fixed circle method. The local background was subtracted, and the two highest and the two lowest replicates were automatically excluded to compensate for possible local defects, thus each data point represents the mean value of the remaining four replicates.

2.5. Data normalization

Only non-saturated spots were used for analysis of the data. Chip-to-chip normalization of the data sets was performed, using a semiglobal normalization approach [28] and [30] conceptually similar to the normalization developed for DNA microarrays. Thus, the coefficient of variation was first calculated for each analyte and ranked. Fifteen percent of the analytes that displayed the lowest coefficient of variation values over all samples were identified, corresponding to 24 analytes, and used to calculate a chip-to-chip normalization factor. The normalization factor Ni was calculated by the formula Ni = Si/μ, where Si is the sum of the signal intensities for the 24 analytes for each sample and μ is the sum of the signal intensities for the 24 analytes averaged over all samples. Each data set generated from one sample was divided with the normalization factor Ni. For the intensities, log2 values were used in the analysis.

2.6. Data analysis

The 201 samples were divided into five groups based on clinical diagnosis. In order to classify the samples, we used the support vector machine (SVM), a supervised learning method in R [31] . The supervised classification was performed using a linear kernel, and the cost of constraints was set to 1, which is the default value in the R function SVM, and no attempt was made to tune it. This absence of parameter tuning was chosen to avoid overfitting. The SVM was trained using a leave-one-out cross-validation procedure. Briefly, the training sets were generated in an iterative process in which the samples were excluded one by one. The SVM was then asked to blindly classify the left out samples as belonging to either group, and to assign a SVM decision value, which is the signed distance to the hyperplane. No filtration on the data was done before training the SVM, i.e. all antibodies used on the microarray were included in the analysis. Further, a receiver operating characteristics (ROC) curve, as constructed using the SVM decision values and the area under the curve (AUC), was calculated. AUC values were interpreted as 0.5–0.6 = poor; 0.6–0.7 = fair; 0.7–0.8 = intermediate; 0.8–0.9 = good; 0.9–1.0 = excellent. Significantly up- or down-regulated plasma proteins (p < 0.005) were identified using Wilcoxon test. In order to visualize the heterogeneity of a sample cohort, an unsupervised hierarchical clustering method was applied. Briefly, data from all the samples within each patient group were mean centered before being hierarchically clustered and visualized as heat maps using Cluster and TreeView [32] . Samples were also visualized using principle component analysis (PCA) software program (Qlucore Omics Explorer, Lund, Sweden). In order to further evaluate the cluster data, a cluster validity algorithm was designed and applied, resulting in a measure of the number of subgroups each patient dataset is composed of. To this end, the Davies–Bouldin index (DBI) was calculated, defined as the ratio between the within-cluster scatter and the between-cluster separation [33] . Hence, the lower the value of the index, the better the separation between the clusters. Each patient group was divided into different possible numbers of clusters according to its dendrogram, and the DBI was compared for each number. The number of clusters resulting in the lowest DBI value was interpreted as the most representative number of clusters in each patient dataset.

In order to identify panels of antibodies with the most discriminatory power between groups, a cross-validated backward elimination strategy was applied, as described previously [34] . Briefly, the strategy involved identifying members (antibodies) recognizing orthogonal patterns in the dataset, and removing members which did not contribute to the discriminatory power, in an iterative manner, resulting in a list with a minimal number of members which discriminate the two groups most efficiently. This biomarker signature is unlike a list that includes biomarkers only on the basis of e.g. low p values.

In addition, survival analysis was performed for patients diagnosed with DLBCL. A Kaplan–Meier plot was constructed and p-values were determined using the Log Rank test. In addition, confounding effects were checked by Cox proportional-hazards regression analysis, where age, Ann Arbor stage, nationality and sex were used as covariates.

2.7. Validation of array data

A human Th1/Th2 10-plex MSD (Meso Scale Discovery, Gaithersburg, MD, USA) assay was run in an attempt to validate the antibody microarray results (differentially expressed analytes), focusing on CLL and DLBCL. The entire patient cohort of DLBCL (n = 54) and two out of three newly discovered subgroups of CLL, CLLa (n = 11) and CLLb (n = 12), were profiled using MSD. DLBCL was chosen for validation as the samples were collected to allow prognostic analysis, and was thus an interesting group to validate. Besides this, the sample groups to be validated were best composed of at least two subgroups, in order to validate differential expression. After choosing DLBCL, CLLa and CLLb corresponded to the exact number of samples possible to analyze with the MSD assay. In addition, three patient samples were run in duplicate to assess reproducibility.

In brief, each well of the MSD 96-plate had been pre-functionalized with antibodies against IFN-γ, IL-1β, IL-2, IL-4, IL-5, IL-8, IL-10, IL-12p70, IL-13, and TNF-α in spatially distinct electrode spots. The assay was run according to the protocol provided by the manufacturer, with the exception that the sample was incubated o/n at 4 °C instead of 2 h at RT, for increased sensitivity. The electrochemiluminescence-based readout was performed in an MSD SECTOR® instrument. The limit of detection was defined as 2.5 times the standard deviation of the zero point in the standard curve.

The MSD data was then compared to the corresponding antibody microarray data, for antibodies with matching specificities which also were among the (most) differentially expressed analytes in the targeted comparisons.

3. Results

3.1. Differential protein expression profiling of B-cell lymphoma plasma proteomes

In this study, we set out to identify plasma biomarker signatures associated with B-cell lymphomas, specifically MCL, FL, CLL, and DLBCL. To this end, we performed differential protein expression profiling of 218 plasma samples ( Table 1 ) using our recombinant antibody microarray platform, mainly targeting immunoregulatory analytes ( Table 2 ). A total of 201 samples were included in the subsequent data analysis (see Supplementary Material and Methods). A representative image of an antibody microarray is shown in Fig. 1 A. The results showed that adequate spot morphologies, dynamic signal intensities, and low non-specific background binding were obtained. First, the reproducibility of the assay was assessed in terms of coefficient of determination (R2). The intra-assay reproducibility (spot-to-spot variation) was assessed by analyzing the eight replicate spots, resulting in an R2 value of 0.98 ( Fig. 1 B), while the inter-assay reproducibility (array-to-array variation) was assessed by analyzing the same sample on different arrays, giving an R2 value of 0.97 ( Fig. 1 B).


Fig. 1 Differential protein expression profiling of four B-cell lymphomas; CLL, FL, MCL, and DLBCL using recombinant antibody microarrays. (A) A representative scanned image of a recombinant antibody microarray hybridized with plasma from an MCL patient containing in total 208 probes and controls orientated in 4 × 4 subarrays. A zoomed image of a representative subarray with 8 (replicates) × 13 spots per subarray. (B) Reproducibility in terms of coefficient of determination (R2). The intra-array reproducibility (spot-to-spot variation) was based on 159 antibodies and 8 replicates. The inter-array reproducibility (array-to-array variation) was based on 2 samples analyzed on 4 independent arrays using 159 antibodies. (C) Classification (ROC AUC values) of N vs. the combined cohort of all BCLs, N vs. each individual BCL, and each individual BCL vs. each individual BCL. (D) Classification of DLBCL short survival (<1.5 years) vs. DLBCL long survival (>4 years). Differentially expressed analytes are shown (FC = fold change, green – down-regulated maximum 0.4×, red – up-regulated maximum 1.5×). (E) Classification of DLBCL short survival (<1.5 years, <1 year, or <0.5 year) vs. DLBCL long survival (>4 years). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

Subsequently, in order to investigate whether the different BCLs could be distinguished from the normal group (N) and from each other, a SVM LOO cross-validation was run using all antibodies (unfiltered data). Comparing the combined cohort of all BCLs vs. N resulted in a fair classification, as illustrated by a ROC AUC value of 0.61 ( Fig. 1 C). If instead each BCL was compared separately with N, ROC AUC values in the same range were obtained, 0.55–0.68 ( Fig. 1 C). Finally, comparing the different BCLs with each other also resulted in ROC AUC values in the poor to fair range, 0.5–0.63 ( Fig. 1 C).

Next, protein expression profiling was performed on the DLBCL samples, grouped according to survival, in an attempt to identify a molecular pattern indicating prognosis. To this end, an SVM LOO cross-validation was run using all antibodies (unfiltered data) to compare the groups; short-term survivors (deceased <1.5 years after diagnosis) vs. long-term survivors (alive >4 years after diagnosis) ( Table 1 ). As seen in Fig. 1 D, 11 analytes were found to be significantly up- or down-regulated (p < 0.01) between the two survival groups, and the classification was found to be fair (ROC AUC of 0.61). Comparing the long-term survivors with refined groups of reduced survival time, <1 year (ROC AUC 0.69) and <0.5 year (ROC AUC 0.68), slightly improved the classification ( Fig. 1 E).

In conclusion, the data showed that poor to fair classification of the different BCLs, reflecting either diagnosis or survival (only DLBCL), was obtained by plasma protein profiling. The limited power of the classification might be explained by (i) suboptimal range of candidate biomarkers targeted, (ii) too heterogeneous groups to enable discrimination, and/or (iii) that sample handling (collection) was not standardized.

3.2. Investigation of disease heterogeneity – identification of novel subgroups

In order to investigate the degree of heterogeneity of each BCL on the protein level, the recombinant antibody microarray data was analyzed by unsupervised hierarchical clustering and visualized by heat maps using Cluster and TreeView ( Fig. 2 ). The dendrograms showed large sample heterogeneity within each disease, as each BCL sample set clustered into several potential subgroups. To obtain a quantitative (unbiased) measure of how many subgroups each sample set should be interpreted as, a cluster validity algorithm involving the Davies–Bouldin index (DBI) was devised and applied. Based on the DBI analysis, the data showed that CLL was more appropriately viewed as three subgroups (denoted CLLa, CLLb and CLLc), FL as 2 (denoted FLa and FLb), MCL as 3 (denoted MCLa, MCLb and MCLc), and DLBCL as 2 (denoted DLBCLa and DLBCLb) ( Fig. 2 ). To rule out the potential influence of confounding factors, the novel subgroups were cross-checked with technical parameters (batch of labeling and day of run) as well as sample data (collection site, age, and gender), and no correlations to these parameters could be found (data not shown). Hence, the results indicated that each BCL displayed heterogeneous protein expression profiles, enabling novel subgroups to be identified. This disease heterogeneity might also explain the impaired classification of the original BCL groups observed above (cf. Fig 1 and Fig 2).


Fig. 2 Disease heterogeneity among CLL, FL, MCL, and DLBCL, visualized by unsupervised hierarchical clustering using all antibodies, i.e. unfiltered data, and corresponding heatmaps. The subgroups, denoted a (red block), b (blue blocks), and c (green blocks), were defined using a cluster validity algorithm based on the Davies–Bouldin index. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

3.3. Validation of antibody microarray data

To validate the antibody microarray results, an orthogonal method, a 10-plex cytokine assay (MSD) (Supplementary Fig. S1A) was used to profile CLLa vs. CLLb and DLBCLa vs. DLBCLb. The MSD data were compared to the corresponding antibody microarray data, in those cases where antibodies with matching specificities were found to be among the most differentially expressed analytes (p < 2.4 × 10−5 for CLLa vs. CLLb, and 3.2 × 10−4 for DLBCLa vs. DLBCLb). These included IL-5 (3), IL-8 (1) and IL-8 (3) for CLLa vs. CLLb (Supplementary Fig. S1B), and IL-5 (3), IL-8 (2) and IL-4 (4) for DLBCLa vs. DLBCLb (Supplementary Fig. S1C). In general, the MSD data was found to display a larger spread than the microarray data (Supplementary Figs. S1B and S1C). In all but one comparison (IL-5 DLBCL), the results of the MSD assay agreed well with that of the microarray assay, indicating that the data could be validated.

Furthermore, the above microarray data was also evaluated by comparing the array binding pattern for multiple antibody clones targeting the same antigen, but different epitopes. Representative data (fold changes) are shown for differentially expressed (p < 0.005) analytes (Supplementary Fig. S1D), focusing on the same ten analytes as targeted by the MSD assay. The results showed that all clones directed against the same antigen, but one (IL-4 clone 2), displayed similar pattern of up-/down-regulation, further supporting the observed patterns. That a set of antibody clones did not indicate a significant change in expression levels of the targeted marker in this comparison could be explained by differences in (i) epitope specificity (epitope masking due to labeling), (ii) affinity, and/or (iii) antibody concentration.

3.4. Further characterization of the novel subgroups

To further study the novel disease subgroups, protein expression profiling using SVM LOO cross-validation was performed to compare all the subgroups within each BCL. Comparing the novel subgroups resulted in excellent classification in all cases, as illustrated by ROC AUC values in the range of 0.94–1.0 ( Fig. 3 ), and numerous differentially expressed analytes (p < 0.005) were also identified (Supplementary Fig. S2). Thus, the results further supported the notion of significant disease heterogeneity within each BCL based on the plasma protein profile.


Fig. 3 Classification of the novel subgroups using SVM LOO cross-validation. ROC AUC values are stated, along with condensed non-redundant analyte lists composed of the analytes which distinguish the subgroups most efficiently as determined using a backward elimination strategy. (A) CLLa vs. CLLb, CLLa vs. CLLc, and CLLb vs. CLLc, (B) FLa vs. FLb, (C) DLBCLa vs. DLBCLb, and (D) MCLa vs. MCLb, MCLa vs. MCLc, and MCLb vs. MCLc.

In order to define a condensed list with those markers that contributed most to the above classifications (as opposed to the list of markers based on p-values, indicating whether the markers are differentially expressed (Supplementary Fig. S2)), a cross-validation backward elimination strategy was adopted ( Fig. 3 ). The results showed that condensed biomarker lists composed of 11–23 analytes, including a variety of proteins, such as complement proteins, T-helper (TH)1 cytokines, TH2 cytokines, chemokines, enzymes, and membrane proteins were identified. Hence, we have deciphered the first short candidate plasma protein signatures, involving a range of different types of proteins, capable of resolving the disease heterogeneity and classifying novel disease subgroups of each BCL.

Next, we investigated whether the novel subgroups could be distinguished from (i) the normal group (N) and (ii) the combined cohort of all other BCLs, by running a SVM LOO cross-validation (Supplementary Fig. S3). The comparisons resulted in intermediate to excellent classification (ROC AUC of 0.79–0.95) of one subgroup of each BCL (CLLa, DLBCLa, FLa, and MCLc) vs. N as well as the combined BCL cohort, while all the other subgroups showed poor to intermediate classification (ROC AUC of 0.57–0.76) in the corresponding comparisons. In conclusion, the data again highlighted the disease heterogeneity, based on the plasma protein level, as well as further explained the impaired classification of the original BCL groups observed above (cf. Fig. 1 C and Supplementary Fig. S3).

3.5. Prognosis of DLBCL according to subgroup

Finally, the newly identified subgroups were crosschecked with clinical data, such as staging (Ann Arbor), IGHV mutational status (only CLL), and survival in an attempt to explain the observed disease heterogeneity. While no correlation between clinical data and the subgroups of CLL, FL, and MCL could be detected (data not shown), a correlation between survival and the subgroups DLBCLa and DLBCLb was observed ( Fig. 4 ).


Fig. 4 Mapping of clinical data (survival) onto the DLBCL subgroups, DLBCLa and DLBCLb. (A) Subdivision of DLBCL patients according to unsupervised hierarchical clustering using unfiltered data and a cluster validity algorithm, onto which survival data was mapped. (B) Distribution of DLBCL patients according to SVM decision values based on unfiltered data, onto which survival data was mapped, along with a heat map showing the top 15 most differentially expressed analytes (green – downregulated, red – upregulated). (C) Distribution of DLBCL patients as visualized by a PCA component analysis using unfiltered data, onto which survival data was mapped. (D) A Kaplan–Meier plot demonstrating overall survival of the two DLBCL subgroups, DLBCLa and DLBCLb. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

In more detail, when mapping the survival data onto the DLBCL subgroups defined by hierarchical clustering, the results showed that 6 of 7 patients with short survival (<0.5 year) clustered in DLBCLa, corresponding to 33% of all DLBCLa patients ( Fig. 4 a). In fact, 78% of all DLBCLa patients were deceased within <1.5 years, compared to only 39% for the DLBCLb patients. Next, the survival data was mapped onto the plotted decision values from the SVM LOO cross validation of DLBCLa vs. DLBCLb ( Fig. 4 B). Although a different algorithm was used to display the subgroups, the same distribution of survival time vs. subgroup was observed. Furthermore, similar results were obtained when survival data was mapped onto the subgroup data, visualized using a third approach, principle component analysis (PCA) ( Fig. 4 C). Finally, a Kaplan–Meier plot was constructed correlating the patient subgroup to survival ( Fig. 4 D). A difference in survival between DLBCLa and DLBCLb was confirmed by the log-rank test (p = 0.001), thus further supporting a link between survival and the plasma protein profiles. The potential cofounding effect(s) of other factors (age, sex, Ann Arbor status, and nationality) were tested using Cox regression analysis, indicating that the subgroups constituted an independent risk factor (DLBCLa vs. DLBCLb Hazard Ratio 3.53, p = 0.004), thus further supporting the observed correlation between survival and DLBCL subgroups.

4. Discussion

B-cell lymphomas are heterogeneous diseases [3], [4], [5], [6], and [7], reflected by tumors with different genetic abnormalities, clinical features, response to treatment, and prognosis [2] . In this pilot study, we have for the first time outlined potential disease heterogeneity among CLL, DLBCL, FL, and MCL, on the plasma protein level using recombinant antibody microarrays. The results indicated that each BCL displayed heterogeneous plasma protein expression profiles, enabling novel subgroups to be identified.

In previous studies, the heterogeneity among BCLs has predominantly been studied using gene expression profiling, revealing the existence of multiple (>2) subgroups for some of the BCLs, such as in DLBCL and CLL [3], [4], [5], and [6]. In parallel studies, the heterogeneity has also been described in terms of IGHV mutational status (CLL) [5] , different cell compositions of the tumors (FL), or different morphologies and immunophenotypes, a key factor for subtyping in all BCLs [2], [7], and [9]. The correlation between these known subgroups and our observations on the protein level could, however, only be evaluated for CLL (IGHV mutational status) due to lack of detailed subtype data for the other groups, a limitation that will be addressed in future efforts. In the case of CLL, the IGHV mutational status and the observed subgroups did not correlate, indicating that the observed differences in plasma protein profiles reflected other features.

Studies addressing the plasma proteome in order to decipher BCL heterogeneity and define BCL-associated multiplex protein panels are, to the best of our knowledge, so far scarce.

Albeit being limited to targeting 66 unique proteins in 201 plasma samples, this represents one of the largest studies so far. In one parallel study, multiplex protein expression profiles of MCL tumor tissue extracts were studied using conventional antibody microarrays, targeting 7 MCL patients, and mainly high- to intermediate abundant proteins [35] . The data showed that the patients could be divided into two distinct groups, 1 vs. 6 patients, and even the 6 patients that were grouped together displayed significant differences in protein expression, thus supporting our findings of disease heterogeneity on the protein level for MCL.

Notably, the observed BCL heterogeneity could not be explained by potential confounding factors, such as technical assay parameters and basic sample parameters. In addition, we also validated the observed binding patterns (specificities) for a small set of the antibodies using an orthogonal method, again supporting the relevance of our findings. In an attempt to explain the observed disease heterogeneity, we mapped the clinical parameters at hand, including stage (all but CLL) and survival, onto the array data. One correlation was observed, linking the novel subgroups of DLBCL (DLBCLa vs. DLBCLb) with survival, indicating that the multiplexed plasma protein profile might be associated with prognosis. In accordance, the condensed 23 biomarker panel deciphered as most important for classifying DLBCLa vs. DLBCLb, determined using the backward elimination process, was found to contain markers, such as IL-10 and HLA-DR/DP ( Fig. 3 C), which have previously been indicated as prognostic markers, but only in a single marker context [36] and [37].

A biomarker can be a key member of a multiplex panel for classification, but when viewed alone, it might not be significantly (p < 0.05) differentially expressed, since such panels are designed to contain markers providing as orthogonal information as possible. In the same manner, the most differentially expressed proteins might not necessarily be included in a multiplexed panel for classification since they provide redundant information; however, they could still provide valuable biological information concerning the molecular differences between the subgroups studied. Consequently, the most differentially expressed (based on p values) plasma proteins for DLBCLa vs. DLBCLb were also deciphered. The top 15 most differentially expressed markers (p ≤ 1.65 × 10−7) for DLBCLa vs. DLBCLb were found to include proteins, such as complement proteins, chemokines, and cytokines ( Fig. 4 B). Notably, these proteins have all been implemented in the molecular pathogenesis of lymphomas [38], [39], and [40], but they have to the best of our knowledge not yet been indicated as prognostic markers for DLBCL. Although not among the top 15 differentially expressed proteins, IL-10 was identified to be de-regulated (p = 0.006), while HLA-DR/DP was not (p = 0.7), again highlighting the fact that a biomarker will provide different (biological) information depending on how and in what context it was deciphered.

Turning to the other BCLs, examination of the condensed multiplexed plasma protein signatures providing the most discriminatory power between the subgroups reflecting disease heterogeneity, revealed a complex pattern of altered levels of, for example, TH1 (e.g. IL-2, IFN-γ, and TNF-α) and TH2 (e.g. IL4, IL-5, IL-6, and IL-10) cytokines, chemokines (e.g. MCP-1, MCP-3, MCP-4, and eotaxin), and complement proteins (C3, C4, and C5). While many of these analytes have been indicated in the molecular pathogenesis of BCLs [38], [39], and [40], validation and interpretation of these differences in a biological (and clinical) context with the aim of trying to explain the observed disease heterogeneities will require additional efforts targeting large independent sets of BCL samples with full clinical documentation. This pilot study should therefore be viewed as a first step toward deciphering BCL heterogeneity on the plasma protein level, while also demonstrating the potential of multiplexed protein profiling techniques, such as affinity proteomics, in studying BCL at the molecular level. Protein-based biomarker panels are powerful, and might be more readily introduced into today’ clinical routine practice as compared with gene-based biomarker panels [8] .

Taken together, disease heterogeneity is a common problem within the field of biomarker research [41] . By targeting a selected set of a priori defined immunoregulatory analytes, we have outlined the first resolved view of BCL disease heterogeneity on the protein level. By extending the range of potential markers beyond immunoregulatory proteins in future efforts, we might be able to improve the resolution of the observed disease heterogeneity on the molecular level even further. This might help to shed further light on and explain the underlying disease biology, and thereby have direct implications for diagnosis, prognosis, as well as tailoring of therapy.

Conflict of interest

The authors declare no competing financial interests.

Role of the funding source

This research was funded by grants from VINNOVA and the Foundation of Strategic Research (Strategic Center for Translational Cancer Research – CREATE Health ( www.createhealth.lth.se )). The SCALE study sample collection phase was funded by the National Cancer Institute. SCALE Sweden was further funded by the Swedish Cancer Society (2009/659, 2012/774), the Stockholm County Council (20110209), and the Strategic Research Program in Epidemiology at Karolinska Institute.


No writing assistance was used.

Contributors: C.W., C.K.A.B., M.J., and K.E.S. designed the study. F.P. performed the experiments and F.P., M.O., and C.W. analyzed the data. All authors were involved in data interpretation. C.W. supervised the work. K.E.S., H.H., and R.R. provided clinical data. F.P. and C.W. wrote the paper. All authors participated in revising the article for important intellectual content and approved the final version of the submitted manuscript.

Appendix A. Supplementary data

The following are the supplementary data to this article:


  • [1] O. Thaunat, E. Morelon, T. Defrance. AmBvalent: anti-CD20 antibodies unravel the dual role of B cells in immunopathogenesis. Blood. 2010;116:515-521 Crossref.
  • [2] K.R. Shankland, J.O. Armitage, B.W. Hancock. Non-Hodgkin lymphoma. Lancet. 2012;380:848-857 Crossref.
  • [3] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503-511 Crossref.
  • [4] A. Rosenwald, G. Wright, W.C. Chan, J.M. Connors, E. Campo, R.I. Fisher, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002;346:1937-1947 Crossref.
  • [5] F. Fais, F. Ghiotto, S. Hashimoto, B. Sellars, A. Valetto, S.L. Allen, et al. Chronic lymphocytic leukemia B cells express restricted sets of mutated and unmutated antigen receptors. J Clin Invest. 1998;102:1515-1525 Crossref.
  • [6] R.N. Damle, T. Wasil, F. Fais, F. Ghiotto, A. Valetto, S.L. Allen, et al. IgV gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia: presented in part at the 40th Annual Meeting of The American Society of Hematology, held in Miami Beach, FL, December 4–8, 1998. Blood. 1999;94:1840-1847
  • [7] K.S.J. Elenitoba-Johnson, R.D. Gascoyne, M.S. Lim, M. Chhanabai, E.S. Jaffe, M. Raffeld. Homozygous deletions at chromosome 9p21 involving p16 and p15 are associated with histologic progression in follicle center lymphoma. Blood. 1998;91:4677-4685
  • [8] Prognostic markers in diffuse large B-cell lymphoma. Leuk Lymphoma. 2010;51:1588-1589
  • [9] A. Freedman. Follicular lymphoma: 2012 update on diagnosis and management. Am J Hematol. 2012;87:988-995 Crossref.
  • [10] J.M. Vose. Mantle cell lymphoma: 2012 update on diagnosis, risk-stratification, and clinical management. Am J Hematol. 2012;87:604-609 Crossref.
  • [11] H. Tilly, U. Vitolo, J. Walewski, M.G. da Silva, O. Shpilberg, M. André, et al. Diffuse large B-cell lymphoma (DLBCL): ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2012;23:vii78-vii82 Crossref.
  • [12] G. Dighiero, T.J. Hamblin. Chronic lymphocytic leukaemia. Lancet. 2008;371:1017-1029 Crossref.
  • [13] H.A. Idikio. Immunohistochemistry in diagnostic surgical pathology: contributions of protein life-cycle, use of evidence-based methods and data normalization on interpretation of immunohistochemical stains. Int J Clin Exp Pathol. 2009;3:169-176
  • [14] C.A.K. Borrebaeck, C. Wingren. Design of high-density antibody microarrays for disease proteomics: key technological issues. J Proteomics. 2009;72:928-935 Crossref.
  • [15] C. Wingren, A. Sandstrom, R. Segersvard, A. Carlsson, R. Andersson, M. Lohr, et al. Identification of serum biomarker signatures associated with pancreatic cancer. Cancer Res. 2012;72:2481-2490 Crossref.
  • [16] M. Melbye, K.E. Smedby, T. Lehtinen, K. Rostgaard, B. Glimelius, L. Munksgaard, et al. Atopy and risk of non-Hodgkin lymphoma. J Natl Cancer Inst. 2007;99:158-166 Crossref.
  • [17] A. Carlsson, O. Persson, J. Ingvarsson, B. Widegren, L. Salford, C.A.K. Borrebaeck, et al. Plasma proteome profiling reveals biomarker patterns associated with prognosis and therapy selection in glioblastoma multiforme patients. Proteomics Clin Appl. 2010;4:591-602 Crossref.
  • [18] E. Soderlind, L. Strandberg, P. Jirholt, N. Kobayashi, V. Alexeiva, A.-M. Aberg, et al. Recombining germline-derived CDR sequences for creating diverse single-framework antibody libraries. Nat Biotechnol. 2000;18:852-856 Crossref.
  • [19] S. Ewert, T. Huber, A. Honegger, A. Pluckthun. Biophysical properties of human antibody variable domains. J Mol Biol. 2003;325:531-553 Crossref.
  • [20] C.K. Borrebaeck, C. Wingren. U. Korf (Ed.) Protein microarrays (Humana Press, 2011) 247-262 Crossref.
  • [21] J. Ingvarsson, A. Larsson, A.G. Sjöholm, L. Truedsson, B. Jansson, C.A.K. Borrebaeck, et al. Design of recombinant antibody microarrays for serum protein profiling: targeting of complement proteins. J Proteome Res. 2007;6:3527-3536 Crossref.
  • [22] M. Kristensson, K. Olsson, J. Carlson, B. Wullt, G. Sturfelt, C.A.K. Borrebaeck, et al. Design of recombinant antibody microarrays for urinary proteomics. Proteomics Clin Appl. 2012;6:291-296 Crossref.
  • [23] C. Wingren, J. Ingvarsson, L. Dexlin, D. Szul, C.A.K. Borrebaeck. Design of recombinant antibody microarrays for complex proteome analysis: choice of sample labeling-tag and solid support. Proteomics. 2007;7:3055-3065 Crossref.
  • [24] J. Persson, M. Bäckström, H. Johansson, K. Jirström, G.C. Hansson, M. Ohlin. Molecular evolution of specific human antibody against MUC1 mucin results in improved recognition of the antigen on tumor cells. Tumor Biol. 2009;30:221-231
  • [25] E. Gustavsson, S. Ek, J. Steen, M. Kristensson, C. Älgenäs, M. Uhlén, et al. Surrogate antigens as targets for proteome-wide binder selection. New Biotechnol. 2011;28:302-311 Crossref.
  • [26] A. Carlsson, D.M. Wuttge, J. Ingvarsson, A.A. Bengtsson, G. Sturfelt, C.A.K. Borrebaeck, et al. Serum protein profiling of systemic lupus erythematosus and systemic sclerosis using recombinant antibody microarrays. Mol Cell Proteomics. 2011;10
  • [27] L. Dexlin-Mellby, A. Sandström, M. Centlow, S. Nygren, S.R. Hansson, C.A.K. Borrebaeck, et al. Tissue proteome profiling of preeclamptic placenta using recombinant antibody microarrays. Proteomics Clin Appl. 2010;4:794-807 Crossref.
  • [28] J. Ingvarsson, C. Wingren, A. Carlsson, P. Ellmark, B. Wahren, G. Engström, et al. Detection of pancreatic cancer using antibody microarray-based serum protein profiling. Proteomics. 2008;8:2211-2219 Crossref.
  • [29] F. Pauly, L. Dexlin-Mellby, S. Ek, M. Ohlin, N. Olsson, K. Jirstrom, et al. Protein expression profiling of formalin-fixed paraffin-embedded tissue using recombinant antibody microarrays. J Proteome Res. 2013;12:5943-5953 Crossref.
  • [30] A. Carlsson, C. Wingren, J. Ingvarsson, P. Ellmark, B. Baldertorp, M. Fernö, et al. Serum proteome profiling of metastatic breast cancer using recombinant antibody microarrays. Eur J Cancer. 2008;44:472-480 Crossref.
  • [31] R. Ihaka, R.R. Gentleman. A language for data analysis and graphics. J Comput Graph Stat. 1996;5:299-314
  • [32] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863-14868 Crossref.
  • [33] D.L. Davies, D.W. Bouldin. A cluster separation measure. IEEE Trans Pattern Anal Machine Intell. 1979;PAMI-1:224-227 Crossref.
  • [34] A. Carlsson, C. Wingren, M. Kristensson, C. Rose, M. Fernö, H. Olsson, et al. Molecular serum portraits in patients with primary breast cancer predict the development of distant metastases. Proc Natl Acad Sci USA. 2011;108:14252-14257 Crossref.
  • [35] I.M. Ghobrial, D.J. McCormick, S.H. Kaufmann, A.A. Leontovich, D.A. Loegering, N.T. Dai, et al. Proteomic analysis of mantle-cell lymphoma by protein microarray. Blood. 2005;105:3722-3730 Crossref.
  • [36] I.S. Lossos, D. Morgensztern. Prognostic biomarkers in diffuse large B-cell lymphoma. J Clin Oncol. 2006;24:995-1007 Crossref.
  • [37] A.M. Perry, Z. Mitrovic, W.C. Chan. Biological prognostic markers in diffuse large B-cell lymphoma. Cancer Control. 2012;19:214-226
  • [38] M.J. Rutkowski, M.E. Sughrue, A.J. Kane, S.A. Mills, A.T. Parsa. Cancer and the complement cascade. Mol Cancer Res. 2010;8:1453-1465 Crossref.
  • [39] L.M. Toney, G. Cattoretti, J.A. Graf, T. Merghoub, P.-P. Pandolfi, R. Dalla-Favera, et al. BCL-6 regulates chemokine gene transcription in macrophages. Nat Immunol. 2000;:214-220 Crossref.
  • [40] K.L. Edlefsen, O. Martínez-Maza, M.M. Madeleine, L. Magpantay, D.K. Mirick, K.J. Kopecky, et al. Cytokines in serum in relation to future non-Hodgkin lymphoma risk: evidence for associations by histologic subtype. Int J Cancer. 2014;
  • [41] G. Wallstrom, K.S. Anderson, J. LaBaer. Biomarker discovery for heterogeneous diseases. Cancer Epidemiol Biomarkers Prev. 2013;22:747-755 Crossref.


a Department of Immunotechnology, Lund University, Lund, Sweden

b CREATE Health, Lund University, Lund, Sweden

c Department of Medicine Solna, Clinical Epidemiology Unit, Karolinska Institutet, Stockholm, Sweden

d Department of Oncology, Skåne University Hospital, Lund, Sweden

e Department of Epidemiology Research, Statens Serum Institute, Copenhagen, Denmark

f Computational Biology & Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden

g Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden

lowast Corresponding author at: Department of Immunotechnology and CREATE Health, Lund University, Medicon Village, SE-22381 Lund, Sweden. Tel.: +46 46 2224323; fax: +46 46 2224200.