This article provides a comprehensive analysis of the comparative performance of various PD-L1 immunohistochemistry (IHC) assays, a critical predictive biomarker for immune checkpoint inhibitor response.
This article provides a comprehensive analysis of the comparative performance of various PD-L1 immunohistochemistry (IHC) assays, a critical predictive biomarker for immune checkpoint inhibitor response. We explore the foundational biology of the PD-1/PD-L1 axis and its clinical significance in immuno-oncology. The review details methodological aspects of major FDA-approved and laboratory-developed tests, including clones 22C3, 28-8, SP263, and SP142, alongside their respective scoring systems such as Tumor Proportion Score (TPS) and Combined Positive Score (CPS). We address key challenges including pre-analytical variables, inter-assay variability, and interpretation discrepancies, offering evidence-based optimization strategies. Finally, we synthesize validation requirements and comparative performance data from recent studies and meta-analyses, providing a robust framework for researchers and drug development professionals to navigate the complex PD-L1 testing landscape and advance biomarker-driven immunotherapy.
The programmed cell death protein 1 (PD-1) and its ligand (PD-L1) pathway represents a critical immune checkpoint that tumors exploit to evade host immune surveillance. Under physiological conditions, this pathway maintains self-tolerance and prevents excessive immune activation during inflammatory responses [1] [2]. However, cancer cells subvert this regulatory mechanism by overexpressing PD-L1, which engages PD-1 receptors on activated T cells, transmitting inhibitory signals that suppress T cell proliferation, cytokine production, and cytotoxic function [2] [3]. This interaction effectively creates an immunosuppressive tumor microenvironment (TME), allowing tumors to escape immune destructionâa process fundamental to cancer progression and metastasis [4] [2].
The significance of the PD-1/PD-L1 axis in oncology is underscored by the clinical success of immune checkpoint inhibitors (ICIs) that block this interaction. These therapies reinstate anti-tumor immunity by preventing PD-L1-mediated T cell suppression, leading to durable responses across multiple cancer types [1] [5]. Consequently, accurate assessment of PD-L1 expression has emerged as a crucial predictive biomarker for patient selection, driving the development and validation of various immunohistochemistry (IHC) assays for PD-L1 detection [2] [6]. This guide provides a comprehensive comparison of these analytical tools, examining their performance characteristics within the framework of comparative IHC assay research.
The PD-1/PD-L1 axis suppresses T cell function through precise biochemical mechanisms. Upon PD-L1 binding to PD-1, the immunoreceptor tyrosine-based switch motif (ITSM) within the PD-1 cytoplasmic domain recruits phosphatases, primarily SHP-2 (and occasionally SHP-1) [5]. These phosphatases dephosphorylate key signaling molecules downstream of the T cell receptor (TCR), including components of the PI3K/AKT/mTOR and RAS/MEK/ERK pathways [1] [5]. This signaling inhibition results in reduced T cell proliferation, diminished cytokine production (e.g., IL-2, IFN-γ, TNF-α), impaired cytolytic activity, and ultimately promotes T cell exhaustion or apoptosis [1] [2] [3].
Cancer cells achieve persistent immune suppression through several mechanisms:
The following diagram illustrates the core signaling mechanism through which PD-1/PD-L1 engagement inhibits T cell activation:
The immunosuppressive function of PD-L1 extends beyond tumor cells themselves. Myeloid-derived suppressor cells (MDSCs), regulatory T cells (Tregs), and certain dendritic cell populations within the TME also express PD-L1, contributing to the overall immune-suppressive landscape [4]. Furthermore, metabolic competition within the TMEâparticularly through aerobic glycolysis leading to lactic acid accumulationâcreates an acidic environment that directly inhibits T cell function and enhances PD-L1-mediated suppression [4]. This multifaceted immunosuppressive network highlights why PD-L1 has become such a critical therapeutic target and biomarker in immuno-oncology.
Multiple PD-L1 IHC assays have been developed as companion diagnostics for immune checkpoint inhibitors. The substantial analytical variability between these assays presents significant challenges for clinical implementation and comparative research. A comprehensive meta-analysis of 22 studies encompassing 376 assay comparisons revealed critical differences in diagnostic accuracy when assays are used interchangeably for purposes other than their originally intended clinical application [6].
Table 1: Diagnostic Accuracy of PD-L1 IHC Assays at TPS â¥1% Cut-off (All Tissue Models)
| Comparator Assay | Reference Assay | Sensitivity (95% CI) | Specificity (95% CI) | Intended Clinical Purpose |
|---|---|---|---|---|
| 28-8 PharmDx | 22C3 PharmDx | 0.85 (0.82-0.88) | 0.93 (0.91-0.95) | Nivolumab therapy |
| SP142 | 22C3 PharmDx | 0.76 (0.71-0.81) | 0.95 (0.93-0.97) | Atezolizumab therapy |
| SP263 | 22C3 PharmDx | 0.91 (0.88-0.93) | 0.93 (0.91-0.95) | Durvalumab therapy |
| 73-10 | 22C3 PharmDx | 0.97 (0.94-0.99) | 0.85 (0.81-0.89) | Research use |
Data adapted from meta-analysis by Røge et al. (2020) [6]
The 22C3 assay demonstrates particularly strong clinical utility in predicting response to combination chemoimmunotherapy. A prospective study of 70 NSCLC patients revealed that PD-L1 classification by the 22C3 assay showed superior correlation with therapeutic response compared to 28-8 and SP142 assays. Patients with TPS â¥50% by 22C3 had significantly longer progression-free survival, while the other assays showed no significant differences in objective response rate or survival [7].
PD-L1 expression is evaluated using different scoring algorithms depending on the cancer type and therapeutic context:
Table 2: Clinical Cut-off thresholds for PD-L1 IHC Assays in NSCLC
| Assay | Antibody Clone | ICI Drug | TPS â¥1% | TPS â¥50% | Scoring System |
|---|---|---|---|---|---|
| 22C3 PharmDx | 22C3 | Pembrolizumab | First-line monotherapy | First-line monotherapy | TPS |
| 28-8 PharmDx | 28-8 | Nivolumab | Second-line therapy | Not applicable | TPS |
| SP142 | SP142 | Atezolizumab | Not applicable | First-line combination | TC/IC |
| SP263 | SP263 | Durvalumab | Not applicable | First-line combination | TPS |
Data synthesized from multiple clinical guidelines [8] [2] [6]
The subjective nature of PD-L1 scoring introduces significant variability in clinical practice. A 2025 study evaluating 51 NSCLC cases scored by six pathologists demonstrated moderate interobserver agreement (Fleiss' kappa = 0.558) for TPS <1% and almost perfect agreement (Fleiss' kappa = 0.873) for TPS â¥50% [9]. Intraobserver consistency was notably higher, with Cohen's kappa ranging from 0.726 to 1.0, indicating that individual pathologists maintain consistent scoring standards over time [9]. This variability is particularly problematic for cases near critical clinical decision thresholds (TPS 1% and 50%), emphasizing the need for standardized training and quality assurance programs.
Digital pathology and artificial intelligence (AI) algorithms represent promising approaches to standardize PD-L1 scoring. However, current AI systems show inconsistent performance compared to expert pathologists. In comparative studies, AI algorithms demonstrated fair agreement (Fleiss' kappa = 0.354) for uPath software and substantial agreement (Fleiss' kappa = 0.672) for Visiopharm application at the 50% TPS cutoff when measured against median pathologist scores [9]. Notably, AI systems tend to overestimate PD-L1 positivity, particularly at lower expression thresholds, which could significantly impact patient selection for immunotherapy [9].
The following diagram outlines a typical experimental workflow for comparing pathologist and AI-based PD-L1 scoring:
Robust validation of PD-L1 IHC assays requires standardized methodologies. The following experimental protocol outlines key steps for comparative performance studies:
Sample Preparation and staining:
Digital Pathology and AI Analysis:
Statistical Analysis:
Table 3: Key Research Reagents for PD-L1 IHC Assay Development and Validation
| Reagent/Material | Specifications | Research Function | Example Products |
|---|---|---|---|
| Primary Antibodies | Clones: 22C3, 28-8, SP263, SP142 | PD-L1 epitope detection | Dako 22C3, Ventana SP263 |
| IHC Detection System | Automated platforms with optimized protocols | Signal amplification and detection | Ventana Benchmark, Dako Autostainer |
| Tissue Controls | Cell lines with known PD-L1 expression | Assay validation and quality control | Commercial control slides |
| Digital Pathology System | High-resolution slide scanners | Whole slide imaging for analysis | Ventana DP200, 3DHISTECH PANORAMIC1000 |
| AI Analysis Software | Deep learning algorithms | Automated TPS calculation | Visiopharm, uPath (Roche) |
| Statistical Packages | Diagnostic accuracy analysis | Data analysis and agreement metrics | R, SPSS, Stata |
| Etilefrine Hydrochloride | Etilefrine Hydrochloride, CAS:534-87-2, MF:C10H16ClNO2, MW:217.69 g/mol | Chemical Reagent | Bench Chemicals |
| Caesalmin E | Caesalmin E, MF:C26H36O9, MW:492.6 g/mol | Chemical Reagent | Bench Chemicals |
The evolving landscape of PD-L1 detection emphasizes several critical areas for development. First, standardization of pre-analytical factors, tissue processing, and scoring criteria remains essential for reducing inter-laboratory variability [6]. Second, integrating multi-omics approaches with PD-L1 expression data may improve predictive accuracy for immunotherapy response [2] [10]. Finally, refinement of AI algorithms through larger training datasets and validation studies is necessary to achieve performance parity with expert pathologists, particularly for borderline cases [9].
Emerging technologies like liquid biopsy for soluble PD-L1 detection and multiplexed immunofluorescence for spatial analysis of the tumor immune microenvironment represent promising complementary approaches to traditional IHC [10]. However, IHC-based PD-L1 assessment remains the cornerstone for patient selection in immune checkpoint inhibitor therapy, underscoring the continued importance of rigorous comparative performance studies across different assay platforms.
For researchers conducting comparative studies, the evidence suggests that developing properly validated laboratory-developed tests for specific clinical purposes is preferable to substituting FDA-approved companion diagnostics with assays developed for different purposes [6]. This purpose-driven approach ensures that PD-L1 detection assays maintain sufficient diagnostic accuracy (sensitivity and specificity â¥90%) for their intended clinical application, ultimately optimizing patient selection and therapeutic outcomes in immuno-oncology.
Programmed Death-Ligand 1 (PD-L1) expression has emerged as the most widely adopted predictive biomarker for patient selection in immune checkpoint inhibitor (ICI) therapy. Despite its widespread clinical implementation, PD-L1 testing presents significant challenges, including assay variability, tissue heterogeneity, and imperfect predictive accuracy [11] [12]. The correlation between PD-L1 expression and treatment response varies considerably across cancer types and testing platforms, necessitating a comprehensive understanding of comparative assay performance to optimize clinical decision-making. This guide provides an objective comparison of predominant PD-L1 immunohistochemistry (IHC) assays, detailing their technical specifications, performance characteristics, and clinical utility to inform researchers and drug development professionals.
The biological rationale for PD-L1 as a biomarker stems from its fundamental role in immune checkpoint regulation. The PD-1/PD-L1 axis constitutes a critical immunosuppressive pathway that tumors exploit to evade host immune surveillance [13]. PD-L1 expressed on tumor cells or antigen-presenting cells binds to PD-1 receptors on T-cells, transmitting an inhibitory signal that suppresses T-cell activation, proliferation, and cytokine production [13]. This interaction effectively dampens antitumor immunity, allowing cancer cells to survive and proliferate.
Immune checkpoint inhibitors targeting the PD-1/PD-L1 axis disrupt this interaction, thereby reactivating the cytotoxic potential of T-cells and restoring antitumor immune responses [14]. The expression level of PD-L1 in tumor tissue theoretically correlates with the degree of pathway dependency, making it a mechanistically plausible biomarker for predicting ICI response [12]. However, the dynamic regulation of PD-L1 expression and the complexity of the tumor immune microenvironment contribute to the limitations observed with PD-L1 as a standalone biomarker [12] [15].
Diagram 1: PD-1/PD-L1 Signaling Pathway and Inhibitor Mechanism. This diagram illustrates how PD-L1 binding to PD-1 suppresses T-cell function and how immune checkpoint inhibitors block this interaction to restore antitumor immunity. The binding of PD-L1 on tumor cells to PD-1 on T-cells transduces inhibitory signals that suppress T-cell activation. ICIs block this interaction, preventing immune suppression and restoring T-cell-mediated cancer cell killing [13].
Four major PD-L1 IHC assays have been developed as companion diagnostics for specific ICIs across various cancer indications. Understanding their analytical performance and interchangeability is crucial for both clinical practice and trial design.
Table 1: Companion Diagnostic PD-L1 Assays and Their Clinical Applications
| Assay Name | Primary Associated Drug | Key Cancer Indications | Scoring System | Market Share (2025) |
|---|---|---|---|---|
| PD-L1 IHC 22C3 pharmDx (Agilent) | Pembrolizumab | NSCLC, HNSCC, Gastric, Esophageal, Cervical | TPS, CPS | 50.4% [16] |
| PD-L1 IHC 28-8 pharmDx (Agilent) | Nivolumab | NSCLC, Malignant Melanoma | TPS | ~15% (inferred) |
| VENTANA PD-L1 (SP142) Assay (Roche) | Atezolizumab | TNBC, Urothelial, NSCLC | IC, TC | ~15% (inferred) |
| VENTANA PD-L1 (SP263) Assay (Roche) | Durvalumab | Urothelial, NSCLC | TPS, CPS | ~20% (inferred) |
Abbreviations: TPS (Tumor Proportion Score), CPS (Combined Positive Score), IC (Immune Cell), TC (Tumor Cell), NSCLC (Non-Small Cell Lung Cancer), HNSCC (Head and Neck Squamous Cell Carcinoma), TNBC (Triple-Negative Breast Cancer)
A systematic comparability study evaluating four standardized PD-L1 assays in hepatocellular carcinoma demonstrated significant differences in analytical performance [17]. The 22C3, 28-8, and SP263 assays showed comparable sensitivity in detecting PD-L1 expression, while the SP142 assay was consistently the least sensitive across multiple scoring methods [17]. Inter-assay agreement measured by intraclass correlation coefficients was 0.646 for tumor proportion score and 0.780 for combined positive score, indicating moderate to good concordance [17].
The 22C3 assay demonstrated the strongest correlation with immune-related gene mRNA signatures, closely followed by 28-8 and SP263 assays [17]. This suggests that these three assays may provide more biologically relevant measurements of the tumor immune microenvironment compared to the SP142 assay.
Table 2: Analytical Performance Comparison of PD-L1 Assays
| Performance Metric | 22C3 | 28-8 | SP263 | SP142 |
|---|---|---|---|---|
| Sensitivity in Tumor Cells | High | High | High | Low [17] |
| Inter-Rater Reliability (TPS) | Excellent (ICC: 0.946) | Excellent (ICC: 0.946) | Excellent (ICC: 0.946) | Good (ICC: 0.946) [17] |
| Inter-Rater Reliability (CPS) | Good (ICC: 0.809) | Good (ICC: 0.809) | Good (ICC: 0.809) | Lower reliability [17] |
| Correlation with Immune Gene Signatures | Strongest | Strong | Strong | Weaker [17] |
| Sample Misclassification at CPS â¥1 | Low | Low | Low | Up to 18% [17] |
ICC: Intraclass Correlation Coefficient
The typical experimental workflow for PD-L1 assay comparison studies involves several critical steps to ensure valid and reproducible results:
Sample Selection: Consecutive sections from surgically resected tumor specimens (e.g., hepatocellular carcinoma, NSCLC) are preferred to minimize tissue heterogeneity bias [17]. Sample sizes of approximately 55 patients provide sufficient statistical power for initial comparability assessments [17].
Staining Protocol: Identical tumor samples are stained with four standardized PD-L1 assays (22C3, 28-8, SP142, SP263) using automated IHC platforms according to manufacturer specifications [17]. Consistent tissue processing and handling are maintained across all assays.
Pathologist Assessment: Multiple pathologists (typically â¥5) independently evaluate PD-L1 expression using standardized scoring systems [17]. For tumor cells, the Tumor Proportion Score (TPS) is calculated as the percentage of viable tumor cells showing partial or complete membrane staining. The Combined Positive Score (CPS) accounts for both tumor cells and immune cells [14].
Statistical Analysis: Inter-assay concordance is evaluated using intraclass correlation coefficients (ICC) for continuous scores [17]. Cohen's kappa or similar metrics assess categorical agreement at clinically relevant cutoffs (e.g., 1%, 50%). Misclassification rates are calculated relative to consensus scores.
To establish biological relevance, PD-L1 protein expression levels are correlated with mRNA signatures of immune-related genes using platforms like NanoString [17]. This validation step helps determine which assays most accurately reflect the underlying tumor immune microenvironment.
While PD-L1 remains the most widely validated biomarker, its limitations have prompted development of multi-parameter assessment approaches. A systematic review of 2,490 NSCLC patients across 13 studies demonstrated that combining PD-L1 with tumor-infiltrating lymphocytes (TILs) provided superior predictive value compared to either biomarker alone [15]. The hazard ratio for improved overall survival was 0.42 (95% CI: 0.31-0.56) for the combination approach, significantly outperforming PD-L1 alone (HR: 0.81) or TILs alone (HR: 0.77) [15].
Multiplex immunohistochemistry/immunofluorescence (mIHC/IF) has emerged as a promising technology, demonstrating the highest sensitivity (0.76) among biomarker testing modalities in a network meta-analysis of 5,322 patients [18]. This approach allows simultaneous evaluation of multiple immune cell populations and their spatial relationships within the tumor microenvironment.
Table 3: Emerging Biomarker Technologies for Immunotherapy Response Prediction
| Technology Platform | Mechanism | Performance Characteristics | Advantages | Limitations |
|---|---|---|---|---|
| Multiplex IHC/IF [18] | Simultaneous detection of multiple immune markers | Sensitivity: 0.76, Specificity: 0.65 [18] | Spatial context preservation, comprehensive immune profiling | Technical complexity, standardization challenges |
| Exosomal PD-L1 [19] | Detection of PD-L1 on circulating extracellular vesicles | Correlates with systemic immunosuppression | Minimally invasive, dynamic monitoring | Standardization issues, clinical interpretation evolving |
| Tumor Mutational Burden [18] | Quantification of total mutations | Specificity: 0.90 in gastrointestinal tumors [18] | Pan-cancer applicability, objective measurement | Cost, cutoff variability across cancer types |
| Gene Expression Profiling [18] | Transcriptomic signatures of immune activation | Predictive for NSCLC | Comprehensive immune status assessment | Computational complexity, validation requirements |
| Combined PD-L1 + TMB [18] | Dual assessment of expression and mutation burden | Sensitivity: 0.89 [18] | Improved patient selection | Increased cost and tissue requirements |
Table 4: Essential Research Reagents for PD-L1 Biomarker Investigation
| Reagent Category | Specific Examples | Research Application | Technical Considerations |
|---|---|---|---|
| Primary Antibodies | 22C3, 28-8, SP142, SP263 clones [17] | PD-L1 detection by IHC | Clone-specific epitope recognition affects staining intensity and patterns |
| Automated IHC Platforms | Dako Autostainer, VENTANA BenchMark [16] | Standardized staining procedures | Platform-specific antigen retrieval and detection systems |
| Digital Pathology Systems | Whole slide scanners with AI algorithms [16] | Quantitative image analysis | Enable consistent scoring and reduce inter-observer variability |
| Multiplex Detection Kits | Multiplex IHC/IF panels [18] | Simultaneous detection of multiple immune markers | Require spectral unmixing and specialized analysis software |
| RNA Analysis Platforms | NanoString PanCancer Immune Panel [17] | Immune-related gene expression profiling | Correlates protein expression with transcriptomic signatures |
| Positive Control Tissues | Cell line arrays with known PD-L1 expression [17] | Assay validation and quality control | Ensure staining consistency across experimental batches |
PD-L1 remains an essential but imperfect predictive biomarker for immune checkpoint inhibitor response. The 22C3, 28-8, and SP263 assays demonstrate substantial analytical concordance, suggesting potential interchangeability in certain contexts, while the SP142 assay shows distinct performance characteristics [17]. Beyond single-parameter PD-L1 assessment, integrated approaches combining PD-L1 with TILs [15], multiplex immunofluorescence [18], or circulating biomarkers [19] show promise for improved patient stratification. As the PD-L1 biomarker testing market continues to expandâprojected to grow from USD 777.2 million in 2025 to USD 1,700 million by 2035 [16]âstandardization, validation, and implementation of novel technologies will be critical for advancing precision immuno-oncology. Future directions should prioritize multi-institutional validation studies and the development of clinically implementable frameworks that address both biological complexity and practical deployment challenges [11].
The advent of immune checkpoint inhibitors (ICIs) targeting the programmed cell death 1 (PD-1) and programmed death-ligand 1 (PD-L1) axis has fundamentally transformed oncology therapeutics, particularly for non-small cell lung cancer (NSCLC) and other advanced malignancies [20]. These therapies function by blocking the PD-1/PD-L1 pathway, thereby restoring the host's antitumor immunity and enabling T-cell-mediated destruction of cancer cells [21]. However, clinical benefits are not universal across all patients, creating an imperative for predictive biomarkers to identify individuals most likely to respond to treatment.
PD-L1 immunohistochemistry (IHC) has emerged as the foremost biomarker for patient selection in PD-1/PD-L1 immunotherapy [20] [22]. Its clinical implementation, however, is complicated by the development of multiple, distinct PD-L1 IHC assays, each with unique characteristics and regulatory statuses. These assays fall into two primary categories: companion diagnostics (CDx), which are essential for therapeutic decision-making as mandated by regulatory labeling, and complementary diagnostics, which provide informative data but are not strictly required for drug administration [23] [22]. This distinction carries significant implications for clinical practice, clinical trial design, and laboratory operations.
This guide provides an objective comparison of the performance characteristics of approved PD-L1 assays, detailing their analytical protocols, clinical validation data, and appropriate applications within the framework of precision immuno-oncology.
Within the context of PD-1/PD-L1 inhibitors, the regulatory classification of a diagnostic assay directly reflects its role in clinical decision-making:
Table 1: FDA-Approved PD-L1 Immunohistochemistry Assays and Their Status
| Assay (Clone) | Platform | Primary Associated Drug(s) | Diagnostic Status | Example Indication(s) |
|---|---|---|---|---|
| PD-L1 IHC 22C3 pharmDx | Dako | Pembrolizumab | Companion Diagnostic | NSCLC (TPS â¥1%), HNSCC (CPS â¥1) |
| PD-L1 IHC 28-8 pharmDx | Dako | Nivolumab | Complementary Diagnostic | NSCLC, Melanoma |
| VENTANA PD-L1 (SP142) | Ventana | Atezolizumab | Complementary Diagnostic | NSCLC (TC â¥50% or IC â¥10%), TNBC |
| VENTANA PD-L1 (SP263) | Ventana | Durvalumab | Complementary Diagnostic | Urothelial Carcinoma |
The clinical application of PD-L1 assays is governed by specific scoring systems and cutoffs established in pivotal clinical trials. These scoring methods are not uniform, adding a layer of complexity to their interpretation.
The landscape of approvals is dynamic. For instance, the FDA has expanded the approval of drugs like GSK's Jemperli (dostarlimab) based on new clinical data, while occasionally rescinding previous approvals, such as the withdrawal of atezolizumab's accelerated approval for triple-negative breast cancer (TNBC) [21] [24].
A critical question in pathology is whether the various FDA-approved PD-L1 assays are analytically equivalent and thus potentially interchangeable. Multiple studies have investigated the concordance between these assays, with findings indicating that performance is highly dependent on the clinical context and the specific clones being compared.
A landmark meta-analysis of diagnostic accuracy published in Modern Pathology evaluated interchangeability based on sensitivity and specificity for specific clinical purposes. The analysis, which included 376 assay comparisons from 22 studies, concluded that replacing an FDA-approved CDx with another assay developed for a different purpose is not advisable without proper validation. For a laboratory to substitute an approved assay, it is preferable to develop and validate a Laboratory Developed Test (LDT) for the same specific clinical purpose [6].
Recent evidence from a 2025 study in Scientific Reports focusing on clear cell renal cell carcinoma (ccRCC) underscores the challenges in concordance. This study evaluated four FDA-approved assays (22C3, 28-8, SP142, SP263) and found significant disparities, particularly with the SP142 assay, which showed remarkably lower PD-L1 positivity in immune cells (2.1%) compared to the others (~15%) [25]. The 28-8 assay demonstrated the highest pairwise concordance with other assays, while the SP142 assay was deemed an outlier.
Table 2: Comparative Diagnostic Performance from Key Studies
| Assay (Clone) | Reported Sensitivity (Range) | Reported Specificity (Range) | Key Concordance Findings | Notable Limitations |
|---|---|---|---|---|
| 22C3 | Varies by study and cutoff | Varies by study and cutoff | High concordance with 28-8 and SP263 in NSCLC; κ: 0.52 with 28-8 (IC) in ccRCC [25] [6] | Gold standard for pembrolizumab, limiting cross-assay use |
| 28-8 | Varies by study and cutoff | Varies by study and cutoff | Shows highest agreement with other assays in ccRCC; moderate concordance with 22C3 & SP263 [25] [6] | Lower positivity in some tumor types (e.g., ccRCC) |
| SP142 | Varies by study and cutoff | Varies by study and cutoff | Consistently lower positivity rates; poor concordance with other assays (κ: 0.16 with 28-8 in ccRCC) [25] | Unique scoring algorithm focusing on IC; high inter-assay variability |
| SP263 | Varies by study and cutoff | Varies by study and cutoff | Good concordance with 22C3 and 28-8 in NSCLC and ccRCC [6] | Can show higher positivity than SP142 |
| mIHC/IF | 0.76 (95% CI: 0.57-0.89) [26] | Lower than MSI | Superior sensitivity in network meta-analysis; high predictive efficacy for NSCLC [26] | Complex methodology, not yet standardized for clinical use |
| MSI | Lower than mIHC/IF | 0.90 (95% CI: 0.85-0.94) [26] | Highest specificity and Diagnostic Odds Ratio (DOR: 6.79); highly efficacious in GI tumors [26] | Limited to tumors with MMR deficiency |
Beyond analytical concordance, the ultimate value of a biomarker lies in its ability to predict patient response to therapy. A network meta-analysis (NMA) published in Frontiers in Immunology in 2023 compared the predictive value of various biomarkers for anti-PD-1/PD-L1 monotherapy across 49 studies [26].
This analysis revealed that multiplex immunohistochemistry/immunofluorescence (mIHC/IF) exhibited the highest sensitivity (0.76) for predicting response, suggesting it is highly effective at identifying patients who will benefit from treatment. In contrast, microsatellite instability (MSI) testing showed the highest specificity (0.90) and a high diagnostic odds ratio (6.79), making it excellent at ruling in response, particularly in gastrointestinal tumors [26].
The analysis also highlighted that combined biomarker approaches, such as PD-L1 IHC combined with tumor mutational burden (TMB), could significantly improve predictive sensitivity (0.89), underscoring the potential of multi-analyte strategies to outperform single-analyte tests like PD-L1 IHC alone [26].
This protocol is based on the methodology used in the 2025 ccRCC concordance study [25].
Objective: To evaluate the diagnostic concordance of four FDA-approved PD-L1 IHC assays (22C3, 28-8, SP142, SP263) in clear cell renal cell carcinoma.
Materials and Reagents:
Methodology:
This protocol is derived from the 2022 study validating the E1L3N antibody against the 22C3 companion diagnostic [27].
Objective: To assess the concordance and predictive value of the E1L3N antibody compared to the FDA-approved 22C3 assay in predicting pembrolizumab response in NSCLC.
Materials and Reagents:
Methodology:
The field of PD-L1 detection is rapidly evolving beyond traditional IHC to address its limitations, such as tumor heterogeneity and the dynamic nature of PD-L1 expression.
Table 3: Key Reagents and Materials for PD-L1 Assay Research
| Item / Reagent | Function / Application | Example Specification / Notes |
|---|---|---|
| FFPE Tissue Sections | Substrate for IHC staining; preserves tissue morphology and antigenicity. | 4-5 μm thickness; use of TMAs allows high-throughput, standardized analysis [25] [27]. |
| FDA-Approved Antibody Clones (22C3, 28-8, SP142, SP263) | Primary antibodies for specific detection of PD-L1 in IHC. | Each clone is validated for a specific automated platform (Dako or Ventana); protocol deviations are not permitted [25] [6]. |
| Alternative Antibodies (e.g., E1L3N) | Research-use-only (RUO) antibodies for assay development and validation. | Must be rigorously benchmarked against a clinical-grade assay for concordance and predictive value [27]. |
| Automated IHC Staining Systems | Ensure standardized, reproducible staining conditions. | Dako Autostainer Link 48 (for 22C3/28-8); Ventana Benchmark series (for SP142/SP263); Leica BOND-MAX [25] [27]. |
| mIHC/IF Staining Kits | Enable simultaneous detection of multiple markers on a single tissue section. | Require specialized imaging systems (e.g., multispectral scanners) and advanced bioinformatics for data analysis [26]. |
| RNA Sequencing Kits | Profiling gene expression and identifying structural variants of PD-L1. | Used to explore mechanisms behind discordant IHC results, such as PD-L1 3'-UTR disruption [25]. |
| 2-Acetamido-3-(methylcarbamoylsulfanyl)propanoic acid | 2-Acetamido-3-(methylcarbamoylsulfanyl)propanoic acid, CAS:103974-29-4, MF:C7H12N2O4S, MW:220.25 g/mol | Chemical Reagent |
| (S)-Lercanidipine Hydrochloride | (S)-Lercanidipine Hydrochloride, CAS:184866-29-3, MF:C36H42ClN3O6, MW:648.2 g/mol | Chemical Reagent |
The following diagram illustrates the logical workflow for selecting and interpreting PD-L1 assays in clinical research, integrating key decision points regarding assay type, scoring systems, and biomarker combinations.
The landscape of predictive biomarkers for immune checkpoint inhibitor (ICI) therapy has expanded significantly beyond PD-L1 immunohistochemistry (IHC). This comparison guide objectively evaluates the performance characteristics, clinical utility, and technical considerations of PD-L1 IHC against other established biomarkers: tumor mutational burden (TMB), microsatellite instability (MSI), and mismatch repair deficiency (dMMR). We synthesize experimental data from recent network meta-analyses and clinical studies, providing structured comparisons of sensitivity, specificity, and predictive value across solid tumors. The analysis reveals that while each biomarker has distinct strengths, multiplex IHC/immunofluorescence (mIHC/IF) demonstrates superior predictive performance, and combined biomarker approaches significantly enhance prediction accuracy for anti-PD-1/PD-L1 therapy response.
The development of immune checkpoint inhibitors has revolutionized cancer treatment, creating an urgent need for reliable predictive biomarkers to identify patients most likely to respond. PD-L1 expression detected via immunohistochemistry was the first FDA-approved companion diagnostic for PD-1/PD-L1 checkpoint inhibitors but has limitations regarding variable predictive value across tumor types and methodological standardization issues [26] [28]. This has driven the exploration and validation of additional biomarkers, including tumor mutational burden (TMB), microsatellite instability (MSI), and mismatch repair deficiency (dMMR).
These biomarkers reflect different aspects of tumor immunobiology. PD-L1 IHC measures protein expression of an immune checkpoint ligand in the tumor microenvironment. TMB quantifies the total number of mutations in the tumor genome, theorized to increase neoantigen production and immunogenicity. MSI and dMMR represent functional deficits in DNA repair machinery that lead to hypermutation. Understanding their comparative performance characteristics, technical requirements, and clinical applications is essential for optimizing treatment selection and advancing personalized cancer immunotherapy [26] [29].
This review provides a comprehensive comparison of these biomarkers within the broader context of comparative performance of immunohistochemistry assays for PD-L1 detection research, presenting structured experimental data and methodological protocols to guide researchers and drug development professionals.
Programmed Death-Ligand 1 (PD-L1) is an immune checkpoint protein expressed on tumor cells and various immune cells. Its interaction with PD-1 receptors on T cells inhibits T-cell activation, enabling tumor immune evasion. PD-L1 IHC detects the presence of this protein in tumor tissue, with various scoring systems including Tumor Proportion Score (TPS) and Combined Positive Score (CPS) [28]. The biological rationale suggests tumors expressing PD-L1 may be more dependent on this pathway for immune escape and thus more susceptible to PD-1/PD-L1 blockade.
TMB measures the total number of somatic mutations per megabase of interrogated genomic sequence. High TMB is hypothesized to increase neoantigen formation, enhancing tumor immunogenicity and T-cell recognition. When checkpoint inhibition is applied, these highly mutated tumors may generate more robust anti-tumor immune responses [29]. TMB is typically assessed using next-generation sequencing (NGS) panels or whole-exome sequencing.
dMMR results from functional defects in DNA mismatch repair proteins (MLH1, MSH2, MSH6, PMS2), typically detected by IHC showing loss of protein expression. MSI represents the phenotypic consequence of dMMRâwidespread insertion/deletion mutations at microsatellite regions throughout the genome, detected by PCR or NGS [30] [31]. This hypermutated state generates abundant frameshift-derived neoantigens, creating highly immunogenic tumors particularly responsive to immune checkpoint blockade [30].
The following diagram illustrates the biological relationships between these biomarkers and their connection to immunotherapy response:
Diagram 1: Biological relationships between predictive biomarkers and their detection methods. This map illustrates how dMMR drives MSI and high TMB, leading to neoantigen generation and T-cell infiltration, which subsequently induces PD-L1 expression as an immune evasion mechanism. The dotted lines indicate detection methodologies for each biomarker.
A recent network meta-analysis (NMA) comparing different predictive biomarker testing assays for PD-1/PD-L1 checkpoint inhibitors provides comprehensive performance data across 49 studies covering 5,322 patients [26] [18]. The analysis evaluated seven biomarker approaches: PD-L1 IHC, TMB, gene expression profiling (GEP), MSI, multiplex IHC/immunofluorescence (mIHC/IF), other IHC/hematoxylin-eosin staining, and combined assays.
Table 1: Diagnostic accuracy of predictive biomarkers for anti-PD-1/PD-L1 therapy response
| Biomarker | Sensitivity (95% CI) | Specificity (95% CI) | Diagnostic Odds Ratio (95% CI) | Superiority Index |
|---|---|---|---|---|
| mIHC/IF | 0.76 (0.57-0.89) | 0.67 (0.47-0.82) | 5.09 (1.35-13.90) | 2.86 |
| MSI | 0.44 (0.30-0.60) | 0.90 (0.85-0.94) | 6.79 (3.48-11.91) | 2.59 |
| PD-L1 IHC | 0.54 (0.45-0.62) | 0.76 (0.68-0.83) | 3.83 (2.56-5.56) | 1.98 |
| TMB | 0.45 (0.35-0.56) | 0.77 (0.68-0.84) | 2.71 (1.69-4.17) | 1.56 |
| GEP | 0.63 (0.46-0.77) | 0.65 (0.47-0.79) | 3.21 (1.26-7.14) | 1.82 |
| Combined PD-L1 IHC + TMB | 0.89 (0.82-0.94) | 0.53 (0.42-0.64) | 7.94 (4.20-14.49) | 3.52 |
The data reveal that mIHC/IF exhibited the highest sensitivity (0.76), while MSI showed the highest specificity (0.90) and diagnostic odds ratio (6.79). Combined PD-L1 IHC with TMB demonstrated markedly improved sensitivity (0.89) compared to either biomarker alone [26].
Biomarker performance varies significantly across cancer types, reflecting differences in tumor immunobiology and oncogenic drivers.
Table 2: Biomarker performance across different tumor types
| Tumor Type | Optimal Biomarker(s) | Key Findings | Supporting Evidence |
|---|---|---|---|
| Non-small cell lung cancer (NSCLC) | mIHC/IF, Other IHC&HE | mIHC/IF demonstrated high predictive efficacy | [26] |
| Gastrointestinal tumors | PD-L1 IHC, MSI | MSI shows high specificity (0.90) and DOR (6.79) | [26] |
| Colorectal cancer | dMMR/MSI | High prevalence of dMMR (8.7-26.8%) and MSI (8.5-21.9%) | [32] |
| Endometrial cancer | dMMR/MSI, TMB | High prevalence of dMMR (8.7-26.8%) and MSI (8.5-21.9%) | [32] |
| Esophageal squamous cell carcinoma | PD-L1, TMB | 54% PD-L1+, 57% TMB-H, but only 1% MSI-H | [33] |
| Anal squamous cell carcinoma | PD-L1 | 64.25% expressed PD-L1; PD-L1-high associated with longer treatment duration | [34] |
| Cervical, Bladder/Urothelial, Lung, Skin cancers | TMB | Low dMMR/MSI prevalence (<5%) but high TMB-H (23.7-52.6%) | [32] |
A comprehensive scoping review and meta-analysis of 3890 papers provides population-level prevalence data for these biomarkers [32]:
Table 3: Pan-cancer prevalence of predictive biomarkers
| Biomarker | Pooled Overall Prevalence | High Prevalence Cancers | Low Prevalence Cancers |
|---|---|---|---|
| dMMR | 2.9% | Endometrial (8.7-26.8%), Colorectal (8.7-26.8%), Small Bowel (8.7-26.8%), Gastric (8.7-26.8%) | Cervical, Esophageal, Bladder/Urothelial, Lung, Skin (<5%) |
| MSI | 2.7% | Endometrial (8.5-21.9%), Colorectal (8.5-21.9%), Small Bowel (8.5-21.9%), Gastric (8.5-21.9%) | Cervical, Esophageal, Bladder/Urothelial, Lung, Skin (<5%) |
| High TMB (â¥10 mut/Mb) | 14.0% | Cervical (23.7-52.6%), Esophageal (23.7-52.6%), Bladder/Urothelial (23.7-52.6%), Lung (23.7-52.6%), Skin (23.7-52.6%) | Other cancer types (generally <5%) |
Standard Protocol:
NGS-Based Approaches:
Dual-Modality Approach:
The following diagram illustrates the typical testing workflow for these biomarkers:
Diagram 2: Biomarker testing workflow from tumor sample to result interpretation. This flowchart illustrates the parallel and sometimes overlapping methodologies for different biomarkers, highlighting how IHC, NGS, and PCR approaches generate complementary predictive information from tumor samples.
Despite theoretical equivalence, significant discrepancies exist between dMMR IHC and MSI PCR results. A large comparative study of 703 cases found a 19.3% overall discrepancy rate, with particularly high rates (60.9%) in dMMR versus MSI-high comparisons [31]. This discordance appears independent of tumor types and not fully explained by technical factors like tumor percentage.
Potential contributors to discordance include:
Multiple challenges complicate TMB measurement standardization:
The coefficient of variation of TMB estimation decreases inversely with both the square root of panel size and TMB levelâhalving the CV requires a four-fold increase in panel size [29].
PD-L1 testing faces several methodological challenges:
Table 4: Key research reagent solutions for predictive biomarker analysis
| Category | Specific Products/Platforms | Application | Key Features |
|---|---|---|---|
| IHC Platforms | Ventana BenchMark ULTRA | Automated PD-L1 and MMR protein staining | Standardized staining with FDA-approved protocols |
| PD-L1 Antibody Clones | SP263, 22C3, SP142, 28-8 | PD-L1 protein detection | Companion diagnostics for specific therapeutics |
| MMR IHC Antibodies | MLH1 (M1), MSH2 (G219-1129), MSH6 (SP93), PMS2 (A16-4) | dMMR detection | Ventana ready-to-use monoclonal antibodies |
| NGS Panels | FoundationOne CDx, MSK-IMPACT, TSO500 | TMB measurement, MSI detection | Comprehensive genomic profiling with validated TMB calculation |
| MSI PCR Kits | Promega MSI Analysis System Version 1.2 | MSI status determination | Five mononucleotide markers with pentanucleotide controls |
| DNA Extraction Kits | Qiagen AllPrep DNA/RNA FFPE Kit | Nucleic acid isolation from FFPE | Simultaneous DNA/RNA extraction from challenging samples |
| Analysis Software | quanTIseq, Immune Cell Abundance | Tumor microenvironment quantification | Computational deconvolution of immune cell populations |
| Propiverine Hydrochloride | Propiverine Hydrochloride, CAS:54556-98-8, MF:C23H30ClNO3, MW:403.9 g/mol | Chemical Reagent | Bench Chemicals |
| Rosiglitazone-d3 | Rosiglitazone-d3 | Stable Isotope | For Research Use | Rosiglitazone-d3, a deuterated internal standard. Essential for accurate LC-MS/MS quantification in metabolism studies. For Research Use Only. Not for human use. | Bench Chemicals |
This comprehensive comparison reveals that each predictive biomarker for ICI response has distinct strengths and limitations. PD-L1 IHC provides direct measurement of the therapeutic target but suffers from heterogeneity and dynamic regulation. TMB offers a quantitative measure of tumor immunogenicity with pan-cancer applicability but requires standardization. dMMR/MSI identifies a biologically distinct tumor subset with exceptional response rates but limited prevalence across cancers.
The emerging paradigm favors integrated biomarker approaches rather than reliance on single markers. Combined PD-L1 IHC with TMB significantly enhances sensitivity [26], while multiplex IHC/IF technologies provide spatial context that improves predictive power. Future directions should focus on standardizing measurement approaches, validating combinatorial biomarker algorithms, and developing novel methodologies that capture the complexity of tumor-immune interactions across diverse cancer types.
Programmed Death-Ligand 1 (PD-L1) expression testing via immunohistochemistry (IHC) serves as a critical predictive biomarker for immune checkpoint inhibitor (ICI) therapy in non-small cell lung cancer (NSCLC). However, substantial variability in PD-L1 testing methodologies and interpretation significantly impacts patient selection and subsequent treatment outcomes. This variability stems from multiple factors including pre-analytical conditions, choice of IHC assays, pathologist interpretation, and tumor biological characteristics. Understanding these sources of variability and their clinical consequences is essential for optimizing personalized immunotherapy approaches. This guide systematically compares the performance of different PD-L1 assessment methods, evaluates their impact on treatment decisions, and provides evidence-based recommendations for reducing variability in clinical practice.
The PD-1/PD-L1 pathway represents a critical immune checkpoint mechanism that tumors exploit to evade host immune surveillance. PD-L1, expressed on tumor cells and resident immune cells, binds to PD-1 receptors on activated T-cells, thereby inhibiting T-cell receptor signaling and suppressing cytotoxic T-cell function [35]. This interaction leads to reduced T-cell proliferation, decreased cytokine production, and diminished cytotoxic activity, ultimately facilitating tumor immune escape [35]. Immune checkpoint inhibitors targeting this pathwayâincluding anti-PD-1 and anti-PD-L1 antibodiesâblock this interaction, restoring antitumor immunity and enabling T-cell-mediated tumor cell killing [36].
The following diagram illustrates the PD-1/PD-L1 signaling pathway and mechanism of immune checkpoint inhibition:
Diagram 1: PD-1/PD-L1 signaling pathway and immune checkpoint inhibition mechanism. Tumor cells express PD-L1 which binds to PD-1 on T-cells, leading to T-cell inhibition. Immune checkpoint inhibitors block this interaction, restoring T-cell function.
Multiple technical factors contribute to variability in PD-L1 testing results. Pre-analytical conditions including specimen age significantly impact PD-L1 detectability, with longer storage times associated with reduced detection rates [37]. A comprehensive meta-analysis of 92 studies demonstrated that PD-L1 detectability declines with increasing specimen age, while consistency improves when data are pooled from multiple laboratories [37]. Additionally, different IHC assays utilizing various antibody clones (e.g., 22C3, SP263, 28-8, SP142) demonstrate varying sensitivities and specificities, leading to interpretation discrepancies particularly at lower expression thresholds [38] [37].
Biological characteristics of tumors introduce significant variability in PD-L1 assessment. Tumor heterogeneityâboth spatial and temporalârepresents a major challenge, with studies demonstrating only approximately 30% concordance in PD-L1 expression between paired primary tumors and metastatic lymph nodes [39]. Intrapatient variation in PD-L1 expression can be substantial, with major increases (ÎTPS ⥠+50%) and decreases (ÎTPS ⤠-50%) observed in 9.7% and 8.0% of cases, respectively [40]. Furthermore, intervening ICI therapy is associated with decreased PD-L1 expression, while acquired copy number losses of CD274, PDCD1LG2, and JAK2 genes are strongly associated with major decreases in PD-L1 expression [40].
Subjective interpretation of PD-L1 expression represents a significant source of variability. Studies demonstrate moderate interobserver agreement among pathologists at the TPS <1% cutoff (Fleiss' kappa 0.558) and almost perfect agreement at TPS â¥50% (Fleiss' kappa 0.873) [9]. Intraobserver consistency is generally higher, with Cohen's kappa ranging from 0.726 to 1.0 [9]. This variability has direct clinical implications, as patients with multiple PD-L1 assessments before ICI therapy showing all samples with PD-L1 â¥1% achieved improved objective response rates and progression-free survival compared to cases with discordant results (at least one sample with PD-L1 <1% and another with PD-L1 â¥1%) [40].
Table 1: Key Factors Contributing to PD-L1 Testing Variability
| Factor Category | Specific Factors | Impact on Variability | Clinical Consequences |
|---|---|---|---|
| Pre-analytical | Specimen age, fixation methods, specimen type (biopsy vs. resection) | Reduced PD-L1 detectability with longer storage; concordance issues between sample types | False-negative results leading to potential exclusion from beneficial therapy |
| Analytical | IHC assay platform, antibody clone (22C3, SP263, 28-8, SP142), staining platforms | Different sensitivities and specificities; inter-assay discordance particularly at low expression levels | Inaccurate patient stratification and potential treatment assignment errors |
| Tumor Biological | Spatial heterogeneity, temporal changes, genomic alterations (CD274, JAK2) | Major expression changes (ÎTPS ⥠±50%) in ~18% of cases; heterogeneity-driven sampling errors | Discordant treatment responses; acquired resistance mechanisms |
| Interpretation | Pathologist experience, interobserver variability, scoring criteria | Moderate agreement at TPS <1% (kappa 0.558); better agreement at TPS â¥50% (kappa 0.873) | Inconsistent treatment thresholds affecting patient selection |
The emergence of artificial intelligence (AI) algorithms for PD-L1 scoring presents both opportunities and challenges for standardizing assessment. When comparing pathologists to AI algorithms, pathologists demonstrate higher consistency at critical Tumor Proportion Score (TPS) cutoffs in NSCLC [9]. In a comparative study of 51 NSCLC cases, pathologists showed moderate interobserver agreement for TPS <1% (Fleiss' kappa 0.558) and almost perfect agreement for TPS â¥50% (Fleiss' kappa 0.873) [9]. Intraobserver consistency was high, with Cohen's kappa ranging from 0.726 to 1.0 [9].
Comparisons between AI algorithms and median pathologist scores showed fair agreement for uPath software (Fleiss' kappa 0.354) and substantial agreement for the Visiopharm application (Fleiss' kappa 0.672) at the 50% TPS cutoff [9]. These results indicate that while AI tools show promise, they currently cannot fully replace expert human evaluation, particularly in critical clinical decision-making contexts requiring refinement to match pathologist reliability [9].
Quantitative continuous scoring (QCS) approaches represent an innovative alternative to traditional semi-quantitative assessment. PD-L1 QCS utilizes computer vision systems for granular cell-level quantification of PD-L1 staining intensity in digitized whole slide images [41]. This methodology captures the percentage of tumor cells with medium to strong staining intensity (PD-L1 QCS-PMSTC) and classifies patients with â¥0.575% as biomarker-positive [41].
In the MYSTIC trial, visual PD-L1 scoring (TPS â¥50%) resulted in a hazard ratio of 0.69 (CI 0.46-1.02) with a 29.7% prevalence of biomarker-positive patients for durvalumab versus chemotherapy [41]. With PD-L1 QCS-PMSTC, a similar hazard ratio of 0.62 (CI 0.46-0.82) was obtained with an increased prevalence of 54.3% [41]. This demonstrates that quantitative approaches can identify broader patient populations who may benefit from ICI therapy while maintaining similar treatment effects.
Liquid biopsy approaches utilizing circulating tumor cells (CTCs) offer an alternative to tissue-based PD-L1 assessment that captures tumor heterogeneity across multiple metastatic sites. Quantitative microscopic evaluation of PD-L1 and HLA I expression on CTCs from NSCLC patients demonstrates heterogeneity in expression patterns and shows promising clinical value in predicting progression-free survival in response to PD-L1 targeted therapies [39].
The analytical validation of exclusion-based sample preparation technology for CTC analysis demonstrates high precision and accuracy, confirming compatibility for clinical laboratory implementation [39]. This approach addresses spatial and temporal heterogeneity limitations of tissue biopsies and enables serial monitoring of PD-L1 expression dynamics during treatment.
Table 2: Comparison of PD-L1 Assessment Method Performance
| Assessment Method | Agreement/Performance Metrics | Advantages | Limitations |
|---|---|---|---|
| Pathologist Visual Scoring | Interobserver: TPS <1% (kappa 0.558), TPS â¥50% (kappa 0.873)Intraobserver: kappa 0.726-1.0 [9] | Clinical standard; expert interpretation; handles complex morphology | Subjectivity; moderate agreement at low expression levels; fatigue |
| AI Algorithm (uPath) | Fair agreement with pathologists (kappa 0.354) at TPS â¥50% [9] | Quantitative; rapid processing; reduces labor intensiveness | Lower agreement than pathologists; requires manual tumor area selection |
| AI Algorithm (Visiopharm) | Substantial agreement with pathologists (kappa 0.672) at TPS â¥50% [9] | Better agreement profile; automated analysis potential | Still requires refinement for clinical decision-making |
| Quantitative Continuous Scoring (QCS) | HR 0.62 vs chemotherapy; identifies 54.3% as biomarker-positive vs 29.7% with visual scoring [41] | Continuous scoring; identifies more potential responders; granular intensity measurement | Computational complexity; requires validation across platforms |
| CTC-Based Analysis | Heterogeneous expression patterns; predicts PFS to PD-L1 therapy [39] | Captures spatial heterogeneity; enables serial monitoring; minimally invasive | Technical challenges in rare cell capture; not yet standardized |
PD-L1 testing variability directly influences patient selection for ICI therapy and subsequent treatment patterns. Real-world evidence from 507 patients with metastatic NSCLC demonstrated increasing PD-L1 testing rates from 86% in 2017 to 100% in 2020, reflecting growing recognition of its clinical importance [42]. However, treatment selection varied significantly based on PD-L1 expression levels and histomolecular subtypes.
In patients with nonsquamous NSCLC without actionable genomic alterations, ICI-chemotherapy combinations were the most common first-line regimens except in the PD-L1 â¥50% category, where ICI monotherapy was most frequently administered [42]. Use of chemotherapy decreased while ICI-chemotherapy combinations increased from 2017 to 2020 across all histomolecular groups [42]. These patterns demonstrate how PD-L1 expression levels directly guide therapeutic decisions, with testing variability potentially leading to substantial deviations from optimal treatment pathways.
Testing variability ultimately impacts clinical outcomes, including overall survival (OS). For all patients with metastatic NSCLC in the real-world study, median OS was 25.0 months (95% CI, 19.1-28.3), with significant variation by histomolecular cohort: 14.3 months for squamous NSCLC, 25.3 months for nonsquamous NSCLC with no actionable genomic alteration, not reached for KRAS G12C-mutated NSCLC, and 27.7 months for nonsquamous NSCLC with other genomic alterations [42].
The clinical consequence of PD-L1 expression variation is further highlighted by studies showing that among patients with multiple PD-L1 assessments before ICI, cases where all samples had PD-L1 â¥1% achieved improved objective response rate and progression-free survival compared to cases with discordant results [40]. Additionally, when the most proximal sample before ICI therapy showed PD-L1 â¥1%, patients had longer median PFS compared to cases where the most proximal sample was PD-L1 <1% [40]. This underscores the critical impact of temporal testing variability on treatment outcomes.
The following protocol represents a standardized approach for tissue-based PD-L1 immunohistochemical analysis:
Sample Preparation: Use freshly cut 4-μm-thick formalin-fixed paraffin-embedded (FFPE) sections from tissue specimens containing at least 100 tumor cells [9]. Ensure appropriate fixation times (6-72 hours in 10% neutral buffered formalin) to prevent antigen degradation.
Staining Procedure: Apply validated PD-L1 antibody clones (e.g., SP263, 22C3, 28-8) according to manufacturer protocols on automated staining platforms such as the BenchMark ULTRA [9]. Include appropriate positive and negative controls with each staining run.
Scoring Methodology: Evaluate PD-L1 staining only on tumor cells, considering any intensity of either partial or complete membranous staining as positive [9]. Record the percentage of positively stained tumor cells as follows: 0%, 1%, 5%, 10%, and up to 100% in 10% increments [9]. For digital pathology, scan slides with resolution of at least 0.25 μm/pixel on slide scanners such as PANORAMIC1000 or Ventana DP200 [9].
Quality Assurance: Implement regular proficiency testing and laboratory comparison programs to minimize inter-laboratory variability. Adhere to CAP-PLQC guidelines for validation and ongoing quality control.
For quantitative continuous scoring of PD-L1 expression:
Image Acquisition: Scan PD-L1-stained slides at high resolution (minimum 0.25 μm/pixel) using whole slide scanners. Ensure uniform focus and illumination across entire slide [41].
Image Analysis: Apply computer vision systems for granular cell-level quantification of PD-L1 staining intensity. Define positive cells as having PD-L1 membrane staining intensity â¥40 (on a 0-255 scale) [41]. Calculate the percentage of tumor cells meeting this intensity threshold.
Biomarker Classification: Classify samples as biomarker-positive where >0.575% of tumor cells demonstrate medium to strong staining intensity (PD-L1 QCS-PMSTC) [41]. This threshold optimizes identification of patients likely to benefit from ICI therapy.
Validation: Compare QCS results with pathologist-derived tumor proportion scores at â¥1% and â¥50% cutoffs to ensure concordance. Validate against clinical outcomes from relevant trials when possible.
The following workflow diagram illustrates the standardized protocol for PD-L1 assessment:
Diagram 2: Standardized workflow for PD-L1 immunohistochemical analysis. The process includes pre-analytical, analytical, and post-analytical phases to ensure consistent and reliable PD-L1 assessment.
Table 3: Key Research Reagent Solutions for PD-L1 Assessment
| Reagent/Platform | Manufacturer | Function | Application Notes |
|---|---|---|---|
| VENTANA PD-L1 (SP263) Assay | Ventana Medical Systems/Roche | IHC detection of PD-L1 expression | Used on BenchMark ULTRA platform; validated for NSCLC [9] |
| pharmDx 22C3 Anti-PD-L1 | Agilent/Dako | IHC detection of PD-L1 expression | FDA-approved companion diagnostic for pembrolizumab [38] |
| uPath PD-L1 Software | Roche | Digital image analysis for PD-L1 scoring | IVDD-certified; requires manual tumor area selection [9] |
| PD-L1 Lung Cancer TME App | Visiopharm | AI-based digital scoring of PD-L1 | Research-use-only; shows substantial agreement with pathologists [9] |
| ExtractMax System | Gilson and Salus | Automated circulating tumor cell isolation | Enables high-yield CTC capture for liquid biopsy approaches [39] |
| PANORAMIC1000 Slide Scanner | 3DHISTECH | Whole slide image digitization | 0.25μm/pixel resolution for high-quality digital pathology [9] |
| Ventana DP200 Slide Scanner | Roche Diagnostics | Whole slide image digitization | Compatible with uPath software platform [9] |
| Ciprofloxacin-d8 | Ciprofloxacin-d8 Hydrochloride Hydrate | Ciprofloxacin-d8 HCl hydrate, a deuterium-labeled internal standard for quantitative LC-MS/MS analysis of ciprofloxacin in research samples. For Research Use Only (RUO). Not for human use. | Bench Chemicals |
| Temozolomide-d3 | Temozolomide-d3, CAS:208107-14-6, MF:C6H6N6O2, MW:197.17 g/mol | Chemical Reagent | Bench Chemicals |
PD-L1 testing variability significantly impacts patient selection and treatment outcomes in NSCLC immunotherapy. Multiple factors contribute to this variability, including pre-analytical conditions, assay selection, tumor biological characteristics, and interpretation differences. The clinical consequences are substantial, affecting treatment choices and ultimately survival outcomes. Emerging approaches including artificial intelligence algorithms, quantitative continuous scoring, and liquid biopsy methods offer potential pathways to reduce variability and improve patient stratification. Standardization of testing protocols, implementation of quality assurance programs, and adoption of validated computational approaches will be essential to minimize variability and optimize personalized immunotherapy approaches. Future research should focus on integrating multiple biomarkers including tumor mutational burden and HLA expression to complement PD-L1 testing and improve predictive accuracy for immune checkpoint inhibitor response.
The advent of immune checkpoint inhibitors (ICIs) has revolutionized cancer treatment, harnessing the body's immune system to combat malignant cells. The interaction between programmed death 1 (PD-1) on T cells and its ligand (PD-L1) on tumor cells constitutes a critical mechanism for immune escape, making the PD-1/PD-L1 pathway a prime therapeutic target [43]. Accurate assessment of PD-L1 expression levels through immunohistochemistry (IHC) has become an essential component of precision oncology, enabling identification of patients most likely to benefit from ICI therapy [44].
Four PD-L1 IHC assaysâ22C3, 28-8, SP263, and SP142âhave received approval from the U.S. Food and Drug Administration (FDA) as companion or complementary diagnostics for various ICIs across multiple cancer types [25] [44]. These assays were developed independently, utilizing different antibody clones, staining platforms, and scoring algorithms, which has resulted in challenges regarding their concordance and interchangeability in clinical practice [25] [45]. This comparison guide provides a detailed, evidence-based analysis of these four major assays, focusing on their technical specifications, analytical performance, and clinical utility across different tumor types to inform researchers, scientists, and drug development professionals.
The four FDA-approved PD-L1 assays employ distinct antibody clones, detection platforms, and scoring criteria, leading to differences in PD-L1 positivity rates and interpretation.
Table 1: Technical Specifications of FDA-Approved PD-L1 Assays
| Assay Clone | Platform | Primary Scoring Method | Key FDA-Approved Indications | Tumor Cell vs. Immune Cell Staining Emphasis |
|---|---|---|---|---|
| 22C3 | Dako Link 48 | Tumor Proportion Score (TPS) | NSCLC, HNSCC, GCC, ESCCC, UC | Primarily tumor cells |
| 28-8 | Dako Link 48 | Tumor Proportion Score (TPS) | NSCLC, RCC, MCC | Primarily tumor cells |
| SP263 | Ventana Benchmark | Tumor Cell Staining | NSCLC, UC | Balanced tumor and immune cells |
| SP142 | Ventana Benchmark | Immune Cell (IC) Score | TNBC, UC | Emphasis on immune cells |
The 22C3 pharmDx assay employs the Tumor Proportion Score (TPS), defined as the percentage of viable tumor cells showing partial or complete membranous PD-L1 staining relative to all viable tumor cells [45] [46]. This assay serves as a companion diagnostic for pembrolizumab in multiple malignancies including non-small cell lung cancer (NSCLC) [44].
The 28-8 pharmDx assay similarly utilizes TPS scoring and is approved as a complementary diagnostic for nivolumab in NSCLC and other cancers [25]. It shares the Dako platform with the 22C3 assay but employs a different antibody clone.
The SP263 assay on the Ventana platform assesses the percentage of tumor cells with any membranous PD-L1 staining of any intensity [45] [44]. It has received Conformité Européenne in vitro diagnostic (CE-IVD) designation as a companion diagnostic for multiple immunotherapeutic agents including durvalumab, pembrolizumab, cemiplimab-rwlc, and atezolizumab in NSCLC [44].
The SP142 assay employs a distinct scoring system that evaluates PD-L1 expression on both tumor cells (TC) and tumor-infiltrating immune cells (IC) [25] [47]. The IC score represents the percentage of tumor area occupied by PD-L1-stained immune cells [47]. This assay is clinically validated for identifying triple-negative breast cancer (TNBC) patients eligible for atezolizumab therapy using an IC â¥1% cutoff [47].
Multiple studies have investigated the concordance between different PD-L1 assays across various cancer types, with findings demonstrating substantial variability depending on tumor histology and scoring systems.
In NSCLC, the SP263 and 22C3 assays demonstrate high concordance rates, suggesting potential interchangeability for clinical decision-making.
Table 2: Assay Concordance in NSCLC (IMpower010 Study) [45]
| Comparison | Threshold | Concordance Rate | Kappa Statistic | Clinical Outcome Concordance |
|---|---|---|---|---|
| SP263 vs 22C3 | â¥1% (Positive) | 83% | Not reported | Comparable DFS benefit with atezolizumab |
| SP263 vs 22C3 | â¥50% (High) | 92% | Not reported | Comparable DFS benefit with atezolizumab |
The phase III IMpower010 study demonstrated that the SP263 and 22C3 assays showed high concordance at both the PD-L1-positive (â¥1%) and PD-L1-high (â¥50%) thresholds in early-stage NSCLC [45]. Importantly, the disease-free survival benefit of adjuvant atezolizumab compared with best supportive care was comparable between assays for both PD-L1-positive and PD-L1-high subgroups, indicating similar predictive value [45].
A recent comprehensive study evaluating all four FDA-approved PD-L1 assays in clear cell renal cell carcinoma revealed substantial differences in analytical performance.
Table 3: PD-L1 Positivity and Concordance in Clear Cell Renal Cell Carcinoma [25]
| Assay Clone | Tumor Cell Positivity | Immune Cell Positivity | Concordance with 28-8 (κ statistic) | Prognostic Significance for CSS |
|---|---|---|---|---|
| 22C3 | Very low | 14.7% | 0.52 | Significantly worse |
| 28-8 | Very low | 16.1% | Reference | Significantly worse |
| SP142 | Very low | 2.1% | 0.16 | Not significant |
| SP263 | Very low | 15.0% | 0.46 | Significantly worse |
This study of 286 ccRCC tissue samples demonstrated remarkably low PD-L1 expression in tumor cells across all assays [25]. When assessing immune cell PD-L1 expression, the 28-8 assay showed moderate pairwise concordance with the 22C3 (κ=0.52) and SP263 (κ=0.46) assays, but poor concordance with the SP142 assay (κ=0.16) [25]. Patients with PD-L1 expression in immune cells evaluated using the 22C3, 28-8, and SP263 assays showed significantly worse cancer-specific survival (CSS), establishing prognostic value for these three assays in RCC [25].
The IMpassion130 study provided insights into assay comparability in TNBC, where the SP142 assay is clinically validated as a companion diagnostic for atezolizumab.
Table 4: Assay Performance in Triple-Negative Breast Cancer (IMpassion130) [47]
| Assay | Scoring Method | PD-L1+ Prevalence | Concordance with SP142 | Clinical Benefit with Atezolizumab |
|---|---|---|---|---|
| SP142 | IC â¥1% | 46.4% | Reference | Significant PFS and OS benefit |
| SP263 | IC â¥1% | 74.9% | 69.2% | Driven by double-positive cases |
| 22C3 | IC â¥1% | 73.1% | 68.7% | Driven by double-positive cases |
| 22C3 | CPS â¥1 | 80.9% | Not reported | Not significant for single-positive cases |
This analysis revealed that the SP263 and 22C3 assays identified substantially more patients as PD-L1-positive (IC â¥1%) compared to the SP142 assay [47]. The analytical concordance between SP142 and the other assays was approximately 69%, indicating suboptimal interchangeability [47]. Importantly, the improved efficacy of atezolizumab plus nab-paclitaxel was primarily driven by patients identified as PD-L1-positive by both SP142 and the alternative assay ("double-positive" cases), rather than those positive only by SP263 or 22C3 ("single-positive" cases) [47].
Standardized experimental protocols are essential for ensuring reproducible and reliable PD-L1 testing across different laboratories and assay platforms.
For surgical pathology specimens, fixation in 10% neutral buffered formalin with fixation times ranging from 3-30 hours (depending on specimen size) is recommended [46]. For cytology specimens, aspirates can be collected directly into methanol-water fixative (CytoLyt), with residual material used to create cell blocks via either plasma-thrombin or Histogel methods before formalin fixation and paraffin embedding [46]. All specimens should be processed on automated tissue processors using standard laboratory programs based on tissue size, with sections cut at 4-micron thickness for staining [46].
The 22C3 and 28-8 assays are performed on the Dako Automated Link 48 platform using the manufacturer's specified reagents and protocols [25] [46]. The SP263 and SP142 assays are performed on the Ventana Benchmark platform following the manufacturer's instructions [25] [47]. Proper control samples must be included in each run to ensure staining quality and interpretation accuracy [46].
Scoring of PD-L1 expression requires trained pathologists who are proficient in the specific scoring algorithm for each assay [47]. For the 22C3, 28-8, and SP263 assays, scoring focuses primarily on tumor cell membranous staining, while the SP142 assay requires additional assessment of immune cell staining [25] [47]. Pathologists should undergo specific training for each assay and scoring system, with ongoing quality assurance and proficiency testing to maintain consistency [47].
The PD-1/PD-L1 axis represents a critical immune checkpoint pathway in cancer biology. PD-L1, encoded by the CD274 gene, is expressed on the surface of tumor cells and tumor-infiltrating immune cells [43] [44]. Interaction between PD-L1 and its receptor PD-1 on T cells leads to inhibition of T cell proliferation, reduced cytokine secretion, and induction of apoptosis in antigen-specific T cells, ultimately resulting in immune escape and tumor progression [43] [44]. The four PD-L1 assays discussed in this guide detect the PD-L1 protein expressed on tumor cells and/or immune cells, enabling identification of patients most likely to respond to ICIs that block this immunosuppressive pathway [44].
Table 5: Key Research Reagents for PD-L1 Immunohistochemistry
| Reagent/Material | Function | Application Notes |
|---|---|---|
| Formalin-fixed, paraffin-embedded (FFPE) tissue | Preserves tissue architecture and antigen integrity | Standard 10% neutral buffered formalin; fixation time 3-30 hours depending on specimen size [46] |
| Cytology cell blocks | Alternative substrate for PD-L1 testing | Prepared from residual cytology material using plasma-thrombin or Histogel method [46] |
| PD-L1 antibody clones (22C3, 28-8, SP263, SP142) | Specific detection of PD-L1 protein | Each clone has distinct binding epitopes and staining characteristics [25] |
| Dako Autostainer Link 48 | Automated IHC staining platform | Optimized for 22C3 and 28-8 pharmDx assays [45] [46] |
| Ventana Benchmark series | Automated IHC staining platform | Optimized for SP263 and SP142 assays [45] [47] |
| Specific detection kits | Signal amplification and visualization | Platform-specific detection systems required for each assay [47] |
| Control tissues | Quality assurance | Positive and negative controls essential for validating staining quality [46] |
The four major FDA-approved PD-L1 assaysâ22C3, 28-8, SP263, and SP142âdemonstrate variable analytical and clinical performance across different cancer types. In NSCLC, the SP263 and 22C3 assays show high concordance and comparable predictive value for ICI benefit, suggesting potential interchangeability in this setting [45]. In contrast, significant disparities exist among all four assays in clear cell renal cell carcinoma, particularly for the SP142 assay, which shows notably lower immune cell positivity and poor concordance with other assays [25]. The SP142 assay remains unique in its emphasis on immune cell staining, which is particularly relevant in specific cancer types such as triple-negative breast cancer [47].
These findings highlight the critical importance of considering assay-specific characteristics when interpreting PD-L1 expression results in both research and clinical settings. The ongoing development of harmonization protocols and artificial intelligence-assisted scoring platforms may help reduce inter-assay variability and improve the accuracy of PD-L1 as a predictive biomarker for immune checkpoint inhibitor therapy [44].
The advent of immune checkpoint inhibitors has established PD-L1 immunohistochemistry (IHC) as a critical predictive biomarker in oncology, making the standardization and reliability of automated staining platforms a cornerstone of modern cancer diagnostics and drug development [48] [49]. Platforms such as the Dako Autostainer Link 48, Ventana BenchMark ULTRA, and Leica BOND-III are integral to performing these complex assays. However, the comparative performance of these systems, influenced by their unique chemistries, protocols, and sensitivities, directly impacts the accuracy of patient selection for therapy. This guide objectively compares these leading platforms, framing the analysis within the broader thesis of PD-L1 assay standardization and providing researchers with the experimental data necessary to inform their analytical and clinical decisions.
The Dako Autostainer Link 48 (Agilent), Ventana BenchMark ULTRA (Roche), and Leica BOND-III represent the leading technologies in automated IHC and ISH (in situ hybridization) staining. Each system employs a distinct approach to automation, reagent management, and the staining process itself, which contributes to its unique performance profile.
The table below summarizes the core specifications and technological approaches of each platform:
Table 1: Key Specifications of Automated Staining Platforms
| Feature | Dako Autostainer Link 48 | Ventana BenchMark ULTRA | Leica BOND-III |
|---|---|---|---|
| Manufacturer | Agilent Technologies | Roche Ventana | Leica Biosystems |
| Staining Principle | Capillary gap (Coverplate) [50] | Puddle (Liquid Coverslip) [51] | Puddle (Covertile) [52] |
| Typical Slide Capacity | 48 slides [53] | 30 slides [51] | 30 slides [52] |
| Workflow | Batch-based | Single-piece, continuous access [51] | Batch-based with 3 independent trays [52] |
| Key Technology | Semi-automated (separate antigen retrieval) [54] | Individually controlled slide heater pads [51] | Patented Covertile system for low reagent volume [52] |
| Reagent System | Open | Largely closed with bar-coded testpacks [50] | Open with real-time level alerts [52] |
| Assay Menu Flexibility | High | Broad (250+ ready-to-use assays) [51] | High, with Novocastra reagents [52] |
The analytical performance of these platforms is paramount, particularly for standardized companion diagnostics. Studies have directly compared their output for key biomarkers like PD-L1 and Ki-67, revealing significant differences in assay sensitivity and inter-instrument concordance.
PD-L1 IHC is a primary diagnostic for immunotherapy, but multiple FDA-approved assays with different antibodies exist, leading to challenges in harmonization. Research shows that the analytical sensitivity of these assays varies substantially by platform.
A multi-institutional study using a standardized PD-L1 Index TMA and quantitative digital analysis found that FDA-approved assays could be grouped by analytic sensitivity. The Ventana SP263 assay was found to be the most sensitive, followed by the Agilent 22C3 and 28-8 assays, while the Ventana SP142 assay was analytically ten times less sensitive than the SP263 assay [48]. This lower sensitivity of the SP142 assay was confirmed in another study, which noted it failed to detect low levels of PD-L1 in cell lines that were distinguished by other assays [49]. Critically, the assays for 22C3, 28-8, SP263, and a laboratory-developed test using E1L3N were highly similar and consistent across multiple laboratory sites for a given platform [49].
Ki-67 is a proliferation marker with well-documented inter-laboratory heterogeneity. A comparative study of Ki-67 IHC laboratory-developed tests (LDTs) on different platforms highlighted significant variability.
Table 2: Analytical Comparison of Ki-67 IHC Laboratory-Developed Tests
| Platform & Antibody Clone | Sensitivity at 20% Cutoff (%) [55] | Specificity at 20% Cutoff (%) [55] | Key Finding |
|---|---|---|---|
| Dako Autostainer Link 48 (MIB-1) | 24.8 | 99.5 | High specificity but low sensitivity vs. reference assay. |
| Leica BOND-III (K2) | 25.1 | 100.0 | Performance nearly identical to Dako AS48 LDT. |
| Ventana BenchMark ULTRA (30-9) | 99.3 | 53.6 | High sensitivity but markedly lower specificity. |
This data demonstrates that the choice of platform and antibody clone combination can drastically alter the classification of samples, as seen with the Ventana 30-9 clone, which showed high sensitivity but low specificity compared to the reference test [55].
Furthermore, a specific study comparing the FDA-approved Ki-67 IHC MIB-1 pharmDx assay on the Dako Omnis versus its reagents used with an optimized protocol on the more widely available Dako Autostainer Link 48 (AS48) showed that high concordance (90.3% overall agreement) is achievable between instruments from the same manufacturer [53]. This suggests that reagent and protocol optimization are as critical as the choice of instrument.
For some diagnostics, the staining pattern is as important as the intensity. In diagnosing Hyalinising Trabecular Tumour (HTT) of the thyroid, a specific cell membrane-positive reaction for MIB-1 (Ki-67) is a crucial criterion. A 2024 study investigated the ability of different automated platforms to replicate this pattern, which is routinely achieved with manual staining.
Table 3: Performance of Automated Platforms for MIB-1 Membrane Staining in HTT
| Platform | Optimal Conditions | Staining Outcome for Membrane Pattern |
|---|---|---|
| Dako Autostainer Link 48 | Antigen retrieval with pH 9.0 at room temperature (RT) | Most stable and strongest membrane staining [54]. |
| Ventana BenchMark ULTRA | CC1 (pH 8.5) retrieval; primary antibody incubation at RT | Significantly stronger membrane staining at RT than at 37°C [54]. |
| Leica BOND-III | ER1 (pH 6.0) retrieval at RT | Weak-to-moderate membrane staining; weaker with pH 9.0 [54]. |
| Dako Omnis | Antigen retrieval at pH 9.0; incubation at 32°C | Weak-to-moderate positive membrane staining [54]. |
This study concluded that the Dako Autostainer Link 48 was the most stable platform for this particular application, closely mimicking manual staining conditions. It also highlighted that slight adjustments in protocol parameters, such as antigen retrieval pH and incubation temperature, are critical for success on automated systems and are not universally optimal across platforms [54].
To ensure objectivity, the data cited in this guide are derived from rigorous, published experimental methodologies. The key protocols are summarized below for researcher reference.
This study quantified inter-assay and inter-laboratory variation using a standardized Index Tissue Microarray (TMA) [49].
This study measured the concordance of the FDA-approved Ki-67 IHC MIB-1 pharmDx assay across two Dako instruments [53].
Figure 1: Experimental Workflow for Platform Comparison Studies. This diagram outlines the core methodologies used in the key studies cited to generate comparable data on staining platform performance.
The performance of an automated staining system is dependent on an integrated set of reagents and materials. The following table details essential components referenced in the featured studies.
Table 4: Essential Research Reagents and Materials for Automated IHC
| Item | Function | Example in Context |
|---|---|---|
| Index Tissue Microarray (TMA) | A standardized slide containing multiple tissue or cell line cores for simultaneous staining, enabling inter-laboratory and inter-assay comparison [49]. | Used with isogenic cell lines expressing a PD-L1 dynamic range to objectively compare assay sensitivity [49]. |
| FDA-Approved IHC Assay Kits | Complete reagent sets (primary antibody, detection system) validated for a specific diagnostic purpose on a designated platform. | PD-L1 IHC 22C3 PharmDx (for Dako platforms) [49]; PD-L1 IHC SP263 Assay (for Ventana platforms) [48] [49]. |
| Validated Antibody Clones | The specific monoclonal antibody that binds the target epitope. Different clones can have varying performance. | Ki-67 clones MIB-1, K2, and 30-9 show different sensitivity/specificity profiles on Dako, Leica, and Ventana platforms, respectively [55]. |
| Epitope Retrieval Solutions | Buffered solutions used to reverse formaldehyde cross-linking and expose hidden antigenic epitopes. pH is critical. | Dako Target Retrieval Solution (pH 6.0 or 9.0) [54]; Ventana Cell Conditioning Solution (CC1, pH ~8.5) [54]. |
| Detection System | A series of reagents that generate a visible signal (chromogenic or fluorescent) at the site of antibody binding. | Bond Polymer Refine Detection (Leica) [54]; EnVision FLEX (Dako) [53] [54]; UltraView DAB (Ventana) [54]. |
| NIST-Traceable Calibrators | Synthetic calibrators (e.g., peptide-coated microbeads) with a known number of molecules/bead, used to standardize assay sensitivity and reproducibility [48]. | Tool for standardizing the biochemical aspect of PD-L1 IHC assays, ensuring week-to-week reproducibility of stain intensity [48]. |
| 4-Methoxy-2,3,6-trimethylbenzyl bromide | 4-Methoxy-2,3,6-trimethylbenzyl bromide, CAS:69877-88-9, MF:C11H15BrO, MW:243.14 g/mol | Chemical Reagent |
| 3-O-Demethylmonensin B | 3-O-Demethylmonensin B|For Research|RUO | 3-O-Demethylmonensin B is a monensin derivative isolated from Streptomyces cinnamonensis. For Research Use Only. Not for human or veterinary use. |
The choice between the Dako Autostainer Link 48, Ventana BenchMark ULTRA, and Leica BOND-III systems involves critical trade-offs. The data indicates that no single platform is universally superior; instead, the optimal instrument depends on the specific application and required context.
For standardized companion diagnostics like PD-L1, the platform is often predetermined by the FDA-approved assay. However, researchers must be aware of the inherent sensitivity differences between assays (e.g., SP263 vs. SP142) [48] [49]. For laboratory-developed tests (LDTs) like Ki-67, the platform and antibody clone selection will profoundly impact the results, as evidenced by the significant variability in sensitivity and specificity [55]. Furthermore, for highly specialized staining patterns like MIB-1 membrane staining in HTT, platform-specific protocol optimization is not just beneficial but essential, with some systems offering more stable performance than others [54].
Therefore, the path to reliable and reproducible IHC data lies in understanding the technical nuances of these automated systems, rigorously validating each assay on the chosen platform, and implementing standardization tools like index TMAs and NIST-traceable calibrators to ensure analytical precision across experiments and laboratories [48] [49].
The advent of immune checkpoint inhibitors (ICIs) targeting the PD-1/PD-L1 axis has revolutionized cancer treatment, making accurate assessment of PD-L1 expression a critical component of companion diagnostics. As of 2025, the U.S. Food and Drug Administration (FDA) has approved 12 PD-L1 companion diagnostics for immunotherapies, each utilizing different scoring methods and thresholds [56]. Two scoring systems have emerged as fundamental to this evaluation: the Tumor Proportion Score (TPS) and the Combined Positive Score (CPS). These quantitative immunohistochemistry (IHC) scoring methods guide therapeutic decisions across multiple cancer types, including non-small cell lung cancer (NSCLC), gastric cancer, and head and neck squamous cell carcinoma (HNSCC). The selection between TPS and CPS has significant implications for patient selection, as it directly influences eligibility for specific ICIs. This guide provides a comprehensive comparison of these systems, examining their technical specifications, clinical applications, performance characteristics, and implementation protocols to support researchers and drug development professionals in optimizing PD-L1 detection strategies.
Tumor Proportion Score (TPS) is defined as the percentage of viable tumor cells exhibiting partial or complete PD-L1 membrane staining relative to all viable tumor cells in the sample [57] [56]. The calculation excludes immune cells and stromal elements, focusing exclusively on neoplastic cells. The formula is expressed as:
TPS = (Number of PD-L1 positive tumor cells ÷ Total number of viable tumor cells) à 100
Combined Positive Score (CPS) represents a more comprehensive metric that quantifies PD-L1 expression across both tumor and immune compartments [57] [58] [56]. CPS is calculated as the number of PD-L1-positive cells (tumor cells, lymphocytes, and macrophages) divided by the total number of viable tumor cells, multiplied by 100:
CPS = (Number of PD-L1 positive cells [tumor cells, lymphocytes, macrophages] ÷ Total number of viable tumor cells) à 100
Table 1: Fundamental Characteristics of TPS and CPS Scoring Systems
| Characteristic | Tumor Proportion Score (TPS) | Combined Positive Score (CPS) |
|---|---|---|
| Cells assessed | Tumor cells only | Tumor cells, lymphocytes, and macrophages |
| Scoring range | 0-100% | 0-100 (theoretically unlimited but typically reported up to 100) |
| Key components | PD-L1+ tumor cells, total viable tumor cells | PD-L1+ tumor cells, PD-L1+ immune cells, total viable tumor cells |
| Excluded elements | Immune cells, stromal cells, necrotic areas | Necrotic areas, non-viable tumor cells |
| Primary clinical context | NSCLC, first-line pembrolizumab monotherapy | Gastric cancer, HNSCC, urothelial carcinoma |
Both scoring systems employ specific thresholds that trigger therapeutic implications across different cancer types. For TPS, the most significant cut-point is â¥50% for first-line pembrolizumab monotherapy in metastatic NSCLC, while â¥1% may indicate benefit in other contexts [41] [56]. For CPS, multiple thresholds exist across indications: â¥1 for gastric cancer (pembrolizumab), â¥5 for gastric cancer (nivolumab in CheckMate-649), and â¥10 for esophageal cancer (pembrolizumab in KEYNOTE-590) [59] [60]. The RATIONALE-305 trial introduced yet another metric called Tumor Area Positivity (TAP), which quantifies both tumor and immune cell staining, with â¥5% defining positivity for tislelizumab benefit in gastric cancer [60]. This proliferation of scoring systems and thresholds underscores the importance of assay-specific biomarker validation in immunotherapy trials.
The comparability of different PD-L1 assays has been extensively studied to determine potential interchangeability in clinical practice. A comprehensive comparability study of immunohistochemical assays for PD-L1 detection in hepatocellular carcinoma demonstrated that the 22C3, 28-8, and SP263 assays exhibited comparable sensitivity in detecting PD-L1 expression, whereas the SP142 assay was consistently the least sensitive across both TPS and CPS evaluations [17]. The inter-assay agreement, measured by intraclass correlation coefficients (ICC), was 0.646 for TPS and 0.780 for CPS, indicating superior concordance for the combined scoring system [17]. This enhanced agreement with CPS likely stems from its incorporation of multiple cell types, which may mitigate tumor heterogeneity effects and staining interpretation variability.
The inter-rater reliability also differs between scoring systems. In the hepatocellular carcinoma study, the overall ICC among five pathologists was 0.946 for TPS and 0.809 for CPS, suggesting that pathologists demonstrate greater consistency when evaluating tumor cells alone compared to the more complex assessment required for CPS [17]. This reliability gap highlights the challenging nature of immune cell quantification in the tumor microenvironment, particularly with certain assays like SP142, where pathologists were less reliable in scoring CPS compared to TPS [17]. Importantly, up to 18% of samples were misclassified by individual pathologists compared to consensus scoring at the CPS â¥1 cutoff, emphasizing the clinical impact of this variability [17].
The relative clinical utility of TPS versus CPS varies significantly across cancer types, reflecting fundamental differences in tumor biology and immune microenvironment composition.
In gastric cancer, CPS has emerged as the dominant biomarker across multiple pivotal trials. The KEYNOTE-859 trial established CPS â¥1 as the threshold for pembrolizumab approval in advanced gastric cancer, demonstrating significant survival benefits (median OS: 13.0 months vs 11.4 months; HR=0.74) [60]. This advantage was more pronounced in the CPS â¥10 subgroup (median OS 15.7 months vs 11.8 months; HR=0.65) [60]. Similarly, the CheckMate-649 trial led to nivolumab approval in HER2-negative advanced gastric cancer with CPS â¥5 [60]. The integration of HER2 status with PD-L1 scoring is particularly relevant in gastric cancer, as the KEYNOTE-811 trial demonstrated that adding pembrolizumab to trastuzumab and chemotherapy significantly improved objective response rates (74.4% vs 51.9%) in HER2-positive gastric cancer, with superior overall survival in PD-L1 CPS â¥1 patients (20.1 months vs 15.7 months; HR=0.79) [60].
In NSCLC, TPS remains the established biomarker in many contexts, particularly for pembrolizumab monotherapy in patients with TPS â¥50% [41] [56]. However, emerging quantitative approaches are revealing new dimensions of PD-L1 assessment. The PD-L1 Quantitative Continuous Scoring (QCS) system identifies NSCLC patients more likely to benefit from durvalumab by capturing the percentage of tumor cells with medium to strong staining intensity [41]. This continuous scoring method demonstrated a hazard ratio of 0.62 (CI 0.46-0.82) with a biomarker-positive prevalence of 54.3%, compared to visual TPS scoring which resulted in a hazard ratio of 0.69 (CI 0.46-1.02) with a 29.7% prevalence [41].
In HNSCC, both scoring systems are utilized, with CPS â¥1 determining first-line pembrolizumab eligibility based on the KEYNOTE-689 trial, while TPS â¥50% guides second-line treatment [57]. A study examining PD-L1 expression across different specimen types in HNSCC found significant discrepancies in both CPS and TPS between biopsy and surgical resection specimens (p<0.01), as well as between resection and metastatic lymph nodes (p<0.01) [57]. This heterogeneity underscores the importance of standardized specimen selection for PD-L1 assessment regardless of scoring system.
Table 2: Clinical Applications of TPS and CPS Across Cancer Types
| Cancer Type | Preferred Scoring System | Key Therapeutic Thresholds | Supporting Clinical Trials |
|---|---|---|---|
| Non-small cell lung cancer (NSCLC) | TPS (primary) | TPS â¥50% (first-line pembrolizumab monotherapy) | KEYNOTE-024, MYSTIC |
| Gastric cancer | CPS | CPS â¥1 (pembrolizumab), CPS â¥5 (nivolumab) | KEYNOTE-859, CheckMate-649, KEYNOTE-811 |
| Head and neck squamous cell carcinoma (HNSCC) | Both | CPS â¥1 (first-line), TPS â¥50% (second-line) | KEYNOTE-689, CHECKMATE-141 |
| Esophageal cancer | CPS | CPS â¥10 (pembrolizumab) | KEYNOTE-590 |
| Angiosarcoma | TPS | TPS â¥1% (emerging biomarker) | Investigational |
Robust PD-L1 scoring requires strict adherence to standardized experimental protocols across pre-analytical, analytical, and post-analytical phases. For the widely used PD-L1 IHC 22C3 pharmDx assay, tissue sections are cut at 4μm thickness from formalin-fixed paraffin-embedded (FFPE) blocks and processed on automated staining platforms such as the Ventana BenchMark ULTRA or Dako Autostainer Link 48 [57] [58]. Appropriate positive and negative controls must be included in each run, with multi-tissue blocks containing tonsil and placenta tissues often serving as positive controls [56]. The entire process for 204 tissue sections from a HNSCC study was automated to ensure consistency [57].
For novel assay development, such as the PD-L1 CAL10 assay (Leica Biosystems) currently in development, feasibility studies compare performance to established assays like the SP263 (Ventana) on the Benchmark Ultra staining system [56]. These comparability studies require careful attention to inclusion criteriaâtypically encompassing both resection and biopsy specimens from relevant cancer types (e.g., 60-70% adenocarcinomas and 30-40% squamous cell carcinomas for NSCLC) with representation of both primary and metastatic sites [56]. Such standardization enables reliable assessment of diagnostic concordance, with the CAL10 assay demonstrating a lower bound of the 95% CI of overall percent agreement of 86.2% at â¥50% TPS cutoff and 94.0% at â¥1% TPS cutoff compared to SP263 [56].
Advanced computational approaches are addressing the challenges of manual PD-L1 assessment through automated scoring pipelines. A typical AI-based workflow for CPS quantification in gastric cancer incorporates multiple sequential deep learning models [58]:
This integrated pipeline demonstrated strong concordance with expert pathologists' consensus in internal validation (Cohen's kappa = 0.782) and maintained robust performance in external cohorts (Cohen's kappa = 0.737) [58].
For rare cancers like angiosarcoma, where training data is limited, specialized pipelines such as PEERCE leverage pre-trained generalist models and fine-tuning approaches to achieve strong TPS prediction performance (correlation coefficients of 0.83-0.93 with pathologist assessment) despite limited annotated data [61]. In this context, AI assistance serves as a valuable "second opinion," with pathologists updating their TPS scores in cases of strong disagreement, thereby improving diagnostic accuracy [61].
Digital vs. Traditional Pathology Workflows for PD-L1 Scoring
Table 3: Essential Research Reagents and Platforms for PD-L1 Scoring
| Category | Specific Products/Platforms | Research Application | Key Characteristics |
|---|---|---|---|
| IHC Assays | PD-L1 IHC 22C3 pharmDx (Agilent/Dako) | Gold standard for TPS/CPS assessment | FDA-approved companion diagnostic; used with Autostainer Link platforms |
| PD-L1 IHC SP263 (Ventana) | Comparative assay validation | Comparable sensitivity to 22C3; used on Benchmark Ultra system | |
| PD-L1 CAL10 (Leica Biosystems) | Novel assay development | Demonstrates 86.2% OPA with SP263 at â¥50% TPS; BOND-III platform | |
| Staining Platforms | Ventana BenchMark ULTRA | Automated IHC staining | Standardized processing for SP263 and SP142 assays |
| Dako Autostainer Link 48 | Automated IHC staining | Optimized for 22C3 pharmDx assay | |
| Leica BOND-III | Automated IHC staining | Development platform for CAL10 assay | |
| Digital Pathology | Philips Intellisite Pathology Solution | Whole slide imaging | High-resolution TIFF file export for bioimage analysis |
| 3DHISTECH PANNORAMIC 1000 | Whole slide imaging | 40à magnification scanning (0.25μm/pixel) for AI analysis | |
| Aperio GT 450 | Whole slide imaging | Digital read concordance with glass slides (94% at â¥1% TPS) | |
| Bioimage Analysis | QuPath | Open-source bioimage analysis | Manual annotation of tumor cells, immune cells, PD-L1+ populations |
| PEERCE Library | AI-assisted TPS prediction | Angiosarcoma-focused; open-source pipeline for rare cancers | |
| Custom AI Pipelines (MobileNet-v2, U-Net, YOLO) | Automated CPS quantification | Integrated patch classification, segmentation, and cell detection | |
| Pirlimycin | Pirlimycin, CAS:78822-40-9, MF:C17H31ClN2O5S, MW:411.0 g/mol | Chemical Reagent | Bench Chemicals |
| Zapnometinib | Zapnometinib, CAS:303175-44-2, MF:C13H7ClF2INO2, MW:409.55 g/mol | Chemical Reagent | Bench Chemicals |
The comparative analysis of TPS and CPS scoring systems reveals a complex landscape where clinical utility is highly context-dependent, varying by cancer type, therapeutic agent, and specific clinical setting. While TPS offers simplicity and superior inter-rater reliability, particularly in NSCLC, CPS provides a more comprehensive assessment of the tumor immune microenvironment that has proven valuable in gastrointestinal cancers and HNSCC. The ongoing development of quantitative continuous scoring systems and AI-assisted pipelines promises to address current limitations in inter-observer variability and specimen-related heterogeneity [41] [58]. Furthermore, the emergence of novel assessment metrics like Tumor Area Positivity (TAP) suggests that the evolution of PD-L1 scoring is ongoing, with future systems potentially incorporating spatial relationships and multiplexed biomarker information [60]. For researchers and drug development professionals, selection between TPS and CPS must be guided by specific cancer indications, available tissue specimens, and the growing arsenal of computational tools that enhance scoring precision and reproducibility.
Laboratory-developed tests are in vitro diagnostic products designed, manufactured, and used within a single clinical laboratory [62]. Historically, the U.S. Food and Drug Administration exercised enforcement discretion over LDTs, but the regulatory landscape has undergone significant changes. In 2024, the FDA announced a final rule to phase out its general enforcement discretion approach, though this rule was subsequently vacated by a federal court in 2025 [63] [64]. This regulatory uncertainty forms the critical backdrop against which laboratories must develop and implement LDTs, particularly for complex applications such as PD-L1 immunohistochemistry testing in cancer.
The significance of LDTs is particularly pronounced in specialized domains like PD-L1 detection for immunotherapy selection. With different immune checkpoint inhibitors linked to specific companion diagnostic assays, laboratories face practical challenges in offering multiple commercial tests [65]. LDTs provide a vital pathway for laboratories to expand testing capabilities using existing platforms, thereby increasing patient access to essential predictive biomarkers without being constrained by proprietary instrument systems [65] [66].
Table 1: Key Differences Between LDTs and Commercial IVDs
| Aspect | Laboratory-Developed Tests (LDTs) | Commercial IVDs |
|---|---|---|
| Regulatory Oversight | CLIA certification, CAP inspections [67] | FDA premarket review (510(k), PMA) [68] [62] |
| Development Flexibility | Rapid adaptation and modification capabilities [66] | Fixed design without modifications allowed [6] |
| Content Control | Laboratory controls target selection and relevance [66] | Manufacturer determines content |
| Implementation Timeline | Relatively quick development and validation [66] | Lengthy development and regulatory review |
| Cost Structure | Lower cost per test [66] | Higher development costs incorporated into pricing |
| Technical Support | Laboratory self-sufficient | Manufacturer-provided technical support [66] |
| Test Consolidation | Multiple analytes possible in single test [66] | Typically focused on specific analytes |
| Distribution Scope | Limited to developing laboratory | Broad distribution across multiple laboratories [66] |
LDTs offer several distinct advantages that make them particularly valuable in specialized clinical and research settings. They provide laboratories with direct control over test content, enabling the selection of specific and relevant targets tailored to patient populations [66]. This flexibility extends to the ability to rapidly develop and modify tests in response to emerging clinical needs, which proved crucial during public health emergencies such as the COVID-19 and mpox outbreaks [67]. Additionally, LDTs enable test consolidation, allowing multiple analytes to be measured in a single test, which can provide more comprehensive data per sample and potentially accelerate diagnostic processes [66].
Commercial IVDs, in contrast, benefit from established quality systems with design and manufacturing controls required for FDA clearance [66]. They offer clinical validity demonstrated through extensive validation studies, and users have access to manufacturer technical support for troubleshooting [66]. The broad distribution of commercial tests across multiple laboratories generates substantial collective data that can reinforce confidence in test performance [66].
Table 2: Performance Comparison of PD-L1 Immunohistochemistry Assays
| Assay Type | Comparison | TPS â¥1% Cutoff | TPS â¥50% Cutoff | Study Details |
|---|---|---|---|---|
| LDT (22C3 on VENTANA) | vs. 22C3 DAKO | 94.6% OPA [65] | 91.8% OPA [65] | 85 NSCLC cases [65] |
| LDT (22C3 on VENTANA) | vs. SP263 VENTANA | 95.0% OPA [65] | 93.8% OPA [65] | 85 NSCLC cases [65] |
| Commercial (22C3 DAKO) | vs. SP263 VENTANA | 91.8% OPA [65] | 96.5% OPA [65] | 85 NSCLC cases [65] |
| Novel CAL10 (Under Development) | vs. SP263 VENTANA | â¥94.0% OPA (95% CI) [56] | â¥86.2% OPA (95% CI) [56] | 136 NSCLC samples [56] |
A comprehensive meta-analysis of 22 publications encompassing 376 assay comparisons revealed crucial insights about PD-L1 assay interchangeability [6]. The analysis established that for a testing laboratory unable to use an FDA-approved companion diagnostic, developing a properly validated LDT for the same purpose as the original PD-L1 FDA-approved immunohistochemistry companion diagnostic is preferable to replacing it with another FDA-approved companion diagnostic developed for a different purpose [6].
This research further determined that LDTs can achieve diagnostic accuracy meeting the clinically acceptable threshold of â¥90% sensitivity and specificity for stated clinical applications when properly validated [6]. However, the performance of LDTs shows greater variability compared to FDA-approved assays due to differences in immunohistochemistry protocol conditions across laboratories, even when using the same primary antibody and automated instrument platform [6].
The development and validation of PD-L1 LDTs follows rigorous experimental protocols to ensure analytical reliability. In a study comparing laboratory-developed and commercial PD-L1 assays, researchers implemented a systematic approach [65]:
Tissue Sample Selection: The study utilized 85 non-small cell lung carcinoma cases from surgical resections, with patient ages ranging from 40 to 88 years. The cohort included both adenocarcinoma and squamous cell carcinoma subtypes to represent the spectrum of NSCLC [65].
Assay Configuration: The LDT was developed using 22C3 antibody on the VENTANA BenchMark ULTRA platform, contrasting with the commercial 22C3 pharmDx Assay designed for the Dako Autostainer Link 48 platform. This cross-platform application required careful optimization of staining conditions [65].
Staining and Evaluation: Triplicate glass slides were stained for each case, including the target stain, H&E for morphological reference, and appropriate negative control isotypes. Staining intensity was quantitatively assessed, with the 22C3 Dako assay producing more intense membrane staining compared to both Ventana platform assays [65].
Statistical Analysis: Overall percent agreement was calculated for key clinical cutoffs (TPS â¥1% and â¥50%), with concordance determined through direct comparison of paired sample results across platforms [65].
A more recent development study of the novel PD-L1 CAL10 assay on the BOND-III platform demonstrates advanced validation methodologies [56]:
Study Design: The feasibility analysis included 136 formalin-fixed paraffin-embedded NSCLC tissue samples, with case selection following strict inclusion criteria requiring 60-70% adenocarcinomas, 30-40% squamous cell carcinomas, and representation of both primary and metastatic sites [56].
Pre-screening Protocol: All cases were pre-characterized using the BOND RTU PD-L1 (73-10) clone to establish baseline PD-L1 expression across the 0-100% TPS range [56].
Digital Pathology Integration: After traditional manual assessment, CAL10-stained glass slides were scanned using the Aperio GT 450 scanner to generate whole slide images. Pathologists re-evaluated the digital images after a 4-month washout period to assess concordance between manual and digital reading modalities [56].
Statistical Framework: A one-sided, exact non-inferiority test for a single proportion with a 0.05 type 1 error rate was applied to demonstrate non-inferiority of the CAL10 assay to the SP263 comparator [56].
Diagram 1: PD-1/PD-L1 Signaling Pathway and Immunotherapy Mechanism. This diagram illustrates how tumor cells expressing PD-L1 interact with PD-1 receptors on T-cells to suppress immune response, and how immune checkpoint inhibitors block this interaction to restore anti-tumor immunity.
Table 3: Key Research Reagent Solutions for PD-L1 Assay Development
| Reagent/Platform | Function | Application Notes |
|---|---|---|
| Primary Antibodies (22C3, SP263, CAL10) | Specific PD-L1 epitope binding | Clone selection affects staining intensity and interpretation [65] [56] |
| Automated Staining Platforms (Dako Link 48, VENTANA BenchMark, BOND-III) | Standardized assay execution | Platform choice affects staining patterns; cross-platform development requires optimization [65] [56] |
| Detection Systems | Signal amplification and visualization | Platform-specific detection chemistry impacts sensitivity [6] |
| Tissue Controls (Tonsil, Placenta) | Assay performance verification | Multi-tissue blocks used for process control [56] |
| Digital Pathology Scanners (Aperio GT 450) | Whole slide imaging for analysis | Enables digital read concordance studies [56] |
| NIST Standard Reference Material 1934 | Metrological traceability | Enables quantitative comparison across different PD-L1 assays [69] |
| Glyceryl 1-monooctanoate | Glyceryl 1-monooctanoate, CAS:26402-26-6, MF:C11H22O4, MW:218.29 g/mol | Chemical Reagent |
| Monomethyl kolavate | Monomethyl Kolavate|TbGAPDH Inhibitor | Monomethyl kolavate is a potent TbGAPDH inhibitor (IC50 = 2 µM) for trypanosomiasis research. For Research Use Only. Not for human or veterinary use. |
Diagram 2: LDT Development and Implementation Workflow. This flowchart outlines the key phases in developing and implementing laboratory-developed tests, from initial concept through post-market monitoring, highlighting critical validation and verification steps.
The regulatory environment for LDTs remains dynamic and complex. The FDA's 2024 final rule sought to establish a four-year phaseout of enforcement discretion, citing concerns about modern LDTs being used more widely for critical healthcare decisions despite variable performance [62]. However, the 2025 federal court decision vacating this rule affirmed that LDTs constitute services rather than devices, placing them outside FDA medical device authorities [63] [64].
This regulatory uncertainty necessitates strategic planning for laboratories developing PD-L1 LDTs. The fundamental framework should include proper validation following Clinical and Laboratory Standards Institute protocols, ongoing performance monitoring, and rigorous adherence to CLIA requirements [68]. Furthermore, laboratories should implement comprehensive quality management systems that address pre-analytical, analytical, and post-analytical phases of testing [67].
Future developments in PD-L1 testing will likely focus on improved standardization through reference materials traceable to NIST standards, which enable quantitative comparison across different PD-L1 IHC assays [69]. Additionally, the integration of digital pathology and artificial intelligence for scoring may reduce inter-pathologist variability, which has been identified as a significant factor in PD-L1 assessment [65] [6].
The development and implementation of LDTs for PD-L1 detection represents a critical capability for modern clinical laboratories, particularly in the context of evolving regulatory frameworks and the need for accessible cancer immunotherapy biomarkers. When properly validated, LDTs demonstrate performance comparable to commercial assays, with the meta-analysis evidence indicating that properly validated LDTs can achieve the clinically acceptable threshold of â¥90% sensitivity and specificity [6]. The strategic implementation of PD-L1 LDTs requires careful attention to analytical validation, clinical concordance studies, and ongoing quality management to ensure reliable patient results for immunotherapy selection.
The accurate assessment of programmed death-ligand 1 (PD-L1) expression through immunohistochemistry (IHC) is a critical predictive biomarker for patient selection in immune checkpoint inhibitor therapy. The reliability of this biomarker, however, is profoundly influenced by pre-analytical variables, particularly the type of tissue specimen analyzed. In clinical practice and research, pathologists and researchers encounter diverse specimen types including surgical resections, biopsies, and cytology cell blocks, each with distinct structural properties and technical challenges. Understanding how PD-L1 expression varies across these different specimen types, and the implications for assay performance, is essential for accurate treatment stratification in oncology, particularly for non-small cell lung cancer (NSCLC) and head and neck squamous cell carcinoma (HNSCC). This guide objectively compares PD-L1 testing performance across different tissue specimens, supported by experimental data and detailed methodologies from recent studies.
Comparative studies have demonstrated significant heterogeneity in PD-L1 expression when measured across different specimen types from the same patients. This variability presents substantial challenges for consistent biomarker interpretation and patient selection.
Table 1: Comparison of PD-L1 Expression Across Specimen Types in HNSCC
| Specimen Type | TPS Comparison | CPS Comparison | Statistical Significance | Study Details |
|---|---|---|---|---|
| Preoperative Biopsy | Lower than resection | Lower than resection | p < 0.01 for both TPS and CPS | 68 HNSCC cases; 22C3 assay [57] |
| Surgical Resection | Reference standard | Reference standard | Reference for comparisons | Digital analysis with QuPath [57] |
| Metastatic Lymph Node | Lower than resection | Lower than resection | p < 0.01 for both TPS and CPS | Same patients, triple sampling [57] |
| Biopsy vs. Lymph Node | No significant difference | No significant difference | Not statistically significant | Despite different origins [57] |
The observed discrepancies highlight the impact of tumor heterogeneity and sample representation on PD-L1 assessment. Surgical resections provide more comprehensive tumor sampling, potentially capturing the full spectrum of PD-L1 expression patterns, while biopsies and metastatic deposits may only represent subsets of the tumor biology [57]. This has direct implications for clinical trial design and diagnostic accuracy, as specimen type may influence patient eligibility for immunotherapy.
The variability in PD-L1 expression across different specimen types can directly impact treatment decisions, particularly when using standardized scoring cutoffs.
Table 2: Impact of Specimen Type on PD-L1 Scoring and Clinical Implications
| Factor | Impact on PD-L1 Scoring | Potential Clinical Consequence |
|---|---|---|
| Tumor Heterogeneity | Significant expression variability between biopsy/resection/lymph nodes | Possible misclassification of PD-L1 status [57] |
| Specimen Adequacy | Small biopsies may have <100 viable tumor cells | Compromised TPS accuracy [41] |
| Tissue Processing | Varying staining intensity across processors | Inter-laboratory variability [70] |
| Scoring Method | Digital vs. visual assessment differences | Altered patient classification [41] |
The evidence suggests that PD-L1 expression is not uniform across different tumor sites or sampling timepoints, reflecting dynamic changes in the tumor microenvironment [57]. This supports the practice of testing the most recent specimen available, as it best represents the current tumor biology that will encounter the therapeutic agent.
A rigorous experimental design was employed to directly compare PD-L1 expression across different specimen types from the same patients, eliminating inter-patient variability [57].
Materials and Methods:
This comprehensive approach enabled direct comparison of PD-L1 expression in matched specimens, providing unique insights into spatial heterogeneity while controlling for assay variability through standardized staining and analysis protocols.
Figure 1: Experimental workflow for multi-specimen PD-L1 comparison study
Advanced computational pathology approaches have been developed to address limitations of visual PD-L1 scoring, particularly for heterogeneous specimen types.
PD-L1 Quantitative Continuous Scoring (QCS) Protocol:
This methodology demonstrated that digital scoring could identify patient populations with comparable survival benefits to visual scoring but with increased prevalence (54.3% vs. 29.7%), potentially allowing more patients to benefit from immunotherapy [41].
Given the limitations of PD-L1 assessment alone, particularly with variable specimen types, research has explored combination biomarker strategies to improve predictive accuracy.
Table 3: Combined PD-L1 and TILs Biomarker Performance
| Biomarker Combination | PFS Improvement (HR) | OS Improvement (HR) | Number of Studies |
|---|---|---|---|
| PD-L1 alone | 0.67 (CI: 0.49-0.90) | Not significant | 8 evaluable studies [15] |
| TILs alone | Not significant | Not significant | 8 evaluable studies [15] |
| PD-L1 + TILs combined | 0.39 (CI: 0.27-0.57) | 0.42 (CI: 0.31-0.56) | 7 of 7 studies showed benefit [15] |
The synergistic effect of combining PD-L1 with tumor-infiltrating lymphocytes (TILs) suggests that comprehensive assessment of the tumor immune microenvironment may compensate for limitations of individual biomarker assessment in specific specimen types [15].
The impact of pre-analytical variables on PD-L1 testing consistency across different specimen types cannot be overstated. A systematic evaluation of tissue processing demonstrated significant technical variability:
Tissue Processing Evaluation Protocol:
Key Findings: Tissue processor C demonstrated 50.7% artifact incidence, and SP142 PD-L1 staining was considered inadequate for evaluation in 29.2% of cases after processing with this system, highlighting how technical processing variables interact with different antibody clones to affect result reliability [70].
Figure 2: Factors influencing PD-L1 expression variability and clinical impact
Table 4: Essential Research Reagent Solutions for PD-L1 Specimen Studies
| Reagent/Platform | Function | Application Notes |
|---|---|---|
| PD-L1 IHC 22C3 PharmDx | Companion diagnostic antibody | FDA-approved for pembrolizumab; high concordance with SP263 [56] [71] |
| PD-L1 IHC SP263 Assay | Complementary diagnostic antibody | Ventana platform; comparable to CAL10 in development [56] |
| Ventana BenchMark ULTRA | Automated staining platform | Standardized IHC processing; reduces technical variability [57] |
| QuPath Bioimage Analysis | Digital pathology platform | Open-source solution for CPS/TPS calculation [57] |
| Aperio GT 450 Scanner | Whole slide imaging | High-resolution digitization for quantitative analysis [56] |
| FFPE Tissue Blocks | Specimen preservation | Standard material for PD-L1 IHC; enables archival studies [57] |
| NIST Traceable Calibrators | Assay standardization | Quantitative comparison across laboratories and assays [72] |
| Salmeterol-d3 | Salmeterol-d3, CAS:497063-94-2, MF:C25H37NO4, MW:418.6 g/mol | Chemical Reagent |
| Atraric Acid | Methyl 2,4-dihydroxy-3,6-dimethylbenzoate|CAS 4707-47-5 | Methyl 2,4-dihydroxy-3,6-dimethylbenzoate (Atraric Acid). High-purity grade for antiandrogen, fragrance, and organic synthesis research. For Research Use Only. Not for human or veterinary diagnostic or therapeutic use. |
The comparative analysis of PD-L1 testing across different specimen types reveals substantial technical and biological challenges that impact biomarker reliability. Surgical resection specimens generally provide the most comprehensive assessment of PD-L1 expression but are not always available in advanced disease settings. Biopsies and metastatic lymph node specimens demonstrate significant variability in PD-L1 expression compared to matched resections, potentially leading to different treatment classifications. The integration of digital pathology solutions and standardized processing protocols can mitigate some variability, while combination biomarker approaches incorporating TILs may provide more robust predictive value across diverse specimen types. Researchers and clinicians should prioritize specimen quality standardization and consider implementing computational pathology solutions to improve consistency in PD-L1 assessment across different tissue specimens.
In the era of precision oncology, the accurate assessment of biomarkers like Programmed Death-Ligand 1 (PD-L1) through immunohistochemistry (IHC) is crucial for identifying patients eligible for immune checkpoint inhibitor therapy. Pre-analytical factorsâencompassing all procedures from tissue collection to antigen retrievalârepresent critical variables that significantly influence the reliability and reproducibility of IHC results. Variations in fixation timing, tissue processing protocols, and storage conditions can profoundly affect epitope preservation, potentially leading to false-negative or false-positive interpretations that directly impact therapeutic decisions. For predictive biomarkers such as PD-L1, which guide treatment with agents like pembrolizumab in non-small cell lung cancer (NSCLC) and triple-negative breast cancer (TNBC), standardized pre-analytical workflows are not merely recommendations but essential components of quality assurance in pathology practice [73] [74].
The complexity of PD-L1 as a biomarker, with its dynamic expression patterns and multiple FDA-approved companion diagnostic assays, further underscores the necessity of controlling pre-analytical variables. Evidence indicates that suboptimal tissue handling can alter staining intensity and distribution, compromising the clinical utility of this critical biomarker. This guide systematically examines the experimental evidence quantifying the effects of key pre-analytical factors on PD-L1 IHC performance, providing researchers and drug development professionals with evidence-based protocols to ensure analytical validity in both research and diagnostic contexts.
The interval between tissue resection and formalin fixation, termed cold ischemia time, represents one of the most critical pre-analytical variables affecting IHC quality. A systematic investigation into fixation delays utilized lung resection specimens from NSCLC tumors larger than 4 cm, collecting ten samples per case subjected to different fixation protocols [75]. Researchers created tissue microarrays (TMAs) from these samples and stained them with 20 different antibodies, including PD-L1 (clones 22C3 and E1L3N), scoring for staining quality and intensity using a standardized scoring system.
The experimental design included samples with delayed fixation (1 hour, 6 hours, 24 hours, 48 hours, and 96 hours) alongside standard fixation controls (0 hours delay) and prolonged fixation samples (2 days, 4 days, 7 days). This comprehensive approach allowed for direct comparison of fixation timing effects across multiple biomarkers relevant to lung cancer diagnosis and treatment [75].
Table 1: Effects of Fixation Delay on IHC Quality in NSCLC Tissue
| Parameter Assessed | Findings with Delayed Fixation | Statistical Significance |
|---|---|---|
| TMA Core Loss | 35% core loss vs. 27% in prolonged fixation | p<0.01 for multiple markers |
| Tissue Quality Deterioration | Significant reduction in interpretable cores | Score 5 (poor quality) increased |
| PD-L1 Expression | Reduction in immunoreactivity | Significant decrease |
| Cytokeratin Markers | Reduced expression of CK7, CAM 5.2, Keratin MNF116 | Significant decrease |
| Diagnostic Markers | Reduced TTF-1, Napsin A, CK5/6 expression | Significant decrease |
The findings demonstrated that delayed fixation negatively affected tissue morphology and antigen preservation, with samples experiencing fixation delays showing significant loss of TMA cores on glass slides and deterioration of tissue quality [75]. This resulted in measurable reduction in expression levels across multiple immunohistochemical markers, including those with diagnostic relevance (cytokeratins, TTF-1) and predictive value (PD-L1). In contrast, prolonged fixation (up to 7 days) showed no significant adverse effects on IHC performance, suggesting that extended formalin exposure is less detrimental than delayed fixation [75].
The degradation of protein epitopes during delayed fixation occurs through multiple mechanisms. Tissue ischemia triggers enzymatic degradation pathways, including protease and phosphatase activation, which modify protein structure and compromise antibody binding sites. Additionally, oxidative damage and pH changes in non-fixed tissues can alter protein conformation, particularly affecting phosphorylation-dependent epitopes. The variation in sensitivity to ischemia across different markers reflects differences in epitope stability and the specific structural requirements for antibody recognition [73].
The duration of formalin-fixed paraffin-embedded (FFPE) tissue block storage represents an underappreciated pre-analytical variable with significant implications for PD-L1 IHC validity. A recent investigation examined 63 triple-negative breast cancer cases with PD-L1 testing using the 22C3 pharmDx assay, evaluating immunoreactivity decline relative to storage duration [74]. The study employed a retrospective design, repeating PD-L1 IHC on the same FFPE blocks after varying storage intervals and comparing results with baseline assessments conducted at initial diagnosis.
Table 2: PD-L1 Immunoreactivity Decline with FFPE Block Storage
| Storage Duration | Percentage Showing Decreased Staining | False-Negative Risk |
|---|---|---|
| <1 year | 0% | Minimal |
| 1-2 years | 11% | Low |
| 2-3 years | 13% | Moderate |
| â¥3 years | 50% | High |
The results demonstrated a striking time-dependent decline in PD-L1 immunoreactivity, with 50% of initially PD-L1-positive cases showing significantly reduced staining after three or more years of storage at room temperature [74]. This decline has direct clinical implications, as false-negative PD-L1 results could inappropriately exclude patients from potentially beneficial immune checkpoint inhibitor therapy. The study also identified associations between PD-L1 positivity and higher Ki67 proliferation index and nuclear grade, suggesting that particularly aggressive tumors might be disproportionately affected by storage-related false negatives [74].
The degradation of protein epitopes during FFPE block storage results from multiple molecular processes. Protein oxidation, hydrolysis, and continued cross-linking reactions gradually modify antigenic structures, compromising antibody binding affinity. Environmental factors such as storage temperature, humidity, and paraffin quality further influence degradation rates. The particular vulnerability of PD-L1 epitopes to storage conditions underscores the need for standardized archival protocols, especially for retrospective studies utilizing historical tissue blocks [74].
While standard PD-L1 IHC remains widely used for predicting response to immune checkpoint inhibitors, emerging evidence suggests that alternative methodologies may offer superior predictive performance. A comprehensive network meta-analysis compared seven different testing methodologies for predicting response to PD-1/PD-L1 inhibitors, analyzing 144 diagnostic index tests from 49 studies encompassing 5,322 patients [76].
Table 3: Comparative Performance of PD-L1 Detection Methodologies
| Methodology | Sensitivity | Specificity | Diagnostic Odds Ratio | Best Application Context |
|---|---|---|---|---|
| Multiplex IHC/IF (mIHC/IF) | 0.76 | 0.57-0.89 | 5.09 | Pan-cancer |
| MSI | 0.90 | 0.85-0.94 | 6.79 | Gastrointestinal tumors |
| PD-L1 IHC + TMB | 0.89 | 0.82-0.94 | Not reported | NSCLC |
| Standard PD-L1 IHC | Variable by clone | Variable by clone | Lower than alternatives | Companion diagnostic context |
The analysis revealed that multiplex IHC/immunofluorescence (mIHC/IF) exhibited the highest sensitivity (0.76) and second-highest diagnostic odds ratio (5.09), suggesting superior overall performance in predicting response to anti-PD-1/PD-L1 therapy [76]. Microsatellite instability (MSI) status demonstrated the highest specificity (0.90) and diagnostic odds ratio (6.79), particularly in gastrointestinal tumors. Notably, the combination of PD-L1 IHC with tumor mutational burden (TMB) significantly improved sensitivity to 0.89, indicating that integrated biomarker approaches may outperform single-analyte tests [76].
Recent advancements in computational pathology have introduced artificial intelligence (AI) approaches for PD-L1 assessment directly from hematoxylin and eosin-stained histological slides. Deep learning algorithms can predict PD-L1 expression patterns while reducing the interobserver variability associated with manual scoring methods like Tumor Proportion Score and Combined Positive Score [77]. These AI-driven tools offer potential for standardized, reproducible PD-L1 assessment while potentially mitigating some pre-analytical challenges through pattern recognition capabilities that may be less affected by subtle epitope degradation.
Experimental Design:
Assessment Methodology:
Experimental Design:
Assessment Methodology:
Table 4: Essential Research Reagents for Pre-analytical Studies
| Reagent/Category | Specific Examples | Research Function |
|---|---|---|
| Fixatives | 10% Neutral Buffered Formalin, Zinc Formalin | Tissue preservation and antigen stabilization |
| PD-L1 Antibody Clones | 22C3, 28-8, SP142, SP263 | Detection of PD-L1 expression with clone-specific characteristics |
| IHC Detection Systems | Polymer-based detection, chromogenic substrates | Signal amplification and visualization |
| Tissue Processing Reagents | Ethanol, xylene, low-melt paraffin | Tissue dehydration, clearing, and embedding |
| Antigen Retrieval Solutions | Citrate buffer (pH 6.0), EDTA/TRIS (pH 9.0) | Epitope unmasking through heat-induced methods |
| Control Materials | Cell line arrays, multitissue blocks | Assay validation and quality control |
Diagram 1: The PD-1/PD-L1 signaling pathway and its relationship to pre-analytical factors in IHC detection. This pathway illustrates how T-cell-derived IFN-γ induces PD-L1 expression on tumor cells, leading to T-cell inhibition upon binding. Immune checkpoint inhibitors block this interaction, restoring anti-tumor immunity. Pre-analytical variables directly impact PD-L1 detection accuracy, which informs treatment decisions.
Diagram 2: Comprehensive experimental workflow for assessing pre-analytical variables in PD-L1 IHC. This workflow systematically evaluates the impact of fixation delay, storage duration, and antigen retrieval methods while controlling for consistent fixative type, storage conditions, and antibody clones. The standardized approach enables quantitative comparison of staining quality across experimental conditions.
The cumulative evidence from controlled studies demonstrates that pre-analytical factorsâparticularly fixation delay and storage durationâsignificantly impact PD-L1 IHC reliability, with potential consequences for patient selection in immunotherapy. Fixation delays exceeding 6-12 hours and FFPE block storage beyond three years substantially reduce immunoreactivity, potentially leading to false-negative results that exclude patients from beneficial treatments. Conversely, prolonged fixation appears less detrimental, while emerging methodologies like multiplex IHC/IF and AI-assisted analysis show promise for enhanced predictive performance.
For researchers and drug development professionals, these findings underscore the necessity of implementing standardized pre-analytical protocols across institutions. Specific recommendations include minimizing cold ischemia time to under 1 hour when possible, establishing storage duration limits for FFPE blocks used in biomarker studies, and adopting multiplexed approaches where feasible. Future efforts should focus on developing stabilization technologies resistant to pre-analytical variability and establishing universal quality metrics for tissue processing in predictive biomarker analysis. Through rigorous attention to pre-analytical variables, the field can improve the reproducibility and clinical utility of PD-L1 immunohistochemistry in precision oncology.
The accurate assessment of programmed death-ligand 1 (PD-L1) expression via immunohistochemistry (IHC) is a critical component of precision oncology, serving as a primary biomarker for predicting responses to immune checkpoint inhibitors (ICIs) [20]. However, the diagnostic landscape is complicated by the existence of multiple, commercially available PD-L1 IHC assays and scoring algorithms, leading to significant challenges in standardization and interpretation. Two major sources of variabilityâinter-assay differences and inter-observer concordanceâdirectly impact the reliability of PD-L1 testing and, consequently, patient selection for immunotherapy.
Inter-assay variability arises from the use of different antibody clones, staining platforms, and scoring criteria, which can yield discordant results for the same tumor sample [78]. Simultaneously, inter-observer variability reflects the challenges pathologists face in consistently interpreting IHC stains, particularly for complex scoring systems that evaluate both tumor and immune cells [79]. This guide objectively compares the performance of major PD-L1 assays, summarizes key experimental data on concordance, and details the methodologies used to generate this evidence, providing researchers and clinicians with a clear framework for evaluating assay performance in a regulatory and research context.
Data from multiple comparability studies reveal consistent patterns of performance and concordance among the most widely used PD-L1 IHC assays. The findings are synthesized in the tables below.
Table 1: Inter-Assay Analytical Concordance Across Multiple Tumor Types
| Assay Comparison | Tumor Type | Scoring Method | Concordance Level | Key Findings |
|---|---|---|---|---|
| 22C3 vs 28-8 vs SP263 | Triple-Negative Breast Cancer (TNBC) | IC-score â¥1% / CPS â¥1 | Good Agreement (κ 0.68-0.74 for 22C3/28-8) [80] | 22C3, 28-8, and SP263 showed comparable positivity rates; SP263 was not interchangeable with others for all scores [80]. |
| 22C3 vs 28-8 vs SP263 | Hepatocellular Carcinoma (HCC) | TPS / CPS | Highly Concordant [17] | These three assays demonstrated high concordance, suggesting potential interchangeability [17]. |
| SP142 vs others | TNBC and HCC | IC-score / CPS | Lower Sensitivity & Concordance [80] [17] | SP142 was consistently the least sensitive assay, with lower positivity rates and concordance with other assays [80] [17]. |
| CAL10 vs SP263 | Non-Small Cell Lung Cancer (NSCLC) | TPS â¥1% / â¥50% | High Concordance (OPA â¥94% at TPSâ¥1%) [56] | The novel CAL10 assay demonstrated comparable performance to the SP263 assay, meeting pre-defined concordance targets [56]. |
Table 2: Inter-Observer Agreement for PD-L1 Scoring
| Assay | Tumor Type | Scoring Method & Cut-off | Inter-Observer Agreement | Intra-Observer Agreement |
|---|---|---|---|---|
| Four Assays (SP142, SP263, 22C3, 28-8) | TNBC | IC-score â¥1% / CPS â¥1 | Good to Excellent (κ 0.73-0.78) [80] | Not Reported |
| SP142 | Breast Cancer (Various Subtypes) | IC-score â¥1% | Substantial (Fleiss κ 0.654-0.655) [79] | Substantial to Almost Perfect (κ 0.667-0.956) [79] |
| SP142 | Breast Cancer (Various Subtypes) | IC-score (Continuous) | ICC: Good to Excellent (Overall) [79] | ICC: Good to Excellent [79] |
| Four Assays (SP142, SP263, 22C3, 28-8) | HCC | TPS / CPS | Good to Excellent (ICC TPS: 0.946; CPS: 0.809) [17] | Not Reported |
The comparative data presented above are derived from rigorously designed studies. The methodologies of the most comprehensive investigations are detailed below.
A 2021 study directly compared four clinically relevant PD-L1 assays in a cohort of 104 triple-negative breast cancer resection specimens [80].
A 2023 study specifically evaluated the inter- and intra-observer agreement of the SP142 assay in a multi-subtype breast cancer cohort [79].
The relationships between different assays and the sources of variability in PD-L1 testing can be visualized through the following diagrams.
Assay Variability and Concordance Factors
This diagram illustrates the primary factors contributing to inter-assay and inter-observer variability in PD-L1 testing, and how they influence the final, harmonized result.
Typical PD-L1 Comparability Study Workflow
This diagram outlines the standard workflow for a PD-L1 assay comparability study, from sample processing to statistical analysis of concordance.
Table 3: Essential Research Reagents and Platforms for PD-L1 IHC
| Item | Function in PD-L1 Research | Examples / Notes |
|---|---|---|
| Antibody Clones | Primary antibodies that specifically bind to the PD-L1 epitope. Different clones have varying sensitivities and specificities. | 22C3, 28-8 (Agilent); SP142, SP263 (Roche); CAL10 (Leica) [80] [78] [56]. |
| Automated Staining Platforms | Automated IHC instruments that standardize the staining process to reduce technical variability. | DAKO Autostainer Link 48 (for 22C3, 28-8); VENTANA Benchmark Ultra (for SP142, SP263); BOND-III (for CAL10) [80] [56]. |
| Digital Pathology System | High-resolution slide scanners and software for creating, storing, and analyzing whole slide images (WSIs). | Enables blinded, remote review by multiple pathologists and facilitates computational analysis [80] [79] [56]. |
| Positive Control Tissues | Tissues with known PD-L1 expression levels used to validate staining run performance. | Multi-tissue blocks containing tonsil and placenta are commonly used [56]. Cell line blocks can also serve as controls [79]. |
| Statistical Analysis Tools | Software for calculating concordance metrics and reliability statistics. | R or SPSS software for calculating Fleiss' Kappa, Intraclass Correlation Coefficient (ICC), and Overall Percent Agreement (OPA) [80] [79] [56]. |
The accurate assessment of programmed death-ligand 1 (PD-L1) expression via immunohistochemistry (IHC) represents a critical cornerstone in predicting response to immune checkpoint blockade (ICB) therapy. However, this process is fundamentally complicated by extensive spatial and temporal tumor heterogeneity, which introduces significant variability in biomarker interpretation. These heterogeneity issues manifest as varying PD-L1 expression patterns across different tumor regions, between primary and metastatic sites, and throughout disease progression and treatment. Consequently, limited biopsy samples may fail to capture the complete immunological landscape of a tumor, potentially leading to misclassification of a patient's PD-L1 status and suboptimal treatment decisions. This comparative analysis examines how spatial heterogeneity, assay concordance, and novel detection methodologies influence the reliability of PD-L1 testing, providing researchers and drug development professionals with a structured evaluation of current challenges and emerging solutions in the field of comparative immunohistochemistry assay performance.
Spatial heterogeneity of PD-L1 expression presents a substantial obstacle for reliable biomarker evaluation in esophageal squamous cell carcinoma (ESCC) and other solid tumors. Research demonstrates that PD-L1 expression exhibits significant intratumoral spatial heterogeneity, which can render limited biopsy samples unrepresentative of the overall tumor PD-L1 status [81].
A prospective observational study employed rigorous methodology to quantify spatial heterogeneity in treatment-naïve ESCC patients. The experimental protocol involved:
Multi-region Sampling: For cohort 1 (n=30), four distinct tumor regions larger than 3mm each were sampled using endoscopic biopsy forceps from surgically resected tumors: proximal tumor region (A), distal tumor region (B), surface mid-region (C), and tumor center (D), with regions selected at least 0.5cm apart while avoiding areas of severe necrosis [81].
Complete Tumor Specimen Analysis: For cohort 2 (n=4), the largest longitudinal section along the midline of completely resected tumors was divided into 3mm à 3mm regions for comprehensive analysis, with regions included only if they comprised at least one-third tissue area and tumor cells accounted for at least one-third of the tissue area [81].
PD-L1 Quantification: PD-L1 expression was calculated using Combined Positive Score (CPS), defined as the number of PD-L1 stained cells (tumor cells, lymphocytes, macrophages) divided by the total number of viable tumor cells multiplied by 100, with a minimum requirement of 100 viable tumor cells per evaluation area [81].
Table 1: Spatial Heterogeneity of PD-L1 Expression in ESCC
| Assessment Method | Regional Discordance Rate | Key Findings | Reduction Strategy |
|---|---|---|---|
| Multi-region biopsy (4 regions) | Significant regional discordance observed | Limited biopsies often unrepresentative of bulk tumor | Multi-region sampling (3 regions) |
| Complete tumor specimens | Variation across normalized regions | Heterogeneity reduced with sufficiently high CPS | Maximum CPS from multiple regions |
| Correlation with T-cells | N/A | CPS positively correlated with CD8+/CD4+ T-cell density | Standardized biopsy strategy |
The spatial distribution of PD-L1 expression demonstrates a significant correlation with the tumor immune microenvironment. Quantitative analyses of CD8+ and CD4+ T-cell infiltration densities, performed using immunohistochemistry on serial sections from the same field of view used for CPS assessment, revealed positive correlations between CPS and T-cell densities [81]. This relationship underscores the biological connection between immune cell infiltration and PD-L1 upregulation, while simultaneously highlighting how spatial heterogeneity in immune cell distribution can consequently drive heterogeneous PD-L1 expression patterns.
The concordance between different FDA-approved PD-L1 immunohistochemistry assays varies significantly across cancer types, presenting challenges for standardized biomarker implementation. A comprehensive evaluation of four PD-L1 assays in clear cell renal cell carcinoma (ccRCC) revealed substantial differences in detection capabilities and prognostic value [25].
The methodological approach for comparative assay assessment included:
Tissue Microarray Construction: Researchers constructed TMAs from 286 ccRCC tissue samples, enabling standardized evaluation across multiple specimens under uniform conditions [25].
Parallel IHC Staining: Each sample was evaluated using four FDA-approved PD-L1 assays: 22C3, 28-8, SP142, and SP263, with strict adherence to the specific evaluation criteria established for each assay in their respective clinical trials [25].
Assessment Criteria: Evaluation included PD-L1 expression in tumor cells (TC), immune cells (IC), and combined scores where applicable, following manufacturer specifications and clinical trial protocols [25].
Concordance Analysis: Pairwise concordance was assessed using kappa statistics, with prognostic correlation evaluated through cancer-specific survival analysis [25].
Table 2: Performance Comparison of FDA-Approved PD-L1 Assays in ccRCC
| Assay | PD-L1+ in Tumor Cells | PD-L1+ in Immune Cells | Concordance (κ with 28-8) | Prognostic Value |
|---|---|---|---|---|
| 22C3 | 18.9% | 14.7% | 0.52 (TC) | Worse CSS with IC+ |
| 28-8 | 2.1% | 16.1% | Reference | Worse CSS with IC+ |
| SP142 | 2.1% | 2.1% | 0.16 (IC) | Limited prognostic value |
| SP263 | 15.0% | 15.0% | 0.46 (IC) | Worse CSS with combined+ |
The comparative analysis revealed that PD-L1 expression in tumor cells was generally low across all assays in ccRCC, while expression in immune cells showed greater variability, approximately 15% for most assays except SP142, which demonstrated remarkably low positivity [25]. The 28-8 assay showed the highest agreement with other assays, while SP142 was deemed unsuitable for concordance evaluation due to its exceptionally low detection rate [25]. Critically, patients with PD-L1 expression in immune cells assessed using 22C3, 28-8, and SP263 assays showed significantly worse cancer-specific survival, highlighting the clinical implications of assay selection [25].
The following diagram illustrates the experimental workflow for comparative PD-L1 assay evaluation:
Diagram 1: PD-L1 Assay Comparison Workflow. This workflow illustrates the experimental protocol for evaluating four FDA-approved PD-L1 assays using tissue microarrays from clear cell renal cell carcinoma patients.
Advanced computational approaches are being developed to address the challenges posed by tumor heterogeneity in ICB response prediction. A novel heterogeneity-optimized machine learning framework demonstrates how addressing multimodal distribution in cancer data can enhance prediction accuracy [82].
The methodological framework involves:
Heterogeneity-Aware Clustering: Application of K-means clustering (K=2) to stratify patients into biologically distinct "hot-tumor" and "cold-tumor" subgroups based on multimodal tumor data, outperforming hierarchical clustering and DBSCAN alternatives [82].
Subtype-Specific Modeling: Development of separate predictive models for each subgroupâa support vector machine for hot-tumor subtypes and a random forest for cold-tumor subtypesâutilizing seven heterogeneity-associated biomarkers to overcome unimodal distribution assumptions [82].
Validation: The framework demonstrated enhanced ICB response prediction across melanoma, NSCLC, and pan-cancer datasets, achieving a mean accuracy gain of at least 1.24% compared to 11 baseline methods, with consistent performance in independent external validation cohorts [82].
Exosomal PD-L1 (exo-PD-L1) has emerged as a promising solution to spatial sampling limitations, offering a systemic, minimally invasive biomarker that captures immune status across tumor sites [83].
The biogenesis and function of exosomal PD-L1 involves:
Formation: Exo-PD-L1 originates from plasma membrane endocytosis, forming early endosomes that develop into multivesicular bodies (MVBs) containing intraluminal vesicles, which are released as exosomes (30-150nm) upon MVB fusion with the cellular membrane [83].
Function: Exo-PD-L1 retains membrane topology, enabling PD-1 binding on T cells and systemic immunosuppression through inhibition of PI3K-AKT and MAPK pathways, restriction of T cell proliferation, and promotion of T cell exhaustion and senescence [83].
Regulation: Interferon-gamma significantly stimulates exo-PD-L1 release as an adaptive immune evasion mechanism, with levels varying by cancer type (lower in "cold" tumors like ovarian cancer, higher in "hot" tumors like melanoma and NSCLC) [83].
The following diagram illustrates the mechanism of exosomal PD-L1 biogenesis and function:
Diagram 2: Exosomal PD-L1 Biogenesis and Function. This diagram illustrates the formation and immunosuppressive mechanism of exosomal PD-L1, from cytokine-stimulated biogenesis to systemic T-cell inhibition.
Table 3: Key Research Reagent Solutions for PD-L1 Heterogeneity Studies
| Reagent/Assay | Manufacturer | Primary Function | Application Context |
|---|---|---|---|
| VENTANA PD-L1 (SP263) Assay | Roche Diagnostics | PD-L1 IHC detection | Companion diagnostic for multiple ICIs; demonstrated cost-effectiveness in NSCLC [84] |
| Dako PD-L1 IHC 22C3 pharmDx | Agilent Technologies | PD-L1 IHC detection | Companion diagnostic for pembrolizumab; used in comparative concordance studies [25] |
| Dako PD-L1 IHC 28-8 pharmDx | Agilent Technologies | PD-L1 IHC detection | Used in comparative assays; showed highest concordance with other tests [25] |
| VENTANA PD-L1 (SP142) Assay | Roche Diagnostics | PD-L1 IHC detection | Demonstrated low positivity in ccRCC; limited prognostic value [25] |
| Tissue Microarray Platform | Multiple vendors | High-throughput tissue analysis | Standardized evaluation of multiple specimens under uniform conditions [25] |
| Exosome Isolation Kits | Multiple vendors | Exo-PD-L1 enrichment | Liquid biopsy approach for systemic PD-L1 assessment [83] |
| Single-cell RNA Sequencing Kits | 10X Genomics, etc. | Tumor microenvironment deconvolution | Identification of immune cell subtypes and spatial ecotypes [85] |
The comparative analysis of PD-L1 immunohistochemistry assays reveals that tumor heterogeneity presents substantial challenges for reliable biomarker assessment, with spatial variation significantly impacting the accuracy of PD-L1 evaluation in esophageal squamous cell carcinoma and other solid tumors. The concordance between FDA-approved assays varies considerably across cancer types, with the SP263 and 22C3 assays generally demonstrating better performance characteristics compared to the SP142 assay in clear cell renal cell carcinoma. Emerging solutions including heterogeneity-optimized computational frameworks, exosomal PD-L1 detection, and standardized multi-region sampling protocols offer promising approaches to overcome these limitations. For researchers and drug development professionals, these findings emphasize the critical importance of selecting appropriate detection methodologies, implementing rigorous sampling strategies, and interpreting PD-L1 expression results within the context of tumor heterogeneity to optimize immunotherapy prediction and patient stratification.
Programmed death-ligand 1 (PD-L1) immunohistochemistry (IHC) is a critical predictive biomarker for identifying patients eligible for immune checkpoint inhibitor therapy across multiple cancer types, including non-small cell lung cancer (NSCLC) and gastric cancer [58] [71]. The accurate assessment of PD-L1 expression via the Combined Positive Score (CPS) or Tumor Proportion Score (TPS) is essential for optimal patient selection. However, the existence of multiple automated staining platforms, different antibody clones, and varied staining protocols introduces significant pre-analytical variability that can compromise the reliability and reproducibility of PD-L1 scoring [71]. This comparison guide objectively evaluates the performance of different automated platforms and staining protocols, providing researchers and drug development professionals with experimental data to optimize their PD-L1 detection systems.
The landscape of automated PD-L1 IHC testing is dominated by two major staining platforms with their associated assays. The Dako Autostainer Link 48 platform runs the 22C3 and 28-8 assays, while the Ventana BenchMark series supports the SP263 and SP142 assays [71] [7]. Each assay was initially developed as a companion diagnostic for specific immune checkpoint inhibitors, creating practical challenges for laboratories that may need to run multiple tests.
Multiple studies have systematically evaluated the concordance between different PD-L1 IHC assays to determine their potential interchangeability. The evidence reveals a complex pattern of agreement and divergence.
Table 1: Interassay Concordance in NSCLC PD-L1 Evaluation
| Compared Assays | Concordance Level | Key Findings | Clinical Implications |
|---|---|---|---|
| 22C3 vs 28-8 vs SP263 | High agreement | Properly validated LDTs show strong correlation | Potential for interchangeability with proper validation |
| SP142 vs others | Lower concordance | Consistently shows different staining characteristics | Not recommended as substitute for other assays |
| All assays with cut-offs | Decreased concordance | Particularly problematic at 1% clinical decision threshold | Hampers interchangeability; requires platform-specific validation |
A systematic review of 27 studies concluded that while high agreement exists between 22C3, 28-8, and SP263 assays, concordance decreases significantly when applying clinical cut-offs, particularly at the critical 1% threshold used for treatment decisions [71]. This finding highlights the challenges in assay interchangeability despite overall staining similarity.
The prospective study by Katayama et al. directly compared three different anti-PD-L1 antibodies (22C3, 28-8, and SP142) in 70 patients with advanced NSCLC treated with combined chemoimmunotherapy [7]. This research revealed that PD-L1 expression levels determined using the 22C3 assay showed the highest correlation with therapeutic response, successfully stratifying patients based on progression-free survival, while the other assays did not reveal remarkable differences in objective response rate or survival [7].
The optimization of staining protocols extends beyond technical performance toç´æ¥å½±å临åºç»æ. In the Katayama study, only the 22C3 assay could significantly differentiate patient outcomes: those with TPS â¥50% showed significantly longer progression-free survival compared to those with TPS <50% [7]. This finding underscores how staining protocol selection can influence clinical decision-making quality.
Staining protocol optimization must also consider the impact on interpretation consistency. The systematic review by PMC7318295 revealed that while interobserver concordance is generally high for all assays, agreement decreases significantly at the 1% cut-off [71]. This is particularly problematic in clinical practice, as discordance between pathologists at this threshold may result in eligible patients being denied valuable treatment options.
Table 2: Quantitative Performance Metrics of Automated PD-L1 Scoring Systems
| Evaluation Metric | Manual Scoring Performance | AI-Assisted Scoring Performance | Improvement |
|---|---|---|---|
| Interobserver Agreement (ICC) | 62% | 74% | +12% |
| Agreement in Challenging Cases (CPS <20) | 19% | 62% | +43% |
| Classification Accuracy | 75% | 88% | +13% |
| Sensitivity | 78% | 96% | +18% |
| Positive Predictive Value | 87% | 88% | +1% |
Recent advances in artificial intelligence (AI) have demonstrated potential to mitigate staining and interpretation variability. A clinical evaluation of the DiaKwant PD-L1 algorithm showed that AI assistance significantly improved interobserver agreement among pathologists, particularly in challenging cases with CPS <20 where ICC improved from 19% to 62% [86]. This demonstrates how computational approaches can complement staining protocol optimization.
To ensure fair comparison across different automated platforms, researchers have established standardized evaluation methodologies. The typical workflow involves parallel staining of identical tissue samples across different platforms, followed by blinded assessment by multiple pathologists, often supplemented with computational tools.
For AI-assisted evaluation of staining protocol performance, advanced computational frameworks have been developed. These typically employ a multi-stage approach that combines machine learning models for tissue classification, segmentation, and cell detection.
The deep learning framework described by PMC12499557 utilizes a sophisticated pipeline where Vision Transformer-based models achieve 97.54% F1-score in tumor patch classification, while modified DeepLabV3+ architectures attain an 83.47% Dice Similarity Coefficient in tumor region segmentation [87]. This approach demonstrates remarkably high correlation (0.96) with pathologist-derived TPS scores, providing a robust methodological foundation for standardized staining protocol evaluation.
Optimizing staining protocols requires specific reagents and platforms designed for automated IHC staining. The following table details essential solutions for researchers conducting comparative studies of automated PD-L1 detection platforms.
Table 3: Essential Research Reagent Solutions for PD-L1 IHC Optimization
| Reagent/Platform | Manufacturer | Primary Function | Application Notes |
|---|---|---|---|
| PD-L1 IHC 22C3 PharmDx | Dako/Agilent | Companion diagnostic for pembrolizumab | Optimized for Dako Autostainer Link 48 platform |
| PD-L1 IHC 28-8 PharmDx | Dako/Agilent | Complementary diagnostic for nivolumab | Shows high concordance with 22C3 and SP263 |
| VENTANA PD-L1 (SP263) | Ventana/Roche | Companion diagnostic for durvalumab | CE marked for pembrolizumab and nivolumab identification |
| VENTANA PD-L1 (SP142) | Ventana/Roche | Complementary diagnostic for atezolizumab | Known for lower tumor cell staining intensity |
| BenchMark Special Stains | Ventana/Roche | Automated special stains platform | Enables customizable protocols with ready-to-use reagents |
| Dako Autostainer Link 48 | Dako/Agilent | Automated IHC staining platform | Standardized staining for 22C3 and 28-8 assays |
| Hydrogen Peroxide (10%) | Various | Melanin bleaching agent | Critical for pigmented specimens; use at 60°C for 25 min |
| Alkaline Phosphatase (AP) | Various | Chromogenic detection | Superior contrast for melanin-rich specimens vs. DAB |
The BenchMark Special Stains system exemplifies platform optimization, offering fully automated baking, deparaffinization, and staining with independent slide heating and customizable protocols to minimize technical variability [88]. For challenging specimens such as melanin-rich cytology samples, optimized bleaching protocols using 10% hydrogen peroxide at 60°C for 25 minutes significantly improve visualization without compromising cellular morphology or antigenicity [89].
Optimization of staining protocols across different automated platforms requires careful consideration of multiple performance factors. The experimental data demonstrates that while high concordance exists between 22C3, 28-8, and SP263 assays, significant differences emerge at clinical decision thresholds, particularly the 1% cut-off critical for treatment eligibility. The 22C3 assay shows particular utility for predicting response to chemoimmunotherapy in NSCLC patients. Integration of AI-assisted scoring systems significantly improves interobserver concordance, especially in challenging cases with low PD-L1 expression levels. Future optimization efforts should focus on standardizing pre-analytical variables, validating platform-specific cut-offs, and incorporating computational tools to maximize staining consistency and scoring reproducibility across different automated platforms.
The accurate assessment of Programmed Death-Ligand 1 (PD-L1) expression through immunohistochemistry (IHC) is essential for identifying patients with cancer who may benefit from immune checkpoint inhibitor therapy [56] [71]. However, the existence of borderline casesâsamples with PD-L1 expression levels near established clinical cut-offsâand challenging staining patterns presents significant interpretation difficulties for pathologists [56] [41]. These challenges contribute to substantial interobserver variability and may affect patient treatment eligibility [77] [71].
This guide objectively compares the performance of various PD-L1 IHC assays and emerging technologies in addressing these complex scenarios, providing researchers and drug development professionals with experimental data and methodologies to enhance assay selection and interpretation protocols.
Multiple studies have systematically evaluated the concordance between different PD-L1 IHC assays, particularly focusing on samples with expression levels near critical clinical thresholds. The following table summarizes key comparative performance data from recent studies.
Table 1: Comparative Performance of PD-L1 IHC Assays in NSCLC
| Compared Assays | TPS Cut-off | Overall Percent Agreement (OPA) | Lower Bound of 95% CI | Sample Size (N) | Reference |
|---|---|---|---|---|---|
| CAL10 vs. SP263 | â¥50% | >86.2%* | 86.2% | 136 | [56] |
| CAL10 vs. SP263 | â¥1% | >94.0%* | 94.0% | 136 | [56] |
| 22C3 vs. 28-8 vs. SP263 | Various | High agreement | Not specified | Systematic Review | [71] |
| 22C3 vs. 28-8 vs. SP263 | â¥1% | Lower concordance | Not specified | Systematic Review | [71] |
Note: Exact OPA values not provided in source; lower bounds of 95% confidence intervals reported instead [56].
A systematic review of 27 studies confirmed high interassay concordance for the 22C3, 28-8, and SP263 assays, while properly validated laboratory-developed tests (LDTs) also demonstrated strong agreement with these standardized assays [71]. However, concordance decreases significantly when applying cut-offs, particularly at the 1% threshold, potentially impacting interchangeability in clinical practice [71].
Fundamental differences in analytical sensitivity between PD-L1 assays contribute to discordance in borderline cases. A survey of 41 laboratories utilizing PD-L1 calibrators traceable to National Institute of Standards and Technology (NIST) Standard Reference Material 1934 revealed that the four FDA-cleared PD-L1 assays represent three distinct levels of analytical sensitivity [72].
Table 2: Metrological Characteristics of PD-L1 Assays
| Assay Characteristic | Finding | Impact on Borderline Cases |
|---|---|---|
| Analytic Sensitivity | Varies between assays | Samples near cut-offs may be classified differently |
| Lower Limit of Detection (LOD) | Assay-dependent | Explains positive/negative discrepancies between assays |
| Dynamic Range | Disparate between some assays | Previous harmonization attempts unsuccessful for certain assays |
| LDT Performance | Some indistinguishable from predicate devices | Proper validation critical for reliable results |
This metrological approach explains why previous attempts to harmonize certain PD-L1 assays proved unsuccessfulâtheir dynamic ranges were too disparate and did not adequately overlap [72]. The implementation of standardized calibrators traceable to NIST standards represents an important transition for companion diagnostic testing that could improve patient stratification and test harmonization [72].
A recently published feasibility study exemplifies a rigorous methodological approach for comparing novel PD-L1 assays with established platforms [56].
Sample Characteristics and Inclusion Criteria:
Staining and Reading Procedures:
Digital Pathology Integration:
Statistical Analysis:
Emerging methodologies address borderline cases through granular, continuous scoring systems that move beyond traditional categorical thresholds [41].
Sample Processing and Quality Control:
Quantitative Continuous Scoring Workflow:
Diagram 1: Quantitative continuous scoring workflow.
Biomarker Development and Validation:
Statistical Analysis for Cut-point Identification:
Recent advancements in artificial intelligence (AI) have transformed PD-L1 assessment approaches, particularly for borderline and difficult-to-interpret cases [77]. AI-driven models, especially deep learning algorithms, can predict PD-L1 expression directly from hematoxylin and eosin-stained histological slides, demonstrating high accuracy in estimating PD-L1 expression and predicting responses to immune checkpoint inhibitors across various cancer types [77].
These computational approaches reduce subjectivity associated with manual scoring methods such as Tumor Proportion Score (TPS) and Combined Positive Score (CPS) [77]. Furthermore, integrating AI with multimodal dataâincluding genomics, radiomics, and real-world clinical dataâcan enhance predictive accuracy and improve patient stratification for immunotherapy [77].
Color variation in IHC-stained images, caused by differences in stain operator protocols, exposure times, and slide scanner specifications, significantly impacts feature extraction and interpretation [90]. A novel color normalization technique based on sparse stain separation and self-sparse fuzzy clustering has been developed specifically for breast cancer IHC-stained images [90].
Methodology and Validation:
This approach adapts techniques previously used for H&E stained images to IHC staining, despite the difference in perceived colors (two in H&E vs. three in IHC), by developing a structure-preserved normalization method specifically optimized for IHC images [90].
The strategic selection of chromogens and development of multiplexing approaches can significantly enhance interpretation of complex staining patterns [91].
Traditional Chromogen Options:
Next-Generation Chromogen Technology: Ventana's DISCOVERY chromogens, based on fluorophores to allow unique color generation and narrow-range light absorption, improve compatibility for in situ hybridization (ISH) and IHC multiplexing [91]. These include:
Translucent Chromogens for Co-localization: The availability of translucent chromogens (Purple, Yellow, Teal) enables visualization of overlapping targets in brightfield IHC or ISH multiplexed assays [91]. When biomarkers of interest are present in the same sub-cellular compartment, these chromogens allow color shifts that indicate co-localization:
Table 3: Key Research Reagent Solutions for PD-L1 IHC Assay Development
| Reagent/Material | Function/Application | Examples/Specifications |
|---|---|---|
| Primary Antibodies | Clone-specific binding to PD-L1 epitopes | CAL10, SP263, 22C3, 28-8, SP142 [56] [71] |
| Staining Platforms | Automated processing of IHC assays | BOND-III (Leica), Benchmark Ultra (Ventana) [56] |
| Detection Systems | Signal amplification and visualization | HRP or AP-based systems with chromogenic substrates [91] [92] |
| Chromogens | Produce visible precipitate at antigen sites | DAB (brown), AP Red, DISCOVERY series (Purple, Yellow, Teal) [91] |
| Counterstains | Provide architectural context and contrast | Hematoxylin (blue nuclei), Nuclear Fast Red (red nuclei), DAPI (fluorescent) [93] |
| Tissue Controls | Assay validation and quality control | Multi-tissue blocks with tonsil, placenta, known positive/negative samples [56] |
| Digital Pathology Systems | Whole slide imaging and computational analysis | Aperio GT 450 scanner, digital image analysis algorithms [56] [41] |
| NIST-Traceable Calibrators | Standardization and harmonization across laboratories | Reference materials with defined unit traceable to SRM 1934 [72] |
| Color Normalization Tools | Standardize color variation between staining batches | Sparse stain separation, self-sparse fuzzy clustering algorithms [90] |
Borderline cases and difficult-to-interpret staining patterns in PD-L1 IHC testing present significant challenges for researchers and drug development professionals. The comparative data and experimental protocols presented in this guide demonstrate that while conventional assays show reasonable concordance, emerging technologiesâincluding quantitative continuous scoring, AI-driven computational pathology, standardized calibration, and advanced chromogen strategiesâoffer promising avenues for improved objectivity and precision.
The integration of these approaches into standardized laboratory practice requires careful validation and consideration of technical parameters, but holds substantial potential for enhancing patient stratification in immunotherapy. As the field evolves, the adoption of metrologically rigorous tools and computational methods will be crucial for addressing the complexities of PD-L1 expression assessment in borderline cases.
The accurate assessment of biomarker expression through Immunohistochemistry (IHC) has become fundamentally important in the era of targeted therapies and immunotherapy, particularly for biomarkers such as PD-L1. In precision oncology, IHC assays directly influence patient selection for specific treatments, making their rigorous validation a critical component of both diagnostic accuracy and drug development. The College of American Pathologists (CAP) guidelines establish a comprehensive framework for analytical validation and ongoing quality assurance of IHC assays to ensure reproducible, reliable, and clinically meaningful results. This comparative performance analysis examines the validation principles and regulatory requirements for IHC assays, with a specific focus on PD-L1 detection, which serves as a cornerstone for immune checkpoint inhibitor therapy. The complex landscape of PD-L1 detection, characterized by multiple antibody clones, different staining platforms, and varying scoring algorithms, creates an imperative for standardized validation approaches that maintain scientific rigor while accommodating necessary methodological diversity [94] [95].
The validation of IHC assays extends beyond technical performance to encompass clinical utility, as evidenced by the direct linkage between PD-L1 expression levels and patient response to immunotherapies such as atezolizumab, durvalumab, and pembrolizumab. Clinical trials have consistently demonstrated that patients with higher PD-L1 expression (as determined by validated IHC assays) derive greater benefit from these treatments, highlighting the critical importance of accurate detection and quantification [96] [95]. Within this context, CAP guidelines provide the foundational principles for establishing assay reliability, while regulatory bodies like the FDA oversee the approval process for companion diagnostics, creating a multi-layered governance structure for IHC assay validation.
The CAP guidelines for IHC assay validation emphasize a systematic approach to establishing analytical performance characteristics before clinical implementation. These guidelines mandate verification that the assay consistently performs according to stated specifications and intended use. While specific CAP guideline documents were not directly available in the search results, the foundational principles referenced in the context of PD-L1 assay development and commercialization include the requirements for specificity, sensitivity, precision, and reproducibility [97] [94]. These analytical parameters form the cornerstone of IHC validation, ensuring that staining patterns accurately reflect true antigen expression rather than technical artifacts.
CAP guidelines further require extensive documentation of pre-analytical, analytical, and post-analytical factors that could impact assay performance. Pre-analytical variables include tissue collection, fixation time, processing methods, and antigen retrieval techniques. Analytical factors encompass antibody clone selection, dilution, incubation times, and detection systems. Post-analytical considerations involve scoring methodologies, pathologist training, and result interpretation criteria. This comprehensive approach acknowledges that robust validation must address the entire testing workflow rather than focusing exclusively on the antibody-antigen interaction [97]. For companion diagnostics, CAP guidelines align with FDA requirements, mandating more stringent validation protocols compared to laboratory-developed tests (LDTs), including defined performance thresholds for sensitivity and specificity against clinical endpoints.
The regulatory landscape for IHC assays operates through two primary pathways: FDA-approved companion diagnostics and Laboratory Developed Tests (LDTs). FDA-approved companion diagnostics, such as the Ventana SP263 assay, undergo the most rigorous validation process, requiring evidence of clinical utility obtained through large-scale clinical trials that directly link test results to therapeutic response [97] [94]. These assays must demonstrate robust reproducibility across multiple laboratories and staining platforms, with predefined scoring criteria that maintain consistency in interpretation.
In contrast, LDTs validated according to CAP guidelines implement a similarly structured analytical validation framework but may lack the extensive clinical outcome data required for FDA approval. The 2024 FDA laboratory-developed test rules are expected to significantly impact this landscape, potentially requiring between $566 million and $3.56 billion in compliance expenditures across the industry according to market analyses [97]. This regulatory evolution reflects increasing recognition of the critical role that standardized biomarker detection plays in patient care, particularly for biomarkers like PD-L1 where expression thresholds directly influence treatment decisions.
The performance of PD-L1 IHC assays varies considerably depending on the antibody clone utilized, with different clones exhibiting distinct staining characteristics and interpretation criteria. Recent developments have focused on identifying cost-effective alternatives to commercially approved assays without compromising analytical performance.
Table 1: Comparative Performance of PD-L1 Antibody Clones in Lung Adenocarcinoma
| Antibody Clone | Comparator Clone | Overall Accuracy | Kappa Value | Key Characteristics |
|---|---|---|---|---|
| 3E2 (Novel) | 28-8 | 90.1% | 0.797 | Cost-effective alternative; potential for 30-60% cost savings |
| 3E2 (Novel) | E1L3N | 69.8% | 0.401 | Moderate agreement |
| 3E2 (Novel) | SP263 | 55.4% | 0.262 | Low agreement |
| SP263 (FDA-approved) | N/A | Reference standard | N/A | High reliability but with significant cost barriers |
A 2025 study evaluating the novel 3E2 clone demonstrated its strong agreement with the established 28-8 clone, with an overall accuracy of 90.1% and a kappa value of 0.797, indicating high concordance [94]. This performance profile suggests that the 3E2 clone may represent a viable alternative in resource-constrained settings, potentially expanding access to PD-L1 testing. However, the same study revealed substantially lower agreement with other clones, particularly the SP263 assay, highlighting the significant variability that can exist between different antibody clones despite targeting the same biomarker [94].
Beyond analytical performance, the clinical utility of PD-L1 assays is ultimately determined by their ability to predict treatment response to immune checkpoint inhibitors. Clinical trials have established consistent correlations between PD-L1 expression levels and therapeutic outcomes across multiple cancer types.
Table 2: Clinical Performance of PD-L1 Assays in Predicting Immunotherapy Response
| Clinical Trial | Therapeutic Agent | PD-L1 Threshold | Objective Response Rate | Overall Survival Benefit |
|---|---|---|---|---|
| BIRCH | Atezolizumab | TC3 or IC3 | 34% | Median OS: 23.5 months |
| BIRCH | Atezolizumab | TC2 or IC2 | 18% | Similar OS trend |
| ATLANTIC | Durvalumab | â¥90% | 30.9% | 1-year OS: 50.8% |
| ATLANTIC | Durvalumab | â¥25% | 16.4% | 1-year OS: 47.7% |
| ATLANTIC | Durvalumab | <25% | 7.5% | 1-year OS: 34.5% |
| CheckMate 057 | Nivolumab | PD-L1 positive | 19% | Median OS: 12.2 months |
The BIRCH trial demonstrated that patients with the highest PD-L1 expression (TC3 or IC3) achieved an objective response rate of 34% with atezolizumab monotherapy, compared to 18% for those with intermediate expression (TC2 or IC2) [96]. Similarly, the ATLANTIC trial showed a clear gradient of response based on PD-L1 expression levels, with the highest expression cohort (â¥90%) achieving an objective response rate of 30.9%, compared to just 7.5% in the low/negative expression group [96]. These findings underscore the critical importance of accurate PD-L1 quantification in identifying patients most likely to benefit from immunotherapy, thereby validating the clinical utility of properly validated IHC assays.
Robust experimental protocols are essential for meaningful comparison of IHC assay performance. The evaluation of the novel 3E2 PD-L1 antibody clone followed a rigorous methodology that can serve as a template for comparative validation studies [94]:
Sample Cohort Selection: The study utilized 101 formalin-fixed, paraffin-embedded (FFPE) lung adenocarcinoma tissue samples obtained from surgical resections or biopsies between May 2018 and November 2021. Key inclusion criteria included pathological confirmation of lung adenocarcinoma and absence of preoperative chemotherapy, radiotherapy, or targeted therapy, which could potentially alter PD-L1 expression patterns.
Immunohistochemical Staining Protocol: The experimental workflow involved parallel staining of consecutive tissue sections from the same patient blocks using four different antibody clones: the experimental 3E2 clone, Abcam 28-8, CST E1L3N, and Ventana SP263. This direct comparison on adjacent sections from the same tissue blocks minimized pre-analytical variables and enabled direct comparison of staining performance. The protocol included standardized antigen retrieval conditions, antibody incubation times, and detection systems to ensure methodological consistency across compared clones.
Quantitative Analysis and Concordance Assessment: Stained slides were digitally scanned and quantitatively analyzed using image analysis software. Statistical measures included Bland-Altman plots for assessing agreement between continuous measurements, calculation of overall accuracy, and Cohen's kappa coefficient for categorical concordance. The kappa statistic is particularly important in validation studies as it measures inter-rater agreement beyond what would be expected by chance alone, with values above 0.75 generally considered excellent agreement [94].
Clinical Correlation: To establish clinical validity, the study included survival analysis of 16 stage III-IV lung adenocarcinoma patients who received immunotherapy, correlating PD-L1 expression levels (as determined by the 3E2 clone) with overall survival. This critical step connects analytical performance to clinical outcomes, demonstrating that PD-L1 expression â¥5% as detected by 3E2 was associated with significantly better survival (p=0.021), mirroring results obtained with the established 28-8 clone (p=0.019) [94].
While conventional IHC detects single biomarkers, multiplex IHC represents an advanced methodology enabling simultaneous detection of multiple biomarkers on a single tissue section. This technology employs tyramide signal amplification (TSA) and antibody stripping techniques to sequentially label multiple antigens without cross-reactivity [98]. The validation of multiplex IHC assays requires additional considerations beyond conventional IHC, including:
Validation of Antibody Stripping Efficiency: Protocols must demonstrate complete removal of primary and secondary antibodies between staining rounds without damaging subsequent epitopes or causing tissue loss.
Spectral Unmixing Validation: For fluorescent multiplex IHC, rigorous validation of spectral unmixing algorithms is essential to ensure that signals from different fluorophores are accurately distinguished without bleed-through or crossover.
Spatial Analysis Validation: Multiplex IHC enables spatial analysis of cellular interactions within the tumor microenvironment. Validation of spatial analysis algorithms requires demonstration of accuracy in cell identification, segmentation, and spatial relationship quantification.
The implementation of multiplex IHC offers significant advantages for tumor microenvironment analysis, including preserved spatial relationships between different cell types, reduced tissue consumption, and comprehensive immunoprofiling from limited sample material [98]. However, these benefits come with increased validation complexity, particularly regarding antibody panel optimization and computational analysis validation.
IHC Assay Validation Workflow and Key Components
The validation and implementation of robust IHC assays for PD-L1 detection requires specific reagents and materials that ensure reproducibility, accuracy, and compliance with regulatory standards. The following toolkit encompasses essential components referenced in the search results:
Table 3: Essential Research Reagent Solutions for IHC Assay Validation
| Reagent/Material | Function | Examples/Characteristics |
|---|---|---|
| Primary Antibody Clones | PD-L1 Epitope Binding | 3E2, 28-8, SP263, 22C3; Specific clone selection affects staining intensity and pattern |
| Detection System | Signal Amplification | Polymer-based systems; HRP-conjugated secondaries; TSA amplification for multiplex IHC |
| Automated Staining Platforms | Standardization | Ventana Benchmark ULTRA; Leica Bond RX; Enable batch processing and protocol consistency |
| Antigen Retrieval Reagents | Epitope Exposure | Citrate-based (pH 6.0) or EDTA/TRIS-based (pH 9.0) buffers; Optimization required for each antibody |
| Positive Control Tissues | Assay Validation | Placental tissue; tonsil; Known PD-L1 expressing cell lines; Essential for daily run validation |
| Digital Pathology Systems | Quantification and Analysis | Slide scanners with image analysis software; Enable standardized scoring and algorithm deployment |
| Multiplex IHC Reagents | Simultaneous Multi-Marker Detection | Celnovte mIHC kits; Opal systems; Allow TSA-based sequential staining with antibody stripping |
The selection of appropriate primary antibody clones represents perhaps the most critical decision in IHC assay development. Studies comparing the novel 3E2 clone with established assays demonstrated that while cost-effective alternatives exist, thorough validation against reference standards is essential [94]. The growing adoption of multiplex IHC reagents, such as those in the Celnovte product line, reflects the increasing demand for simultaneous evaluation of multiple biomarkers within the spatial context of the tumor microenvironment [98]. These systems typically include enzyme-labeled polymers, TSA dyes across various fluorescence channels (e.g., CM480 to CM780 series), and DAPI nuclear counterstains to facilitate comprehensive tissue analysis.
For automated staining platforms, integration of pre-optimized reagent kits that minimize protocol adjustments enhances reproducibility across laboratories. The consistent performance of these systems depends heavily on standardized antigen retrieval methods and calibrated detection chemistry that maintain lot-to-lot consistencyâa particular challenge noted in the search results, which indicated that approximately two-thirds of commercial antibodies fail basic specificity testing, forcing laboratories to implement costly internal validation procedures [97]. This underscores the importance of obtaining reagents from manufacturers with robust quality control systems, such as those with GMP manufacturing conditions and ISO9001/ISO13485 certifications [98].
The validation of IHC assays for PD-L1 detection represents a dynamic interface between diagnostic pathology, regulatory science, and clinical oncology. CAP guidelines and regulatory requirements establish essential frameworks for ensuring assay reliability, but practical implementation requires careful navigation of technical and economic challenges. The comparative analysis presented herein demonstrates that while cost-effective alternatives to commercial assays continue to emergeâsuch as the 3E2 clone with its potential for 30-60% cost savingsâtheir adoption must be guided by rigorous validation against both analytical standards and clinical outcomes [94].
The evolving landscape of IHC validation is increasingly shaped by advanced methodologies such as multiplex staining and AI-assisted workflows that enhance quantification objectivity and reproducibility. The integration of digital pathology platforms with cloud-based AI algorithms, as exemplified by systems like Roche's navify digital pathology, creates opportunities for standardized analysis while simultaneously introducing new validation considerations for computational components [97]. Furthermore, the growing emphasis on companion diagnostic approvals continues to raise the validation bar, requiring increasingly robust evidence of clinical utility tied directly to therapeutic response [97] [95].
As immunotherapy treatment paradigms expand across cancer types, the principles of IHC assay validation will continue to evolve in complexity and importance. The fundamental requirement remains unchanged: ensuring that diagnostic assays provide reproducible, accurate, and clinically meaningful results that optimally guide patient care. Through adherence to structured validation frameworks, implementation of comprehensive experimental protocols, and utilization of quality-controlled reagents, laboratories can navigate this challenging landscape while advancing the field of precision cancer diagnostics.
The advent of immune checkpoint inhibitors (ICIs) has revolutionized the treatment landscape for advanced cancers, notably non-small cell lung cancer (NSCLC). Programmed death-ligand 1 (PD-L1) expression on tumor cells, as detected by immunohistochemistry (IHC), serves as a primary predictive biomarker for patient selection in anti-PD-1/PD-L1 therapies [6] [71]. However, a significant challenge has emerged in clinical practice: for each distinct ICI, a unique, corresponding PD-L1 IHC assay was developed and validated within its specific clinical trial [71]. This has resulted in a proliferation of companion and complementary diagnostics (e.g., the Dako 22C3 and 28-8 assays, and the Ventana SP263 and SP142 assays), each with its own protocol, scoring algorithm, and approved clinical purpose [6].
This multiplicity of tests is not practically or economically feasible for most pathology laboratories to implement simultaneously, given constraints of cost, tissue availability, and staining platforms [71]. Consequently, the central question of whether these various PD-L1 IHC assays are "interchangeable" has become a critical area of investigation. This guide objectively compares the performance of standardized PD-L1 assays and laboratory-developed tests (LDTs), framing the discussion within the broader thesis of comparative performance and diagnostic accuracy for PD-L1 detection. It synthesizes evidence from key meta-analyses and systematic reviews to provide researchers, scientists, and drug development professionals with a data-driven resource for understanding assay concordance and its implications for clinical practice and biomarker development.
Data from large-scale meta-analyses and systematic reviews provide the most robust evidence regarding assay comparability. The findings are summarized in the tables below.
Table 1: Key Findings from Meta-Analyses and Systematic Reviews on PD-L1 Assay Interchangeability
| Study Focus | Included Studies & Samples | Key Findings on Interassay Concordance | Key Findings on Interobserver Concordance | Conclusion on Interchangeability |
|---|---|---|---|---|
| Meta-Analysis of Diagnostic Accuracy [6] | 22 studies; 376 assay comparisons; primarily NSCLC. | For specific clinical purposes (e.g., TPS â¥1% or â¥50%), replacing an FDA-approved CDx with another CDx developed for a different purpose often resulted in diagnostic sensitivity/specificity <90%. Properly validated LDTs could be a better alternative. | Not the primary focus. | Assays are not automatically interchangeable for a purpose they were not clinically validated for. A purpose-based approach is essential. |
| Systematic Review of Comparability [71] | 27 studies; sample sizes 15-713 NSCLC specimens. | High analytical concordance between 22C3, 28-8, and SP263 assays, and properly validated LDTs. Lower concordance observed in comparisons involving the SP142 assay. Concordance decreased when using clinical cut-offs. | High interobserver agreement for all assays/LDTs, but lower agreement at the 1% cut-off compared to the 50% cut-off. | Interchangeability is hampered by the use of cut-offs. Discordance at the 1% cut-off may deny patients treatment. |
Table 2: Comparative Performance of Different PD-L1 Assay Clones
| Assay Clone | Associated Drug(s) | Comparability with 22C3, 28-8, SP263 | Key Limitations & Considerations |
|---|---|---|---|
| 22C3 | Pembrolizumab | High overall agreement [71]. | FDA-approved as a companion diagnostic. Interchangeability with others is context-dependent [6]. |
| 28-8 | Nivolumab | High overall agreement [71]. | FDA-approved as a complementary diagnostic. |
| SP263 | Durvalumab | High overall agreement [71]. | CE-marked for use with pembrolizumab and nivolumab, indicating recognized similarity [71]. |
| SP142 | Atezolizumab | Lower concordance consistently observed [71]. | Tends to report lower Tumor Cell (TC) staining percentages, leading to more false negatives if used interchangeably [71]. |
Table 3: Pathologist vs. Artificial Intelligence (AI) Performance in PD-L1 Scoring [9]
| Scoring Method | Interobserver Agreement (TPS â¥50%) | Interobserver Agreement (TPS <1%) | Intraobserver Consistency | Agreement with Median Pathologist Score (TPS â¥50%) |
|---|---|---|---|---|
| Pathologists (Light Microscopy/WSI) | Almost perfect (Fleiss' kappa = 0.873) | Moderate (Fleiss' kappa = 0.558) | High (Cohen's kappa: 0.726 - 1.0) | (Reference Standard) |
| AI Algorithm (uPath - Roche) | Not Reported | Not Reported | Not Reported | Fair (Fleiss' kappa = 0.354) |
| AI Algorithm (Visiopharm) | Not Reported | Not Reported | Not Reported | Substantial (Fleiss' kappa = 0.672) |
The conclusions drawn in the cited meta-analyses and reviews rely on rigorous experimental and methodological protocols. This section details the core methodologies employed by the primary studies that contributed to these aggregated findings.
The meta-analysis by Munari et al. (2020) followed a structured, pre-defined protocol [6]:
The individual studies included in the systematic reviews, such as the Blueprint project, typically employed the following experimental workflow [71]:
The study by Leithner et al. (2025) provides a template for comparing pathologist and AI performance [9]:
The following diagrams, generated using Graphviz, illustrate the core experimental workflow for assay comparison studies and the logical decision pathway for determining assay interchangeability in a clinical context.
The following table details essential materials and reagents used in PD-L1 IHC testing and comparative research.
Table 4: Essential Reagents and Materials for PD-L1 IHC Research
| Item Name | Function/Description | Example Specifics |
|---|---|---|
| FDA-Approved Companion Diagnostic Kits | Standardized, regulatory-cleared tests for determining patient eligibility for specific drugs. | PD-L1 IHC 22C3 PharmDx (Dako/Agilent) for pembrolizumab; PD-L1 IHC 28-8 PharmDx (Dako/Agilent) for nivolumab [71]. |
| Ventana PD-L1 (SP263) Assay | A standardized assay widely used in comparison studies and approved for use with durvalumab. | Often shows high concordance with 22C3 and 28-8 [71]. Used on Ventana BenchMark ULTRA platform [9]. |
| Laboratory-Develop Test (LDT) Antibodies | In-house validated tests using commercially available antibodies, offering potential flexibility and cost savings. | Clones like E1L3N, SP142, etc. Must be properly validated against a reference standard for a specific clinical purpose [6] [71]. |
| Automated IHC Staining Platforms | Instruments that automate the staining process to improve reproducibility and reduce variability. | Dako Omnis or Link 48 (for Dako assays); Ventana BenchMark ULTRA or XT (for Ventana assays) [71]. Platform differences can contribute to variability. |
| Whole-Slide Scanners & Digital Pathology | Hardware for digitizing glass slides, enabling remote review, archival, and use with AI algorithms. | Scanners like the Ventana DP200 (Roche) or PANORAMIC1000 (3DHISTECH) are used to create whole-slide images for pathologist and AI review [9]. |
| AI-Powered Image Analysis Software | Algorithms designed to automatically detect tumor regions and quantify PD-L1 TPS, reducing subjectivity. | Commercial applications include Roche uPath and Visiopharm PD-L1 Lung Cancer TME [9]. Performance compared to pathologists is an active research area. |
Immunohistochemistry (IHC) assays for programmed death-ligand 1 (PD-L1) expression serve as critical companion diagnostics for immune checkpoint inhibitor therapies across multiple malignancies. The comparative performance of these assays varies significantly across different tumor types due to biological differences in PD-L1 expression patterns, scoring algorithms, and tissue microenvironment characteristics. This review systematically evaluates the technical performance of major PD-L1 IHC assaysâincluding the Dako 22C3, Ventana SP263, Ventana SP142, and emerging assaysâacross three major cancer types: non-small cell lung cancer (NSCLC), head and neck squamous cell carcinoma (HNSCC), and urothelial carcinoma (UC). Understanding these assay-specific performance characteristics is essential for appropriate test selection in both clinical trials and routine practice, ensuring accurate patient stratification for immunotherapy.
In NSCLC, multiple studies have demonstrated strong concordance between the 22C3, SP263, and 28-8 assays, while the SP142 assay consistently shows lower sensitivity for tumor cell staining.
Table 1: PD-L1 Assay Concordance in NSCLC (Tumor Cell Scoring)
| Compared Assays | Statistical Measure | 1% Cut-off | 25% Cut-off | 50% Cut-off | Study |
|---|---|---|---|---|---|
| SP263 vs 22C3 | Kappa (κ) | 0.71 | 0.75 | 0.81 | Ring Study [99] |
| SP263 vs 22C3 | Overall Percent Agreement | - | - | Lower bound 95% CI: 86.2% | CAL10 Development Study [56] |
| Various Assays & LDTs | Systematic Review Conclusion | High agreement between 22C3, 28-8, SP263; Lower concordance with SP142 | - | - | Systematic Review [71] |
The Ring Study, an international comparison conducted across multiple centers, demonstrated almost perfect agreement between SP263 and 22C3 at the 50% cut-off (κ=0.81), which was superior to the agreement at lower cut-offs [99]. This high level of concordance supports potential interchangeability between these assays in NSCLC, particularly at the clinically relevant 50% threshold for pembrolizumab monotherapy. A recent developmental study for the novel CAL10 assay demonstrated comparable performance to the SP263 assay, with the lower bound of the 95% confidence interval for overall percent agreement reaching 86.2% at the â¥50% tumor proportion score (TPS) cut-off [56].
Table 2: PD-L1 Assay Performance in HNSCC
| Performance Aspect | Findings | Assay | Study |
|---|---|---|---|
| Interobserver Agreement (25% cut-off) | κ=0.60 to 0.82 | SP263 | Ring Study [99] |
| Interobserver Agreement (50% cut-off) | κ=0.64 to 0.90 | SP263 | Ring Study [99] |
| 73-10 Clone Positivity Rate | 79% in HNSCC vs 3% in normal mucosa | 73-10 | High-Sensitivity Study [100] |
| Specimen Type Discrepancies | Significant differences in CPS/TPS between biopsy and resection (p<0.01) | 22C3 | HNSCC Specimen Comparison [57] |
The Ring Study demonstrated that the performance of the SP263 assay in HNSCC is comparable across five different countries, indicating robust international consistency [99]. Recent research has highlighted the impact of specimen type on PD-L1 scoring in HNSCC. A comprehensive study of 68 HNSCC cases found significant discrepancies in both combined positive score (CPS) and TPS between preoperative biopsy and surgical resection specimens (p<0.01), as well as between surgical resection and metastatic lymph nodes (p<0.01) [57]. This heterogeneity emphasizes the challenges in PD-L1 assessment in HNSCC and suggests that specimen type must be considered when interpreting results.
Emerging data on the high-sensitivity 73-10 clone demonstrates its potential utility in HNSCC, with a 79% positivity rate (using tumor cell score â¥1%) in HNSCC compared to only 3% in normal oral mucosa [100]. This clone also correlated with high CD4+ tumor-infiltrating lymphocytes and served as an independent prognostic factor for overall survival, disease-specific survival, and progression-free survival.
Urothelial carcinoma presents unique challenges for PD-L1 assessment due to the importance of immune cell staining and considerable discordance between assays.
Table 3: PD-L1 Assay Performance in Urothelial Carcinoma
| Performance Aspect | Findings | Assays Compared | Study |
|---|---|---|---|
| Interobserver Agreement (TC, 25% cut-off) | κ=0.68 to 0.91 | SP263 | Ring Study [99] |
| Immune Cell Scoring Concordance | Poor to substantial (κ= -0.04 to 0.76) | SP263 | Ring Study [99] |
| Immune Cell Scoring Correlation | Low (CCC=0.10 to 0.68) | SP263 | Ring Study [99] |
| Biological Basis of Discordance | SP142 preferentially detects PD-L1 on dendritic cells | SP142 vs 22C3 | IMvigor130 Analysis [101] |
| Clinical Outcome Association | "22C3-only positive" associated with worse outcomes | SP142 vs 22C3 | IMvigor130 Analysis [101] |
The Ring Study highlighted particular challenges in UC, with low concordance for immune cell staining across different cut-offs (1%, 5%, 10%, and 25%), which may significantly impact treatment decisions [99]. A detailed analysis of specimens from the IMvigor130 trial revealed that discordance between the SP142 and 22C3 assays stems from their detection of biologically distinct PD-L1 expression patterns [101]. The SP142 assay preferentially detects PD-L1-expressing dendritic cells, which are associated with more favorable outcomes to immune checkpoint blockade, while cases positive only by the 22C3 assay (associated with tumor cell-dominant PD-L1 expression) correlated with worse outcomes.
A smaller comparative study of 24 UC cases found generally good concordance among the three antibodies (SP142, SP263, and 22C3), though it noted consistent underestimation of PD-L1 expression by the SP142 clone compared to the others [102].
The Ring Study implemented a rigorous multicenter design to assess PD-L1 assay performance across NSCLC, HNSCC, and UC [99]. Excisional specimens from each cancer type were assayed using the Ventana SP263 platform at three sites in six countries (Australia, Brazil, Korea, Mexico, Russia, and Taiwan). All stained slides were rotated to two other sites for interobserver scoring. In the NSCLC cohort, the same tissue samples were also assessed with the Dako 22C3 pharmDx assay for direct comparison. PD-L1 immunopositivity was scored according to approved algorithms: the percentage of PD-L1-expressing tumor cells for SP263 and tumor proportion score for 22C3. Statistical analysis included kappa statistics for categorical agreement and concordance correlation coefficients for continuous measures.
The developmental study for the Leica Biosystems PD-L1 CAL10 assay employed a feasibility design comparing the novel assay to the established SP263 assay in NSCLC samples [56]. The study included 136 formalin-fixed paraffin-embedded NSCLC tissue samples with case characteristics reflecting real-world diversity: 76 resection specimens, 23 biopsies, 88 adenocarcinomas, 43 squamous cell carcinomas, and one large cell carcinoma. Cases were pre-screened and pre-characterized using the BOND RTU PD-L1 (73-10) clone to ensure representation across the full TPS range (0-100%). Staining was performed on the BOND-III system for CAL10 and the Benchmark Ultra system for SP263, with appropriate controls. Two pathologists independently read randomized, anonymized slide sets, with statistical analysis focusing on overall percent agreement with predefined non-inferiority targets.
Advanced computational methods for PD-L1 assessment are emerging as alternatives to visual scoring. The PD-L1 Quantitative Continuous Scoring (QCS) system utilizes computer vision for granular cell-level quantification of PD-L1 staining intensity in digitized whole slide images [41]. This approach derives a biomarker capturing the percentage of tumor cells with medium to strong staining intensity (PD-L1 QCS-PMSTC), classifying patients with â¥0.575% as biomarker-positive. When validated in the MYSTIC trial (768 whole slide images), this quantitative method achieved a hazard ratio of 0.62 (CI 0.46-0.82) for durvalumab versus chemotherapy, with a substantially increased biomarker-positive prevalence of 54.3% compared to 29.7% for visual TPS â¥50% scoring.
Another study employed open-source bioimage analysis using QuPath software to manually annotate four distinct cell populations: tumor cells, immune cells, PD-L1-expressing tumor cells, and PD-L1-expressing immune cells in HNSCC samples [57]. This digital approach enabled precise quantification of CPS and TPS across 204 tissue sections from 68 patients, revealing significant differences between specimen types that might be challenging to detect with visual scoring alone.
The PD-1/PD-L1 pathway represents a critical immune checkpoint mechanism that regulates T-cell-mediated immunity. Under normal physiological conditions, the interaction between PD-L1 on antigen-presenting cells and PD-1 on T-cells maintains immune tolerance and prevents excessive inflammation [56]. Tumor cells exploit this mechanism by overexpressing PD-L1, which engages PD-1 on tumor-infiltrating T-cells, leading to T-cell exhaustion and immune evasion [56] [100]. Immune checkpoint inhibitors, including anti-PD-1 (pembrolizumab, nivolumab) and anti-PD-L1 (atezolizumab, durvalumab) antibodies, block this interaction, restoring T-cell-mediated anti-tumor immunity [56] [101]. The accurate assessment of PD-L1 expression through IHC assays is therefore essential for identifying patients most likely to benefit from these therapies.
The standardized workflow for comparative PD-L1 IHC assay evaluation begins with formalin-fixed paraffin-embedded tissue blocks sectioned at 4µm thickness [57] [100]. Sections undergo automated IHC staining on dedicated platforms (Dako, Ventana, or Leica systems) using validated protocols for each antibody clone [56] [71]. Following staining, slides are digitized using whole slide scanners (Aperio GT 450, Philips Intellisite, or PANNORAMIC systems) to enable both visual assessment by pathologists and computational analysis [56] [57] [41]. Evaluation incorporates multiple scoring systems: Tumor Proportion Score (percentage of PD-L1-positive tumor cells), Combined Positive Score (number of PD-L1-positive cells divided by total tumor cells multiplied by 100), and Immune Cell scoring (percentage of tumor area occupied by PD-L1-positive immune cells) [57] [101]. Statistical analysis focuses on concordance metrics (kappa statistics, overall percent agreement, concordance correlation coefficients), interobserver variability, and correlation with clinical outcomes [99] [71].
Table 4: Key Reagents and Platforms for PD-L1 IHC Research
| Category | Specific Product | Research Application | Performance Notes |
|---|---|---|---|
| Commercial Assays | PD-L1 IHC 22C3 pharmDx (Dako) | Companion diagnostic for pembrolizumab | High concordance with SP263 in NSCLC [99] [71] |
| PD-L1 IHC SP263 (Ventana) | Companion diagnostic for durvalumab | Comparable performance across tumor types [99] | |
| PD-L1 IHC SP142 (Ventana) | Complementary diagnostic for atezolizumab | Lower tumor cell sensitivity; preferential dendritic cell staining [101] [71] | |
| Development Assays | PD-L1 CAL10 (Leica) | Novel assay in development | Comparable to SP263 in NSCLC [56] |
| PD-L1 73-10 (Leica) | High-sensitivity detection | Superior sensitivity in HNSCC [100] | |
| Staining Platforms | Dako Autostainer Link | 22C3 assay platform | - |
| Ventana Benchmark Ultra | SP263, SP142 assay platform | - | |
| Leica BOND-III | CAL10, 73-10 assay platform | - | |
| Digital Analysis Tools | QuPath | Open-source bioimage analysis | Cell classification and scoring [57] |
| HALO | Commercial image analysis | Multiplex analysis and co-localization [101] | |
| Aperio GT 450 | Whole slide imaging | Digital slide creation [56] |
The comparative performance of PD-L1 IHC assays varies significantly across NSCLC, HNSCC, and urothelial carcinoma, reflecting biological differences in PD-L1 expression patterns and tumor microenvironment characteristics. In NSCLC, the 22C3, SP263, and 28-8 assays demonstrate strong concordance, supporting potential interchangeability with proper validation, while the SP142 assay shows consistently lower tumor cell sensitivity. HNSCC displays significant PD-L1 heterogeneity across different specimen types, complicating clinical assessment. Urothelial carcinoma presents unique challenges due to biologically meaningful discordance between assays, with the SP142 assay preferentially detecting dendritic cell PD-L1 expression associated with better outcomes. Emerging technologies including high-sensitivity clones (73-10, CAL10), digital pathology platforms, and quantitative continuous scoring systems show promise for improving the accuracy and reproducibility of PD-L1 assessment. Optimal PD-L1 testing requires careful consideration of tumor type, specimen characteristics, scoring systems, and clinical context to ensure appropriate patient selection for immunotherapy.
The assessment of Programmed Death-Ligand 1 (PD-L1) expression via immunohistochemistry (IHC) has become a critical predictive biomarker in oncology, guiding treatment decisions for immune checkpoint inhibitors across multiple cancer types including non-small cell lung cancer (NSCLC), gastric cancer, and bladder cancer [103] [58] [104]. The prevailing clinical standard involves visual scoring by pathologists, primarily using the Tumor Proportion Score (TPS) or Combined Positive Score (CPS). However, this manual approach suffers from significant interobserver variability and subjectivity [105] [41]. The emergence of artificial intelligence (AI) algorithms for digital pathology promises to enhance the accuracy, standardization, and efficiency of PD-L1 scoring. This article provides a comprehensive comparison of pathologist-based and AI-based PD-L1 scoring methodologies, examining their respective performance characteristics, technical approaches, and implications for clinical practice and drug development.
Table 1: Comparative Performance of Pathologists and AI Algorithms in PD-L1 Scoring
| Assessment Method | Cancer Type | Metric | Performance Value | Reference Standard |
|---|---|---|---|---|
| Pathologists (Interobserver) | NSCLC | Fleiss' Kappa (TPS â¥50%) | 0.873 (almost perfect) | Median pathologist scores [103] |
| Pathologists (Interobserver) | NSCLC | Fleiss' Kappa (TPS <1%) | 0.558 (moderate) | Median pathologist scores [103] |
| Pathologists (Intraobserver) | NSCLC | Cohen's Kappa Range | 0.726 - 1.0 | Self-consistency [103] |
| Visiopharm AI Algorithm | NSCLC | Fleiss' Kappa (TPS â¥50%) | 0.672 (substantial) | Median pathologist scores [103] |
| uPath (Roche) AI Algorithm | NSCLC | Fleiss' Kappa (TPS â¥50%) | 0.354 (fair) | Median pathologist scores [103] |
| Deep Learning Model (Lunit) | NSCLC | Spearman Correlation | 0.925 | Pathologist consensus [106] |
| YOLO-based AI Pipeline | Gastric Cancer | Cohen's Kappa (CPS) | 0.782 | Expert pathologist consensus [58] |
| QuPath (AI-SAI Protocol) | Bladder Cancer | Cohen's Kappa | 0.86 | Manual assessment [104] |
| QuPath (AI-WSI Protocol) | Bladder Cancer | Cohen's Kappa | 0.65 | Manual assessment [104] |
Table 2: Predictive Performance for Immunotherapy Response
| Scoring Method | Cancer Type | Predictive Metric | Performance | Clinical Context |
|---|---|---|---|---|
| Pathologist TPS (â¥50%) | NSCLC (MYSTIC Trial) | Hazard Ratio (PFS) | 0.69 (CI 0.46-1.02) | Durvalumab vs Chemotherapy [41] |
| PD-L1 QCS-PMSTC (>0.575%) | NSCLC (MYSTIC Trial) | Hazard Ratio (PFS) | 0.62 (CI 0.46-0.82) | Durvalumab vs Chemotherapy [41] |
| AI-Powered TPS (Lunit) | NSCLC | Hazard Ratio (PFS, TPS <1%) | 2.38 (CI 1.69-3.35) | ICI treatment [106] |
| Pathologist TPS | NSCLC | Hazard Ratio (PFS, TPS <1%) | 1.62 (CI 1.23-2.13) | ICI treatment [106] |
| AI Spatial Biomarker | NSCLC | Hazard Ratio (PFS) | 5.46 | ICI treatment [107] |
| PD-L1 TPS Alone | NSCLC | Hazard Ratio (PFS) | 1.67 | ICI treatment [107] |
Traditional pathologist assessment of PD-L1 expression typically follows standardized protocols. In a comparative study involving 51 SP263-stained NSCLC cases, six pathologists (five pulmonary pathologists and one in training) scored slides using both light microscopy and whole-slide images (WSI) [103] [9]. The evaluation was performed with a washout period of at least one month between light microscopy and digital scoring to minimize recall bias, following CAP-PLQC guidelines [9]. Pathologists evaluated only tumor cells, considering any intensity of either partial or complete membranous staining as positive. The percentage of positively stained tumor cells was recorded categorically: 0%, 1%, 5%, 10%, and up to 100% in 10% increments [9]. This methodology reflects real-world clinical practice where pathologists visually estimate the proportion of positive tumor cells relative to all viable tumor cells.
AI approaches demonstrate considerable diversity in their technical implementation. Most systems employ a multi-stage pipeline combining computer vision and deep learning architectures:
Dual-Network Architecture for Tumor Region Identification: The gastric cancer CPS system developed by [58] employs a pipeline that first identifies tumor regions using a combination of MobileNet for patch-level classification and U-Net for pixel-level segmentation. This dual-network approach enhances the accuracy of tumor region delineation, which is particularly challenging in some cancer types.
Cell Detection and Classification: Following tumor region identification, a YOLO-based cell detection model computes PD-L1 expression on different cell types for CPS calculation [58]. This model performs triple-task recognition: detection of PD-L1+ tumor cells, PD-L1- tumor cells in tumor regions, and detection of PD-L1+ immune cells in associated non-tumor regions.
Quantitative Continuous Scoring (QCS): A more advanced approach presented by [41] involves quantitative continuous scoring of PD-L1 expression intensity at a granular cell level. This system captures the percentage of tumor cells with medium to strong staining intensity (PD-L1 QCS-PMSTC) and classifies patients with â¥0.575% as biomarker-positive. Unlike traditional binary classification (positive/negative), QCS measures continuous membrane staining intensity values, allowing for more precise patient stratification [41].
Whole-Slide Image Analysis: Most AI systems operate on digitized whole-slide images. The QuPath software comparison study evaluated two protocols: Selected Area Interpretation (AI-SAI) and Whole Slide Imaging (AI-WSI) [104]. AI-SAI demonstrated stronger agreement with manual assessment (κ=0.86) compared to AI-WSI (κ=0.65), suggesting that focused analysis on representative regions may sometimes outperform whole-slide analysis, particularly in bladder cancer cases with high PD-L1-positive tumor cell content [104].
AI PD-L1 Scoring Workflow
Table 3: AI Algorithm Architectures for PD-L1 Scoring
| Algorithm Component | Architecture | Function | Application Example |
|---|---|---|---|
| Cell Detection | YOLOv5/YOLO-based | Localizes and classifies individual cells | PD-L1 positive/negative tumor cell detection [58] [105] |
| Tissue Segmentation | U-Net | Pixel-level segmentation of tumor regions | Delineating tumor vs. non-tumor areas [58] |
| Patch Classification | MobileNet-v2 | Patch-level classification of tumor regions | Identifying tumor-containing patches [58] |
| Foundation Model | Vision Transformer (ViT) | Generates image embeddings for classification | FGFR alteration prediction in bladder cancer [107] |
| Intensity Quantification | Custom computer vision | Measures continuous membrane staining intensity | PD-L1 QCS-PMSTC scoring [41] |
Advanced AI systems now demonstrate capabilities beyond simple replication of pathologist scoring:
Spatial Biomarkers: Researchers from Stanford University developed an AI spatial biomarker that analyzes interactions between tumor cells, fibroblasts, T-cells, and neutrophils [107]. This five-feature model achieved a hazard ratio of 5.46 for progression-free survival in NSCLC patients treated with immune checkpoint inhibitors, significantly outperforming PD-L1 tumor proportion scoring alone (HR=1.67) [107].
Molecular Prediction: Foundation models trained on large WSI datasets can predict molecular alterations directly from H&E-stained slides. Johnson & Johnson's MIA:BLC-FGFR algorithm predicts FGFR alterations in non-muscle invasive bladder cancer with 80-86% AUC, demonstrating strong concordance with traditional molecular testing [107].
Multimodal Integration: Researchers from UCSF and Artera validated a multimodal AI biomarker that combines H&E images with clinical variables (age, Gleason grade, PSA) to predict prostate cancer outcomes after radical prostatectomy [107]. This integration of image-based AI with clinical data improves prognostic tools for personalized treatment strategies.
Scoring Approach Comparison
Table 4: Key Research Reagent Solutions for PD-L1 Assessment Studies
| Reagent/Platform | Type | Primary Function | Application Notes |
|---|---|---|---|
| PD-L1 IHC Assays (22C3, 28-8, SP263, SP142) | Immunohistochemistry Antibodies | Detection of PD-L1 protein expression | Different clones have specific scoring guidelines and regulatory approvals [103] [58] |
| Whole Slide Scanners (PANORAMIC1000, Ventana DP200) | Digital Pathology Hardware | Digitization of pathology slides for AI analysis | Resolution (0.25-0.475 μm/pixel) critical for cell-level analysis [9] [58] |
| QuPath Software | Open-source Digital Pathology Platform | AI-powered cell detection and classification | Offers both Selected Area and Whole Slide analysis protocols [104] |
| Visiopharm PD-L1 Lung Cancer TME | Commercial AI Application | Automated TPS scoring for NSCLC | Demonstrated substantial agreement with pathologists (κ=0.672) [103] |
| uPath Software (Roche) | Commercial Digital Pathology Platform | PD-L1 image analysis for SP263 clones | IVDD-certified for TPS â¥50% classification [103] [9] |
| BenchMark ULTRA Platform | Automated Staining System | Standardized IHC staining procedure | Ensures consistent staining quality for both manual and digital assessment [9] [104] |
The comparative analysis of pathologist versus AI algorithm scoring for PD-L1 assessment reveals a complex landscape where both approaches offer complementary strengths. Pathologists demonstrate higher consistency, particularly at critical TPS cutoffs â¥50%, with almost perfect interobserver agreement (κ=0.873) [103]. However, AI algorithms show promising capabilities in quantitative continuous scoring, spatial biomarker analysis, and reducing interobserver variability, particularly in challenging low-expression cases [41] [106].
The emerging paradigm appears to be a collaborative approach where AI algorithms handle quantitative cell detection and classification tasks while pathologists provide contextual interpretation and oversight. This synergistic model leverages the computational power and consistency of AI with the clinical expertise and integrative reasoning of human pathologists. As AI technologies continue to evolveâwith advancements in foundation models, multimodal integration, and spatial analysisâthey hold significant potential to enhance the precision and predictive power of PD-L1 assessment in both clinical practice and oncology drug development.
For researchers and drug development professionals, the selection between pathologist-based and AI-based scoring approaches should consider factors including the specific cancer type, required throughput, available computational resources, and regulatory requirements. Hybrid models that combine the strengths of both approaches likely represent the future of precision immuno-oncology.
Programmed Death-Ligand 1 (PD-L1) immunohistochemistry (IHC) has emerged as a critical predictive biomarker for immune checkpoint inhibitor (ICI) therapy in non-small cell lung cancer (NSCLC) and other malignancies [108] [109]. The Tumor Proportion Score (TPS), which represents the percentage of viable tumor cells exhibiting partial or complete membranous PD-L1 staining, serves as a fundamental scoring metric for determining patient eligibility for immunotherapy [9]. Clinically relevant TPS cut-offs (â¥1% and â¥50%) have been established through pivotal clinical trials and are integrated into treatment guidelines worldwide, directly influencing therapeutic decision-making [9] [71].
The development of multiple PD-L1 IHC assays, each with distinct antibody clones and staining platforms, has created significant challenges for pathology laboratories [65] [71]. With limited resources and tissue availability, laboratories face practical difficulties in implementing all commercially available assays, fueling interest in their potential interchangeability [65] [71]. This comprehensive review synthesizes evidence from key concordance studies to evaluate the analytical comparability of various PD-L1 assays at these critical clinical thresholds, providing researchers and clinicians with evidence-based guidance for assay selection and implementation.
Multiple studies have systematically evaluated the analytical concordance between different PD-L1 IHC assays, with consistent findings across various study designs and geographic regions. The evidence demonstrates that while several assays show high agreement, notable exceptions exist that impact their potential interchangeability.
Table 1: Interassay Concordance at Clinically Relevant TPS Cut-offs
| Assay Comparison | Sample Size | % Agreement at â¥1% TPS | % Agreement at â¥50% TPS | Statistical Measure | Study |
|---|---|---|---|---|---|
| 22C3 vs 28-8 | 144 | 82.2% | 91.6% | Overall Agreement | [108] |
| 22C3 vs SP263 | 473 | >91% (for first-line treatment criteria) | >91% (for first-line treatment criteria) | Positive/Negative Agreement | [110] |
| 22C3 vs SP142 | 127 vs 132 | Lower than 22C3/28-8 comparison | Lower than 22C3/28-8 comparison | Cohen's Kappa | [108] |
| CAL10 vs SP263 | 136 | â¥94.0% (lower bound of 95% CI) | â¥86.2% (lower bound of 95% CI) | Overall Percent Agreement | [111] |
| 22C3 vs 28-8 vs SP263 (Blueprint Phase 2) | 81 | Highly comparable | Highly comparable | Intraclass Correlation | [109] |
The Blueprint (BP) Phase 2 study, a pivotal academic and professional society collaboration, provided compelling evidence regarding assay comparability. This comprehensive analysis of 81 real-world lung cancer specimens evaluated five trial-validated PD-L1 assays (22C3, 28-8, SP142, SP263, and 73-10) and demonstrated that 22C3, 28-8, and SP263 assays showed highly comparable staining characteristics for tumor cell PD-L1 expression [109]. In contrast, the SP142 assay exhibited consistently lower sensitivity for detecting PD-L1 expression on tumor cells, while the 73-10 assay demonstrated higher sensitivity compared to other assays [109].
A systematic review published in 2020, which analyzed 27 qualified studies, corroborated these findings, noting that "decrease in concordance... is seen with use of cut-offs, which hampers interchangeability of PD-L1 immunohistochemistry assays" [71]. This observation is clinically relevant as it highlights the challenges in applying binary classifications (positive/negative) to continuous biological variables, particularly near the critical threshold values.
Beyond analytical concordance, the relationship between assay performance and predictive value for treatment response is paramount. A 2024 prospective study directly compared three different PD-L1 assays (22C3, 28-8, and SP142) for predicting response to combined chemoimmunotherapy in 70 patients with advanced NSCLC [7]. This investigation revealed that PD-L1 expression determined using the 22C3 assay showed stronger correlation with therapeutic response than either the 28-8 or SP142 assays [7]. Specifically, patients with TPS â¥50% as determined by the 22C3 assay had significantly longer progression-free survival compared to those with TPS <50%, while the other assays did not reveal remarkable differences in objective response rate or progression-free survival [7].
The concordance studies employed rigorous methodological approaches to ensure valid and reproducible comparisons:
Sample Preparation and Staining Protocols: Most studies utilized formalin-fixed, paraffin-embedded (FFPE) tissue sections from NSCLC specimens, with consecutive sections to minimize variability due to tumor heterogeneity [110] [109]. The staining procedures followed manufacturers' recommended protocols for each assay: 22C3 and 28-8 assays were typically performed on Dako Autostainer Link 48 platforms, while SP142 and SP263 assays used Ventana BenchMark ULTRA staining systems [108] [110] [109]. The 73-10 assay protocol was developed by Dako/Agilent for avelumab clinical trials [109].
Scoring Methodologies: PD-L1 expression was assessed by experienced pathologists, often with specific training in PD-L1 interpretation [110] [109]. The Tumor Proportion Score was recorded as either continuous variables (0-100%) or categorized using clinically relevant cut-offs (<1%, 1-49%, â¥50%) [108] [109]. Some studies incorporated digital pathology platforms for scoring validation [9] [109], and the Blueprint Phase 2 study demonstrated high concordance between glass slide and digital image scoring (Pearson correlation >0.96) [109].
Statistical Analysis: Concordance was evaluated using various statistical measures including overall percent agreement, Cohen's kappa coefficient (κ), intraclass correlation coefficient (ICC), and Fleiss' kappa for multiple raters [108] [9] [109]. The kappa statistic interpretation typically followed established guidelines: <0.20 poor, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 good, and 0.81-1.00 excellent agreement [108].
Figure 1: Experimental Workflow for PD-L1 Assay Concordance Studies
Table 2: Essential Materials for PD-L1 Concordance Research
| Category | Specific Reagents/Platforms | Research Function | Key Characteristics |
|---|---|---|---|
| Commercial Assays | PD-L1 IHC 22C3 pharmDx (Dako/Agilent) | Companion diagnostic for pembrolizumab | TPS scoring; Dako Link 48 platform |
| PD-L1 IHC 28-8 pharmDx (Dako/Agilent) | Complementary diagnostic for nivolumab | TPS scoring; Dako Link 48 platform | |
| VENTANA PD-L1 (SP142) Assay (Ventana/Roche) | Complementary diagnostic for atezolizumab | TC/IC scoring; lower tumor cell sensitivity | |
| VENTANA PD-L1 (SP263) Assay (Ventana/Roche) | Companion diagnostic for durvalumab | Comparable to 22C3/28-8; Ventana platform | |
| PD-L1 IHC 73-10 Assay (Dako/Agilent) | Developed for avelumab trials | Higher sensitivity; Dako platform | |
| Laboratory-Developed Tests | Cross-platform LDTs (e.g., 22C3 on Ventana) | Increase testing accessibility | Requires proper validation [65] |
| Staining Platforms | Dako Autostainer Link 48 | Platform for 22C3, 28-8, 73-10 | Closed system with optimized protocols |
| VENTANA BenchMark ULTRA | Platform for SP142, SP263 | Automated staining with integrated detection | |
| BOND-III System (Leica) | Platform for novel assays (e.g., CAL10) | Emerging platform for PD-L1 testing [111] | |
| Digital Pathology | Whole Slide Imaging Scanners | Digital scoring and archiving | Enables remote assessment and AI applications [9] |
| AI-Powered Analysis Software | Automated TPS scoring | Shows promise but requires refinement [9] |
Beyond interassay differences, interobserver variability among pathologists represents another dimension of complexity in PD-L1 testing. Recent studies evaluating both manual and digital scoring demonstrate that interobserver agreement is generally higher at the â¥50% TPS cut-off (almost perfect agreement, Fleiss' kappa 0.873) compared to the â¥1% TPS cut-off (moderate agreement, Fleiss' kappa 0.558) [9]. This finding has significant clinical implications, as discordance at the 1% threshold may potentially impact patient eligibility for certain immunotherapies.
Intraobserver consistency typically remains high (Cohen's kappa 0.726-1.0) [9], suggesting that individual pathologists generally maintain consistent scoring approaches. The Blueprint Phase 2 study further confirmed very strong reliability among pathologists in tumor cell PD-L1 scoring across all assays (overall ICC = 0.86-0.93), though reliability for immune cell scoring was considerably lower (overall ICC = 0.18-0.19) [109].
The landscape of PD-L1 testing continues to evolve with the development of novel assays and technological approaches. The recently developed PD-L1 CAL10 assay (Leica Biosystems) demonstrated strong concordance with the SP263 assay in a feasibility study, with overall percent agreement lower bounds of 94.0% at â¥1% TPS and 86.2% at â¥50% TPS [111]. This performance suggests that properly validated novel assays can achieve comparable results to established platforms.
Artificial intelligence applications in PD-L1 scoring show promise but currently exhibit limitations. Comparative studies between pathologists and AI algorithms reveal that while some AI tools demonstrate substantial agreement with median pathologist scores (Fleiss' kappa 0.672), their performance remains less consistent than expert human evaluation, particularly at critical clinical decision-making thresholds [9].
Figure 2: PD-1/PD-L1 Signaling Pathway and Clinical Measurement
The comprehensive analysis of PD-L1 IHC assay concordance studies reveals a consistent pattern: the 22C3, 28-8, and SP263 assays demonstrate high analytical comparability for tumor cell scoring at both the 1% and 50% TPS thresholds, suggesting potential interchangeability in clinical practice [108] [109] [71]. In contrast, the SP142 assay shows systematically lower sensitivity for tumor cell staining, while the 73-10 assay exhibits higher sensitivity compared to other assays [109].
These findings have significant implications for diagnostic laboratories and clinical practice. The demonstrated concordance between 22C3, 28-8, and SP263 assays provides flexibility for laboratories in selecting testing platforms based on available infrastructure and resource considerations. However, the predictive superiority of the 22C3 assay for response to chemoimmunotherapy in one prospective study suggests that not all analytically comparable assays may be clinically equivalent [7].
Future directions in PD-L1 testing should focus on standardizing scoring criteria, reducing interobserver variability through enhanced training and potentially artificial intelligence integration, and validating laboratory-developed tests that increase accessibility while maintaining analytical performance. As immunotherapy continues to evolve across cancer types, ensuring accurate, reproducible, and accessible PD-L1 testing remains paramount for optimizing patient selection and treatment outcomes.
The comparative performance of PD-L1 IHC assays reveals a complex landscape where no single assay universally outperforms others, but significant differences in analytical performance and diagnostic accuracy exist. Key takeaways include the critical importance of rigorous validation per CAP guidelines, the moderate-to-poor interchangeability of many assaysâparticularly at clinically crucial cut-offsâand the persistent challenge of inter-observer variability. Future directions must focus on harmonizing scoring systems, standardizing pre-analytical conditions, and integrating artificial intelligence to enhance reproducibility. The emergence of AI algorithms shows promise but currently lacks the consistency of expert pathologists, highlighting a need for further refinement. For researchers and drug developers, these findings underscore the necessity of fit-for-purpose assay selection and validation to ensure reliable patient stratification in clinical trials and ultimately, optimize immunotherapy outcomes. The evolution of PD-L1 testing will continue to be pivotal in advancing precision immuno-oncology.