The Challenge: Variable Expressivity

Medicine faces a fundamental puzzle that has resisted decades of research: identical genetic changes produce dramatically different clinical outcomes.

Consider 22q11.2 deletion syndrome, the most common chromosomal microdeletion. Every affected individual is missing the same stretch of chromosome 22. Yet:

  • Some develop severe congenital heart defects; others have none
  • Some experience profound immune deficiency; others show minimal impact
  • 25-30% develop schizophrenia—but 70-75% do not
  • 50-80× elevated lupus risk—but only a fraction actually develop SLE

This phenomenon—variable expressivity—represents one of the deepest unsolved problems in medicine. Current single-system approaches (cardiology alone, immunology alone, psychiatry alone) explain only a fraction of this variance.

The Core Insight

What if outcome variance becomes predictable when we analyze multiple biological systems together rather than separately? What cross-system signatures might distinguish patients destined for severe disease from those who will remain healthy?

Limitations of Current Approaches

Modern medicine has made extraordinary progress through specialization. Cardiologists have mapped the heart's electrical system. Immunologists have catalogued immune cell subtypes. Neuroscientists have traced neural circuits. But this specialization comes at a cost: insights from one specialty rarely inform another.

The Silo Problem

  • Cardiology studies measure cardiac function but rarely assess immune status or brain connectivity
  • Immunology studies measure cytokines but rarely assess cardiac function or psychiatric symptoms
  • Psychiatry studies measure symptoms but rarely assess immune function or cardiac biomarkers

This creates a systematic blind spot: cross-system patterns that might predict outcomes remain invisible because no single study measures all relevant systems.

The Literature Fragmentation Problem

Even when relevant findings exist, they are scattered across specialty journals, using different terminologies, study designs, and patient populations. A cardiologist studying congenital heart defects might be unaware of immunology findings in the same patient population—and vice versa.

Our Approach: Cross-System Invariant Analysis

Cross-System Invariant Analysis (CSIA) addresses these limitations through systematic integration of findings across research domains. The methodology has four core components:

The CSIA Pipeline
Literature Synthesis
Pattern Detection
Hypothesis Generation
Prospective Validation

1. Systematic Literature Synthesis

We conduct comprehensive reviews across traditionally siloed domains—integrating cardiology, immunology, neurology, psychiatry, gastroenterology, and other fields. For each condition, we map:

  • Epidemiological associations (what co-occurs?)
  • Mechanistic pathways (what might explain co-occurrence?)
  • Temporal sequences (what comes first?)
  • Modifier effects (what changes outcomes?)

2. Computational Pattern Detection

Our methodology employs computational analysis to identify statistical regularities across biological systems. When cardiac, immune, and neurological measurements are analyzed together rather than separately, certain patterns emerge that correlate with clinical outcomes.

Important Clarification

The specific patterns we identify are empirically observed correlations. We make no claims about underlying mechanisms. The "why" behind these patterns remains an open scientific question requiring further investigation.

3. Hypothesis Generation

Observed cross-system patterns generate specific, testable hypotheses. Each hypothesis specifies:

  • Population: Which patients? (e.g., "children with 22q11.2DS")
  • Exposure/Predictor: What cross-system signature? (e.g., "elevated inflammatory markers + specific cardiac abnormality + early thymic dysfunction")
  • Outcome: What clinical result? (e.g., "development of autoimmune disease within 5 years")
  • Effect size: How strong? (e.g., "odds ratio > 3.0")

4. Prospective Validation

All hypotheses are pre-registered before data analysis. This prevents post-hoc rationalization and ensures that findings can be independently replicated or refuted. We specify:

  • Primary and secondary outcomes
  • Statistical analysis plan
  • Success and failure criteria
  • Required sample sizes

What We Claim—and What We Do Not

We DO Claim We Do NOT Claim
Cross-system analysis reveals statistical patterns invisible to single-system approaches These patterns represent proven causal mechanisms
Certain cross-system signatures correlate with clinical outcomes We understand why these correlations exist
Our hypotheses are testable and falsifiable Our hypotheses have been fully validated
Literature synthesis reveals underexplored connections We have discovered anything fundamentally new
Cross-system thinking may improve risk stratification Our methods are ready for clinical implementation
Epistemological Commitment
"We present correlations, not causations; hypotheses, not proofs; risk factors, not destinies; possibilities, not certainties."

Statistical Framework

Pattern Detection Criteria

For a cross-system correlation to generate a formal hypothesis, it must meet the following criteria:

Hypothesis Generation Criteria
  1. Literature support: At least two independent studies suggest the relationship
  2. Biological plausibility: Known pathways could mechanistically connect the systems
  3. Effect size: Estimated effect size suggests clinical relevance (OR > 1.5 or similar)
  4. Testability: Clear outcome that can be measured in existing or planned cohorts
  5. Specificity: Prediction distinguishes from null hypothesis with calculable statistical power

Pre-Registration Standards

All hypotheses are pre-registered on public platforms (e.g., OSF Registries, ClinicalTrials.gov) before any validation data is examined. Pre-registration includes:

  • Complete hypothesis statement
  • Planned sample and data sources
  • Complete statistical analysis plan
  • Primary and secondary outcomes
  • Planned sensitivity analyses
  • Clear success/failure criteria

Multiple Testing Correction

Given that CSIA generates multiple hypotheses, we apply appropriate corrections for multiple comparisons:

  • Bonferroni correction for confirmatory analyses
  • False Discovery Rate (FDR) control for exploratory analyses
  • Replication requirement: Key findings must replicate in independent cohorts

Evidence Grading

We use a modified GRADE framework to communicate the strength of evidence for each finding:

Evidence Quality Levels
Rating Criteria Interpretation
High Multiple RCTs or large cohort studies with consistent results Very confident the true effect is close to estimate
Moderate Well-designed cohort or case-control studies Moderately confident; true effect likely close to estimate
Low Case series or observational data with limitations Limited confidence; true effect may differ from estimate
Hypothesis Cross-system pattern identified but not yet validated Theoretical prediction requiring prospective testing

Example: The TLR9 Convergence Model

To illustrate our methodology, consider how we developed the TLR9 convergence model for 22q11.2DS-associated lupus:

Step 1: Literature Synthesis

We integrated findings from three traditionally separate domains:

  • Immunology: TLR9 mediates autoreactive B-cell activation in lupus (Leadbetter et al., 2002)
  • 22q11.2DS epidemiology: 50-80× elevated lupus risk (Crowley et al., 2018)
  • 22q11.2DS genetics: DGCR8 haploinsufficiency affects miRNA processing (Sellier et al., 2014)

Step 2: Pattern Detection

Cross-system analysis revealed that DGCR8 and TLR9 pathways converge through miRNA regulation of TLR9 expression—a connection not previously emphasized in either the 22q11.2DS or lupus literature.

Step 3: Hypothesis Generation

Pre-Registered Hypothesis
"In patients with 22q11.2DS, elevated TLR9 signaling (measured by proxy markers) will predict development of autoimmune disease with an odds ratio > 3.0 within a 5-year follow-up period."
Status: Pre-registered, awaiting validation cohort
Falsification criteria: OR < 1.5 in adequately powered cohort would refute

Step 4: Validation Planning

This hypothesis can be tested in existing 22q11.2DS longitudinal cohorts with stored biosamples. We have specified the analysis plan and are seeking data access collaborations.

Transparency Commitment

Scientific credibility requires transparency. We commit to:

Open Pre-Registration

All hypotheses publicly registered before validation

Methods Disclosure

Complete description of analytical approaches

Negative Results

Publish refuted hypotheses alongside confirmed ones

Data Availability

Share data and code where permissions allow

Limitations We Acknowledge

  • Correlation ≠ causation: Cross-system patterns identify correlations that require mechanistic investigation
  • Literature bias: Our synthesis is limited by what has been published and may reflect publication bias
  • Hypothesis inflation: Pattern detection may generate false positives; hence the requirement for prospective validation
  • Implementation gap: Even validated findings require clinical implementation research before changing practice

Join the Validation Effort

CSIA generates hypotheses. Validation requires data. We seek collaborators with:

Longitudinal Cohorts

Patient populations with multi-system data collected over time

Biobank Access

Stored samples that can be analyzed for cross-system markers

Domain Expertise

Specialists who can evaluate biological plausibility and clinical relevance