Data Sources
Swanson's Apple synthesizes biomedical associations from six curated databases to generate hypotheses. Each source contributes a different type of evidence linking diseases, genes, and compounds.
NC State University
Manually curated associations between chemicals, genes/proteins, and diseases drawn from peer-reviewed literature. One of the most comprehensive hand-curated biomedical resources available.
Role: Primary source for both Disease–Gene (B↔A) and Chemical–Gene (C↔A) edges. Forms the backbone of most hypotheses in the engine.
EMBL-EBI, Wellcome Sanger Institute, GSK, Pfizer, Takeda
Integrates genetic, somatic, and functional genomics evidence linking diseases to molecular targets across multiple evidence types including GWAS, rare variants, and gene expression.
Role: Expands Disease–Gene (B↔A) edges with genetic evidence, broadening the set of disease-target links beyond manual curation.
EMBL-EBI
Manually curated bioactivity database for drug-like compounds. Tracks development phase for over two million compounds, with max_phase=4 indicating full FDA approval.
Role: Identifies FDA-approved drugs among compounds in the database. Powers the “FDA Approved” filter on the main results table.
Washington University in St. Louis
Aggregates drug-gene interactions from roughly 30 sources including DrugBank, PharmGKB, and ChEMBL, providing a broad view of known pharmacological relationships.
Role: Adds Drug–Gene (C↔A) edges, expanding the set of known drug-target relationships used to generate compound hypotheses.
University of New Mexico / NIH
Clinically active drug-target pairs curated from FDA drug labels, the WHO essential medicines list, and the biomedical literature. Emphasizes interactions of established clinical significance.
Role: Adds Drug–Protein (C↔A) edges with high clinical confidence, grounded in FDA-approved indications and active pharmacology.
Novo Nordisk Foundation Center for Protein Research, University of Copenhagen
Text-mined and curated disease-gene associations derived from the biomedical literature at scale. Applies confidence scoring to filter associations by evidence quality.
Role: Adds Disease–Gene (B↔A) edges from large-scale literature mining, complementing the manually curated sources with broader coverage.
How the sources combine
Each hypothesis connects a disease (B) to a compound (C) through a shared gene or protein target (A). The B↔A edges come from CTD, Open Targets, and DISEASES; the C↔A edges from CTD, DGIdb, and DrugCentral. ChEMBL tags which compounds are FDA-approved. The ABC algorithm surfaces pairs where the B–C connection has little or no existing literature — the basis of Swanson-style literature-based discovery.