BACKGROUND: A data warehouse is a repository that centralizes and integrates data from disparate systems to provide the ability to easily access historical, consistent data. Integration of disparate source systems into one centralized location can enable rapid identification of more robust research cohorts and enable data-driven decision making. The objective of the Miami Cancer Institute (MCI) Oncology Data Warehouse (ODW) is to collect and organize data from clinical records, research, and administrative systems to support information retrieval, business intelligence, and analytics for high-level decision making for oncology patients. The design, architecture, and implementation aligns with industry best practices which includes Data Governance, Enterprise Data Modeling, and Metadata Management.

METHODS: We integrated structured and unstructured data from disparate sources into one centralized data model optimized for querying known as the ODW. The ODW is modeled as a star schema, with fact tables and conformed dimension tables, and expands to a galaxy schema with constellation facts and dimensions that can snowflake to other data models as needed. Each fact table represents a subject area (i.e. pathology), that is directly related to the conformed dimension tables using surrogate and foreign keys. Conformed dimensions represent the attributes associated to the subject area (i.e. date of encounter). The source data is extracted, transformed and loaded (ETL) automatically from different databases into a set of tables. The ETL code performs incremental loads at regular prescribed intervals into two parallel storage areas, a relational database management system (RDMS) as well as a Big Data file storage system.

RESULTS: An interdisciplinary team of physicians, engineers, scientists, and subject matter experts at the Miami Cancer Institute of Baptist Health South Florida, has designed, developed, and implemented the ODW with information originating from different data sources which include: Electronic Medical Record (EMR) systems, Financial Systems, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Pathology synoptic reports and archives, and Next Generation Sequencing services. Structurally it is a subject-oriented, integrated collection of data leveraging conformed dimensions. The ODW is capable of connecting most business intelligence (i.e. Tableau) or statistical (i.e. SAS) tools for automated or static report development.

CONCLUSION: The growing ODW enables physicians, clinical management teams, and medical analysts to systematically mine and review the molecular, genomic, and associated clinical or administrative information of patients, and identify patterns that may influence treatment decisions and potential outcomes. By implementing an innovative combination of technology tools and methods, we were able to organize enterprise information about oncology patients which can be utilized for clinical decision support and precision medicine use cases.

Publication Date


Presented At:

14th Annual BHSF Research Conference

Content Type


Open Access

Available to all.