Data Science

Our Data Science team employs innovative approaches for obtaining, storing, searching, and analyzing data. We take advantage of Natural Language Processing, Machine Learning, Data Mining, Geo-spatial Analysis Methods, and Predictive Analytics approaches to extract knowledge from clinical or research data and present them in an intuitive and user friendly fashion. Our distributed Big Data infrastructure allows us to expand our projects regardless of volume, variety, and complexity of data. We bring you experience of instant search and data access on structured and unstructured documents with smart features such as auto suggestion, typo correction and highlighting results.​

Text Processing​

We develop Natural Language Processing algorithms to parse and analyze blocks of text from medical records or any text reports, extract information and transform them into understandable structure for further use. We design Information Retrieval applications that provide instant, advanced and customizable querying and ranking capabilities on full text documents.

Our team designs cross-platform web-based querying systems that make searching and analyzing data easier, faster and more creative. Our top notch, flexible user interface designed by Custom Application Development team enables users to instantly see the results as they are typing and changing options and save and export results in multiple formats. We also offer multi-lingual full text search with spellchecking, auto complete, highlighting, and advanced customizable ranking capabilities.

We also use Natural Language Processing to engage computational linguistic techniques to extract clinical information from notes and reports. Such information is critical for clinical decision-making that is not already readily available in discrete format for analysis.


  • Applying advanced ranking algorithms to identify and promote valuable keywords and boost important fields
  • Developing all-in-one querying systems that enables retrieval of both structured and unstructured data in one system
  • Providing top-quality semantic search experience by expanding queries using synonyms, hyponyms and ontology related concepts, correcting typos, auto suggesting keywords, and identifying negations

Case Studies

NEISS​ database search engine: National Electronic Injury Surveillance System (NEISS) collects data on consumer product-related injuries occurring in the United Stated. We designed a query system that searches through more than 7 million injury records of approximately 100 hospitals in less than a second. Users can apply multiple complex Boolean queries, look for keywords and phrases in clinical notes, and save and reuse their queries later.

Discovery Informatics Gateway: DIG, an open-source search engine, offers a Google-like experience to effectively query unstructured data from Electronic Medical Records. We offer an easy to use, interactive user interface, that can be used by researchers to perform cohort identification, as well as by clinicians to search and find individuals with certain criteria.

I2B2: Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification using natural language processing techniques. Discrete data were collected from semi-structured sleep study reports by syntactically and semantically parsing sentences including diagnosis, procedure and medication information. The system showed to work more efficiently than traditional manual chart review methods, and enabled searching capabilities that were previously not possible.

We leverage Machine Learning algorithms to search patterns in the data or train automated classifiers for various data mining, prediction, and decision support tasks.

Data Mining & Predictive Analytics​

We employ state of the art techniques and theories drawn from Artificial Intelligence and statistics to extract knowledge from data and automate tasks. Our systems are integrated with knowledge systems, machine learning system, and high performance computing systems to optimize their efficiency. We use statistical analysis and well-known evaluation metrics to prove efficiency of our systems.


  • Train systems to automate/replace human doable tasks with machines
  • Extract meanings and semantics from text
  • Perform collecting, modeling and analyzing, and decision making on your data
  • Discovering patterns in large data sets
  • Converting data into cleaner and more readable format
  • Information visualization

Case Study

Race/ethnicity identification: Here at Nationwide Children’s Hospital, we provide special care and services for our patients based on their needs. Our developed system that identifies race/ethnicity and preferred speaking language of patients enables us to provide translators at patient visits and be aware of potential risks and diagnosis for patients.

We integrate geo data and provide geo-spatial analysis to enhance our abilities to get insights especially on research problems with geo-spatial components.

Geographic Information Systems

We offer spatial data integration services to geocode your data so that they can be analyzed spatially and plotted on a digital map for decision-making. By using ESRI’s GIS technologies, we develop spatial data analytical approaches to optimize health and human services. With Esri maps and spatial analysis, you can prioritize spending, optimize service locations, and identify vulnerable populations. Our goal is to help you achieve better outcomes for patients, stakeholders, and the public.


  • Understand the impact of working and living environments on personal health risks.
  • Communicate through maps and raise the awareness about environmental and social hazards.
  • Explore disease and hazard data to test hypotheses about causes and outcomes.
  • Track the spread of diseases and plan steps to combat and contain them.
  • Manage risk factors, hazards, and demographics information through a holistic spatial view.
  • Plan health service locations to make them most accessible.
  • Analyze the clusters of illness among your patients to determine potential environmental causes or spread patterns.
  • Optimize the routing and resource allocation for emergency events.