Big Data

RISI has deployed an institutional Hadoop cluster to provide an infrastructure for big data solutions in storage and analytics. We design algorithms and develop scalable and reliable applications to store and analyze large amounts of structured and unstructured data using distributed computing.

Big Data

We offer storing and analyzing big amount of structured, unstructured and genomic data on distributed, fault tolerance, and highly available systems. Our team takes advantage of Machine Learning, Natural Language Processing and Data Mining algorithms to analyze big datasets in a timely fashion and provides access to the results on the cloud. We provide real-time information streaming of high throughput systems by storing and analyzing data in a distributed environment. Our technology allows data to be stored in SQL, NoSQL, graph and schema-free formats on distributed file systems.


  • Amazon AWS setup
  • Storing huge amount of information on fault tolerance, highly available systems in tabular, graph or NoSQL databases
  • Designing algorithms to extract and analyze information in big databases
  • Storage and real-time analysis of huge streaming data
  • Develop and/or refine algorithms to automate research data workflow processes

Case Studies

Big data storage and access management (such as genomics and clinical data)​: provide solutions to easily store and manage terrabytes of data. High throughput instruments generate a bulk of data. Some of the solutions being looked at currently are to use a cloud based solution, distributed data management using Hadoop like system and traditional high performance computing.​

Genome Archiving Communication System: Researchers at Nationwide Children’s Hospital complete a first-of-its-kind project to evaluate a large-scale genomic data management system on the scale of up to one million genomes.