SODAR Introduction

SODAR stands for the System for Omics Data Access and Retrieval. It is a centralized system for providing access to raw data, results, and metadata in omics research projects. SODAR is targeted for researchers and project owners who want to manage their experiment data according to the FAIR principles: findable, accessible, interoperable and reusable.

SODAR is developed by the Core Unit Bioinformatics at the Berlin Institute of Health.

SODAR aims at providing the following capabilities for managing omics research data:

  • Project based access control and data encapsulation

  • Modeling and management of study design metadata

  • Large scale data storage

  • Linking files to metadata

  • Management and validation of file uploads

  • Tools for aiding project data management

  • Integrating data with third party tools

SODAR can be accessed either by a web based graphical user interface or REST APIs for programmatic use.

Data Workflow

The SODAR data workflow involves the following elements:

Projects and Categories

In SODAR, data and user access are structured into projects, which exist under categories. Projects contain project study modeling, file data and other project related functionality. A category can be thought of as a project with no data and the possibility to contain other categories or projects under it.

Sample Sheets

Sample sheets contain the sample, process and material metadata for project studies. They are modeled in the ISA Model standard as investigations, studies and assays. One SODAR project can contain one investigation with one or more studies.

Large Scale Data Storage

The actual sample data files for studies and assays are stored in a distributed file system built on iRODS. This data can contain anything from binary alignment map files to e.g. reports and log files. The files can be accessed through relevant sample sheet metadata in SODAR.

Landing Zones

Uploading new sample data is done through landing zones, which are temporary user-specific file areas with write access. Once uploads are prepared, SODAR validates the files and moves them into the read-only sample data repository.

Notable Features

  • Accessibility
    • User access via one or multiple LDAP/AD services, Single Sign-On via SAML and/or local accounts

    • Access tokens can be can be generated for REST API use

    • UUIDs and permanent URLs for all relevant objects in the system

  • iRODS Integration
    • Automated iRODS environment file generation

    • WebDAV for mounting iRODS as a network drive and web-based browsing of files, supporting random file access

  • Sample Sheets
    • Sample sheet import from existing ISA-Tab TSV files

    • Sample sheet generation from templates

    • Sample sheet editing and version control

    • Sample sheet export in ISA-Tab and Excel formats

    • Automated iRODS shortcut generation for BAM/CRAM/VCF files

    • Automated Integrative Genomics Viewer (IGV) session file generation and merging

    • Track hub management for UCSC Genome Browser integration

  • Landing Zones
    • Automated validation and of file uploads

    • Transactions with rollback for file transfers to avoid invalid or incomplete data to be entered to projects

    • Automated generation of expected iRODS collections for standardized data structures

  • Other Features
    • Searching for sources, samples and files in iRODS

    • Timeline application for enhanced event logging an providing audit trails