Programmatic Access

Overview

There are several ways to access the metadata and data on the 4DN data portal. The most straightforward and intuitive ways to do so are via the search and browse functionality on the web site.

search box search box

browse menu browse menu

The web site also facilitates downloading files individually or in bulk from data sets of interest.

However, as you become more familiar with the data sets available from the portal you may want to programmatically access metadata and data. Additionally, while we provide workbooks and easy to use tools for data submission some advanced users may wish to do submissions programmatically.

The sections below explain some of the options available.

DCIC utils

If you program in Python the 4DN-DCIC has developed and maintains a package of utilities that contains useful functions for accessing 4DN metadata from the portal. You can install it with pip install dcicutils

To get started using this package and for some information on useful functions to access metadata see the documentation here

If you've already installed the package and want to see some examples of how you can use it look here

If you don't use Python expand the REST API section below.

  REST API

Downloading Files

File download from the 4DN data portal now requires authentication, even if the file is public. Accounts can be created by anyone, including those not part of the 4DN Network; for more information on account creation, see the Account Creation page. Files can be downloaded from the web portal after logging in, as usual; for file download from the command line, the process is described below.

First you need to create a new access key, if you don't have an access key already. Note that access keys created for Jupyterhub can't be used for the rest of the portal.

To create a new access key, first log in to the data portal, then click on your account in the upper right and click on Profile from the dropdown menu. There will be a button near the bottom of the page to add an access key. Save the key and secret; typically this is done by creating a file in your home directory called keypairs.json with the following contents/format (replacing the x’s with the appropriate key and secret, of course):

{
    "default": {
        "key": "XXXXXXXX",
        "secret": "xxxxxxxxxxxxxxxx",
        "server": "https://data.4dnucleome.org"
    }
}

Once the access key is created and stored, the file can be downloaded via curl with the following command:

curl -O -L --user <key>:<secret> <download-url>

For more information on file downloads see this page

DRS API

The data portal has an implementation of the Data Repository Service v1.0 API.

This standard has been developed and adopted as part of the Global Alliance for Genomics and Health (GA4GH) and implementation has been encouraged by the Common Fund Data Ecosystem.

In brief from the specification, the Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it’s stored and how it’s managed.

A DRS request uses 4DN file accessions as unique identifiers and can be made with the following uri formats:

drs://data.4dnucleome.org/files-processed/4DNFIIA7E3HL/@@drs

drs://data.4dnucleome.org/ga4gh/drs/v1/objects/4DNFIIA7E3HL

and will return a json that follows the API specification:

    {
        "id": "/files-processed/4DNFIIA7E3HL/",
        "created_time": "2022-02-25T20:40:38.976850+00:00",
        "drs_id": "4DNFIIA7E3HL",
        "self_uri": "drs://data.4dnucleome.org/files-processed/4DNFIIA7E3HL/@@drs",
        "access_methods": [
          {
              "access_url": {
                 "url": "https://data.4dnucleome.org/4DNFIIA7E3HL/@@download"
              },
              "type": "https"
          },
          {
              "access_url": {
                 "url": "http://data.4dnucleome.org/4DNFIIA7E3HL/@@download"
              },
              "type": "http"
          }
       ],
       "description": "ATAC-seq signal fold change",
       "size": 2345462672,
       "aliases": [
           "4692b7c1-addf-47aa-b4cb-e30dc65a38f6"
       ],
       "checksums": [
          {
             "checksum": "73a2635fe8382a28339370a22b84de48",
             "type": "md5"
          }
       ],
       "version": "73a2635fe8382a28339370a22b84de48",
       "updated_time": "2022-05-03T21:45:53.756034+00:00"
    }

A request that specifies the access method will return a uri that can be used for data dowload eg.

https://data.4dnucleome.org/ga4gh/drs/v1/objects/4DNFIIA7E3HL/access/https

Returns:

    {
        "url": "https://data.4dnucleome.org/4DNFIIA7E3HL/@@download"
    }

Access to the data via the access_url respects the authentication and authorization for that data and therefore, attempts to download data to which you do not have access will fail.