UserContent
released
April 12th, 2021 at 6:18pm
Downloading Released Files from AWS Open Data
As part of the AWS Open Data program the 4DN-DCIC hosts publicly released data in an open s3 bucket. Some basic information on the data and access provided by 4DN via this program can be found here. Accessing data that resides in these buckets via the mechanisms described in the previous sections will work as expected and should be transparent to you. However, if you do not wish to create an account on the 4DN data portal but do want to download specific data files along with informative metadata you can use the following mechanism.
Follow the guide to selecting files and generate a metadata.tsv file for your files of interest as described in the section above.
If there is a value in the open_data_url column for a file you have selected, then that file lives in the open data bucket and can be directly accessed using that URL:
eg. curl -O <open_data_url>
If you wish to use AWS command line tools you will need to parse the provided open data URL to convert the file location to the proper form.
For example:
https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/files/34067f2a-0586-44a8-adf8-d4336db309c5/4DNFIFISE78E.fastq.gz
must be converted to:
s3://4dn-open-data-public/fourfront-webprod/files/34067f2a-0586-44a8-adf8-d4336db309c5/4DNFIFISE78E.fastq.gz
to be accessed with the CLI.