Data Access
Table of Contents
- Data access guide
- Globus based transfer
- Imperial College Research Data Store based transfer
- Illumina Basespace Sequence Hub based file transfer
- Data access via iRODS server (Discontinued)
- Access QC report pages
- List of resources
- Change logs
Data access guide
Currently, Globus is our preferred mode of data transfer.
Update: We have stopped using iRODS server for data distribution from October 2022.
Check our new data access guide: Accessing data from IGF
Click to expand slides here
Globus based transfer
Imperial College’s Research Data Store is now linked to Globus which allowes the following options
- Transfer large volumes of data between the RDS, your personal computer and Globus-accessible storage at other institutions
- Share RDS project allocation data with selected third parties, without requiring them to have a College account (Globus identity required)
Check our new slides for Globus transfer: globus data transfer
Click to expand slides here
Requirements
For users from Imperial College London
We will require your Colleges username (e.g., username@ic.ac.uk
) for this mode of data sharing. Please note that data sharing will fail if you provide us alternate user names (e.g. following will not work your.fullname@imperial.ac.uk
or user.name@ic.ac.uk
). For more details, please have a look at Imperial College’s guideline for Globus data transfer: Transferring data to other sites with Globus
For users without Imperial College account
Please send us the email id linked to your Globus account.
Globus transfer process
- We will create a new Globus collection after receiving the user’s request and copy existing data to it.
- The new collection will be shared with the user’s Globus account (i.e., email id or username). We can add more than one user to the same Globus collection.
- User needs to follow Globus or Imperial College’s documentation and transfer data (to any preferred storage location).
- Files from any new sequencing run or analysis will get added to the existing Globus collection directory.
- User needs to transfer any new batch of data separately after receiving email notification from us.
- Each file on the Globus collection directory will be removed after 30 days of file creation.
- We may remove old Globus collections without any prior notice, once all the files are removed from the collection directory.
Imperial College Research Data Store based transfer
Please note: We can only use RDS transfer at the very end of the project when we have data from all the sequencing runs and analysis pipelines
Imperial College now offers a new central service for storing large volume of research data. Please follow these steps to setup a new storage volume for your sequencing project:
Step 1: Check the documentation about Research Data Store (and wiki page) and setup a new allocation for your peoject. Also, we have few slides regarding setting-up RDS project allocation
Click to expand slides here
Step 2: Add Imperial BRC Genomics Facility (username: igf) as a new member of the research data storage, once its available
Step 3: Update IGF regarding your new RDS storage path in HPC
Step 4: Data will be copied to the top level of the storage using the layout RDS_PATH/live/PROJECT_NAME
Step 5: Remove IGF user from the RDS allocation when all the sequencing runs are finished and data transfer is over (IMPORTANT)
How to remove IGF user from the RDS allocation
Follow these steps for removing IGF user from the RDS allocation:
- Login to RCS selfservice portal portal using your Imperial College credentials
- Click on the Research Data Storage projects on the left panel
- Click on the correct “rds-xyz” id to access the ‘Membership” info for the selected RDS project
- Check if you have admin priviledges for this project or not (i.e., if Admin? column has yes value or not)
- Go to the row which has entry for user ‘igf’ and select the checkbox for Remove? column and click the Update button at the bottom of this page
Illumina Basespace Sequence Hub based file transfer (Discontinued)
Please note: We can only transfer fastq files via Basespace.
Fastq files from the sequencing runs can be uploaded to Illumina BaseSpace Sequence Hub based on your request. Following information are required for this specific mode of data transfer:
- Your basespace account email (existing account or a new free basic subscription account)
- Confirmation regarding the sample consent type
BaseSpace configuration:
- We use Basespace London region for data upload and share
- API server:
https://api.euw2.sh.basespace.illumina.com
- API server:
- Data can be download via Basespace cli or via browser
Data access via iRODS server (Discontinued)
A local installation of iRODS server is used for the data handover to the users. A copy of the data is kept in this server only for a limited time and then automatically removed after the data transfer deadline. Access to this server is restricted by the Imperial College’s firewall. Users are only allowed to access this server, once they are connected to the college’s network (either direct or VPN access).
Command line file transfer
Steps for setting up iRODS client in HPC CX1
Please follow these steps to set up the iRODS clients in hpc for the first time
- Create directory
.irods
under home (e.g.mkdir -p ~/.irods
) - Create iRODS environment file
~.irods/irods_environment.json
- Copy following configuration to the above mentioned file (replace USERNAME with your actual username) and validate file format using JSONLint
Authentication Type: Standard
Use your IGF login password for setting up iRODS account in HPC, if the authentication type is Standard. You should be receiving the account credentials in a separate email from IGF.
Click to expand
{
"irods_host": "eliot.med.ic.ac.uk",
"irods_port":1247,
"irods_default_resource": "woolfResc",
"irods_user_name": "YOUR_IGF_USERNAME",
"irods_zone_name": "igfZone"
}
Authentication Type: PAM
Use your Imperial login credential for setting up iRODS account in HPC, if the authentication type is PAM
Click to expand
{
"irods_host": "eliot.med.ic.ac.uk",
"irods_port":1247,
"irods_default_resource": "woolfResc",
"irods_user_name": "YOUR_HPC_USERNAME",
"irods_zone_name": "igfZone",
"irods_ssl_ca_certificate_file": "/apps/irods/certs/igf-chain.pem",
"irods_ssl_ca_certificate_path": "/apps/irods/certs",
"irods_ssl_verify_server": "cert",
"irods_authentication_scheme": "PAM"
}
Steps for command line transfer in HPC CX1
Step 1: Load irods tool (e.g. module load irods/4.2.0
)
Step 2: Set up your iRODS account using command iinit
and specify your password
Step 3: Download data using commandline tool iget
(e.g. iget -Pr /igfZone/home/USERNAME/PROJECT_NAME/PATH
)
Step 3.1: Download fastq data using commandline tool: iget -Pr /igfZone/home/USERNAME/PROJECT_NAME/fastq
Step 3.2: Download analysis data using commandline tool: iget -Pr /igfZone/home/USERNAME/PROJECT_NAME/analysis
Access QC report pages
QC report pages for the raw and anlysed data files are accessible from our ftp site and accessible via the following url format http://eliot.med.ic.ac.uk/report/project/PROJECTNAME
. You have to use the same login credentials for accessing these pages. For more details, please check Project QC Report section of the help page.
You can access these pages from your mobile device if you are connected to wifi network Imperial-WPA.
List of resources
Change logs
- None