User Tools

Site Tools


copying_data_to_the_nodes

Copying Data to the Nodes

If you have a job that reads the same data file many times, or makes many “random” accesses to a data file, it may be more efficient to have that data locally on a node than compete with other users to access the file server.

Each node has almost 1TB of space mounted on /tmp. This /tmp space is local to each node.

So, you could copy your data to the node and access it locally, than when your job is done, copy the results back to your home directory.

Note: If what your program does is read a file strictly sequentially just once, this copy is unlikely to help.

There is a couple of options for doing the copy.

1) Do it directly in your script..

cp /home/username/mydata.fastq /tmp
... Run your process on the data in /tmp ...
rm /tmp/mydata.fastq

(Really you would use mktemp to get a unique name to avoid clashes.)

Should be careful if you have multiple copies of your script running on a node: you could be copying the data multiple times.

2) Use Secure Copy(SCP) or Secure FTP(SFTP)

For detailed explanation refer How to use SCP and SFTP to securely transfer files

  • Copy files with SCP - To copy local file filename to the directory /tmp on the remote server at 192.168.1.3.
scp filename user@192.268.1.3:/tmp/
  • Copy files with SFTP - To transfer the local file /etc/filename to /tmp on the remote server.
$  sftp username@192.168.1.3
sftp> put /etc/filename /tmp/

3) Use Globus

Use Globus to transfer files with a GUI interface or to transfer very large files.

Globus is a web based file transfer application that allows resilient, unattended file transfers between two Globus endpoints. Start the transfer and Globus ensures it completes successfully and sends email when the transfer is done. Globus may be preferable to SCP or SFTP when transferring very large files because it does so unattended, in the background, with status checking and fault tolerance.

- How to Globus in the BRC cluster

There are two ways to use the “Globus Connect Personal” client in the BRC cluster. Below steps explain the text mode version. This requires both your web browser and a Unix terminal connected to the BRC cluster.

1) Load the latest Globus module

module load globuspersonal/3.2.2

2) Setup the client

globusconnectpersonal -setup

The program is going to create a URL that you need to copy and paste into the browser of your personal computer.In your browser follow all the instructions to login and authenticate into your Globus account.

At some point, it will show a page with an authorization code. Copy the code from the web browser and paste it into the Linux SSH terminal window at the prompt it says 'Enter the auth code: '.

Then it will ask for a name for your new Globus Endpoint, the prompt says “Input a value for the Endpoint Name: ”.

You can choose any name that makes sense when referring to the BRC cluster. Recommendation is to enter the answer: BRC Cluster.

The program is going to exit and return to the Linux command line.

3) Using the endpoint to transfer files

At this point you can start the client any time by doing the below

module load globuspersonal/3.2.2
globusconnectpersonal -start

While the client is running on the BRC cluster, you can access and transfer your files from the web Globus interface by searching the endpoint in the search bar. you can type BRC cluster, or navigate until you see “Your Collections” and choose the BRC cluster Endpoint.

As we transfer large amounts of data, it would be better to keep the Globus Connect client up and running by executing the below

nohup globusconnectpersonal -start &

Remember that the setup only gets done once. After that you can start the client.

Please refer below resources to learn how to transfer files using Globus.

copying_data_to_the_nodes.txt · Last modified: 2024/10/17 13:33 by yrchemut