Understanding Tier-3 storage¶

The Tier-3 offers different kinds of storage and it is important that you understand how to use them to their best advantage.

User home directories /t3home/${USER}¶

/t3home/${USER}: This is your home directory. It is relatively small (10 GB), but it is backed up daily. You should use it for your code, important documents, configuration files, etc. You should not use it for high I/O operations, since this file system is shared on all nodes between all users, it can easily get overloaded by I/O requests, which will typically result in delays (e.g. an ls will block for a few seconds before you get the output).

Checking quota

[feichtinger@t3ui01 ~]$ quota -s -f /t3home
Disk quotas for user feichtinger (uid 3896):
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
t3nfs:/t3/home1  66444K  %GREEN%10240M%ENDCOLOR%  11264M            5137   4295m   4295m

Accessing snapshots: You have access to a number of snapshots of the filesystem. This conserves the contents at the specific point of time when the snapshot was taken. You can list the snaphots by listing the hidden .snapshot directory entry in your home directory.

[feichtinger@t3ui01 ~]$ ls /t3home/${USER}/.snapshot
daily.2020-10-26_0010  daily.2020-10-27_0010  monthly.2020-10-01_0010  weekly.2020-10-25_0015

User working area /work/${USER}¶

The Tier-3 offers a larger storage area for each user (several 100 GB) under the /work directory. This area provides snapshots and nichtly backups are made. Even though it is implemented through a powerful storage server with SSDs, you should not use it for doing very intensive IO. For intensive read IO use the distributed storage of the SE.

To check your current usage and total available space, use the following command. It shows you a file where you can see a summary over of all users.

[feichtinger@t3ui01 ~]$ cat /work/USER_QUOTAS

You can list the available snapshots by explcitely using the hidden .zfs pseudo directory within the /work area.

[feichtinger@t3ui01 feichtinger]$ ls -a /work/.zfs/snapshot
.                      snap-20210201-215535        snap-daily-20210310-063001
..                     snap-daily-20210306-063001  snap-daily-20210311-063001
base-copy              snap-daily-20210307-063001  snap-daily-20210312-063001
snap-2021-01-26T18:59  snap-daily-20210308-063001
snap-2021-01-29T23:41  snap-daily-20210309-063001

Note that the .zfs directory is not a normal directory, and some GUI clients may not correctly display it. Use a terminal session in order to work with it.

The storage element (SE) user area /pnfs/psi.ch/cms/trivcat/store/user/${USER}¶

The SE is the main large storage that you can use. It can be accessed via different protocols that are available through a number of tools. The SEs allow you to transfer large files between sites, but they also provide efficient file access for analysis jobs. The test-dCacheProtocols script available on the Tier-3 tests your access through these protocols (refer to the account setup page).

You can directly access the SE through the NFS4.1 protocol like a normal local file system under the path /pnfs/psi.ch/cms/trivcat/store/user/${USER}. On the user interface nodes the filesystem in mounted in read/write mode, so you can copy files into your area and create new ones. This is what you want to use if you want the worker nodes to read numpy datasets directly via native python from the SE. On the worker nodes the file system can only be accessed in read only mode when using NFS. If you want to write files from the worker nodes, please copy whole files from scratch using gfal-copy or xrdcp.

Do not run commands like du or find on this file system: The area is very large and contains millions of files. Running such commands can take hours and has a heavy impact on the file system.

NOTE: even though it feels like a normal filesystem that can be reached just via a standard path, the underlying storage is not a normal standard file system and it is not fully POSIX standard compliant. One of the most pronounced differences is that files are immutable, i.e. you can not modify files and append to them once they have been created. But you can delete a file and then create a new one with the same name. So, e.g. you will not be able to edit a file with a text editor on that file system. These limitations are not relevant for typical analysis use cases where you copy whole files and have them accessed by your jobs.

A note on the available access protocols

WAN transfers (from outside of Tier-3 network): xroot, davfs: These protocols and the associated shell tools like gfal-*, xrdcp, and xrdfs are useful for copying whole files between sites. The old protocols gsiftp and SRM are being decommissioned on almost all sites, you should not use them anymore. The Tier-3's xrootd access point reachable from the outside is root://t3se01.psi.ch:1094. Don't use this external xroot access point for local transfers within the Tier-3. Outside connections are under different policies. Examples of transfer commands:
```
xrdcp root://t3se01.psi.ch:1094/store/user/feichtinger/source/tmp/targetfile
gfal-copy /tmp/srcfile davs://t3se01.psi.ch:2880/store/temp/user/MYUSER/tgtfile
```
LAN transfers (from nodes inside of the Tier-3): NFS, xroot: these protocols allow efficient random access to the files (don't use the ancient dcap protocol any more). If you want to use xroot, your application needs to provide support for this protocol. If this is not the case, e.g. if you want to analyze numpy files with native python, then you should use NFS (which conceptually is the easiest to use, since it mostly feels and behaves like a normal mounted filesystem where you can use cp). ATTENTION: The xroot access point you should use within the cluster is root://t3dcachedb03.psi.ch:1094. Examples for xrootd and NFS access:
```
xrdcp root://t3dcachedb03.psi.ch:1094/pnfs/psi.ch/cms/trivcat/store/user/feichtinger/source/tmp/targetfile

cp /pnfs/psi.ch/cms/trivcat/store/user/feichtinger/source /tmp/targetfile \</pre\>
```

Node local /scratch file system /scratch/${USER}¶

Each node, whether worker node or user interface, has a /scratch area. This is where you should perform tasks requiring intensive I/O operations . Your batch jobs should produce files in this area on the local node, and only at the end of the job move the whole file to the final target, e.g. to the SE.

Backup policies¶

/t3home: Files are back-upped daily and are available through snapshots
/work: Files are back-upped nightly. /work resides on a RaidZ1 (like RAID5) storage configuration
/pnfs (Storage element): Files are not backed up. But they reside on a high quality storage with some reduncancy.

Recovering files from snapshots for /t3home and /work is lined out in this article.

Watch out!

THERE ARE NO BACKUPS of a node's /scratch areas!