Practice 4 - Data Management on Grid
Date: 06.03.2013
Deadline: 24.03.2013
- Review the lecture slides: PDF
- Exercise 1: Grid Data experiment
References for documentation
Referred documents and web sites contain supplementary information for the practice.
Glossary
- Grid acronyms and definitions can be found at EGI Glossary.
Documentation
Getting started
Read about the data management at ARC and the LFC user tutorial slides from the referred documents.
Log in to the ATIGrid machine:
ssh atigrid.mt.ut.ee
Star grid session
arcproxy -S balticgrid
Data management
There are many different protocols supported in Grid middlewares to access data.
ARC userguide says:
The following transfer protocols and metadata servers are supported:
- ftp - ordinary File Transfer Protocol (FTP)
- gsiftp - GridFTP, the Globus® -enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance [2]
- http - ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams
- https - HTTP with SSL v3
- httpg - HTTP with Globus® GSI
- ldap - ordinary Lightweight Data Access Protocol (LDAP)
- srm - Storage Resource Manager (SRM) service
- lfc - LFC catalog and indexing service of EGEE gLite
- file - local to the host file name with a full path
The most common way of accessing Grid Storage Elements is GSIFTP:
$ arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid
The GSIFTP protocol offers the functionalities of FTP, but with support for GSI. It is responsible for secure file transfers to/from Storage Elements. But with gridftp client you have to remember exactly to witch storage element you have upload your data and where are the replicas.
LCG File Catalogue (LFC)
Users and applications need to locate files (or replicas) on the Grid. The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s). The LCG File Catalogue (LFC) is the File Catalogue adopted by gLite 3.1, but other file catalogues are in use.
In atigrid you can several commands to manage data:
arcls, arccp, ...
commands are for data management at ARC UIlfc-*
commands are for LFClcg-*
commands are for copying files to SE and registering to LFC service and for downloading the files.- for more commands read the manuals
To use LFC be sure that you have set the following environment variables.
echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST bdii.balticgrid.org:2170 lfc lfc.balticgrid.org
If the value of any of them is different (not set) then reset them.
Setting environment variables in bash:
export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170 export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfc.balticgrid.org
All supported VO catalogues are located in the top-level catalogue /grid/
. For accessing files for balticgrid VO use:
lfc-ls -l /grid/balticgrid
Our course directory:
lfc-ls -l /grid/balticgrid/BGCC2013/
Please make your personal directory in the course directory (BGCC2013)
lfc-mkdir /grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname
Check if it was successful:
lfc-ls -l /grid/balticgrid/BGCC2013
You can set your personal directory your LCF home (then you should not use the absolute path to the files all the time):
export LFC_HOME=/grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname
Create yourself a file for testing:
echo "write here something" > text_file.txt
Upload it to LFC:
lcg-cr --vo balticgrid file://$PWD/text_file.txt -l lfn:text_file.txt -d se.grid.eenet.ee
The output of the command is the files GUID.
Sometimes you need only one file from the other branch of the LFC files tree (i.e. from Lab4 directory). Then you can create a link to the file and you can not use the absolute paths
$ lfc-ln -s /grid/balticgrid/BGCC2013/Lab4/Lab4_data.test Lab4_data_symlink.test
look:
lfc-ls -l
To download a file from LFC:
lcg-cp --vo balticgrid lfn:text_file.txt file://$PWD/text_file_copy.txt
Exercise 4.1 - Grid Data experiment
At LFC in /grid/balticgrid/BGCC2013/Lab4/
directory are 12 files named as Lab4_data(0-11)
. It is about 12GB data. One of the files contains your certificates Common Name (CN is your name without diacritical marks).
Prepare 12 grid jobs, which will take one of the files from Storage Element and try to find your name (CN) from the file.
'The script you run can not access the input file in the grid Storage Element directly. The job description has to download it from the LFC (hint is inputFiles'').
The result should be written to a text file:
- in which file and on which row was your CN. (hint - names begin with capital letters, and before and after the name might not be a space (python re.match)).
- What is the name of the machine your grid job was running (hint - uname -a)
- How much time used for running the your script (in seconds, python time.time)
You should copy your result file to your home directory at LFC (/grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname
)
For debugging your scripts there is smaller /grid/balticgrid/BGCC2013/Lab4/Lab4_data.test
file and it contains practical sessions supervisors names: "Hardi Teder", "Pelle Jakovits" and "Ilja Kromonov"
Deliverables: W4.*.xrsl and python script files and a W4_comments.txt file with the answers to the red questions, your comments and the command line outputs that proves you have been running the exercises. Also the your LFC home directory has to contain the result file.
Also read the Grid practical exercise solution format page to learn how to finalize and upload your solution.