Arvutiteaduse instituut
  1. Kursused
  2. 2012/13 kevad
  3. Gridi ja pilvetehnoloogia alused (MTAT.08.011)
EN
Logi sisse

Gridi ja pilvetehnoloogia alused 2012/13 kevad

  • Main
  • Lectures
  • Practicals
  • Links
  • Results
  • Submit Homework

Practice 4 - Data Management on Grid

Date: 06.03.2013
Deadline: 24.03.2013

  • Review the lecture slides: PDF
  • Exercise 1: Grid Data experiment

References for documentation

Referred documents and web sites contain supplementary information for the practice.

Glossary

  • Grid acronyms and definitions can be found at EGI Glossary.

Documentation

  • NorduGrid ARC documentation
  • The NorduGrid ARC User Guide
  • LFC User tutorial

Getting started

Read about the data management at ARC and the LFC user tutorial slides from the referred documents.

Log in to the ATIGrid machine:

ssh atigrid.mt.ut.ee

Star grid session

arcproxy -S balticgrid

Data management

There are many different protocols supported in Grid middlewares to access data.

ARC userguide says:

The following transfer protocols and metadata servers are supported:

  • ftp - ordinary File Transfer Protocol (FTP)
  • gsiftp - GridFTP, the Globus® -enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance [2]
  • http - ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams
  • https - HTTP with SSL v3
  • httpg - HTTP with Globus® GSI
  • ldap - ordinary Lightweight Data Access Protocol (LDAP)
  • srm - Storage Resource Manager (SRM) service
  • lfc - LFC catalog and indexing service of EGEE gLite
  • file - local to the host file name with a full path

The most common way of accessing Grid Storage Elements is GSIFTP:

$ arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid

The GSIFTP protocol offers the functionalities of FTP, but with support for GSI. It is responsible for secure file transfers to/from Storage Elements. But with gridftp client you have to remember exactly to witch storage element you have upload your data and where are the replicas.

LCG File Catalogue (LFC)

Users and applications need to locate files (or replicas) on the Grid. The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s). The LCG File Catalogue (LFC) is the File Catalogue adopted by gLite 3.1, but other file catalogues are in use.

In atigrid you can several commands to manage data:

  • arcls, arccp, ... commands are for data management at ARC UI
  • lfc-* commands are for LFC
  • lcg-* commands are for copying files to SE and registering to LFC service and for downloading the files.
  • for more commands read the manuals

To use LFC be sure that you have set the following environment variables.

echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST
bdii.balticgrid.org:2170
lfc
lfc.balticgrid.org

If the value of any of them is different (not set) then reset them.

Setting environment variables in bash:

export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170
export LCG_CATALOG_TYPE=lfc
export LFC_HOST=lfc.balticgrid.org

All supported VO catalogues are located in the top-level catalogue /grid/. For accessing files for balticgrid VO use:

lfc-ls -l /grid/balticgrid

Our course directory:

lfc-ls -l /grid/balticgrid/BGCC2013/

Please make your personal directory in the course directory (BGCC2013)

lfc-mkdir /grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname

Check if it was successful:

lfc-ls -l /grid/balticgrid/BGCC2013

You can set your personal directory your LCF home (then you should not use the absolute path to the files all the time):

export LFC_HOME=/grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname

Create yourself a file for testing:

echo "write here something" > text_file.txt

Upload it to LFC:

lcg-cr --vo balticgrid file://$PWD/text_file.txt -l lfn:text_file.txt -d se.grid.eenet.ee

The output of the command is the files GUID.

Sometimes you need only one file from the other branch of the LFC files tree (i.e. from Lab4 directory). Then you can create a link to the file and you can not use the absolute paths

$ lfc-ln -s /grid/balticgrid/BGCC2013/Lab4/Lab4_data.test Lab4_data_symlink.test

look: lfc-ls -l

To download a file from LFC:

lcg-cp --vo balticgrid lfn:text_file.txt file://$PWD/text_file_copy.txt

Exercise 4.1 - Grid Data experiment

At LFC in /grid/balticgrid/BGCC2013/Lab4/ directory are 12 files named as Lab4_data(0-11). It is about 12GB data. One of the files contains your certificates Common Name (CN is your name without diacritical marks).

Prepare 12 grid jobs, which will take one of the files from Storage Element and try to find your name (CN) from the file.

'The script you run can not access the input file in the grid Storage Element directly. The job description has to download it from the LFC (hint is inputFiles'').

The result should be written to a text file:

  • in which file and on which row was your CN. (hint - names begin with capital letters, and before and after the name might not be a space (python re.match)).
  • What is the name of the machine your grid job was running (hint - uname -a)
  • How much time used for running the your script (in seconds, python time.time)

You should copy your result file to your home directory at LFC (/grid/balticgrid/BGCC2013/Firstname_Middlenames_Lastname)

For debugging your scripts there is smaller /grid/balticgrid/BGCC2013/Lab4/Lab4_data.test file and it contains practical sessions supervisors names: "Hardi Teder", "Pelle Jakovits" and "Ilja Kromonov"

Deliverables: W4.*.xrsl and python script files and a W4_comments.txt file with the answers to the red questions, your comments and the command line outputs that proves you have been running the exercises. Also the your LFC home directory has to contain the result file.

Also read the Grid practical exercise solution format page to learn how to finalize and upload your solution.

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
euroopa sotsiaalfondi logo