How to run Keras on GPU
Getting an account for EENet grid
Then go to https://taat.grid.ee/ and choose Log in via: TAAT from the top right corner. NB! Only people studying in Estonian universities can create this account. After logging in, at some point you are prompted for SSH keys. Upload your public key.
Installing Python Virtualenv
EENet provides Python 2.6 as default, but Python 2.7 is available as well. As of 01.2016 Python 2.7 is still safe bet compared to Python 3.
Download and unpack Virtualenv:
curl -O https://pypi.python.org/packages/source/v/virtualenv/virtualenv-13.1.2.tar.gz tar xzvf virtualenv-13.1.2.tar.gz cd virtualenv-13.1.2
Create and activate a new virtual environment using Python 2.7:
python2.7 virtualenv.py ~/venv_keras source ~/venv_keras/bin/activate
python --version gives you Python 2.7 now.
pip install git+git://github.com/Theano/Theano.git
Together with Theano Numpy, Scipy and six are installed as well. Installation of Scipy can take some time.
To verify run
python -c 'import theano'. If it doesn't give any errors, then you are fine.
git clone https://github.com/fchollet/keras.git cd keras python setup.py install
I suggest installing Keras from local folder, because you probably want to check out the examples and occasionally peek into the source code to figure out errors.
To verify run
python -c 'import keras'. No errors means it works.
Create a file
~/.theanorc with following contents:
[global] floatX = float32 device = gpu0 [nvcc] fastmath = True
If you now run
python -c 'import theano', you should be getting an error:
FATAL: Module nvidia not found. WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
The problem is, that you are logged in to gateway server
juur.grid.eenet.ee. GPUs are attached to nodes
idu41. To run a command on one of the GPU nodes you need to use SLURM. For example:
srun --partition=gpu --gres=gpu:1 --constraint=K20 python -c 'import theano' Using gpu device 0: Tesla K20m (CNMeM is disabled)
The command line options used above:
--partition=gpu- restricts execution to only nodes in
gpupartition. Use command
sinfoto see list of partitions and their restrictions.
--gres=gpu:1- ask for node with 1 GPU.
--constraint=K20- restrict selection to only Tesla K20 GPU-s (faster).
--mem=12000- sometimes you may want to increase the memory limit given to jobs by default. 12000 means 12GB.
NB! You must activate Virtualenv with
source ~/venv_keras/bin/activate every time before running Keras on GPU.
Tips and tricks
Writing out those long srun commands can quickly get tedious. To simplify you can create an alias in the
alias srungpu="srun --partition=gpu --gres=gpu:1 --constraint=K20" alias nrungpu="nohup srun --partition=gpu --gres=gpu:1 --constraint=K20 --mem=12000"
After logging out and in again you can run scripts simply with this:
For longer jobs use nohup command that doesn't terminate the process when you log out. For example to run above example overnight:
nrungpu python ~/keras/examples/lstm_text_generation.py >lstm_text_generation.log &
>lstm_text_generation.log redirects the command output to file and
& puts the process to background.
To see list of active SLURM jobs run
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 558938 gpu python hpc_tamb R 13:43 1 idu40 558782 gpu bash hpc_kuz R 7-08:39:48 1 idu39 558778 gpu bash hpc_kuz R 7-11:16:14 1 idu41 558925 long EUWEST_6 elmer R 5:34:05 1 idu38 558928 long EESTI1_3 elmer R 5:29:35 1 idu04 558927 long EESTI1_2 elmer R 5:29:36 1 idu12 558923 long EESTI1_6 elmer R 11:33:00 1 idu11 558920 long R hpc_tane R 14:14:06 1 idu29
To cancel running job use
scancel <JOBID>, for example