We will try to compile and run a simple coarray-using program. For Linux, the following installation steps seem to work:
1. Install gfortran
This should work using the package manager, so e. g. sudo apt-get install gfortran should work on Ubuntu
2. Install OpenMPI for Coarray Fortran
Coarray Fortran uses MPI internally. sudo apt-get install libcaf-openmpi-3 should install the openmpi part for Coarray Fortran, and also the OpenMPI libraries themselves as dependencies. If you get an error when compiling that some MPI library is missing, it might be necessary to install OpenMPI separately.
3. Install Coarray Fortran binaries
Try installing a package called open-coarrays-bin. If this does not work, follow https://askubuntu.com/questions/1277932/cannot-install-open-coarrays-bin-for-gfortran-on-ubuntu-20-04
Once this is done, it should be possible to compile and run this program. Read the program and understand what it does first.
program main
  implicit none
  integer, parameter :: blocks_per_image = 2**16
  integer, parameter :: block_size = 2**10
  real, dimension(block_size) :: x, y
  integer :: in_circle[*]
  integer :: i, n_circle, n_total
  real :: step, xfrom
  n_total = blocks_per_image * block_size * num_images()
  step = 1./real(num_images())
  xfrom = (this_image() - 1) * step
  in_circle = 0
  do i=1, blocks_per_image
     call random_number(x)
     call random_number(y)
     in_circle = in_circle + count((xfrom + step * x)** 2 + y**2 < 1.)
  end do
  sync all
  if (this_image() == 1) then
     n_circle = in_circle
     do i=2, num_images()
        n_circle = n_circle + in_circle[i]
     end do
     print *,"pi/4 is approximately", real(n_circle)/real(n_total), "exact", atan(1.)
  end if
end program main
Compilation is done using: caf montecarlo.f90 -o montecarlo.
The program can be run using: cafrun -n 4 ./montecarlo.
Alternatively, compiling the program into a single-image version is possible with: gfortran -fcoarray=single montecarlo.f90 -o montecarlo. This can be run directly with ./montecarlo.
You can measure the time the exeuction takes with the time command (just put time before the execution command). Since the program uses a fix number in each image, the "real" time (time from start to end) should not go down, the user time (real time spent over cores) should go up and the result should become more precise.