All svn account holders need to familiarize themselves with the following rules.
Rules for contributing your code to be included in the SCHISM package
Here are some rules for preparing your own code:
- Name all your parallel codes .F90;
- No spaces between “#” (pre-processor) and if/else/end;
- Try to use the I/O channel number directly, e.g., read(61, etc instead of assigning a number to a variable (e.g. read(ich,). This'd facilitate others searching for conflicts.
Sharing I/O channels in SCHISM
You need to exercise caution when dealing with parallel I/O especially for writing. For writing outputs, you’d generally let only 1 process do the job, e.g.
If you do need to have all processes write e.g. debug messages, you’d then use channel 12 (see below).
Below are all I/O channel numbers used in different sub-models of SCHISM (bold numbers mean they have been used by other sub-models and you’d avoid using them). Please add your numbers as soon as you change the code so as to prevent others from using it. A good way to find out if a number is available is to issue the following cmd from src/:
grep "61" */*.F90 -->Looks for '61'
- Hydro/: Channels between 8 and 200 are used by various codes for I/O. In particular:
- 12: this channel is initialized by different processes to point to files outputs/nonfatal_xxxx, where “xxxx” are the process IDs. Therefore it’s very useful for debugging purpose; you can use it anywhere
- 101 to 100+noutput (inclusive of both): reserved for global outputs (including from tracers from sediment, EcoSim, ICM, as well as WWM);
- 201-250: non-standard outputs (e.g. at sidecenters, prism centers);
- 251 to 259: reserved for station outputs
- 10, 31, 32: used for one-off I/O – can be used by other sub-models as long as you close them immediately after use in your part of the code
- 16: this channel points to mirror.out (on rank 0), the main message output for info about the run. You should use it with if(myrank==0)
- 1100 to 1200: handles for inputs/outputs etc
- 600: outputting some messages
- 301 to 323: reading channels for non-point source inputs for ICM
- 5 and 10 (both for temporary reading)
If you need to profile some parts of the code, you can use the custom timer: timer_ns(). Here are the steps to set up a new timer:
- In Core/schism_glbl.F90: increase the dimension of timer_ns() for your number of timers needed;
- Insert the following segments around the portion of the code you want to profile:
cwtmp2=mpi_wtime() !start of timer
timer_ns(id)=timer_ns(id)+mpi_wtime()-cwtmp2 !end timing this section
('id' is the index of your timer)
- Compile with USE_TIMER turned on in Make.defs.<your env>
- The outputs can be found in: outputs/nonfatal_*; search for 'Custom timers in hours:'. You can sickle them together with:
grep "Custom timers in hours: <id>" outputs/nonfatal_* > mytimer.<id>
On many clusters, more sophisticated profilers exist. Check your local system.
Making sense of MPI inside SCHISM
If you are working on the code you may be confused about the exchanges inside SCHISM. When should you use these?
The first thing you need to remember when writing MPI code with domain decomposition is that a rank (or MPI 'process') only has info in its 'augmented' (=resident + ghost) domain, and knows absolutely nothing outside this region.
For example, you want to do averaging at each node of the sub-domain around its ball of elements.
do i=1,np !not 'npa'
sum1=0 sum_a=0 do j=1,nne(i) ie=indel(j,i) sum1=sum1+stokes(ie)*area(ie) sum_a=sum_a+area(ie) enddo !j stokes_nd=sum1/sum_a
Notice that you'd use 'np' instead of 'npa' (=np+npg; augmented domain) here. For any resident node 'P', indel() is well defined (because again, Rank 1 has info in its augmented domain including the ghost zone), and so the loops make sense. As long as all ranks have same info in each others' ghost zones (which is the purpose of exchange routines), 'stokes_nd' at an interface node (e.g. 'P') will be same across ranks.
However, for a ghost node 'Q', some surrounding elements are located outside the augmented domain (and in this case, indel() are actually negative!), and so if you use the same operation, erroneous values at ghost nodes would be generated. Therefore you cannot use npa in the outer loop.
Now, after this loop is executed, what happens to the ghost nodes like 'Q'? Since they are excluded, the 'stokes_nd' will be wrong there. Fortunately, some neighboring ranks have the correct values for these nodes, which are resident nodes in those neighboring ranks; e.g., 'Q' is a resident node of Rank 0. So now different ranks will have different values at some nodes, and this is to be avoided. In order to make sure each rank has correct (and up-to-date) values in its augmented domain, you need to follow this loop with an exchange routine. Remember that for the exchange routines to work, you need to define the exchanged array in the augmented domain:
allocate(stokes_nd(npa)) !not 'np'
in order to allow each rank to receive info in the ghost zone.
The above is just one example of when exchanges are needed. Usually this involves some kind of queries into a neighborhood, but beware that there are other circumstances where exchanges are necessary. The most important thing to remember is, again,
A rank only has info in its 'augmented' domain, and knows absolutely nothing outside this region.
In MPI code, it's crucial to make sure all ranks have identical values in overlapping zone.
Now let's consider an example where no exchanges are needed. Say you want to simply go thru all elements and update some arrays:
do i=1,nea !not 'ne'
Since all ranks have info of 'cde' and 'area' locally available, 'qdl_e' will be correctly calculated even inside the ghost zones, and its values are the same cross ranks there (since 'cde' and 'area' are same cross ranks in the overlapping zones). So in this case you'd use 'nea' instead of 'ne' in the loop. Of course, there is nothing wrong with using 'ne' in the loop followed by an exchange routine, but doing so would be less efficient and incur unnecessary communication cost.