Difference between revisions of "Debugging after a run crashes"

Latest revision as of 09:43, 30 June 2014

The best way to find out the reason for a crash is to visualize the surface velocity with ACE/xmvis6. Usually you'll see some large/noisy velocity somewhere, which may give you some hints on forcing etc.

Sometimes you want to visualize the problem right before the crash. You cannot use autocombine_MPI_elfe.pl as the last stack of output is incomplete. But you can use the core FORTRAN combine script (e.g., combine_output6) to directly combine an incomplete stack. Just follow the instruction in the header of combine_output6.f90 to prepare the inputs and run. Then visualize the combined outputs with xmvis6.

@@ Line 1: / Line 1: @@
 The best way to find out the reason for a crash is to visualize the surface velocity with ACE/xmvis6. Usually you'll see some large/noisy velocity somewhere, which may give you some hints on forcing etc.
-Sometimes you want to visualize the problem right before the crash. Here is the way using the hotstart option.
+Sometimes you want to visualize the problem right before the crash. You cannot use autocombine_MPI_elfe.pl as the last stack of output is incomplete. But you can use the core FORTRAN combine script (e.g., combine_output6) to directly combine an incomplete stack. Just follow the instruction in the header of combine_output6.f90 to prepare the inputs and run. Then visualize the combined outputs with xmvis6.
-Suppose you run crashed right after time step it=1005 (you can find out this in mirror.out; note that "TIME STEP= " is written AFTER a step is completed), and the closest hotstart output (in outputs/) has a step of 900.
-First save any outputs that may be overwritten upon ihot=2:
-<UL>
-  <LI>mv mirror.out mirror.out.0
-  <LI>mv hotstart.in hotstart.in.0
-  <LI>mv outputs outputs.0
-  <LI>mkdir outputs
-</UL>
-....
-The third move is necessary as we are going to change the stack size (ihfskip).
-Combine hotstart outputs at it=900 using combine_hotstart*.f90 to generate a new hotstart.in, and then move it to the same dir as hgrid.gr3.
-Then set ihot=2 in param.in. Also set nspool and ihfskip, and hotout_write to 1005. Start the run with same number of CPUs. Occasionally, the hotstarted run will crash at a different step, say 1006, and if this is the case, reset nspool and ihfskip, and hotout_write to 1006 and redo it. The 2nd time should work.
-You'll see 2 stacks coming out after the crash. Combine the 1st stack and then viz.

Difference between revisions of "Debugging after a run crashes"

Latest revision as of 09:43, 30 June 2014

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools