Skip to content
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
856821c
Add pre/post callback function args to advance_state
thomasgibson May 28, 2021
72ceaf6
Pass checkpoint functions as pre-step callbacks
thomasgibson May 28, 2021
2814826
Add healthcheck callback and exceptions
thomasgibson May 28, 2021
25dd51c
Use healthcheck callbacks in examples
thomasgibson May 28, 2021
7ee673c
Fix docstring for sim_healthcheck
thomasgibson May 28, 2021
9dba4ff
Document callback signature and return state
thomasgibson Jun 4, 2021
f1b3e4f
Test the basic healthcheck callback
thomasgibson Jun 4, 2021
faba167
Remove EOS copy-pasta
thomasgibson Jun 4, 2021
0b6df95
Minor refactoring and allow callbacks to terminate the simulation
thomasgibson Jun 5, 2021
b656079
Fix exception docs
thomasgibson Jun 5, 2021
2acdf73
Expand documentation of callbacks
thomasgibson Jun 5, 2021
5405604
Simplify sim_checkpoint and modularize callbacks
thomasgibson Jun 5, 2021
ca7c37b
Update steppers
thomasgibson Jun 6, 2021
da65cd2
Write short test for comparison callback
thomasgibson Jun 6, 2021
2523fbd
Renaming: {cfd_healthcheck,StepperCrashError} -> {sim_healthcheck,Syn…
thomasgibson Jun 7, 2021
8284d8a
Simplify drivers; move exception handling into sim_checkpoint
thomasgibson Jun 7, 2021
00d29a0
Merge branch 'main' into thg/callbacks
thomasgibson Jun 8, 2021
cba127b
Fix callback tests
thomasgibson Jun 8, 2021
5c091bd
Merge branch 'thg/callbacks' of github.com:illinois-ceesd/mirgecom in…
thomasgibson Jun 8, 2021
4c938fa
Clean up sim_checkpoint function
thomasgibson Jun 8, 2021
0c2e986
Update simutils/health/checkpoint - move most to driver, simplify
MTCam Jun 11, 2021
169b92d
Merge branch 'main' into mrgmain
MTCam Jun 11, 2021
15bdf3d
Port examples into place, slight adjustment/cleanup/corrections for s…
MTCam Jun 11, 2021
68302b4
Update tests to match the facilities.
MTCam Jun 11, 2021
8b506e1
Twiddle to test both pressure and mass for first bad state.
MTCam Jun 11, 2021
5043f86
Merge branch 'main' into recallbacks
MTCam Jun 11, 2021
51ade9b
Sharpen the error sync.
MTCam Jun 11, 2021
6833902
Update error handling to match review suggestion by @thomasgibson.
MTCam Jun 11, 2021
080513e
Udpate error handling per @thomasgibson review.
MTCam Jun 11, 2021
70ff6dc
Sharpen the handling of post-stepping (exceptional) io per @majosm.
MTCam Jun 12, 2021
cccff7f
Sharpen the valid ranges specific to each example, and remove default…
MTCam Jun 12, 2021
1772b86
Correct call to naninf in test.
MTCam Jun 12, 2021
a9b7990
Use verbs for function names.
MTCam Jun 12, 2021
100df77
Use new name for visfile util, switch autoignition and vortex over to…
MTCam Jun 13, 2021
6f7ef04
Clean up logging a bit, attempt to use nlog to control logging interval.
MTCam Jun 13, 2021
4376604
Merge branch 'main' into recallbacks
MTCam Jun 15, 2021
2d37e96
Merge branch 'main' into recallbacks
MTCam Jun 15, 2021
c74b9f5
Merge branch 'main' into recallbacks
MTCam Jun 19, 2021
1ca252e
Merge branch 'main' into recallbacks
MTCam Jun 23, 2021
4c94a7d
Fix merge errors
MTCam Jun 23, 2021
6e1e913
Merge branch 'main' into recallbacks
MTCam Jun 25, 2021
fa9aab6
Merge branch 'main' into recallbacks
MTCam Jun 25, 2021
3e34bcc
Merge branch 'main' into recallbacks
MTCam Jun 29, 2021
7f38ed1
Merge branch 'main' into recallbacks
MTCam Jun 30, 2021
ffb35b8
Merge branch 'main' into recallbacks
MTCam Jun 30, 2021
293d28f
Refactor autoignit just a bit
MTCam Jul 2, 2021
8443a66
Refactor autoignition example per discussions.
MTCam Jul 3, 2021
d2b2c68
Bring examples up-to-date with current stepper API
MTCam Jul 3, 2021
9a5fcf3
Add DT health check to autoignition.
MTCam Jul 3, 2021
de48784
Modernize lump example
MTCam Jul 6, 2021
a5e5fc4
remove unused exceptions module, for now
MTCam Jul 6, 2021
42ea719
remove unused exceptions module, for now
MTCam Jul 6, 2021
0d74f9c
fix api/signature and doc
MTCam Jul 6, 2021
ae90d9f
Update stepper API to current
MTCam Jul 7, 2021
d092b82
Modernize more examples.
MTCam Jul 7, 2021
ff489e1
Fix simutil import
MTCam Jul 7, 2021
aa07513
Fix up steppers to call get_timestep properly
MTCam Jul 7, 2021
1464b9b
Fix up inviscid_sim_timestep to properly consider t_remaining
MTCam Jul 7, 2021
3dfdb21
Modernize examples.
MTCam Jul 7, 2021
926e76a
Correct misnamed restart parameter.
MTCam Jul 7, 2021
cd95207
Merge branch 'main' into recallbacks
MTCam Jul 7, 2021
82eb938
Reduce number of steps to speed up CI.
MTCam Jul 8, 2021
02b37d8
Unify call signatures of user-defined utilities.
MTCam Jul 8, 2021
7d6c988
Correct some dumb issues with the examples.
MTCam Jul 8, 2021
ab06ff7
Correct errant call-sites to new utility signatures.
MTCam Jul 8, 2021
9c63136
Call logmgr_set_time on restart.
MTCam Jul 8, 2021
1b4df75
Deprecate logmgr, dim, eos args to stepper.
MTCam Jul 8, 2021
c857e06
Correct mistake in input restart file naming.
MTCam Jul 8, 2021
bdd4d90
Correct mistake in input restart file naming.
MTCam Jul 8, 2021
1a7e8e3
Correct import location for logmgr_set_time
MTCam Jul 8, 2021
6a7d20c
Document dt, deprecate get_timestep arg
MTCam Jul 8, 2021
ea98b59
Update examples to use exceptions to clean up error handling, evict g…
MTCam Jul 9, 2021
d7bca23
Merge branch 'main' into recallbacks
MTCam Jul 9, 2021
72505bc
Correct finished check, return a proper constant dt result, some othe…
MTCam Jul 9, 2021
544a63f
Merge branch 'yupdayt-recallbacks' into recallbacks
MTCam Jul 9, 2021
3c3bbd5
Correct and enhance the restart processing to demonstrate change-of-o…
MTCam Jul 9, 2021
41f78ba
Update restart logic, add missing final dump advice.
MTCam Jul 9, 2021
5c3ab55
Use built-in exceptions instead of custom ones.
MTCam Jul 9, 2021
c544ddd
Satisfy pylint to raise an actual named exception inside try.
MTCam Jul 9, 2021
810ab46
Fix up the documentation to have the correct signature for callbacks
MTCam Jul 9, 2021
549a8dd
Massage exceptions a bit so that the error messages are a little more…
MTCam Jul 9, 2021
d51755a
Merge branch 'exceptions-massage' into recallbacks
MTCam Jul 9, 2021
af009c3
Rearrange function def order for consistency and less "tism" activation.
MTCam Jul 9, 2021
5306e30
Tweak exception handling to our liking.
MTCam Jul 9, 2021
5bd30d8
Add syncing utility and use it in examples.
MTCam Jul 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/support/tools.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Random Pile'o'Tools
===================

.. automodule:: mirgecom.exceptions
.. automodule:: mirgecom.simutil

.. automodule:: mirgecom.utils
82 changes: 53 additions & 29 deletions examples/autoignition-mpi.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,8 @@
from mirgecom.euler import euler_operator
from mirgecom.simutil import (
inviscid_sim_timestep,
sim_checkpoint,
check_step,
generate_and_distribute_mesh,
ExactSolutionMismatch
generate_and_distribute_mesh
)
from mirgecom.io import make_init_message
from mirgecom.mpi import mpi_entry_point
Expand All @@ -52,6 +50,7 @@
from mirgecom.boundary import AdiabaticSlipBoundary
from mirgecom.initializers import MixtureInitializer
from mirgecom.eos import PyrometheusMixture

import cantera
import pyrometheus as pyro

Expand Down Expand Up @@ -81,6 +80,7 @@ def main(ctx_factory=cl.create_some_context, use_leap=False):
constant_cfl = False
nstatus = 1
nviz = 5
nhealth = 1
rank = 0
checkpoint_t = current_t
current_step = 0
Expand All @@ -91,7 +91,6 @@ def main(ctx_factory=cl.create_some_context, use_leap=False):
timestepper = rk4_step
box_ll = -0.005
box_ur = 0.005
error_state = False
debug = False

from mpi4py import MPI
Expand Down Expand Up @@ -223,25 +222,56 @@ def my_rhs(t, state):
+ eos.get_species_source_terms(state))

def my_checkpoint(step, t, dt, state):
reaction_rates = eos.get_production_rates(state)
viz_fields = [("reaction_rates", reaction_rates)]
return sim_checkpoint(discr, visualizer, eos, cv=state,
vizname=casename, step=step,
t=t, dt=dt, nstatus=nstatus, nviz=nviz,
constant_cfl=constant_cfl, comm=comm,
viz_fields=viz_fields)

try:
(current_step, current_t, current_state) = \
advance_state(rhs=my_rhs, timestepper=timestepper,
checkpoint=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final)
except ExactSolutionMismatch as ex:
error_state = True
current_step = ex.step
current_t = ex.t
current_state = ex.state
from mirgecom.simutil import check_step
do_status = check_step(step=step, interval=nstatus)
do_viz = check_step(step=step, interval=nviz)
do_health = check_step(step=step, interval=nhealth)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need some additional logic to get a final dump when the final step isn't a multiple of the check interval.

Something like

def my_checkpoint(step, t, dt, state):
    done = t >= t_final
    do_status = done or check_step(step=step, interval=nstatus)
    do_viz = done or check_step(step=step, interval=nviz)
    do_health = done or check_step(step=step, interval=nhealth)

    # ... stuff ...

    return state, done

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see what you think about the force argument added in (70ff6dc). I like the force argument because it is a directive as opposed to the passive/automatic done = t >= t_final.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with either for now, but I think the done version moves us a little closer to #257. Currently in that PR, checkpoint is called after the timestep (so there is no final checkpoint call in the driver that you could force), and also checkpoint is what decides whether the simulation is finished or not (so done will be present anyway).


if do_status or do_viz or do_health:
dv = eos.dependent_vars(state)
reaction_rates = eos.get_production_rates(state)
io_fields = [
("cv", state),
("dv", dv),
("reaction_rates", reaction_rates)
]
Comment thread
majosm marked this conversation as resolved.
Outdated

if do_status: # This is bad, logging already completely replaces this
Comment thread
MTCam marked this conversation as resolved.
Outdated
from mirgecom.io import make_status_message
status_msg = make_status_message(discr=discr, t=t, step=step, dt=dt,
cfl=current_cfl, dependent_vars=dv)
if rank == 0:
logger.info(status_msg)

errors = 0
if do_health:
from mirgecom.simutil import check_naninf_local, check_range_local
if check_naninf_local(discr, "vol", dv.pressure) \
or check_range_local(discr, "vol", dv.pressure):
errors = 1
message = "Invalid pressure data found.\n"
errors = discr.mpi_communicator.allreduce(errors, op=MPI.SUM)
if errors > 0:
if rank == 0:
logger.info("Fluid solution failed health check.")
logger.info(message) # do this on all ranks
Comment thread
majosm marked this conversation as resolved.
Outdated

if do_viz or errors > 0:
from mirgecom.simutil import sim_visualization
sim_visualization(discr, io_fields, visualizer, vizname=casename,
step=step, t=t, overwrite=True)
Comment thread
MTCam marked this conversation as resolved.
Outdated

if errors > 0:
a = 1/0
print(f"{a=}")
Comment thread
thomasgibson marked this conversation as resolved.
Outdated

return state

current_step, current_t, current_state = \
advance_state(rhs=my_rhs, timestepper=timestepper,
pre_step_callback=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final, eos=eos, dim=dim)

if not check_step(current_step, nviz): # If final step not an output step
if rank == 0:
Expand All @@ -250,12 +280,6 @@ def my_checkpoint(step, t, dt, state):
dt=(current_t - checkpoint_t),
state=current_state)

if current_t - t_final < 0:
error_state = True

if error_state:
raise ValueError("Simulation did not complete successfully.")


if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
Expand Down
83 changes: 62 additions & 21 deletions examples/lump-mpi.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@
from mirgecom.euler import euler_operator
from mirgecom.simutil import (
inviscid_sim_timestep,
sim_checkpoint,
generate_and_distribute_mesh,
ExactSolutionMismatch
generate_and_distribute_mesh
)
from mirgecom.io import make_init_message
from mirgecom.mpi import mpi_entry_point
Expand Down Expand Up @@ -81,6 +79,7 @@ def main(ctx_factory=cl.create_some_context, use_leap=False):
boundaries = {BTAG_ALL: PrescribedBoundary(initializer)}
constant_cfl = False
nstatus = 1
nhealth = 1
nviz = 1
rank = 0
checkpoint_t = current_t
Expand Down Expand Up @@ -131,21 +130,66 @@ def my_rhs(t, state):
boundaries=boundaries, eos=eos)

def my_checkpoint(step, t, dt, state):
return sim_checkpoint(discr, visualizer, eos, cv=state,
exact_soln=initializer, vizname=casename, step=step,
t=t, dt=dt, nstatus=nstatus, nviz=nviz,
exittol=exittol, constant_cfl=constant_cfl, comm=comm)

try:
(current_step, current_t, current_state) = \
advance_state(rhs=my_rhs, timestepper=timestepper,
checkpoint=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final)
except ExactSolutionMismatch as ex:
current_step = ex.step
current_t = ex.t
current_state = ex.state
from mirgecom.simutil import check_step
do_status = check_step(step=step, interval=nstatus)
do_viz = check_step(step=step, interval=nviz)
do_health = check_step(step=step, interval=nhealth)

if do_status or do_viz or do_health:
from mirgecom.simutil import compare_fluid_solutions
dv = eos.dependent_vars(state)
exact_mix = initializer(x_vec=nodes, eos=eos, t=t)
component_errors = compare_fluid_solutions(discr, state, exact_mix)
resid = state - exact_mix
io_fields = [
("cv", state),
("dv", dv),
("exact_mix", exact_mix),
("resid", resid)
]

if do_status: # This is bad, logging already completely replaces this
from mirgecom.io import make_status_message
status_msg = make_status_message(discr=discr, t=t, step=step, dt=dt,
cfl=current_cfl, dependent_vars=dv)
status_msg += (
"\n------- errors="
+ ", ".join("%.3g" % en for en in component_errors))
if rank == 0:
logger.info(status_msg)

errors = 0
if do_health:
from mirgecom.simutil import check_naninf_local, check_range_local
if check_naninf_local(discr, "vol", dv.pressure) \
or check_range_local(discr, "vol", dv.pressure):
errors = 1
message = "Invalid pressure data found.\n"
if np.max(component_errors) > exittol:
errors = errors + 1
message += "Solution errors exceed tolerance.\n"
errors = discr.mpi_communicator.allreduce(errors, op=MPI.SUM)
if errors > 0:
if rank == 0:
logger.info("Fluid solution failed health check.")
logger.info(message) # do this on all ranks

if do_viz or errors > 0:
from mirgecom.simutil import sim_visualization
sim_visualization(discr, io_fields, visualizer, vizname=casename,
step=step, t=t, overwrite=True)

if errors > 0:
a = 1/0
print(f"{a=}")

return state

current_step, current_t, current_state = \
advance_state(rhs=my_rhs, timestepper=timestepper,
pre_step_callback=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final, eos=eos, dim=dim)

# if current_t != checkpoint_t:
if rank == 0:
Expand All @@ -154,9 +198,6 @@ def my_checkpoint(step, t, dt, state):
dt=(current_t - checkpoint_t),
state=current_state)

if current_t - t_final < 0:
raise ValueError("Simulation exited abnormally")


if __name__ == "__main__":
logging.basicConfig(format="%(message)s", level=logging.INFO)
Expand Down
90 changes: 62 additions & 28 deletions examples/mixture-mpi.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@
from mirgecom.euler import euler_operator
from mirgecom.simutil import (
inviscid_sim_timestep,
sim_checkpoint,
generate_and_distribute_mesh,
ExactSolutionMismatch
generate_and_distribute_mesh
)
from mirgecom.io import make_init_message
from mirgecom.mpi import mpi_entry_point
Expand Down Expand Up @@ -78,6 +76,7 @@ def main(ctx_factory=cl.create_some_context, use_leap=False):
current_t = 0
constant_cfl = False
nstatus = 1
nhealth = 1
nviz = 1
rank = 0
checkpoint_t = current_t
Expand All @@ -89,7 +88,6 @@ def main(ctx_factory=cl.create_some_context, use_leap=False):
timestepper = rk4_step
box_ll = -5.0
box_ur = 5.0
error_state = 0

from mpi4py import MPI
comm = MPI.COMM_WORLD
Expand Down Expand Up @@ -152,24 +150,66 @@ def my_rhs(t, state):
boundaries=boundaries, eos=eos)

def my_checkpoint(step, t, dt, state):
global checkpoint_t
checkpoint_t = t
return sim_checkpoint(discr, visualizer, eos, cv=state,
exact_soln=initializer, vizname=casename, step=step,
t=t, dt=dt, nstatus=nstatus, nviz=nviz,
exittol=exittol, constant_cfl=constant_cfl, comm=comm)

try:
(current_step, current_t, current_state) = \
advance_state(rhs=my_rhs, timestepper=timestepper,
checkpoint=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final)
except ExactSolutionMismatch as ex:
error_state = 1
current_step = ex.step
current_t = ex.t
current_state = ex.state
from mirgecom.simutil import check_step
do_status = check_step(step=step, interval=nstatus)
do_viz = check_step(step=step, interval=nviz)
do_health = check_step(step=step, interval=nhealth)

if do_status or do_viz or do_health:
from mirgecom.simutil import compare_fluid_solutions
dv = eos.dependent_vars(state)
exact_mix = initializer(x_vec=nodes, eos=eos, t=t)
component_errors = compare_fluid_solutions(discr, state, exact_mix)
resid = state - exact_mix
io_fields = [
("cv", state),
("dv", dv),
("exact_mix", exact_mix),
("resid", resid)
]

if do_status: # This is bad, logging already completely replaces this
from mirgecom.io import make_status_message
status_msg = make_status_message(discr=discr, t=t, step=step, dt=dt,
cfl=current_cfl, dependent_vars=dv)
status_msg += (
"\n------- errors="
+ ", ".join("%.3g" % en for en in component_errors))
if rank == 0:
logger.info(status_msg)

errors = 0
if do_health:
from mirgecom.simutil import check_naninf_local, check_range_local
if check_naninf_local(discr, "vol", dv.pressure) \
or check_range_local(discr, "vol", dv.pressure):
errors = 1
message = "Invalid pressure data found.\n"
if np.max(component_errors) > exittol:
errors = errors + 1
message += "Solution errors exceed tolerance.\n"
errors = discr.mpi_communicator.allreduce(errors, op=MPI.SUM)
if errors > 0:
if rank == 0:
logger.info("Fluid solution failed health check.")
logger.info(message) # do this on all ranks

if do_viz or errors > 0:
from mirgecom.simutil import sim_visualization
sim_visualization(discr, io_fields, visualizer, vizname=casename,
step=step, t=t, overwrite=True)

if errors > 0:
a = 1/0
print(f"{a=}")

return state

current_step, current_t, current_state = \
advance_state(rhs=my_rhs, timestepper=timestepper,
pre_step_callback=my_checkpoint,
get_timestep=get_timestep, state=current_state,
t=current_t, t_final=t_final, eos=eos, dim=dim)

if current_t != checkpoint_t: # This check because !overwrite
if rank == 0:
Expand All @@ -178,12 +218,6 @@ def my_checkpoint(step, t, dt, state):
dt=(current_t - checkpoint_t),
state=current_state)

if current_t - t_final < 0:
error_state = 1

if error_state:
raise ValueError("Simulation did not complete successfully.")


if __name__ == "__main__":
logging.basicConfig(format="%(message)s", level=logging.INFO)
Expand Down
Loading