Std::bad_alloc issue while running big scale simulations

When I run a big scale simulation with n nodes and 2*n MPI jobs (I test n = 2, 32), It will show the error as:

[2020-08-11 01:02:31.422] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

[2020-08-11 01:02:31.423] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

Then I test with n =2 ,32 with 1 task per node. It will be fine. It will generate the cells and particles correctly.

[2020-08-11 16:04:45.252] [MPMExplicit] [info] Rank 0 Read cells: 305387 ms
[2020-08-11 16:13:47.481] [MPMExplicit] [info] Rank 1 Generate particles: 546929 ms
[2020-08-11 16:14:01.833] [MPMExplicit] [info] Rank 0 Generate particles: 556580 ms

It’s because that we copy grid data for each MPI tasks. I have 4977079 nodes and 4875000 cells in this case.

Thanks @yliang-sn! Could you please paste the entire output up to the first step?

This is the error output of 2 nodes with 2 tasks per node.

TACC:  Starting up job 6224031 
TACC:  Starting parallel tasks... 
[2020-08-11 00:48:10.190] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-11 00:48:10.190] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-11 00:48:10.190] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-11 00:48:10.190] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-11 00:48:10.191] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-11 00:48:10.191] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-11 00:48:10.190] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-11 00:48:10.190] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-11 00:48:10.191] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-11 00:48:10.191] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-11 00:48:10.191] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-11 00:48:10.191] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-11 00:50:34.478] [MPMExplicit] [info] Rank 1 Read nodes: 144287 ms
[2020-08-11 00:50:34.478] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-11 00:50:34.478] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-11 00:50:34.706] [MPMExplicit] [info] Rank 0 Read nodes: 144515 ms
[2020-08-11 00:50:34.707] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-11 00:50:34.707] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-11 00:50:35.504] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-11 00:50:35.705] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-11 00:50:45.651] [MPMExplicit] [info] Rank 3 Read nodes: 155459 ms
[2020-08-11 00:50:45.652] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-11 00:50:45.652] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-11 00:50:45.806] [MPMExplicit] [info] Rank 2 Read nodes: 155614 ms
[2020-08-11 00:50:45.807] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-11 00:50:45.807] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-11 00:50:46.664] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-11 00:50:46.798] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-11 00:55:44.862] [MPMExplicit] [warning] #903: Cell entity sets are undefined Cell entity sets JSON not found 
[2020-08-11 00:55:44.862] [MPMExplicit] [info] Rank 1 Read cells: 309357 ms
[2020-08-11 00:55:47.805] [MPMExplicit] [warning] #903: Cell entity sets are undefined Cell entity sets JSON not found 
[2020-08-11 00:55:47.805] [MPMExplicit] [info] Rank 0 Read cells: 312100 ms
[2020-08-11 00:55:55.914] [MPMExplicit] [warning] #903: Cell entity sets are undefined Cell entity sets JSON not found 
[2020-08-11 00:55:55.915] [MPMExplicit] [info] Rank 3 Read cells: 309250 ms
[2020-08-11 00:55:59.127] [MPMExplicit] [warning] #903: Cell entity sets are undefined Cell entity sets JSON not found 
[2020-08-11 00:55:59.127] [MPMExplicit] [info] Rank 2 Read cells: 312328 ms
[2020-08-11 01:02:31.422] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

[2020-08-11 01:02:31.423] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

[2020-08-11 01:02:31.426] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc: #1692 Generating particle failed
[2020-08-11 01:02:31.426] [MPMExplicit] [info] Rank 1 Generate particles: 406563 ms
[2020-08-11 01:02:31.427] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc: #1692 Generating particle failed
[2020-08-11 01:02:31.427] [MPMExplicit] [info] Rank 0 Generate particles: 403622 ms
[2020-08-11 01:02:43.122] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

[2020-08-11 01:02:43.122] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #575: std::bad_alloc

[2020-08-11 01:02:43.126] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc: #1692 Generating particle failed
[2020-08-11 01:02:43.126] [MPMExplicit] [info] Rank 2 Generate particles: 403998 ms
[2020-08-11 01:02:43.126] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc: #1692 Generating particle failed
[2020-08-11 01:02:43.126] [MPMExplicit] [info] Rank 3 Generate particles: 407211 ms
[2020-08-11 01:03:42.624] [IOMeshAscii] [error] Read particles cells: std::bad_alloc
[2020-08-11 01:03:42.625] [IOMeshAscii] [error] Read particles cells: std::bad_alloc
[2020-08-11 01:03:48.725] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #1325: Couldn't find key.

[2020-08-11 01:03:48.728] [MPMExplicit] [warning] #930: Particle cells are undefined Particle cells are not properly assigned to particles 
[2020-08-11 01:03:48.850] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #1325: Couldn't find key.

[2020-08-11 01:03:48.852] [MPMExplicit] [warning] #930: Particle cells are undefined Particle cells are not properly assigned to particles 
[2020-08-11 01:03:54.294] [IOMeshAscii] [error] Read particles cells: std::bad_alloc
[2020-08-11 01:03:54.296] [IOMeshAscii] [error] Read particles cells: std::bad_alloc
[2020-08-11 01:04:00.450] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #1325: Couldn't find key.

[2020-08-11 01:04:00.452] [MPMExplicit] [warning] #930: Particle cells are undefined Particle cells are not properly assigned to particles 
[2020-08-11 01:04:00.456] [mesh::0] [error] /scratch/07277/lyowsn/mpm/include/mesh.tcc #1325: Couldn't find key.

[2020-08-11 01:04:00.459] [MPMExplicit] [warning] #930: Particle cells are undefined Particle cells are not properly assigned to particles 
[2020-08-11 01:04:01.786] [particle3d::1269361] [error] /scratch/07277/lyowsn/mpm/include/particles/particle.tcc #295: std::bad_alloc

This is the correct output of 32 nodes with 1 task per node. I delete some lines because of the limit of the characters number.

TACC:  Starting up job 6192884 
TACC:  Starting parallel tasks... 
[2020-08-06 03:58:55.571] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.572] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.573] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.572] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.572] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.573] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.573] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.573] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.572] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.571] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.573] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.573] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.575] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.577] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.576] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.576] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.577] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.577] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.578] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.577] [MPMExplicit] [info] MPM analysis type MPMExplicit3D as default
[2020-08-06 03:58:55.585] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.584] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 03:58:55.584] [MPMBase] [warning] #1067: Damping parameters are undefined [json.exception.out_of_range.403] key 'damping_factor' not found 
[2020-08-06 03:58:55.585] [MPMBase] [warning] /scratch/07277/lyowsn/mpm/include/solvers/mpm_base.tcc #71: Damping is not specified, using none as default
[2020-08-06 03:58:55.585] [MPMExplicit] [info] MPM analysis type MPMExplicit3D
[2020-08-06 04:01:12.682] [MPMExplicit] [info] Rank 16 Read nodes: 137106 ms
[2020-08-06 04:01:12.683] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:12.683] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:12.716] [MPMExplicit] [info] Rank 11 Read nodes: 137142 ms
[2020-08-06 04:01:12.716] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:12.716] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:15.857] [MPMExplicit] [info] Rank 15 Read nodes: 140279 ms
[2020-08-06 04:01:15.857] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:15.857] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:16.396] [MPMExplicit] [info] Rank 7 Read nodes: 140818 ms
[2020-08-06 04:01:16.397] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:16.397] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:16.616] [MPMExplicit] [info] Rank 10 Read nodes: 141036 ms
[2020-08-06 04:01:16.617] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:16.617] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:16.837] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:17.368] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:17.600] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:18.488] [MPMExplicit] [info] Rank 25 Read nodes: 142906 ms
[2020-08-06 04:01:18.488] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:18.488] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:19.496] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:20.025] [MPMExplicit] [info] Rank 29 Read nodes: 144448 ms
[2020-08-06 04:01:20.025] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:20.025] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:20.143] [MPMExplicit] [info] Rank 20 Read nodes: 144558 ms
[2020-08-06 04:01:20.144] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:20.144] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:20.336] [MPMExplicit] [info] Rank 5 Read nodes: 144761 ms
[2020-08-06 04:01:20.336] [MPMExplicit] [warning] #749: Entity sets are undefined Entity set JSON not found 
[2020-08-06 04:01:20.336] [MPMExplicit] [warning] #775: Euler angles are undefined Euler angles JSON not found 
[2020-08-06 04:01:21.038] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:21.148] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:01:21.291] [MPMExplicit] [warning] #878: Friction conditions are undefined Friction constraints JSON not found 
[2020-08-06 04:06:07.701] [MPMExplicit] [warning] #903: Cell entity sets are undefined Cell entity sets JSON not found 
[2020-08-06 04:06:07.701] [MPMExplicit] [info] Rank 17 Read cells: 293964 ms
[2020-08-06 04:15:08.087] [MPMExplicit] [info] Rank 17 Generate particles: 540385 ms
[2020-08-06 04:21:01.063] [MPMExplicit] [info] Rank 17 Locate particles: 352976 ms
[2020-08-06 04:26:19.590] [MPMExplicit] [info] Rank 17 Read volume, velocity and stresses: 318526 ms
[2020-08-06 04:26:19.590] [MPMExplicit] [warning] #1047: Particle sets are undefined Particle entity set JSON not found 
[2020-08-06 04:26:19.590] [MPMExplicit] [warning] #992: Particle velocity constraints are undefined Particle velocity constraints JSON not found 
[2020-08-06 04:26:19.590] [MPMExplicit] [info] Rank 17 Create particle sets: 318526 ms
[2020-08-06 04:26:19.590] [MPMExplicit] [warning] No particle surface traction is defined for the analysis
[2020-08-06 04:26:19.590] [MPMExplicit] [warning] No concentrated nodal force is defined for the analysis
[2020-08-06 04:26:23.159] [MPMExplicit] [info] Rank 28 Read volume, velocity and stresses: 318822 ms
[2020-08-06 04:26:23.159] [MPMExplicit] [warning] #1047: Particle sets are undefined Particle entity set JSON not found  
[2020-08-06 04:27:45.698] [MPMExplicit] [info] Rank 31 Create particle sets: 318831 ms
[2020-08-06 04:27:45.698] [MPMExplicit] [warning] No particle surface traction is defined for the analysis
[2020-08-06 04:27:45.698] [MPMExplicit] [warning] No concentrated nodal force is defined for the analysis
[2020-08-06 04:27:55.156] [MPMExplicit] [info] Rank 0, Domain decomposition started

[2020-08-06 04:27:55.154] [MPMExplicit] [info] Rank 14, Domain decomposition started

[2020-08-06 04:27:55.154] [MPMExplicit] [info] Rank 13, Domain decomposition starte

[2020-08-06 04:31:45.457] [MPMExplicit] [info] Rank 7, Domain decomposition: 230301 ms
[2020-08-06 04:31:48.673] [MPMExplicit] [info] Rank 4, Domain decomposition: 233516 ms
[2020-08-06 04:33:02.919] [MPMExplicit] [info] Step: 0 of 1000000.

[2020-08-06 04:33:06.019] [MPMExplicit] [info] Rank 12, Domain decomposition: 310864 ms
[2020-08-06 04:33:17.998] [MPMExplicit] [info] Rank 21, Domain decomposition: 322841 ms
[2020-08-06 04:35:19.024] [MPMExplicit] [info] Rank 20, Domain decomposition: 443868 ms
[2020-08-06 04:38:58.958] [MPMExplicit] [info] Rank 11, Domain decomposition: 663801 ms
[2020-08-06 04:39:48.217] [MPMExplicit] [info] Rank 16, Domain decomposition: 713062 ms
[2020-08-06 04:41:38.477] [MPMExplicit] [info] Rank 2, Domain decomposition: 823319 ms
[2020-08-06 04:43:07.796] [MPMExplicit] [info] Step: 1 of 1000000.

Thanks @yliang-sn, it looks like the std::badalloc happens when creating the cells, since we copy the entire mesh. Will find a better way to handle the meshes.