Submitting with justin

This repo holds a justin jobscript and a python script to make metadata for H2 and H4 VLE beam line simulations. These files are then used as input to the larsoft-based ProtoDUNE-SP/HD/VD event generators.

The actual simulation uses G4Beamline(Link?) and configurations created by the CERN beam expert in charge of the 2 beam lines. Tarballs of the G4beamline binaries, the configurations, and associated cross section data can be found in /exp/dune/data/users/calcuttj/G4Beamline/H4_G4BL_DUNE/Jake_testing/ respectively as the following tarballs:

  g4bl.tar.gz 
  pack.tar.gz
  Geant4Data.tar.gz

For now, these will have to be uploaded to the cvmfs-based Rapid Code Distributon Service (RCDS -- see this link) for using RCDS on the grid with jobsub) for justin jobs. In the future, these will be distributed on cvmfs for ease of use.

Submitting with justin

Preparation

Note: must use SL7 container in order to use ups to set up justin

Set up ups etc: source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh

Set up justin: setup justin

Make sure your token is active: htgettoken -a htvaultprod.fnal.gov -i dune

Make a tarball of the metadata script and upload to RCDS: tar -cf input.tar make_g4bl_metadata.py; input_dir=$(justin-cvmfs-upload input.tar)

Upload each of the g4bl-related tarballs:

jake_dir=/exp/dune/data/users/calcuttj/G4Beamline/H4_G4BL_DUNE/Jake_testing/
g4bl_dir=$(justin-cvmfs-upload $jake_dir/g4bl.tar.gz)
pack_dir=$(justin-cvmfs-upload $jake_dir/pack.tar.gz)
g4data_dir=$(justin-cvmfs-upload $jake_dir/Geant4Data.tar.gz)

Jobscript arguments

Several arguments can be provided to the justin jobscript via setting environment variables. When running an interactive test or submitting via justin, these are provided to the job's environment using the following syntax: --env ENVVAR=VALUE

Full examples of using this syntax are shown below. For now, the following is a list of all of the environment variables that can be configured for the job. The last 4 entries are the environment variables corresponding to the tarballs which have been uploaded and unpacked to cvmfs. For now, these must be provided. The rest of the variables have defaults (so they don't have to be provided).

Name	Description	Default
POLARITY	Polarity of beam line ("+" or "-")	+
BEAMLINE	Which beam line to use ("H4" or "H2")	H4
CENTRALP	Momentum setting (in GeV) of VLE -- i.e. entering NP02/4	1.0
NPART	How many Particles to send at the target (Need at least 100)	100
G4DATA_DIR	Location of unpacked G4Data tarball	--
G4BL_DIR	Location of unpacked g4bl tarball	--
PACK_DIR	Location of unpacked "pack" tarball	--
INPUT_DIR	Location of unpacked input tarball (contains make_g4bl_metadata.py)	--

Example -- Interactive Test

The following will run an interactive test of the jobscript and will impart 1000 particles on the H2 target, using positive polarity settings, with a central momentum of 1.0 GeV/c

   justin-test-jobscript --jobscript g4beamline_justin.jobscript --monte-carlo 1 \
                         --env G4DATA_DIR=$g4data_dir/ --env G4BL_DIR=$g4bl_dir/ \
                         --env PACK_DIR=$pack_dir/ --env INPUT_DIR=$input_dir/  \
                         --env BEAMLINE=H2 --env NPART=1000

This took about 2 minutes to run during a single test on one of the dunegpvms. When finished, the temp workdir where output files can be found is shown by justin. Note that the actual directory will be different everytime, as a random hash is produced to create distinct locations.

  ====End of jobscript execution====
/tmp/justin-test-jobscript.0mv8dl/home/workspace:
total 472
-rw------- 1 calcuttj dune 176696 Mar  6 16:38 g4bloutput.txt
-rw------- 1 calcuttj dune 283635 Mar  6 16:38 H2_v27c_1GeV_1_20250306T223828Z_000001.root
-rw------- 1 calcuttj dune    534 Mar  6 16:38 H2_v27c_1GeV_1_20250306T223828Z_000001.root.json
-rw------- 1 calcuttj dune      7 Mar  6 16:38 justin-processed-pfns.txt

These temp directories are periodically cleaned up, so if you produce anything you'd like to continue using for testing, copy the output to your data directory /exp/dune/data/users/${USER}/

Example -- Full Submission

The above command can be easily translated into a workflow submission by adding a few more arguments i.e. --rss-mib, --max-distance, --output-pattern, and --wall-seconds. Consult this justin tutorial) to learn what these mean and the other options available.

justin simple-workflow --jobscript g4beamline_justin.jobscript --monte-carlo 100 \
                       --env G4DATA_DIR=$g4data_dir/ --env G4BL_DIR=$g4bl_dir/ \
                       --env PACK_DIR=$pack_dir/ --env INPUT_DIR=$input_dir/  \
                       --env BEAMLINE=H2 --env NPART=100 --rss-mib 3999 \
                       --max-distance 30 \
                       --output-pattern "*root:https://fndcadoor.fnal.gov:2880/dune/scratch/users/calcuttj/g4beamline_prod/H2_test_full/" \
                       --wall-seconds 3600

Merging Files Produced in Justin

The tape storage used by DUNE works best with files on the order of 1-10 GB. A file produced by this G4BL simulation with 100K POT is on the order of 30MB, and so they need to be merged. Included in this repo is a python merging script and a justin jobscript to facilitate this. Within the python script are also routines which are intended to be used interactively which check that no inputs are duplicated within the merging (as of writing, this has not yet ocurred, but it is still important to check),

In order to submit a merging workflow request, run the following

  tar -cf merge.tar merge_g4bl.py #Make sure merge.tar is up to date
  merge_dir=`justin-cvmfs-upload merge.tar` #Upload to CVMFS
  justin simple-workflow --jobscript g4bl_merge.jobscript --env MERGE_DIR=$merge_dir \
                         --env DATASET={prior dataset} \
                         --monte-carlo {N Outputs} --env LIMIT={N Inputs} --max-distance 30 --rss-mib 3999 \
                         --output-pattern "*root:{output dataset}" \
                         --scope {output scope} --lifetime-days=90

A description of the necesary (justin) arguments and environment variabiles is as follows

Name	Type	Description
DATASET	Env. Var	Input dataset for merging
MERGE_DIR	Env. Var	Similar to above, the uploaded merge.tar
LIMIT	Env. Var	Number of files merged per output file
--monte-carlo	Argument	Number of output files
--output-pattern	Argument	Expression for matching output files & the target output dataset
--scope	Argument	Target output scope -- Use "usertests" for testing. For production, use ehn1-beam-np04 for the H4-VLE Beamline (PDHD/NP04) and ehn1-beam-np02 for the H2-VLE Beamline (PDVD/NP02) -- CHECK WITH STEVE TIMM THAT THIS IS AVAILABLE

For 30MB initial output files, we need 100 files merged together to reach 3GB (a suitable size for storage). In the case that we produced 1000 outputs from the simulation step, we would need 10 merged files to accomplish this (LIMIT = 100, monte-carlo = 10)

Checking Merged Files

The merged files have the inputs to the merging step listed as their parents in metacat. This can be used to check that merged files do not include duplicate inputs. Check this using

  python merge_g4bl.py check_parents --dataset {merged dataset}

This will print out the number of input files and the number of unique input files. If these differ, that is an indication of duplicated inputs, and the issue causing this should be diagnosed. As of writing, this has not happened, but it is worth checking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Submitting with justin

Preparation

Jobscript arguments

Example -- Interactive Test

Example -- Full Submission

Merging Files Produced in Justin

Checking Merged Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
condor_stuff		condor_stuff
README.md		README.md
g4beamline_justin.jobscript		g4beamline_justin.jobscript
g4bl_merge.jobscript		g4bl_merge.jobscript
make_g4bl_metadata.py		make_g4bl_metadata.py
merge.tar		merge.tar
merge_g4bl.py		merge_g4bl.py
reformat_g4bl.c++		reformat_g4bl.c++

Folders and files

Latest commit

History

Repository files navigation

Submitting with justin

Preparation

Jobscript arguments

Example -- Interactive Test

Example -- Full Submission

Merging Files Produced in Justin

Checking Merged Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages