A short USER-GUIDE for QYMSYM  Version 0.1
============================
Content: 0) Introduction
         1) Getting the code
         2) Installing and Compiling
         3) Units for the code
         4) The Parameter file param.in
         5) Input and Output Data
         6) Running qymsym
         7) List of routines and subroutines
         8) Other information

Last Edit: 11/05/2010 by ACQ for cuda 3

++++++++++++++++++
0) Introduction:
---------------

QYMSYM is a GPU accelerated parallel 2nd order hybrid symplectic 
integrator for integration of planetary systems.
The integrator allows close approaches between particles.   The method
is described by Moore, A. and Quillen, A. C. 2010, (see citation and preprint
below) and is similar to the algorithms developed by Chambers 1999 and 
Duncan et al. 1998 but is GPU accelerated.
The code works in double precision.
Citations:
Moore, A. \& Quillen, A. C. 2010,  
    submitted to New Astronomy,
    "QYMSYM: A GPU-accelerated Hybrid Symplectic Integrator that
    permits close encounters",
    The preprint is available at:
    http://arXiv.org/abs/1007.3458
Chambers, J. E. 1999, Monthly Notices of the Royal Astronomical Society, 
   304, 793-799, "A Hybrid Symplectic Integrator that Permits Close 
   Encounters Between Massive bodies"
Duncan, M. J., Levison, H. F., \& Lee, M. H.
   Astronomical Journal, 116, 2067-2077, 1998, "A Multiple Time Step 
   Symplectic Algorithm for Integrating Close Encounters"

The code is suitable for integration of more than 256 massive bodies within
the context of a symplectic planetary system integrator.
If you are only interested in integrating a few bodies, I would recommend
instead Eric Ford et al.'s gpuswarm code.  This code is not suitable for
integrating N equal mass bodies (would recommend a regular 
N-body code for that.)
You will need a double precision and CUDA capable GPU.

Contact us with questions and comments:
aquillen at pas dot rochester dot edu
and 
alexander dot moore6 at gmail dot com

++++++++++++++++++

1) Getting the Code:
--------------------
The Code is posted at http://astro.pas.rochester.edu/~aquillen/qymsym/

+++++++++++++++++++


2) Installing and Compiling:
---------------------------
Unpack the tarball with 
> tar -xzvf qymsym01.tgz
Everything goes into a directory called qymsym01

go into that directory
>cd qymsym01

Edit the Makefile for your system
  Check the make file to see if cuda, nvcc and cudpp and their includes are 
  in the right places for your system.  Edit it so they are.

Make the executables
>make 

3 executables should have been made qymsym, mkdisk, toorb
qymsym is the integrator
mkdisk makes a disk initial conditions file
toorb turns snap files into orbital elements

Requirements for compiling:  CUDA 3. 
The cudpp library must be available (usually comes with NVIDIA's CUDA_SDK), 
You need a double precision and CUDA capable GPU

The   code currently inputs and outputs in ASCII.
You must compile with nvcc with flag -arch sm_13 for double precision. 


------------------------
The executables:
 qymsym            the integrator itself
                   uses parameter file param.in, see format below
                   the parameter file currently has to have this name
 mkdisk            makes an isotropic dispersion particle disk 
                   user chooses radial range, power law, 
                   and inclination dispersion
                   parameters for this also given in param.in
                   with additional planets specified by their orbital elements
 toorb             output orbital elements in ascii from output files
                   runs from the command line

++++++++++++++++++++++++++++++

3) Code units:
--------------
  Time is in units such that a planet of semi-major axis a=1 has period 2*pi.
  Mass is given in units of the central star.
  We often divided the time by 2pi and then plot in units of the initial orbital
  period of the innermost planet.

++++++++++++++++++++++++++++++

4) The Parameter file param.in
------------------------------

 param.in          This parameter file is needed for executables 
                   mkdisk and qymsym  
                   This file must be called param.in right now
                   its format is described below

There is one line per parameter and the order is important (sorry!)
I will put each line of the parameter file below marked with > and add 
some description following. the actual parameter file has no >
There is an example param.in file with this code.
>a1     char* outroot       in and out file name root
  The first parameter on first line is the root name for data files
  With a1 the initial conditions file (and generated by mkdisk) is called a1.asc
  snapshots will be called a1_0000.asc, a1_0001.asc etc...
  For their format see the next section below.
>1024   int numBodies       total number of particles integrated
  If this is not an integer multiple of the number of threads per block,
  the number of bodies integrated will be somewhat higher
>512    int numBodiesMassive total number of massive particles
  if this is not an integer multiple of the number of threads per block
  then the number of massive bodies will be somewhat higher
>0      device              which GPU device is the code running on
>0.02   double deltaTime    timestep for each symplectic integration step
       units of time are such that 2pi = 1 orbit for a planet
       with semi-major axis a=1,  
       Large timesteps have worse energy conservation
       0.02 seems to be a good choice
>4      int niter           number of iterations and dataouts 
       This is the number of output snapshots
>100    int ndt             number of timesteps per iteration or data output
        ndt*deltaTime is the time between dataouts or output snaps
>1.0    double damping      not used at moment
>1e-10  double soft2_int    square of interaction step softening^2 eps2
   Called by the interaction step --- should be very small and is irrelevant 
   since when things get close they get put into the Hermite integrator
>1e-10  double soft2_her    Hermite integrator softening^2
   used in Hermite integrator.  Should not be too small (below 1e-10) 
   unless you have added code to deal with planetary collisions
>2.0    double hillfac_int  hillfactor used in integration (K)  (a_H)
       used for K function flipping terms from the interaction Hamiltonian 
       to the Keplerian one.   Code is pretty insensitive to this.
>2.5    double hillfac_col  hillfactor use in encounter detection (a_E)
       Keep in range 1.5-4.  If it is large there will be many encounters.
>4.00   double amin         min semimajor axis for the disk 
       For generating a disk of low mass particles
>4.5    double amax         max semimajor axis for the disk
>0.0    double gamma        powerlaw distn p(a) ~ a^gamma
       Distribution for semi-major axis of generated disk      
>0.01   double ibar         sqrt(<i^2>) also sets sqrt(<e^2>)
       Inclination and eccentricity distribution in generated disk
>1.0e-9 double m            mass for small particles in generated disk
          masses are given in units of the central mass
>0.0    flag1               not yet used
>0.0    flag2               not yet used
>0.0    flag3               not yet used
>0.0    flag4               not yet used
>4      int nplanets        followed by their masses and orbital elements
Here is where you put in really massive objects.
Units of mass: divided by central star.
The format is m, semi-major axis,  eccentricity, inclination, 
longnode, argperi, meananomaly.  Angles are in radians.
>1e-2  1.0  0.2 0.0 0.0 0.1 0.0   #m a e i longnode argperi meananom
>1e-2  1.30 0.2 0.0 0.0 1.0 0.0   #m a e i longnode argperi meananom
>5e-3  3.00 0.2 0.0 0.0 1.5 0.0   #m a e i longnode argperi meananom
>5e-3  3.40 0.2 0.0 0.0 1.5 0.5   #m a e i longnode argperi meananom

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

5) Input and Output Data
------------------------------

Input initial conditions file syntax (generated with executable mkdisk
   and using parameters in param.in)
   If the root name (first parameter in param.in) is a1, the initial 
    conditions file should be called a1.asc
   The first line should be start with # and then have an integer followed by 
     a floating point number
   The integer is the number of particles in your integration 
        (including the central star)
   The floating point number is the simulation time 
     It is not in orbital periods, 2pi = 1 period for a planet at semi-major=1  
   For the initial conditions file the initial time is 0.0
   The remaining lines in this initial conditions file are 7 numbers each, 
    one for each particle: mass x y z vx vy vz
   The coordinate system is Cartesian. The center of mass position 
     and momentum are subtracted prior to integration

--------------------------------------------------------
Output snap files:
  If the root name (first parameter in param.in) is a1, the 
  output files are called a1_0000.asc a0_0001.asc ....
  They have the same format as the input file
  The difference between a1_0000.asc and a1.asc is the center of mass 
  subtraction, i.e. the particle positions etc are output before any 
  integration 
  A final file is output at the end of the last timestep

-----------------------------------------------
## we have turned off the feature listed below as it slows down the code:
encounter lists and other information are currently put into the file
"coll.txt"
At each time (in periods) encounter lists are printed 
when planet planet encounters are taking place "PP" is printed.
Each encounter list is started with the number of objects in the list (np)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++


6) Running it
--------------------

Make an initial conditions file with a few planets and some extra
low mass particles.
Edit the parameter file param.in
Make an initial conditions files 
  >mkdisk
If the rootfile name in param.in was a1 you should now have a file
called a1.asc which qymsym will look for

Running the executable qymsym to do integrations:
  > qymsym      
     with no command line arguments starts the simulation
     the simulation will look for param.in and an initial conditions file
     that should have been generated by mkdisk
     If the root name (first parameter in param.in) is a1, the initial 
     conditions file should be called a1.asc and qymsym will look for this file
  > qymsym 50    
     the command line now has a number
     The simulation will restart using a1_0050.asc as an initial conditions file

qymsym will output energies at every output stage (just like mercury)
qymsym will also output encounter lists in a file called coll.txt

If you would like orbital elements for data out a1_0003.asc 
   > toorb a1_0003.asc a1_0003.oe
now a1_0003.oe will contain orbital elements for the particles

+++++++++++++++++++

7) List of routines and subroutines
----------------------------------

Listing of routines and subroutines

executables with main():
 nbody.cpp         the integrator itself -> qymsym
 mkdisk.cpp        makes an isotropic dispersion particle disk 
                   parameters for this also given in param.in
                   with additional planets specified by their orbital elements
 toorb.cpp         output orbital elements in ascii from output files
                   runs from the command line

.h files:
 nbody.h           forward declares of routines
 kepcart.h         not used for integrator but general purpose
 doub.h            double3, double4, accjrk struct or typedefs 
                   possible source of problems as double3 is now defined 
                   in CUDA 3

CUDA kernels:
 nbody_kernel.cu    all force pairs based on L. Nyland's routine SDK1.1
			also see GPU GEMS3
                    includes interaction step or kick step of integrator
                    in double precision
 enn_kernel.cu      energy vector all potential energy pairs 
                    use to compute total energy
                    in double precision
 drift_kernel.cu    contains parallel reduction sum needed for energy sum 
			contains Mark Harris's routine reduce6 (from SDK 1.1)
                    does the drift step all particles on GPU
                    in double precision
 kep_kernel.cu      does Keplerian steps using fg functions and universal 
                    variable Kepler's equation for all particles on GPU 
                    in double precision
                    follows recipe by Prussing and Conway
 her_kernel.cu      accelerations and jerks computed on GPU for use with a 
                    Hermite integrator 
                    in double precision
 nn_kernel.cu       find nearest neighbor on GPU
 sweep1_kernel.cu   all pairs search for close encounters on GPU
                    in single precision
 
Other subroutines:
 kepcart.cpp       Cartesian to/from orbital element converter
 sread.cpp         input parameter file,   very ugly, no parsing yet
 coord.cpp         coordinate transformation routines 
                   helio/barycentric to/from Cartesian
 mklist.cpp        make sublists of encounters from a list of interacting pairs
 hermitep.cpp      Hermite integrator for encounters
 colltest.cpp      Calls integration kernels in sequence to do a timestep
 iostuff.cpp       io of snap files, originally for hd5part format, 
                   now in ascii

 bodysystemcuda.cu calling cuda subroutines for memory allocation
                   and integration kernel routines  
                   (kepstep,interactionstep,driftstep, and energy)
 mkpairlist.cu     calls sweep1_kernel.cu kernel to find pairs of particles 
                   in encounters
 gravher.cu        calls her_kernel on GPU, also allocates mem for it
 scanpass.cu       used to call cudpp routines, called mkpairlist.cu
                   used in encounter detection

Makefile           very simple bare makefile


++++++++++++++++++++++++

8) Other information:
----------------------

Ways to screw up:
  softenings should not be zero
soft2_int should not be too small less than 10^-12
if  hillfac_col is too small (like 0.5) will affect energy conservation
if hillfac_col is too big, things screw up (too big means above 4)
  as there are too many encounters 
good regions are 1.5-4 for good energy conservation

-----------------------------------------------------
Things the programmer might want to know about:

in colltest.cpp
#define PPHERM 1  // if 1 put all particles in Hermite integrator 
                  // when there is a planet planet encounter

in nbody.h
#define npmax 10   // maximum number of planets in siminfo struct

Hard set in Hermite integrator hermitep.cpp
#define NPSWITCH 50   // use gpu to do gravity if this many particles or more
#define ETA  0.001  // affects accuracy of Hermite integrator timestep
#define ETAS 0.001  // affects initial timestep choice of Hermite integrator
#define MAXT 1e5  // a large time step
#define MINDT 1.0e-9 // smallest time step

in mkpairlist.cu
#define NTHREADS 128   // hardset,  may want to change this someday or pass

in gravher.cu 
#define NTHREADS 128   // hardset,  may want to change this someday or pass

in nbody.cpp
#define NTHREADS 128  // hardset
                      //  may want to change this someday or pass

loop unrolling is an issue with numbers of threads for interaction step
(nbody_kernel.cu)

kep_kernel.cu uses quite a bit of global memory (lack of registers on gpu?)

------------------------------------------------------------------

Things the planetary person might want to know about: 

    Collisions not yet implemented
    Binaries or moons can form 
    Massless particles can run in it but will never collide

--------------------------------------------------------
Copyright notices:
     Two routines from early versions of NVIDIA's CUDA SDK (V1.1 )are used 
     in this code (a modified reduce6, and a modified version of nbody).  
     We have made efforts to appropriately cite the authors in the code.
     Citations are also here:  
   Fast N-body simulation with CUDA,
     Nyland, L.,  Harris,  M., \& and Prins, J.  2008, Chap 31
     in GPUGems3, edited by Hubert Nguyen,
     2008, Addison-Wesley,  Upper Saddle River, NJ, page 677
   Parallel Prefix Sum (scan) with CUDA
     Harris,  M., Sengupta, S, \& Owens, J. D.   2008, Chap 39
     in GPUGems3, edited by Hubert Nguyen,
     2008, Addison-Wesley,  Upper Saddle River, NJ, page 851

     This code is provided "as is" without implied warranty of any kind. 
     Users and possessors of this source code 
     are hereby granted a nonexclusive, royalty-free license to use this code.