.. _halo_finding:

Halo Finding
============
.. sectionauthor:: Stephen Skory <sskory@physics.ucsd.edu>

There are two methods of finding particle haloes in yt. The recommended and default method is called HOP, a 
method described in `Eisenstein and Hut (1998) <http://adsabs.harvard.edu/abs/1998ApJ...498..137E>`_. 
A basic friends-of-friends (e.g. `Efstathiou et al. (1985) <http://adsabs.harvard.edu/abs/1985ApJS...57..241E>`_)
halo finder is also implemented, however at this time it should be considered experimental.

HOP
---

The version of HOP used in yt is an upgraded version of the `publicly available HOP code 
<http://cmb.as.arizona.edu/~eisenste/hop/hop.html>`_. Support for 64-bit floats and integers has been
added, as well as parallel analysis through spatial decomposition. HOP builds groups in this fashion:

  1. Estimates the local density at each particle using a smoothing kernel.
  2. Builds chains of linked particles by 'hopping' from one particle to its densest neighbor.
     A particle which is its own densest neighbor is the end of the chain.
  3. All chains that share the same densest particle are grouped together.
  4. Groups are included, linked together, or discarded depending on the user-supplied over density
     threshold parameter. The default is 160.0.

Please see the `HOP method paper <http://adsabs.harvard.edu/abs/1998ApJ...498..137E>`_ 
for full details.

Friends-of-Friends
------------------

The version of FoF in yt is based on the `publicly available FoF code <http://www-hpcc.astro.washington.edu/tools/fof.html>`_ from the University of Washington. Like HOP,
FoF supports parallel analysis through spatial decomposition. FoF is much simpler than HOP:

  1. From the total number of particles, and the volume of the region, the average
     inter-particle spacing is calculated.
  2. Pairs of particles closer together than some fraction of the average inter-particle spacing
     (the default is 0.2) are linked together. Particles can be paired with more than one other particle.
  3. The final groups are formed the networks of particles linked together by friends, hence the name.

.. warning:: The FoF halo finder in yt is not thoroughly tested! It is probably fine to use, but you
   are strongly encouraged to check your results against the data for errors.

Running HaloFinder
------------------

Running HOP on a dataset is straightforward

.. code-block:: python

  from yt.mods import *
  pf = load("data0001")
  halo_list = HaloFinder(pf)
  :language: python

Running FoF is similar:

.. code-block:: python

  from yt.mods import *
  pf = load("data0001")
  halo_list = FOFHaloFinder(pf)

Halo Data Access
----------------

``halo_list`` is a list of ``Halo`` class objects ordered by decreasing halo size. A ``Halo`` object
has convenient ways to access halo data. This loop will print the location of the center of mass
for each halo found

.. code-block:: python

  for halo in halo_list:
      print halo.center_of_mass()

All the methods are:

  * .center_of_mass() - the center of mass for the halo.
  * .maximum_density() - the maximum density in "HOP" units.
  * .maximum_density_location() - the location of the maximum density particle in the HOP halo.
  * .total_mass() - the mass of the halo in Msol (not Msol/h).
  * .bulk_velocity() - the velocity of the center of mass of the halo in simulation units.
  * .maximum_radius() - the distance from the center of mass to the most distant particle in the halo
    in simulation units.
  * .get_size() - the number of particles in the halo.
  * .get_sphere() - returns an an EnzoSphere object using the center of mass and maximum radius.

.. note:: For FOF the maximum density value is meaningless and is set to -1 by default. For FOF
   the maximum density location will be identical to the center of mass location.

The command

.. code-block:: python

  halo_list.write_out("HaloAnalysis.out")

will output the results of HOP or FoF to a text file named ``HaloAnalysis.out``. The file contains
each of the data values listed above except for .get_sphere().

For each halo the data for the particles in the halo can be accessed like this

.. code-block:: python

  for halo in halo_list:
      print halo["particle_index"]
      print halo["particle_position_x"] # in simulation units

Parallel Halo Analysis
----------------------

Both the HOP and FoF halo finders can run in parallel using spatial decomposition. In order to run them
in parallel it is helpful to understand how it works.

Below in the first plot (i) is a simplified depiction of three haloes labeled 1,2 and 3:

.. image:: ParallelHaloFinder.png
   :width: 500

Halo 3 is twice reflected around the periodic boundary conditions.

In (ii), the volume has been
sub-divided into four equal subregions, A,B,C and D, shown with dotted lines. Notice that halo 2
is now in two different subregions,
C and D, and that halo 3 is now in three, A, B and D. If the halo finder is run on these four separate subregions,
halo 1 is be identified as a single halo, but haloes 2 and 3 are split up into multiple haloes, which is incorrect.
The solution is to give each subregion padding to oversample into neighboring regions.

In (iii), subregion C has oversampled into the other three regions, with the periodic boundary conditions taken
into account, shown by dot-dashed lines. The other subregions oversample in a similar way.

The halo finder is then run on each padded subregion independently and simultaneously.
By oversampling like this, haloes 2 and 3 will both be enclosed fully in at least one subregion and
identified completely.

Haloes identified with centers of mass inside the padded part of a subregion are thrown out, eliminating
the problem of halo duplication. The centers for the three haloes are shown with stars. Halo 1 will
belong to subregion A, 2 to C and 3 to B.

Parallel HaloFinder padding
^^^^^^^^^^^^^^^^^^^^^^^^^^^

To run with parallel halo finding, there is a slight modification to the script

.. code-block:: python

  from yt.mods import *
  pf = load("data0001")
  halo_list = HaloFinder(pf,padding=0.02)
  # --or--
  halo_list = FOFHaloFinder(pf,padding=0.02)

The ``padding`` parameter is in simulation units and defaults to 0.02. This parameter is how much padding
is added to each of the six sides of a subregion. This value should be 2x-3x larger than the largest
expected halo in the simulation. It is unlikely, of course, that the largest object in the simulation
will be on a subregion boundary, but there is no way of knowing before the halo finder is run.

In general, a little bit of padding goes a long way, and too much just slows down the analysis and doesn't
improve the answer (but doesn't change it). 
It may be worth your time to run the parallel halo finder at a few paddings to
find the right amount, especially if you're analyzing many similar datasets.
