JMC Notes 2010 Aug 26
pulse finding code from ~2003
code is in pulse.match.tar.gz and was set up for
	compiling under linux
A. pulse.match:
	original code from mid-1990s did hierarchical smoothing + thresholding
	1.  find all samples above threshold and report
	2.  add adjacent samples and decimate by 2 
	3.  find all samples above threshold with threshold raised by sqrt(2)
	4.  repeat steps 2 and 3 up to requested number of levels
		(i.e. we often used 8 levels, so the maximum number of
		smoothed samples is 2^{8-1} = 128) 
	
	note that smoothing filters in the different levels have
	coefficients that follow Pascal's triangle:

	1 -> 1 1 -> 1 2 1 -> 1 3 3 1  etc.

	note also code reports _samples_ that are above threshold;
	in general, there can be multiple samples per _event_.
	So the reported values cannot be interpreted as events.

B. Friends of friends (referred to as "sift" in the code):
	It could be called friends of friends or called a "cluster"
		algorithm since it identified clusters of events
		that are part of the same event.
	This was written by me in 2003 to address the samples/event
	issue of A and used for the Crab GP analysis in
	Cordes et al. 2004, ApJ

	The algorithm works like this:
	1. Set a fairly low threshold, like 3 sigma.
	2. Find all samples that are above this threshold.
	   For these samples, find those that are contiguous
	   or separated by no more than ncluster samples from each
	   other  (i.e. gaps are allowed between samples up to
	   ncluster in length).  The code currently has ncluster=2
	   hardwired. This can/should be made an input variable.   

	   For all samples that are so identified as part of the same
           burst, find sum of all the samples and the mean sample number
	   of all the samples (weighted by amplitude).   Also find the
	   maximum amplitude of all the samples in the cluster and
	   find an effective width (in samples) that is the sum of all
	   samples divided by the maximum amplitude. 
	   These are reported along with the local off-pulse mean
	   and rms (these are calculated iteratively by excluding
	   all samples that are above threshold). 

	   All of this happens in zwrpul_sift.c.
	   The write statement to the output file (XXXX.sift.dat,
	   renamed in some of my analysis to pulse.list.YYYY) is:

	   fprintf(fd,"%2d %2d %2d %9d %8.2f %8.2f %8.2f %8.2f %5d %8.3f %8.3f\n",
                 ndm, nsubband,  ns, iimax,
                 amax,  pulse[imax].mean, pulse[imax].rms,
                 suma, nsum, wgp, tsum-iimax);

	   where 
		ndm = number of dispersion channel (e.g. if we were
		      doing lots of trial DMs in a search;  for the Crab
		      I used two DMs: zero and the Crab's DM)
		nsubband = subband number if we process separately
		ns = level of smoothing (if we were doing pulse.match;
	             this format is an extension of what was used for
		     pulse.match so for sift.dat ns=0 (no smoothing).
		iimax = sample number in file of maximum in cluster
		amax = maximum amplitude of cluster in sigma units
			(i.e. this is S/N = peak / off-pulse-rms)
		pulse[imax].mean = local off-pulse mean (we analyzed
			the time series in blocks and this is the block mean)
		pulse[imax].rms = rms in block
		suma = sum of all samples in cluster
		nsum = number of samples in cluster
		wgp = suma/amax = effective width of cluster in samples
		tsum-iimax = mean pulse time minus iimax (i.e. an asymmetric
			pulse would have a mean different from iimax)

	The fof algorithm worked very well on the Crab by finding
	just single clusters associated with any given giant pulse.
	This might be different with high-resolution data where the
	emission in a given pulse might comprise multiple peaks.
	How multiple peaks are reported will depend on ncluster,
	so there is some control.  You could also put a cap
	on wgp that any GP is allowed to have, though this 
	wouldn't be bullet proof.

Examples: 
	In pulse.list.B0531+21.52304.023 there is an event record:

	1  0  0    386118 11030.49   940.73    14.14 29969.84    77    2.717   -0.156
	This was the largest event we saw in one hour of 430 MHz data
	at Arecibo.  It has S/N = 11030.49.   There is a plot of
	this pulse in Cordes et al. 2004