MOOSE redesign for messaging.
Goals:
1. Cleanup.
2. Handle multithreading and MPI from the ground up.

Why do this?
It is a huge amount of work to refactor a large existing code base. This is
needed here, and was in fact anticipated, for two reasons:
- We needed the experience of building a fairly complete, functioning system
	to know what the underlying API must do.
- The original parallel stuff was a hack.

This redesign does a lot of things differently from the earlier MOOSE messaging.
- Introduces a buffer-based data transfer mechanism, with a fill/empty
	cycle to replace the earlier function-call mechanism. The fill/empty
	cycle is needed for multithreading and also works better with multinode
	data traffic.
- All Elements are now assumed to be array Elements, so indexing is built into
	messaging from the ground up.
- There is a split between synchronous messaging (exact amount of data
	transferred on every timestep) and asynchronous messaging (variable
	amounts of data transferred). This split is at the data transfer
	level but the message specification should be nearly the same.
- Field access function specification is cleaner.
- Separation of function specification from messaging. This means that any
	message can be used as a communication line for any function call 
	between two Elements. This gives an enormous simplification to message
	design. However, it entails:
- Runtime type-checking of message data, hopefully in a very efficient way.
	As message setup is itself runtime, and arbitrary functions can sit
	on the message channels, it turns out to be very hard to
	do complete checking at compile or setup time.
- Wildcard info merged into messages.
- Three-tier message manipulation hierarchy, for better introspection and
	relating to higher-level setup calls. These are
	Msg: Lowest level, manages info between two Elements e1 and e2. 
		Deals with the index-level connectivity for the Element arrays.
	Conn: Mid level. Manages a set of Msgs that together make a messaging
		unit that takes a function call/data from source to a set of
		destination Elements and their array entries. Has introspection
		info. Is a MOOSE field.
	Map: High level. Manages a set of Conns that together handle a
		conceptual group. Equivalent to an anatomical projection from
		one set of neurons to another in the brain. Equivalent to what
		the 'createmap' function would generate. Has introspection.
		Is a MOOSE Element.

- Field access and function calls now go through the messaging interface. Not
	necessarily the fastest way to do it, but simplifies life in a 
	multinode/multithreaded system, and reduces the complexity of the
	overall interface.


------------------------------------------------------------------------------
Control flow.

Developer design:
Every object being scheduled has to provide a 'process' and a 'reinit' function,
which are set up as a shared message, as per the following interface:

static DestFinfo process( 
	"process", "description", new ProcOpFunc< Class>( &Class::process ) );
static DestFinfo reinit( 
	"reinit", "description", new ProcOpFunc< Class>( &Class::reinit ) );
static Finfo* procShared[] = { &process, &reinit };
static SharedFinfo( "proc", "description", procShared, 
	sizeof( procShared ) / sizeof( const Finfo* ) );

These functions are called specially by the scheduler, and bypass the regular
message queuing system.

During Reinit, the initial conditions are set and state variables are sent out
(see below). Reinit happens just at the start of the calculation.
During Process, first all pending messages are cleared, then internal updates
happen and data is sent out. Process is repeated once each timestep for each
Element.
Process even within the same timestep, can be strictly ordered by clock tick.
In other words, the messages are handled and the internal updates are done
within one clock tick, before the next is begun.

Supposing you have two classes A and B. Class A has the state variables of
the calculation, and class B computes rates of change and sends these back.
We need to put these on separate clock ticks. When running, this
is the sequence your classes should follow:

REINIT:
A reinits and sends out data, typically the state variables.

PROCESS:
Stage 0:
        B handle msgs
        B send update
Stage 1:
        A handle msgs
        A send update.



1. Scheduling.
Shell::start( double runtime )
	Sets up thread stuff
	Configures clocks for barriers, threads, etc.
	Inits threads with Clock::threadStartFunc
	Clock::threadStartFunc
		Clock::tStart
			Sets up grouping for current thread
			TickPtr::advance
				Tick::advance on all ticks.
					pthread_barrier_wait
					On first thread only: Qinfo::mergeQ
						Puts local data into InQ
					pthread_barrier_wait
					Qinfo::readQ: handle msgs in InQ
					Call process on all scheduled Elements.
				pthread_barrier_wait
			sorts TickPtrs.
	pthread_exit
join threads
clean up


As before, we have a hierarchy of Shell, Clock, and Tick.
The Tick now becomes an ArrayField of the Clock. This means that the clock
can manipulate Ticks directly, which is much easier to understand. However,
Ticks still look like Elements to the rest of the system and have their
own messages to all scheduled objects.
The TickPtr is an intermediate wrapper for Ticks that helps to sort them
efficiently.
The Tick object is the one that connects to and calls operations on target
Elements.
Ticks call the 'process' function on Elements. This bypasses the queueing
system, but uses the regular field definition system. There is a somewhat 
specialised OpFunc: ProcOpFunc, which handes type conversions for calls 
emerging from Ticks. This is the sequence:
Tick::advance
Iterate over all outgoing Process messages.
	call Msg::process( procinfo, fid )
		call e2->process( procinfo, fid );
			call dataHandler->process( p, elm, fid )
				Get the OpFunc corresponding to fid.
				cast to ProcOpFuncBase, check.
				Scan over the elm array indices
				suitable for current thread.
					call opFunc->proc( obj, eref, p)
						Typecasts obj
						calls obj->process( eref, p )
						

Ticks also call the global Qinfo::readQ() function. This handles all
asynchronous message passing including spike events and setup calls.
The old reinit/reset/resched is gone, it will be a function call triggered 
through clearQ.

2. ASYNCHRONOUS MESSAGING
Async messages work at two levels. First, the developer sees the creation and
calls to messages within MOOSE objects. Second, the data transfer happens
under the surface through queues that span threads and nodes.

2.1 Data structures used for messages within MOOSE objects.
Messages are called through the static const instances of SrcFinfos.
As described elsewhere, SrcFinfos are initialized along with other Element
fields at static initialization time.
SrcFinfo< T1, T2...>( name, description, slot );
	The slot here is a BindIndex, which looks up which Message and Function
	will be called by SrcFinfo::send.
	Slot 0 is reserved for parent->child messages
	Slot 1 is reserved for element->msgSpec messages
	Slots 2 onward are used for data-transfer messages.
	For now each data-transfer slot is predefined.
	Later, if there is a proliferation of costly slot definitions, it
	should be possible to predefine slots only for those SrcFinfos which
	need to be executed efficiently. Other slots will have to generate an
	entry on the fly.

2.1.1 Element-level messaging data structures.
The Element carries two messaging data structures:
	vector< MsgId > m_; // Handles incoming messages
	vector< vector< MsgFuncBinding > > msgBinding_; // Handles outgoing msgs

The first of these is just to keep track of what comes in to the Element.
The msgBinding_ is indexed by the slot described above, which is of type
BindIndex. Each slot refers to a vector of MsgFuncBinding objects, which are
basically pairs of MsgIds and FuncIds. 
When send executes, the code iterates through this vector of MsgFuncBinding
objects, adding each MsgId/FuncId pair into the outgoing queue.

2.1.2 Shared messages and messages with many distinct target classes/functions
There are often cases where a pair of objects must exchange multiple kinds
of information. When this happens, we want to use a single Msg between the
two, to keep track of the trajectory of the data, but we want to send
many kinds of data along this Msg. 
	Example 1: when getting a field, the querying object needs to tell 
	the responder what information to send, and the responder must send
	back the field value. This is a single Msg, but different kinds of
	data travel in different directions along it in this case.  
	To accomplish this, the querying object sets up an entry in its
	msgBinding_ vector. The responding object in this special case
	just puts the other end of the Msg in its m_ vector.
	- Execution works as follows: Querying object calls send< FuncId >
	where the FuncId identifies the 'get_field' function on the responder.
	The get_field function is registered automagically by the Field< T >
	field definition as a GetOpFunc. The operation done by this function
	is to get the field value, and send it right back along the querying
	Msg using somewhat low-level call to Qinfo::addSpecificTargetToQ.
	Example 2. A molecule might send a display object both its xyz
	coordinates, and also its state. It might do so at different times,
	called by different SrcFinfo::send functions.
	Here we have more than one kind of data go from the source
	to the target object.
	- To accomplish this, the two SrcFinfos have distinct slot values,
	pointing to distinct msgBinding_ entries. The MsgFuncBindings point
	to the same MsgId but distinct FuncIds for the respective SrcFinfos. 
	Example 3. A molecule sends its conc to a reaction, and the reaction
	sends back the change in conc on each timestep.
	- This is really a sync message problem. If done using async calls,
	the SrcFinfos on either side of the Msg would point to their own
	msg slots, each referring to a BindIndex entry. We don't really worry
	about what the m_ vector does, it isn't too costly for this to
	maintain the Msg at one or both sides of the msg, but it isn't
	necessary either. Since the Shared Msg setup will use a single call,
	it is not hard to do it efficiently. Since the high-level details of
	the Msg are stored in the MsgSpec, we don't worry too much about
	traversal either.

2.2 Calls to messages within MOOSE objects.
To send a message, one just calls
	srcFinfoInstance->send( Eref current, ProcInfo p, arg1, arg2 ... )


Scheduling:
clearQ: Manages the event queue for the async messaging. Here are the calls:

Tick::advance(): // Calls Qinfo::clearQ on the global event queues.
Qinfo::clearQ: marches through the queue, each entry of which identifies the
	operative message and hence target Element. Also identifies function
	to call, and the arguments.
Msg::exec: specialized for each Msg subclass. Calls func on all target entries.

Still to do: Sort out flow control to interface threads such as graphics
and console. Presumably these would go through other Jobs. There is some
trickiness in how they fill in queues for the regular objects to clear.

Functioning:
This is how asynchronous messaging works.
slot s = Slot1< double >(Conn#, Init<TargetType>Cinfo()->funcInfo("funcName"));
	This slot constructor does an RTTI check on the func type, and if OK,
	loads in the funcIndex. If bad, emits warning and stores a safety
	function.
	Limitation: handling multiple types of targets, whose Functions are
	not just inherited variations. e.g., conc -> plot and xview.
	To get round it: these are simply distinct Connections.
s->send( Eref e, double arg ); // Uses s to send the data in a typesafe way.
	Converts the funcId and argument into a char buffer.
void Eref::asend( Slot s, char* arg, unsigned int size); 
	//Looks up Conn from Slot, passes it the buffer with funcId and args.
void Conn::asend( funcIndex, char* arg, unsigned int size); 
	// Goes through all Msgs with buffer. 
void Element::addToQ( FuncId funcId, MsgId msgId, const char* arg, unsigned int size ); 
	Puts MsgId, funcId and data on queue. May select Q based on thread.
	Multi-node Element needs a table to look up for MsgId, pointing to 
	off-node targets for that specific MsgId. At this time it also inserts 
	data transfer requests to those targets into the postmaster buffers.
	Looks up opFunc, which is also an opportunity to do type checking in 
	case lookup shows a problem.
OpFunc Element::getOpFunc( funcId ); // asks Cinfo for opFunc.
OpFunc cinfo::getOpFunc( funcId );
<Need to clear up arguments of functions above>
At this point all the info is sitting in the asyncQueue of the target Element.
Eventually the scheduling gets round to calling clearQ as discussed above.
<Need to put in handling of synaptic input>


SYNCHRONOUS MESSAGING
Scheduling:
process: Manages synchronous messaging, which is similar to the traditional
	GENESIS messaging in that objects transfer fixed amounts of data every
	timestep. Here are the calls:
Tick::process(); // Calls Process on targets specified by Conn.
Conn::process( const ProcInfo* p ); // Calls Process on all Msgs.
Msg::process( const ProcInfo* p ); // Calls Process on e1.
Element::process( const ProcInfo* p );// Calls Process on all Data objects.
	Partitions among threads.
virtual void Data::process( const ProcInfo* p, Eref e ); 
	Asks e to look up or sum the incoming data buffers, does computation,
	sends out other messages.  This involves dumping data directly into 
	the target sync buffers. Note that the memory locations are fixed 
	ahead of time and are distinct, so this can be done simultaneously 
	from multiple threads.

Functioning:

Here are the calls that the Elements/Data use to send out data:

<Here Slot means an index to the hard-coded location for the buffer, different
from the new Slot class above. Will need to clear up nomenclature.>
void ssend1< double >( Eref e, Slot s, double arg ); // Calls Eref::send.
Eref::send1( Slot slot, double arg); // Calls Element::send
Element::send1( Slot slot, unsigned int eIndex, double arg ); 
	Puts data into buffer: sendBuf_[ slot + i * numSendSlots_ ] = arg;
Note that we assume all sync data transfer is doubles. Probably a good
assumption, may help with data alignment.

Here are some of the calls that Elements use to examine the dumped data:
double Eref::oneBuf( Slot slot ); 
	Looks up value in sync buffer at slot:
	return *procBuf_[ procBufRange_[ slot + eIndex * numRecvSlots_ ] ];
double Eref::sumBuf( Slot slot ); 
	Sums up series of values in sync buffer at slot. Uses similar operation
	except that it iterates through the offset variable:
		offset = procBufRange_.begin() + slot + i * numRecvSlots_;
double Eref::prdBuf( Slot slot, double v ); 
	multiples v into series of values in sync buffer at slot, similar
	fast iteration through buffer using offset.

< To clarify: How to set up these buffers when defining messages >


SPORADIC FUNCTION CALLS
Case 1: Existing message between Elements, using all targets:
Slot* s = Slot1< double >(...)
s->send( Eref e, [args] )
Case 2: Message does not exist
bool set< double >( Id tgt, FieldId f, double val );
bool set< double >( Id tgt, const string& fieldname, double val );
Both of these functions create a temporary object and add a temporary message.
<More details to come>
Case 3: Message exists, but want to send to a single target:
s->sendTo( Eref me, Id tgt, [args] );

=============================================================================
FIELD ACCESS. Updated 18 Apr 2010.
Field and function access is routed through the Shell object, /root. 
The Shell is present on all nodes. 
For the coder, the field access functions have a templated front-end described
	below. This is more efficient than the string interface functions.
For the parser talking to the Shell, the field/func access functions go through
	string-ified arguments void doSet( Id, DataId, field, args) and
	string doGet( Id, DataId, field )

These functions are mediated by the SetGet<N>< Type > class templates. 
N can be any number of arguments, from 0 to 5.
Each provides one consistent function:
SetGet1< double >::set( const Eref& dest, const string& field, A arg )

For N = 1 we also have the following functions.
SetGet1< double >::strSet( const Eref& dest, const string& field, 
	const string& val )
string SetGet1< double >::strGet( const Eref& dest, const string& field )
SetGet1< double >::setVec( const Eref& dest, const string& field, 
	const vector< A >& arg )
We would like to also make a function:
SetGet1< double >::getVec( const Eref& dest, const string& field, 
	vector< A >& arg )

In addition, the Field< Type > template is derived from SetGet1< A > and
is designed to interface with the automatically generated functions for
field access from ValueFinfo: set_<fieldname> and get_<fieldname>
It provides type-specific functions:
bool Field< double >::set( const Eref& dest, const string& field, double arg )
double Field< double >::get( const Eref& dest, const string& field )


Inner functioning.
Set works as follows:

Func		Src		Dest			Args
Master::innerDispatchSet
		requestSet	Worker: handleSet	Id, DataId, FuncId, 
							Prepacked Buffer
Worker::handleSet
		lowLevelSet	Target::set_field	None: hacks in the data.
		ack		Master::handleAck	

				-------------
Get works as follows.

Func		Src		Dest			Args
Master::innerDispatchGet
		requestGet	Worker::handleGet	Id, DataId, FuncId

Worker::handleGet on non_tgt node
		ack		Master::handleAck
Worker::handleGet on tgt node
		lowLevelGet	Target::get_field::	Eref, buf
				GetOpFunc

Target::get_field::GetOpFunc::fieldOp
		Hacked func 	Worker::recvGet		node, status, 
		to add directly from RetFunc		PrepackedBuffer
		into Q
		
Worker::recvGet
		Puts the data into the buffer
		calls handleAck to complete the operation.

Field access is set up using ValueFinfo and its variants.
ValueFinfo: uses 
	setFunc of form  void ( T::*setFunc )( F )
	getFunc of form  F ( T::*getFunc )() const
to access fields. Wraps these functions into OpFunc1 and GetOpFunc. Generates
functions named "set_" + fieldname and "get_" + fieldname for the Finfo array.

ReadOnlyValueFinfo: Similar, only doesn't use the setFunc.


Internals for SetGet access to fields.
Sets:
1. DestFinfos provide templated functions with different types and # of args.
	SetGet< ArgTypes >::set( const Eref& dest, const string& field, Args)
	This munges the arguments into a char* array.
2. Shell::innerSet( Eref, funcId, val, size ) (taking over SetGet::iSetInner)
	This creates a message to the target Eref:Finfo, and passes the data.
	This message is predefined
		- Not deleted/recreated for different targets, just changes
			the target
		- Avoid running up the MsgIds for this Msg. It is always the
			same on all nodes.
		- Avoid having different nodes end up with different MsgIds
			due to creation on local node.
	2.1 When running on multiple nodes, this first sends out the data
	to all workers and the owner of the target object has to deal with it.
	2.2 The owner shell then passes back an ack to the master node.
3. The target function sends back an ack to the calling Shell? Need this to 
	ensure serial operation. But it will mess up Destfinfos.

Gets:
1. ValueFinfos provide the proper combination of a handler and a return
	function that sends the return value back.
	Field< Type > currently handles the return op, not sure why.
	Type Field< Type >::get just passes data on to the Shell::innerGet
2. Shell::innerGet( Eref, DataId, requestFid, handleFid, field )
	replaces SetGet::iGet in doing field validation, setting up msgs,
	and sending request. Now it polls till the return comes in.
3. Shell::handleGet deals with the return value, just bunging it into a buffer
	on the Shell object. We don't know yet what the type is.
4. Shell::innerGet again, having waited till the data came home. Passes
	data back to the Field< Type >::get.
5. Field< Type >::get now has a buffer and knows how to deal with it, converts
	back to the desired type and returns.
Note that we can probably replace all of the Field stuff with SetGet1< Type >.

We have a small problem here when handling array fields that also need to 
refer to the parent object. For example, when dt values on clock ticks are
modified the parent clock needs to do some re-sorting.

A related problems is to set up zombies. In all these cases, things would be
easier if we always had access to the full DataId as one of the function
arguments, as in UpFunc. This is untidy for regular function calls.
For that matter we might want to use yet more information arguments, such
as the target Element and the Qinfo, as in Epfuncs.

------------------------------------------------------------------------------
Setup
FIELD DEFINITION
Similar to earlier MOOSE, have an initClassInfo function for each class,
that is meant to be called in order at static initialization. The ordering
is ensured by having the static initializers call the static initializers
of their own base classes. 

With initClassInfo, set up a static array of finfos.  Entries like:
new Finfo( async1< Reac, double, &Reac::setKf >, "Kf" )
Function handling is cleaner in at least three ways:
- It does not require static typecasting of functions, 
- It does static typecasting of the char buffer within a templated function
- It uses member functions of the Data class directly, rather than static
	functions.

Note that a single Finfo might refer to more than one OpFunc. For example,
a ValueFinfo will typically define a set func and a get func.
A LookupFinfo will define a set func with index, a get func with index, a
	get func to return size of table, and a get func to return the whole
	table.

<Still need to fully define how the Finfos control runtime message setup>

ValueFinfo: Stores functions for set and get operations on a specific field.
LookupFinfo: Stores functions for set with index, get with index, a
	get to return size of table, and a get to return the whole table.
DestFinfo: Stores a single function.
SrcFinfo: Defines origin of a message. Manages the send and sendTo operations.
	Typed. Provides index for FuncId lookup.
Note that the Finfos do NOT map one-to-one to the funcs.

The Finfo may have multiple functions in it. Each function is defined using
an Ftype that provides 3 functions: 
constructor: Loads in the function of the form &Reac::setKf.
checkSlot( const Slot* s): Does RTTI check on the slot to ensure compatibility.
op( Eref e, const char* buf): converts the data entry and buffer and calls func.

MESSAGE SETUP
bool add( Element* src, const string& srcField,
	Element* dest, const string& destField );
	Decides if fields are for simple or shared messages.
Case 1: Simple Msg:
bool addMsgToFunc( Element* src, const Finfo* finfo, Element* dest, FuncId fid )
	Makes a new Msg, with Src and Dest Elements. 
Msg::Msg( src, dest ): constructor registers the new Msg on the src and dest:
	Element::addMsg( m )
	This part needs thought, to decide what kind of Msg to make. 
Conn::add( m): To put the Msg on the Conn.
src->addConn : To put the conn on the Src.
src->addTargetFunc: To put the target Func info on the Src.

Case 2: Shared Msg.
addSharedMsg( Element* src, const Finfo* f1, Element* dest, const Finfo* f2 )
Yet to implement.

------------------------------------------------------------------------------
Tree operations.
Move:
Shell::doMove( Id orig, Id newParent )
This function moves the object orig to the new parent NewParent. It does so
by deleting the old parent->child Msg and creating a new one.

Shell::doCopy( Id orig, Id newParent, string newName, unsigned int n, bool copyExtMsgs)
This function creates a copy of the 'orig' object and all its descendants.
It renames the object to 'newName'.
This copy includes all parameters and state variables. The copy can be multiple,
so that the single original now becomes an array of size n. If this happens
then the DataHandlers of the copy have to be upgraded to the next higher
dimension. Likewise, Messages must also be upgraded.
The copy always includes the messages that were within the copied tree. If the
copeExtMsgs field is True then all messages are copied.
Present status: only n = 1 and copyExtMsgs = 0 are currently supported.
Tested with all current implementations of messaging.
Not tested on multiplen nodes.

Shell::doDelete( Id i )
Destroys Id i, all its messages, and all its children.

------------------------------------------------------------------------------
OpFunc lookup
Every function used by MOOSE has a unique FuncId, consistent across nodes. 
This is assigned at static initialization time when the Cinfo constructor
scans the FinfoArray. 
Managed in Cinfo::funcs_[FuncId]
Assigned in Cinfo::init using Finfo::registerOpFuncs

There is a static global vector of OpFuncs in the Cinfo class that has entries
for every function so defined. The FuncId is the index into this vector.
Additionally, each Cinfo instance has a vector of OpFuncs applicable to itself,
again indexed by FuncId. Invalid FuncIds for the class have a zero pointer.

FuncId 0 is a dummy function.

------------------------------------------------------------------------------
Solvers
Solvers take over operation of a number of regular objects. This is done
by replacing the data and class info of the original, with special
'Zombie' versions provided by the solver. Typically the data provided is
a wrapper for the data of the solver itself, and the class info is a
set of access functions that act on the data, with exactly the same interface
as the original class. The messages of the zombified Element are left intact,
with the possible exception of the 'process' message.

To do zombification, the Element provides several handy functions:
MsgId findCaller( fid ): Finds the first Msg that calls the specified fid.
getInputMsgs( vector< MsgId >, Fid ): Finds all Msgs that call the fid.
getOutputs< vector< Id >, const SrcFinfo* ): Finds all target Ids of the Src
getInputs< vector< Id >, const DestFinfo* ): Finds all src Ids of the Dest
zombieSwap( const Cinfo*, DataHandler* ): Replaces Cinfo and data on Element.

The solver data class should provide utility functions for the Zombies to
covert object Ids to lookup indices into the solver.


Original Data is replaced completely with a solved version
The Element uses a replacement Cinfo to handle this, so that the operation:
	OpFunc Element::getOpFunc( funcId ); // asks Cinfo for opFunc.
is now redone. This handles all field access and async messages.
< Need to work out what to do with sync messages >







Slot::sendTo
	- Sets up a temporary buffer for the argument and target index.
	- only it doesn't use it. Could replace the target index stuff
		done below in Conn::tsend.
Eref::tsend
	- Creates a Qinfo with the useSendTo flag
	- Element looks up a Conn based on ConnId.
Conn::tsend
	- Looks for a Msg with a target Element matching the target Id.
	- Appends the target index after the arg in the buffer.
	- Updates the Qinfo to indicate the new size
Msg::addToQ
	- the Qinfo gets the msgId of the target
Element::addToQ
Qinfo::addToQ called with the queue vector from the Element.
	- copies the Qinfo and then the arg into the queue.

...................... There it sits till Element::clearQ	

Element::clearQ
	Marches through buffer
Element::execFunc
	- Extracts Qinfo
	- Looks up func and Msg
	- If useSendTo:
		OpFunc::op Executes the function.
	- else:
		Msg::exec
			- Sorts out which target to use. 
				Unnecessary as it is on it
		Opfunc::op: Executes the function.
		

------------------------------------------------------------------------------
Parallelism and grouping.
After some analysis, it seems like many parallel simulations will have a 
lumpy connectivity. That is, there will be groups with high internal
connectivity. Different groups and stray other elements have sparser
connectivity. For example, a brain region (e.g., OB) will be highly connected
internally, and much more loosely connected externally. So the OB would form
one group, and perhaps the Piriform another group.

Multithreading and MPI-based decomposition are very similar as one scales up:
in both cases we need buffers for data transfer. Up to a point it makes sense
to amalgamate all MPI based communication for each node even if it has many 
threads internally. This is the main asymmetry.

------------------------------------------------------------------------------

MULTITHREADING.
Async:
Setup:
Nothing special
Send:

Each thread has a separate section of the Queue on each target Element. So
there is never any need for individual mutexes.
clearQ: Three possible levels of separation.
	- Separate at the level of Msgs for large Msgs handling lots of
	targets. Separate chunks of the targets could be assigned to
	different threads. Needs that no other threads are modifying target.
	- Separate at the level of Elements; so that each Element is on a
	different thread. Need to set up enough Elements to balance this
	properly. Can we set up multiple Elements on the same node with the
	same Id? This would work well for cases where there are huge numbers
	of Elements.
	- Separate at the level of solvers. Works where one big solver handles
	lots of Elements. This becomes very solver-specific, and we would
	have to define rules for how the solver is allowed to do this.
Sync:
No problems because the memory locations are distinct for all data transfer.

Data transfer of 'group' queues, from perspective of each thread.
        - During process, put off-group stuff into off-group queues.
                - on-node other threads; and off-node data each have queues.
        - During process, put in-group data into own 'output' queue.
        - When Process is done, consolidate all in-group 'output' queues.
        - Send consolidated in-group queue to all nodes in group
        - off-group, on-node queues are handled by their owner threads.
        - Send off-group, off-node queues to target nodes.
        - Receive consolidated queues from on-group nodes.
                [further consolidate?]
        - Receive mythread input queue from off-group, on-node threads
        - Recieve anythread input queues from off-group off-node
                [Consolidate input queues ?]
        - Iterate through consolidated queue for in-group, on-node.
        - Iterate through consolidated queue for in-group, off-node.
        - Iterate through input queue for off-group, on-node
        - Iterate through input queue for off-group, off-node.
                - Each thread will have to pick subset of entries to handle.

Data transfer of 'non-group' queues, from perspective of each thread:
        - During process, put off-group stuff into off-group queues.
                - on-node other threads; and off-node data each have queues.
        - During process, put own stuff into own input queue.
        - off-group, on-node queues are handled by their owner threads.
        - Send off-group, off-node queues to target nodes.
        - Receive mythread input queue from off-group, on-node threads
        - Recieve anythread input queues from off-group off-node
                [Consolidate input queues ?]
        - Iterate through input queue for off-group, on-node
        - Iterate through input queue for off-group, off-node.
                - Each thread will have to pick subset of entries to handle.

.........................................................................
Setting up multithread scheduling.
For comparison, here is regular scheduling
Shell::start:
	calls Clock::threadStartFunc( void* threadInfo ), no threads.
Clock::threadStartFunc( void* threadInfo ) is the static thread function.
	calls Clock::Start in non-thread version, with args in threadInfo
Clock::Start:
	If just one tickPtr: advance through the tickPtr for the entire time.
	If multiple tickPtrs (i.e., many dts):
		sort the tickPtrs
		loop while tickPtr[0].nextTime < endTime
			advance tickPtr0 till nextTime
			sort again
			update nextTime.
	*---*

Multithread scheduling: Assume group-wise queue handling for now.
Shell::start:
	Makes barrier. Spawns child threads, with threadStartFunc.
	 Waits for them to join.
Clock::threadStartFunc( void* threadInfo ) calls tStart in threaded version.
Clock::tStart()
	if just one tickPtr, advance through it for the entire time.
	TickPtr::advance: loop while nextTime_ < endTime. In this loop,
		call all the Ticks (at different stages)
		Need to cleanly handle advancing of p->currTime and nextTime_,
		which are used by all threads. So this must be made thread-safe.
		

		Tick::advance(): 
			Qinfo::clearQ( threadId )
				Note that here there is a Q shared among
				all threads, and it is treated as readonly.
			Conn::process.
				Puts outgoing data into thread-local output Q.
			Condition: when all other threads are done, the
				Qs are either merged or put into a frame-buffer
				so that all threads can now handle entire
				set as readonly.
			Or, Barrier: Ensure all threads are done
			Thread 0: Merge Qs into a frame-buffer, set as readonly
			Barrier: Ensure thread0 is done before proceeding.
			Equivalently, the entry into the clearQ routine could
			be the synchronization point.
			
	If multiple tickPtrs:		
		
	

	
	*---*

Qinfo::readQ( threadId )
	Reads the data from threadId. It is entirely readonly, and is thread-
	safe. Different threads can use the same threadId.

Qinfo::clearQ( threadId )
	Clears the data from threadId; that is, zeroes it out. 

The design sequence is that for 

------------------------------------------------------------------------------

MPI:
Async:
Setup: need to figure out off-node targets. See below for send.

Send:
Msg has been preconfigured to add to the regular Q, the outgoing Q, or both.
Shouldn't this be separate Msgs? But it is to the same Element, just to
a different buffer on it.

ClearQ:
On-node stuff is as usual.
Postmasters (or Elements themselves) clear out off-node Qs in a straightforward
way, since everything is now serialized.
Once these arrive on the target node, they simply get dumped into the incoming
Q of the target Element.


------------------------------------------------------------------------------

Managing space, specially in chemical models.

Conceptually, each Compartment is enclosed by one or more Geometry
objects. There may or may not be an adjacent Compartment on the other
side. The Compartment has a vector of Boundary objects, which are
managed in a FieldElement. 
The geometry/adjacency relationships are defined by messages from the
Boundary FieldElement.
Compartments also have a 'size' and 'dimension'. I haven't yet worked
out a general way to get this from an arbitrary set of Geometries
around them, but in principle it can be obtained from this.
Again, in principle, the size may change during a simulation.

