MFS interface, capabilities and limitations:
============================================
To use MFS, configure the "CONFIG_MOSIX_FS" option into the kernel
then mount it, using:
	mount -t mfs {any_name} {mount_point} [-o dfsa={n}]
This gives access to nearly all files throughout the openMosix cluster,
with the root of each node available via {mount_point}/{node_number}/.
Also available as sub-directories are:

/{mount_point}/here/
	The current node where your process runs
/{mount_point}/home/
	Your home node
/{mount_point}/magic/
	The current node when used by the "creat" system call (or an "open"
	with the "O_CREAT" option) - otherwise, the last node on which an MFS
	magical file was successfully created (this is very useful for creating
	temporary-files, then immediately unlinking them)
/{mount_point}/lastexec/
	The node on which the process last issued a successful "execve"
	system-call.
/{mount_point}/selected/
	The node you selected by either your process itself or as inherited
	by one of its ancesstors (before forking this process), writing a
	number into "/proc/self/selected".

You may also wish to have MFS be automatically by entering the following
line into "/etc/fstab":
        cluster    /{mount_point}    mfs    defaults    0 0
or for DFSA use:
        cluster    /{mount_point}    mfs    dfsa=1      0 0

Once CONFIG_MOSIX_FS is configured in the kernel and openMosix has been configured
(See "man setpe"), other nodes can access the node's file-system even without
the above mount.  To disallow MFS access to this node, write a "1" to
"/proc/hpc/admin/nomfs" (to re-allow, write a "0").

MFS was designed to also run under DFSA, allowing direct access by processes,
from wherever they run at each moment, to the node holding the files/directories
that they require, bypassing their "DEPUTY" in most cases.  This makes it even
more efficient if the required files are on the same node as the process, in
which case the process can serve itself without resorting to the network.
To use MFS with DFSA, make sure that the mount-point is the same on all nodes,
then mount (or remount) MFS with the "-odfsa={n} flag, where {n} is in the
range of 1-8 and identical on all nodes in the cluster:

Users and Groups:
-----------------
MFS assumes that all user and group ID's throughout the cluster have
equivalent access rights.  You should not use MFS on clusters with
heterogenous user/group scheme.  While allowing the Super-User to access
all files throughout the cluster, this is implied anyway by the security
requirements of openMosix (See "man mosix").  If most of your cluster uses the
same scheme, but some nodes do not, you may either configure MFS only in the
kernel of those nodes that use the same scheme, or write a "1" to
"/proc/hpc/admin/nomfs" during node-startup and before openMosix is configured
on the other nodes, as well as not mounting MFS there.

Temporary files:
----------------
the "here", "magic", "lastexec" and "selected" directories are designed
to provide easier access to temporary files, so that programs are helped
to create their temporary files where they run.  With many programs, you
can make use these directories without recompiling, by using the "TMPDIR"
environment variable.

The most conservative, but safest thing to do, which can be applied to all
programs, is to:
	setenv TMPDIR "/{mount-point}/selected/tmp
In this case, your shell (or the calling script) should run
	echo `cat /proc/self/where` > /proc/self/selected
before calling the program.
(note that "cp" cannot be used here, since only the shell may modify its
own "selected", but "echo" works because it is built into most shells)

The next, little less conservative approach, but still safe for programs
that do not rely on passing file-names to their children as arguments of
"exec", is to:
	setenv TMPDIR "/{mount-point}/lastexec/tmp
       (or "env TMPDIR=/{mount-point}/lastexec/tmp program [args]")

The next, still less conservative, but more powerful approach, can be used
for programs that create temporary-files, which either create only one MFS file,
or unlink temporary-files as soon as they are created.  For such programs:
	setenv TMPDIR "/{mount-point}/magic/tmp
       (or "env TMPDIR=/{mount-point}/magic/tmp program [args]")

Finally, programs that are locked on any particular node, may use:
	setenv TMPDIR "/{mount-point}/here/tmp
       (or "env TMPDIR=/{mount-point}/here/tmp program [args]")
Please note that this approach is not 100% safe, because even while locked,
migration back to the home-node may still occur if/when the node where the
program runs is being shut-down for reboot.

Of course, when designing a new program to run with MFS,
all the above methods can be freely mixed.

Interpretation of symbolic-links:
---------------------------------
The following non-trivial interpretation of symbolic links found within MFS,
was designed to provide uniformity of access between links created locally
and via MFS, especially by scripts and "makefile"s that use `pwd` as part
of symbolic links:

The rule is that when a symbolic link begins with a '/', it refers to the
root of the file-system's node - not the home-node!
Similarly, a "/.." (or any combination with ".." that calls for the parent of
the file-system's root) refers to the file-system's root again, rather than to
the MFS mount-point.

One of the implications is that a symbolic link is never allowed to cross nodes.

Excluded files:
---------------
The following may not be accessed via the MFS file-system:
* nodes that excluded themselves.
* special files - other than regular-files, directories or symbolic-links.
* the "proc" file-system.
* any subdirectories of the recursive MFS mount-point with the exception
  of symblic links starting in '/', pointing to the same node, and doing
  so only once.

Examples:
assuming that there are 3 nodes in the cluster and on node #2:
1) MFS is mounted on "/mfs"
2) "/usr/src/linux_here" is a symbolic link to "/mfs/2/usr/src/linux"
3) "/usr/src/local_linux" is a symbolic link to "../../mfs/2/usr/src/linux"
4) "/usr/src/other_linux" is a symbolic link to "/mfs/3/usr/src/linux"
5) "/usr/src/mfs_linux" is a symbolic link to "/mfs/2/mfs/2/usr/src/linux"

then the following are accessible:

/mfs/2/usr/src/linux
/mfs/2/etc/hosts
/mfs/2/mfs
/mfs/2/usr/src/linux_here
/usr/src/local_linux
/usr/src/other_linux

but the following are not (and will result in "Permission denied" error):

/mfs/2/dev/tty6			(special character device)
/mfs/2/proc/hpc		("proc" file system)
/mfs/2/mfs/2/tmp	(to prevent infinite recursion and confusing the shell)
/mfs/2/usr/src/local_linux	(symbolic-link does not start with '/')
/mfs/2/usr/src/other_linux	(symbolic-link pointing to another node)
/usr/src/mfs_linux		(symbolic-link pointing to local node twice)

(please note, however, that symbolic-links are still readable
with "lstat" and "readlink" regardless of their contents)

Garbage Collection:
-------------------
When either a client node or part of the network crashes, a garbage-collection
mechanism will eventually clean up the references to the held-files or
directories on the serving node(s).  It may take, however, up to an hour
until the server(s) finally give up the connection, during which the serving
node(s) will not be able to un-mount the particular file-system(s) involved.

The Super-User may still force an un-mount in 3 ways:
1) disable MFS by writing a "1" to "/proc/hpc/admin/nomfs".
2) un-configure openMosix by running "setpe -off".
3) write the name of a file or directory to be released to
   "/proc/hpc/admin/mfskill".  If the given name is of a directory, all
   files and sub-directories under it will be released as well (with the
   possible exception of files being actively accessed at that very moment),
   thus writing '/' releases everything, but is very distruptive to users,
   so it is better to write the name of the mount-point of the file-system
   that you wish to un-mount.

Functionality limitations:
--------------------------
* Mandatory file-locking is not supported.
* the F_NOTIFY fcntl option is not supported.
* Voluntary file-locking only operates among processes of the same home-node
  (and since it will not be supported by DFSA, it always requires DEPUTY-
  assitance on the home-node).
* file-ioctl is currently only supported for the EXT2 file-system.
* mmap of MFS files only supports private mappings (MAP_PRIVATE).
  Open files must have read-permission.
  The actual implementation of "mmap" and "execve" does not use demand-paging,
  but rather reads in the relevant text/data from the file before proceeding.
* Every effort was attempted to prevent giving the same inode-number to
  different files, and in most cases this is the case, but it is not totally
  possible with only 32 bits inode-numbers and the large potential number of
  files on numerous nodes and devices within each node.  Priority is given so
  that files on any particular node do not get the same inode-numbers, but even
  this cannot be absolutely guaranteed when some of the files are NFS (or other
  file-systems that use the full 32-bit space for inode numbers).  To identify
  an inode most accurately, one should use the raw "stat" ("fstat"/"lstat")
  system-call as provided by the kernel before being filtered by the
  compatibility library, providing the node-number in the "__unused1" field,
  the device-number in the "__unused2" field and the local inode-numer in the
  "__unused3" field (these fields are currently always 0 for non-MFS).
  In the "stat64"/"lstat64"/"fstat64" system-calls, the node number can be
  found in "__pad0[2-3]", the device-number in "__pad0[4-5]" and the local
  inode number in "__pad0[6-9]".
