Skip to content

Design: Process abstraction

gerdstolpmann edited this page May 2, 2016 · 6 revisions

Current

At the moment, there is no real process abstraction in the omake sources, though there is some very interesting shell functionality.

There are two ways of launching processes:

  • as command pipeline for executing rules
  • from the omake DSL for getting the output of a command: $(shell ...)

Commands may be written in omake itself, or may be external commands. At the moment, omake guarentees parallel execution of pipeline commands and that the commands are really connected with pipes (generally no temp files, except of a few hardcoded special cases, like reading in dependency files). This is no problem when the command is an external command. When the command is an internal command, omake has to spawn a second runner. This is a forked subprocess on Unix and a second thread on Win32.

Also, the pipelines are not necessarily executed on the local host. There can be remote command relays (for distributing the build load to several hosts).

Next

Generally, I'd like to simplify things a lot. I don't think that omake profits a lot from real parallelism. So let's drop this. The complications of parallelism: on Unix, forking the omake process is a very expensive operation, in particular when omake has already allocated a lot of memory. On Win32, it is tried to emulate "fork" with the help of threads, but this emulation is cumbersome and error-prone. The alternative is to resort to temporary files where needed. Analysis of this:

  • An external-only pipeline ext1 | ext2 | ... | extN can still run in parallel with the caller. The commands are connected with pipes, as well as input/output of the while pipeline with the caller. No difference from what is done at the moment.
  • When there is a single internal command part of the pipeline ext1 | int1 | ... | extN we suspend the execution of the caller, and run this internal command instead (in the omake process). This internal command is connected with pipes to the external commands, and runs in parallel to these commands. Any output produced by the pipeline is diverted into a temp file. If the input of the pipeline is not redirected we are done. If the input of the pipeline is redirected we handle the whole things as if the pipeline were of the form caller | ext1 | int1 | ... | extN. The caller code is done when the file handle writing to the pipeline is closed - this triggers the execution of the rest of the pipeline. (=> We need the ability to catch the event of closing file handles.)
  • When there are several internal commands ext1 | int1 | ... | int2 | ... | intM | extN. Assume that the caller never writes into the pipeline (because we can transform this into a longer pipeline caller | ...). We break the pipeline up into sequential pieces consisting of at most one internal command and any number of external commands. The sequential piece k writes into a temporary file, and piece k+1 reads from this temp file.
  • When an internal command intk is running, it is possible that this command invokes another sub pipeline. This means our process model needs to be hierarchical.

Difficulty: When an external command is run, it inherits a lot of file descriptors from omake. We need to be careful here, and set CLOSE_ON_EXEC where needed.

Process abstraction

A process may be an internal process or an external process. The latter guys are already implemented by the OS. What is an internal process? We have:

  • An identifier. Because our processes are hierarchical, so are our IDs. The initial process has ID [0]. The children have IDs [0;0], [0;1], etc. The children of [0;1] have IDs [0;1;0], [0;1;1], etc.
  • A current working directory (use an OS file descriptor for saving cwds, there is fchdir).
  • Environment variables.
  • File handles for stdin, stdout, stderr. Generally our abstraction needs to provide our own omake-handles, with our own read/write impl and our own close. Need the ability to catch closed handles.
  • State, and the ability to clone the state for creating child processes, or clone/modifying the state.

A process has the ability to spawn a pipeline, made from subprocesses.

Attempt of some interface.

type 't int_runnable             (* internal runnable (command) with state 't *)
type ext_runnable                (* external runnable *)
type 't process                  (* internal process, self-view *)
type process_handle              (* handle for a child process, internal or external *)
type process_status = ...        (* see Unix.process_status *)

type path_anchor =
 | Cwd
 | Absolute of path
type path_walk = path_anchor * string list

type file_handle

(* need file operations *)
val open_in : ... -> file_handle
val open_out : ... -> file_handle
val close : file_handle -> unit
val with_handle : (Unix.file_descr -> 'a -> 'b) -> file_handle -> 'a -> 'b    (* be pragmatic *)
val wrap : Unix.file_descr -> file_handle   (* be pragmatic *)
val connect_in : unit -> file_handle    (* writable handle pushing data to stdin of child *)
val connect_out : unit -> file_handle   (* readable handle pulling data from stdout of child *)
val connect_err : unit -> file_handle   (* readable handle pulling data from stderr of child *)

(* The "connectors" are either implemented with temporary files or with pipes *)

val block_until_readable : file_handle -> unit
val block_until_writable : file_handle -> unit

val read : file_handle -> bytes -> ...
val write : file_handle -> bytes -> ...
(* read/write functions for convenience *)


val ext_runnable : cwd:path_walk -> env:... -> command:(string * string array) ->
       ext_runnable

val int_runnable : cwd:path_walk -> env:... -> command:('t process -> process_status) ->
       description:string ->
       't int_runnable

(* in the following: 's = the state type of the current process. 't = the type of the child *)

type 's spawner
val ext_spawn : ext_runnable -> 's spawner
val int_spawn : ('s -> 't) -> 't int_runnable -> 's spawner
(* Spawn a child by copying the current state of type 's with the 's->'t function *)

val redirect_stdin : 's spawner -> file_handle -> 's spawner
val redirect_stdout : 's spawner -> file_handle -> 's spawner
val redirect_stderr : 's spawner -> file_handle -> 's spawner

(* e.g. Assign stdin of subprocess to a file:

let fh = open_in "file"
let sp2 = redirect_stdin sp1 fh
let ph = execute sp2 state
close fh
let ps = wait ph

Read from stdin of a subprocess:

let fh = connect_in()
let sp2 = redirect_stdin sp1 fh
let ph = execute sp2 state
let ps = wait ph
read fh buf ...
close fh
*)

val pipeline : 's spawner list -> 's spawner   (* | *)
val sequence : 's spawner list -> 's spawner   (* ; *)
val and_then : ...   (* && *)
val or_else : ...    (* || *)

val execute : 's spawner -> 's -> process_handle   (* & *)

val on_termination : process_handle -> (unit -> unit) -> unit
val on_waiting : (unit -> bool) -> unit
val wait : process_handle -> process_status
val wait_on_any : unit -> (process_handle * process_status) option
 (* None = no subprocess running *)

val state : 't process -> 't
val pid : _ process -> int list
val description : _ process -> string

val current_pid : unit -> int list
val current_description : unit -> string

Caveats

  • Deadlocks possible, e.g. wait on a process, but the waiter must deliver data to the process
  • file handles may be part of process state, and may be inherited to children. Some aspects of the handles are per process, however, in particular the file position. Not sure how to handle this, but I guess we need a file position per handle and process
  • don't allow select() on file handles that are used for connecting processes
  • when waiting for a process, all handles writing into this process must already be closed
  • we can take care of the handles created with our version of open_* and with connect_*. However, wrapped normal Unix.file_descr are limited. E.g. if a socket is assigned to stdin we support this only when all runnables are external.

Channel abstraction

In the omake DSL there is a channel abstraction. This abstraction must be easily reducible on this API.

Job control

osh supports some job control, like foreground and background jobs. Unsure what to do, probably drop this feature. omake is a build utility, not shell replacement.