Java Fibres API Explained.

June 08, 2019

Java Fibres API Explained.

Fibers are, then, what we call Java's planned user-mode threads. This section will list the requirements of fibers and explore some design questions and options. It is not meant to be exhaustive, but merely present an outline of the design space and provide a sense of the challenges involved.

In terms of basic capabilities, fibers must run an arbitrary piece of Java code, concurrently with other threads (lightweight or heavyweight), and allow the user to await their termination, namely, join them. Obviously, there must be mechanisms for suspending and resuming fibers, similar to LockSupport's park/unpark. We would also want to obtain a fiber's stack trace for monitoring/debugging as well as its state (suspended/running) etc.. In short, because a fiber is a thread, it will have a very similar API to that of heavyweight threads, represented by the Thread class. With respect to the Java memory model, fibers will behave exactly like the current implementation of Thread. While fibers will be implemented using JVM-managed continuations, we may also want to make them compatible with OS continuations, like Google's user-scheduled kernel threads.

There are a few capabilities unique to fibers: we want a fiber to be scheduled by a pluggable scheduler (either fixed at the fiber's construction, or changeable when it is paused, e.g. with an unpark method that takes a scheduler as a parameter), and we'd like fibers to be serializable (discussed in a separate section).

In general, the fiber API will be nearly identical to that of Thread as the abstraction is the same, and we'd also like to run code that so far has run in kernel threads to run in fibers with little or no modification. This immediately suggests two design options:

Represent fibers as a Fiber class, and factor out the common API for Fiber and Thread into a common super-type, provisionally called Strand. Thread-implementation-agnostic code would be programmed against Strand, so that Strand.currentStrand would return a fiber if the code is running in a fiber, and Strand.sleep would suspend the fiber if the code is running in a fiber.
Use the same Thread class for both kinds of threads — user-mode and kernel-mode — and choose an implementation as a dynamic property set in a constructor or a setter called prior to invoking start.

A separate Fiber class might allow us more flexibility to deviate from Thread, but would also present some challenges. Because a user-mode scheduler does not have direct access to CPU cores, assigning a fiber to a core is done by running it in some worker kernel thread, and so every fiber has an underlying kernel thread, at least while it is scheduled to a CPU core, although the identity of underlying kernel thread is not fixed, and may change if the scheduler decides to schedule the same fiber to a different worker kernel thread. If the scheduler is written in Java — as we want — every fiber even has an underlying Thread instance. If fibers are represented by the Fiber class, the underlying Thread instance would be accessible to code running in a fiber (e.g. with Thread.currentThread or Thread.sleep), which seems inadvisable.

If fibers are represented by the same Thread class, a fiber's underlying kernel thread would be inaccessible to user code, which seems reasonable but has a number of implications. For one, it would require more work in the JVM, which makes heavy use of the Thread class, and would need to be aware of a possible fiber implementation. For another, it would limit our design flexibility. It also creates some circularity when writing schedulers, that need to implement threads (fibers) by assigning them to threads (kernel threads). This means that we would need to expose the fiber's (represented by Thread) continuation for use by the scheduler.

Because fibers are scheduled by Java schedulers, they need not be GC roots, as at any given time a fiber is either runnable, in which case a reference to it is held by its scheduler, or blocked, in which case a reference to it is held by the object on which it is blocked (e.g. a lock or an IO queue), so that it can be unblocked.

Another relatively major design decision concerns thread locals. Currently, thread-local data is represented by the (Inheritable)ThreadLocal class(es). How do we treat thread-locals in fibers? Crucially, ThreadLocals have two very different uses. One is associating data with a thread context. Fibers will probably need this capability, too. Another is to reduce contention in concurrent data structures with striping. That use abuses ThreadLocal as an approximation of a processor-local (more precisely, a CPU-core-local) construct. With fibers, the two different uses would need to be clearly separated, as now a thread-local over possibly millions of threads (fibers) is not a good approximation of processor-local data at all. This requirement for a more explicit treatment of thread-as-context vs. thread-as-an-approximation-of-processor is not limited to the actual ThreadLocal class, but to any class that maps Thread instances to data for the purpose of striping. If fibers are represented by Threads, then some changes would need to be made to such striped data structures. In any event, it is expected that the addition of fibers would necessitate adding an explicit API for accessing processor identity, whether precisely or approximately.

An important feature of kernel threads is timeslice-based preemption (which will be called forceful, or forced preemption here, for brevity). A kernel thread that computes for a while without blocking on IO or synchronization will be forcefully-preempted after some time. While at first glance this seems to be an important design and implementation issue for fibers — and, indeed, we may decide to support it; JVM safepoints should make it easy — not only is it not important, but having this feature doesn't make much of a difference at all (so it is best to forgo it). The reason is as follows: unlike kernel threads, the number of fibers may be very large (hundreds of thousands or even millions). If many fibers require so much CPU time that they need to often be forcefully preempted then as the number of threads exceeds the number of cores by several orders of magnitude, the application is under-provisioned by orders of magnitude, and no scheduling policy will help. If many fibers need to run long computations infrequently, then a good scheduler will work around this by assigning fibers to available cores (i.e. worker kernel threads). If a few fibers need to run long computations frequently, then it is better to run that code in heavyweight threads; while different thread implementations provide the same abstraction, there are times where one implementation is better than the other, and it is not necessary for our fibers to be preferable to kernel threads in every circumstance.

A real implementation challenge, however, may be how to reconcile fibers with internal JVM code that blocks kernel threads. Examples include hidden code, like loading classes from disk to user-facing functionality, such as synchronized and Object.wait. As the fiber scheduler multiplexes many fibers onto a small set of worker kernel threads, blocking a kernel thread may take out of commission a significant portion of the scheduler's available resources, and should therefore be avoided.

On one extreme, each of these cases will need to be made fiber-friendly, i.e., block only the fiber rather than the underlying kernel thread if triggered by a fiber; on the other extreme, all cases may continue to block the underlying kernel thread. In between, we may make some constructs fiber-blocking while leaving others kernel-thread-blocking. There is good reason to believe that many of these cases can be left unchanged, i.e. kernel-thread-blocking. For example, class loading occurs frequently only during startup and only very infrequently afterwards, and, as explained above, the fiber scheduler can easily schedule around such blocking. Many uses of synchronized only protect memory access and block for extremely short durations — so short that the issue can be ignored altogether. We may even decide to leave synchronized unchanged, and encourage those who surround IO access with synchronized and block frequently in this way, to change their code to make use of the j.u.c constructs (which will be fiber-friendly) if they want to run the code in fibers. Similarly, for the use of Object.wait, which isn't common in modern code, anyway (or so we believe at this point), which uses j.u.c.

In any event, a fiber that blocks its underlying kernel thread will trigger some system event that can be monitored with JFR/MBeans.

While fibers encourage the use of ordinary, simple and natural synchronous blocking code, it is very easy to adapt existing asynchronous APIs, turning them into fiber-blocking ones. Suppose that a library exposes this asynchronous API for some long-running operation, foo, which returns a String:

 



















interface AsyncFoo {

   public void asyncFoo(FooCompletion callback);
}

where the callback, or completion handler FooCompletion is defined like so:

 



















interface FooCompletion {

  void success(String result);
  void failure(FooException exception);
}

We will provide an async-to-fiber-blocking construct that may look something like this:

 



















abstract class _AsyncToBlocking<T, E extends Throwable> {

    private _Fiber f;
    private T result;
    private E exception;
  
    protected void _complete(T result) {
        this.result = result;
        unpark f
    }
  
    protected void _fail(E exception) { 
        this.exception = exception;
        unpark f
    }
  
    public T run() throws E { 
        this.f = current fiber
        register();
        park
        if (exception != null)
           throw exception;
        return result;
    }
  
    public T run(_timeout) throws E, TimeoutException { ... }
  
    abstract void register();
}

We can then create a blocking version of the API by first defining the following class:

 



















abstract class AsyncFooToBlocking extends _AsyncToBlocking<String, FooException> 

     implements FooCompletion {
  @Override
  public void success(String result) {
    _complete(result);
  }
  @Override
  public void failure(FooException exception) {
    _fail(exception);
  }
}

which we then use to wrap the asynchronous API with as synchronous version:

 



















class SyncFoo {

    AsyncFoo foo = get instance;
  
    String syncFoo() throws FooException {
        new AsyncFooToBlocking() {
          @Override protected void register() { foo.asyncFoo(this); }
        }.run();
    }
}

We can include such ready integrations for common asynchronous classes, such as CompletableFuture.

Continuations

The motivation for adding continuations to the Java platform is for the implementation of fibers, but continuations have some other interesting uses, and so it is a secondary goal of this project to provide continuations as a public API. The utility of those other uses is, however, expected to be much lower than that of fibers. In fact, continuations don't add expressivity on top of that of fibers (i.e., continuations can be implemented on top of fibers).

In this document and everywhere in Project Loom, the word continuation will mean a delimited continuation (also sometimes called a coroutine¹). Here we will think of delimited continuations as sequential code that may suspend (itself) and resume (be resumed by a caller). Some may be more familiar with the point of view that sees continuations as objects (usually subroutines) representing "the rest" or "the future" of a computation. The two describe the very same thing: a suspended continuation, is an object that, when resumed or "invoked", carries out the rest of some computation.

A delimited continuation is a sequential sub-program with an entry point (like a thread), which we'll call simply the entry point (in Scheme, this is the reset point), which may suspend or yield execution at some point, which we'll call the suspension point or the yield point (the shift point in Scheme). When a delimited continuation suspends, control is passed outside of the continuation, and when it is resumed, control returns to the last yield point, with the execution context up to the entry point intact. There are many ways to present delimited continuations, but to Java programmers, the following rough pseudocode would explain it best:

 



















foo() { // (2)

  ... 
  bar()
  ...
}

bar() {
  ...
  suspend // (3)
  ... // (5)
}

main() {
  c = continuation(foo) // (0)
  c.continue() // (1)
  c.continue() // (4)
}

A continuation is created (0), whose entry point is foo; it is then invoked (1) which passes control to the entry point of the continuation (2), which then executes until the next suspension point (3) inside the bar subroutine, at which point the invocation (1) returns. When the continuation is invoked again (4), control returns to the line following the yield point (5).

The continuations discussed here are "stackful", as the continuation may block at any nested depth of the call stack (in our example, inside the function bar which is called by foo, which is the entry point). In contrast, stackless continuations may only suspend in the same subroutine as the entry point. Also, the continuations discussed here are non-reentrant, meaning that any invocation of the continuation may change the "current" suspension point. In other words, the continuation object is stateful.

The main technical mission in implementing continuations — and indeed, of this entire project — is adding to HotSpot the ability to capture, store and resume callstacks not as part of kernel threads. JNI stack frames will likely not be supported.

As continuations serve as the basis for fibers, if continuations are exposed as a public API, we will need to support nested continuations, meaning code running inside a continuation must be able to suspend not only the continuation itself, but an enclosing one (e.g., suspend the enclosing fiber). For example, a common use for continuations is in the implementation of generators. A generator exposes an iterator, and the code running inside the generator produces another value for the iterator every time it yields. It should therefore be possible to write code like this:

 



















new _Fiber(() -> {

  for (Object x : new _Generator(() -> {
      produce 1
      fiber sleep 100ms
      produce 2
      fiber sleep 100ms
      produce 3
  })) {
      System.out.println("Next: " + x);
  }
})

In the literature, nested continuations that allow such behavior are sometimes call "delimited continuations with multiple named prompts", but we'll call them scoped continuations. See this blog post for a discussion of the theoretical expressivity of scoped continuations (to those interested, continuations are a "general effect", and can be used to implement any effect — e.g. assignment — even in a pure language that has no other side-effect; this is why, in some sense, continuations are the fundamental abstraction of imperative programming).

Code running inside a continuation is not expected to have a reference to the continuation, and the scopes normally have some fixed names (so suspending scope A would suspend the innermost enclosing continuation of scope A). However, the yield point provides a mechanism to pass information from the code to the continuation instance and back. When a continuation suspends, no try/finally blocks enclosing the yield point are triggered (i.e., code running in a continuation cannot detect that it is in the process of suspending).

As one of the reasons for implementing continuations as an independent construct of fibers (whether or not they are exposed as a public API) is a clear separation of concerns. Continuations, therefore, are not thread-safe and none of their operations creates cross-thread happens-before relations. Establishing the memory visibility guarantees necessary for migrating continuations from one kernel thread to another is the responsibility of the fiber implementation.

A rough outline of a possible API is presented below. Continuations are a very low-level primitive that will only be used by library authors to build higher-level constructs (just as java.util.Stream implementations leverage Spliterator). It is expected that classes making use of contiuations will have a private instance of the continuation class, or even, more likely, of a subclass of it, and that the continuation instance will not be directly exposed to consumers of the construct.

 



















class _Continuation {

    public _Continuation(_Scope scope, Runnable target) 
    public boolean run()
    public static _Continuation suspend(_Scope scope, Consumer<_Continuation> ccc)
    
    public ? getStackTrace()
}

The run method returns true when the continuation terminates, and false if it suspends. The suspend method allows passing information from the yield point to the continuation (using the ccc callback that can inject information into the continuation instance it is given), and back from the continuation to the suspension point (using the return value, which is the continuation instance itself, from which information can be queried).

To demonstrate how easily fibers can be implemented in terms of continuations, here is a partial, simplistic implementation of a _Fiber class representing a fiber. As you'll note, most of the code maintains the fiber's state, to ensure it doesn't get scheduled more than once concurrently:

 



















class _Fiber {

    private final _Continuation cont;
    private final Executor scheduler;
    private volatile State state;
    private final Runnable task;

    private enum State { NEW, LEASED, RUNNABLE, PAUSED, DONE; }
  
    public _Fiber(Runnable target, Executor scheduler) {
        this.scheduler = scheduler;
        this.cont = new _Continuation(_FIBER_SCOPE, target);
      
        this.state = State.NEW;
        this.task = () -> {
              while (!cont.run()) {
                  if (park0())
                     return; // parking; otherwise, had lease -- continue
              }
              state = State.DONE;
        };
    }
  
    public void start() {
        if (!casState(State.NEW, State.RUNNABLE))
            throw new IllegalStateException();
        scheduler.execute(task);
    }
  
    public static void park() {
        _Continuation.suspend(_FIBER_SCOPE, null);
    }
  
    private boolean park0() {
        State st, nst;
        do {
            st = state;
            switch (st) {
              case LEASED:   nst = State.RUNNABLE; break;
              case RUNNABLE: nst = State.PAUSED;   break;
              default:       throw new IllegalStateException();
            }
        } while (!casState(st, nst));
        return nst == State.PAUSED;
    }
  
    public void unpark() {
        State st, nst;
        do {
            State st = state;
            switch (st) {
              case LEASED: 
              case RUNNABLE: nst = State.LEASED;   break;
              case PAUSED:   nst = State.RUNNABLE; break;
              default:       throw new IllegalStateException();
            }
        } while (!casState(st, nst));
        if (nst == State.RUNNABLE)
            scheduler.execute(task);
    }
  
    private boolean casState(State oldState, State newState) { ... }  
}

Schedulers

As mentioned above, work-stealing schedulers like ForkJoinPools are particularly well-suited to scheduling threads that tend to block often and communicate over IO or with other threads. Fibers, however, will have pluggable schedulers, and users will be able to write their own ones (the SPI for a scheduler can be as simple as that of Executor). Based on prior experience, it is expected that ForkJoinPool in asynchronous mode can serve as an excellent default fiber scheduler for most uses, but we may want to explore one or two simpler designs, as well, such as a pinned-scheduler, that always schedules a given fiber to a specific kernel thread (which is assumed to be pinned to a processor).

Unwind-and-Invoke

Unlike continuations, the contents of the unwound stack frames is not preserved, and there is no need in any object reifying this construct.

TBD

Additional Challenges

While the main motivation for this goal is to make concurrency easier/more scalable, a thread implemented by the Java runtime and over which the runtime has more control, has other benefits. For example, such a thread could be paused and serialized on one machine and then deserialized and resumed on another. This is useful in distributed systems where code could benefit from being relocated closer to the data it accesses, or in a cloud platform offering function-as-a-service, where the machine instance running user code could be terminated while that code awaits some external event, and later resumed on another instance, possibly on a different physical machine, thus making better use of available resources and reducing costs for both host and client. A fiber would then have methods like parkAndSerialize, and deserializeAndUnpark.

As we want fibers to be serializable, continuations should be serializable as well. If they are serializable, we might as well make them cloneable, as the ability to clone continuations actually adds expressivity (as it allows going back to a previous suspension point). It is, however, a very serious challenge to make continuation cloning useful enough for such uses, as Java code stores a lot of information off-stack, and to be useful, cloning would need to be "deep" in some customizable way.

Other Approaches

An alternative solution to that of fibers to concurrency's simplicity vs. performance issue is known as async/await, and has been adopted by C# and Node.js, and will likely be adopted by standard JavaScript. Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa.

While implementing async/await is easier than full-blown continuations and fibers, that solution falls far too short of addressing the problem. While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code. In other words, it does not solve what's known as the "colored function" problem.

Reference

Source -> https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html

Search This Blog

Helmut Steiner