Java7 -- fork & join framework

What is Fork/Join?

Fork/Join framework is a parallel execution task framework provided by Java 7. The idea is to decompose large tasks into small tasks, and then small tasks can continue to decompose, and then calculate the results of each small task separately and combine them. Finally, the summarized results are taken as the results of large tasks. Its idea is very similar to that of MapReduce. For task segmentation, each subtask is required to be independent of each other, and can execute tasks in parallel and independently without affecting each other.

  • ForkJoinPool

ForkJoinPool is a task scheduler in the ForkJoin framework. Like ThreadPoolExecutor, it implements its own thread pool and provides three methods to schedule subtasks:

  1. Execute: execute the specified task asynchronously without returning results;
  2. invoke, invokeAll: execute the specified task asynchronously and return the result only after it is completed;
  3. submit: execute the specified task asynchronously and return a Future object immediately;
  4. ForkJoinTask

The actual execution task classes in the Fork/Join framework have the following two implementations, which are generally inherited.

  1. RecursiveAction: used for subtasks that return no results;
  2. Recursive task: used for subtasks with results returned;

Fork/Join framework practice

The following is a small example of Fork/Join. From 1 + 2 +... 1 billion, each task can only process the addition of 1000 numbers, and more than 1000 are automatically decomposed into small tasks for parallel processing; It also shows the time loss comparison between not using Fork/Join and using Fork/Join.

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class ForkJoinTask extends RecursiveTask<Long> {

    private static final long MAX = 1000000000L;
    private static final long THRESHOLD = 1000L;
    private long start;
    private long end;

    public ForkJoinTask(long start, long end) {
        this.start = start;
        this.end = end;
    }

    public static void main(String[] args) {
        test();
        System.out.println("--------------------");
        testForkJoin();
    }

    private static void test() {
        System.out.println("test");
        long start = System.currentTimeMillis();
        Long sum = 0L;
        for (long i = 0L; i <= MAX; i++) {
            sum += i;
        }
        System.out.println(sum);
        System.out.println(System.currentTimeMillis() - start + "ms");
    }

    private static void testForkJoin() {
        System.out.println("testForkJoin");
        long start = System.currentTimeMillis();
        ForkJoinPool forkJoinPool = new ForkJoinPool();
        Long sum = forkJoinPool.invoke(new ForkJoinTask(1, MAX));
        System.out.println(sum);
        System.out.println(System.currentTimeMillis() - start + "ms");
    }

    @Override
    protected Long compute() {
        long sum = 0;
        if (end - start <= THRESHOLD) {
            for (long i = start; i <= end; i++) {
                sum += i;
            }
            return sum;
        } else {
            long mid = (start + end) / 2;

            ForkJoinTask task1 = new ForkJoinTask(start, mid);
            task1.fork();

            ForkJoinTask task2 = new ForkJoinTask(mid + 1, end);
            task2.fork();

            return task1.join() + task2.join();
        }
    }

}

The calculation results are required here, so the task inherits the RecursiveTask class. ForkJoinTask needs to implement the compute method. In this method, you first need to judge whether the task is less than or equal to the threshold 1000. If so, execute the task directly. Otherwise, it will be divided into two subtasks. When each subtask calls the fork method, it will enter the compute method to see whether the current subtask needs to be further divided into sub tasks. If it does not need to be further divided, it will execute the current subtask and return the results. Using the join method blocks and waits for the subtask to complete and get its results.

Program output:

test
500000000500000000
4992ms
--------------------
testForkJoin
500000000500000000
508ms

The results show that the time loss of parallel tasks is significantly less than that of serial tasks, which is the advantage of parallel tasks.

Nevertheless, when using Fork/Join, be careful not to use it blindly.

  1. If the task is disassembled deeply, the number of threads in the system will accumulate, resulting in serious degradation of system performance;
  2. If the function call stack is very deep, it will lead to stack memory overflow;

 

Posted on Fri, 12 Nov 2021 05:54:52 -0500 by jasonla