问题描述

如果父子任务共用一个线程池,可能会因为子任务进队列,导致父任务一直等待,出现线程卡死

样例代码

private static final Executor POOL = new ThreadPoolExecutor(3, 100, 1, TimeUnit.MINUTES, new ArrayBlockingQueue<>(100));

 public static void main(String[] args) {
    parentTask();
 }

private static void parentTask() {
    CountDownLatch parentLatch = new CountDownLatch(3);
    for (int i = 0; i < 3; i++) {
        int finalI = i;
        POOL.execute(() -> {
            System.out.println("开始执行父任务" + finalI);
            childTask(finalI);
            System.out.println("结束执行父任务" + finalI);
            parentLatch.countDown();
        });
    }
    parentLatch.await();
    System.err.println("全部任务执行完成");
}

private static void childTask(int i) {
    CountDownLatch childLatch = new CountDownLatch(3);
    for (int j = 0; j < 3; j++) {
        int finalJ = j;
        POOL.execute(() -> {
            System.out.println("开始执行父任务" + i + "的子任务" + finalJ);
            System.out.println("执行中");
            System.out.println("结束执行父任务" + i + "的子任务" + finalJ);
            childLatch.countDown();
        });
    }
    childLatch.await();
}

执行,输出了三条 log 后卡死:

开始执行父任务0
开始执行父任务1
开始执行父任务2

父任务线程状态为 WAITING,阻塞在 parentLatch.await()

"main" #1 prio=5 os_prio=0 tid=0x000001198ac32000 nid=0x3010 waiting on condition [0x0000000419aff000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007425cf698> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at me.xxx.Test.parentTask(Test.java:80)
	at me.xxx.Test.main(Test.java:22)

子任务线程状态为 WAITING,阻塞在 childLatch.await()

"pool-1-thread-1" #15 prio=5 os_prio=0 tid=0x000001494886d800 nid=0xb44 waiting on condition [0x0000002fcf2fe000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <merged>(a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at me.xxx.Test.childTask(Test.java:96)
	at me.xxx.Test.lambda$parentTask$0(Test.java:75)
	at me.xxx.Test$$Lambda$1/589873731.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

问题原因

Java 线程池的使用顺序:

  1. 核心线程
  2. 队列
  3. 最大线程
  4. 拒绝策略

父任务占满了核心线程,子任务进队列,但未超过队列长度,于是一直等待空闲的核心线程,所以不会执行到子任务的 ` childLatch.countDown()`,出现卡死

规避方案

业务中关键逻辑,使用独自的线程池,避免父子任务线程池共用,尽量做到线程池隔离