快速处理 GB 级别的大文件

内存映射文件指的是将一段虚拟内存逐字节映射于一个文件,有了内存映射文件,就可以认为文件已经全部读进了内存,然后当成一个非常大的数组来访问,速度提升飞快

MappedByteBuffer

在 Java 中,内存映射主要用到了 MappedByteBuffer,一般用 FileChannel 获取

public MappedByteBuffer map(MapMode mode, long position, long size) throws IOException

参数说明

mode 为文件映射模式,有三种:READ_ONLYREAD_WRITEPRIVATE

position 为文件映射时的起始位置

size 为要映射的区域的大小,必须 Integer.MAX_VALUE,如果文件超大,可以通过多个内存文件映射来解决

比较读取速度

由于电脑上没有大文件,我把 IntelliJ IDEA 压缩成了 zip,压缩后大小为 811.7MB

FileInputStream

private static final String PATH = "/Users/victor/Downloads/IntelliJ IDEA.zip";

public static void main(String[] args) throws Exception {
    var t = System.currentTimeMillis();
    fileInputStream();
    System.out.println(System.currentTimeMillis() - t);
}

private static void fileInputStream() throws Exception {
    try (var is = new FileInputStream(PATH)) {
        int c;
        while ((c = is.read()) != -1) {}
    }
}

结果:没耐心等到输出结果,太慢了。。

BufferedInputStream

private static final String PATH = "/Users/victor/Downloads/IntelliJ IDEA.zip";

public static void main(String[] args) throws Exception {
    var t = System.currentTimeMillis();
    bufferedInputStream();
    System.out.println(System.currentTimeMillis() - t);
}

private static void bufferedInputStream() throws Exception {
    try (var is = new BufferedInputStream(new FileInputStream(PATH))) {
        int c;
        while ((c = is.read()) != -1) { }
    }
}

结果:3987

MappedByteBuffer

private static final String PATH = "/Users/victor/Downloads/IntelliJ IDEA.zip";

public static void main(String[] args) throws Exception {
    var t = System.currentTimeMillis();
    mappedByteBuffer();
    System.out.println(System.currentTimeMillis() - t);
}

private static void mappedByteBuffer() throws Exception {
    try (var fileChannel = FileChannel.open(Paths.get(PATH))) {
        var size = fileChannel.size();
        var mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, size);
        for (var i = 0; i < size; i++) {
            mappedByteBuffer.get(i);
        }
    }
}

结果:1610

通过比较可以发现,使用MappedByteBuffer会大幅提高读取速度

注意