NIO读写文件&缓冲流读写hdfs文件代码

NIO读写文件

public void nioTest() {
 
        String path = "D:\\\\BaiduYunDownload\\\\access_2013_05_30.log";
        File file = new File(path);
        FileInputStream fis = null;
        try {
            fis = new FileInputStream(file);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        FileChannel fileChannel = fis.getChannel();
        int capacity = 1024;
        ByteBuffer buffer = ByteBuffer.allocate(capacity);
        String outputPath = "D:\\\\BaiduYunDownload\\\\access.log";
        File outputFile = new File(outputPath);
        File parentFile = outputFile.getParentFile();
        if(!parentFile.exists()) file.mkdirs();
        try {
            outputFile.createNewFile();
        } catch (IOException e) {
            e.printStackTrace();
        }
        FileOutputStream fos = null;
        try {
            fos = new FileOutputStream(outputFile);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        FileChannel outputFileChannel = fos.getChannel();
        try {
            while(fileChannel.read(buffer) != -1) {
                buffer.flip();
                outputFileChannel.write(buffer);
                buffer.clear();
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                fis.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                fileChannel.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                outputFileChannel.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

大体思路：

分别创建读文件流和写文件流的通道，设定缓冲区大小，读取字符到缓冲区。此时切换“”读“”模式到“写”模式，写入缓冲区内的字符到写文件通道，然后清空缓存区，循环上述过程。

特殊情况：

当文本中出现中文时，可能导致因设置的capacity而无法一次性读取完一个汉字（只读了一半），进而出现中文乱码现象。按理来说是这样的，但是在jdk1.8环境下测试多个测试案例未发现中文乱码现象。假设出现中文乱码，可尝试以下代码：

Charset charset = Charset.forName("GBK");
charset.decode(buffer);

如果还不行，那么需要通过拼接汉字来解决中文乱码问题。

为什么NIO读取文件速度快？

因为它是非阻塞的，读取数据时，如果内核IO操作完成，则读取数据并返回；如果未完成，则返回失败。

而阻塞方式的读取数据，如果数据未准备好，则重新读取数据，因此比较耗时。

拓展：

NIO不仅是非阻塞的，还是同步的。因为客户端发送的连接请求都会注册到多路复用器上，多路复用器轮询IO，当有IO请求时，启动一个线程处理。而对于异步方式，当有IO请求，需要等待通知才启动线程处理。

clear() compact()的区别

clear 清除缓冲区的内容，compact清除缓冲区中已经读取过的内容。

为什么使用传统方式：缓冲流读取hdfs文件？

读取hdfs文件需要用FSDataOutputStream和FSDataInputStream，而这两种流均不支持创建通道，因此无法使用NIO完成输入输出。

缓冲流读写hdfs文件代码

public void readAndWriteHdfs() throws IOException { 
	FileSystem fileSystem = getFileSystem();
    Path inputPath = new Path(url+"/access.log");
    FSDataInputStream fsDataInputStream = null;
    try {
        fsDataInputStream = fileSystem.open(inputPath);
    } catch (IOException e) {
        e.printStackTrace();
    }
    Reader reader = new InputStreamReader(fsDataInputStream,"UTF-8");
    BufferedReader bufferedReader = new BufferedReader(reader);
    FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(url+"/test.log"));
    Writer writer = new OutputStreamWriter(fsDataOutputStream,"UTF-8");
    BufferedWriter bufferedWriter = new BufferedWriter(writer);
    String content = null;
    while((content = bufferedReader.readLine()) != null) {
        byte[] bytes = content.getBytes("UTF-8");
        bufferedWriter.write(new String(bytes)+"\n");
    }
    bufferedWriter.close();
    writer.close();
    fsDataOutputStream.close();
    bufferedReader.close();
    reader.close();
    fsDataInputStream.close();
    fileSystem.close();
}

按行读需要在读取的每行末尾添加换行符号，unix操作系统添加”\n”,windows操作系统添加”\r\n”。

如果不按行读，而逐个读取字节，可能会出现中文乱码现象，暂时没有找到解决方案。