技术探索总结 > linux > linux_io

linux_io

IO操作两个阶段

再说一下IO发生时涉及的对象和步骤。以read函数举例，对于一个networkIO会涉及到两个系统对象，一个是调用这个IO的process (or thread)，另一个就是系统内核(kernel)。当一个read操作发生时，它会经历两个阶段：

（1）等待数据准备(Waitingfor the data to be ready)

（2）将数据从内核拷贝到进程中(Copyingthe data from the kernel to the process)

记住这两点很重要，因为这些IO Model的区别就是在两个阶段上各有不同的情况。是否阻塞说的是第一个阶段，即等待数据准备阶段是否会阻塞，而是否同步说的是第二阶段，即将数据从内核拷贝到进程这个真实的IO Operation操作阶段是否阻塞。

FileObserver基于inotify

Looper基于epoll

https://stackoverflow.com/questions/17207809/difference-between-inotify-and-epoll

inotify

inotify_init(void) creates inotify instance to read events from
inotify_add_watch(int fd, const char * path, int mask) returns a watch fd around the file node behind the path
inotify_rm_watch(int fd, int wd) stops watching for events on fd

epoll

epoll_create(void) creates epoll object
epoll_ctl(int epfd, int op, int fd, struct epoll_event * event) sets up events to watch
epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); blocks until event happens

The biggest difference is that epoll can be used for ANY fd. This means it’s good for watching all types of ways to communicate data. Sockets, IPC, files, printers.. anything. inotify is for filesystems only.

However, because inotify is specific to filesystems, you can receive notifications on a wide array of filesystem-specific attributes, such as file attributes and the file being read. These things are not possible via epoll.

In fact, inotify returns a file descriptor - which means you can use epoll to determine which inotify FD’s you should call read on. So the two go hand in hand to some extent.

http://en.wikipedia.org/wiki/Inotify

https://jvns.ca/blog/2017/06/03/async-io-on-linux--select--poll--and-epoll/

select、poll、epoll之间的区别(搜狗面试)

select，poll，epoll

select，poll，epoll都是IO多路复用的机制。I/O多路复用就通过一种机制，可以监视多个描述符，一旦某个描述符就绪（一般是读就绪或者写就绪），能够通知程序进行相应的读写操作。但select，poll，epoll本质上都是同步I/O，因为他们都需要在读写事件就绪后自己负责进行读写，也就是说这个读写过程是阻塞的，而异步I/O则无需自己负责进行读写，异步I/O的实现会负责把数据从内核拷贝到用户空间。

epoll跟select都能提供多路I/O复用的解决方案。在现在的Linux内核里有都能够支持，其中epoll是Linux所特有，而select则应该是POSIX所规定，一般操作系统均有实现

graph LR
水平触发-->报告了fd后,没有被处理,那么下次poll时会再次报告该fd
边沿触发-->只会通知你一次,直到该文件描述符上出现第二次可读写事件才会通知你

Select

select本质上是通过设置或者检查存放fd标志位的数据结构来进行下一步处理。这样所带来的缺点是：

select的几大缺点：

（1）每次调用select，都需要把fd集合从用户态拷贝到内核态，这个开销在fd很多时会很大，当套接字比较多的时候，每次select()都要通过遍历FD_SETSIZE个Socket来完成调度,不管哪个Socket是活跃的,都遍历一遍。这会浪费很多CPU时间。如果能给套接字注册某个回调函数，当他们活跃时，自动完成相关操作，那就避免了轮询，这正是epoll做的。

（2）同时每次调用select都需要在内核遍历传递进来的所有fd，这个开销在fd很多时也很大，需要维护一个用来存放大量fd的数据结构，这样会使得用户空间和内核空间在传递该结构时复制开销大

**（3）**单个进程可监视的fd数量被限制，即能监听端口的大小有限。一般来说这个数目和系统内存关系很大，具体数目可以cat /proc/sys/fs/file-max察看。32位机默认是1024个。64位机默认是2048.

Poll

poll本质上和select没有区别，它将用户传入的数组拷贝到内核空间，然后查询每个fd对应的设备状态，如果设备就绪则在设备等待队列中加入一项并继续遍历，如果遍历完所有fd后没有发现就绪设备，则挂起当前进程，直到设备就绪或者主动超时，被唤醒后它又要再次遍历fd。这个过程经历了多次无谓的遍历。

它没有最大连接数的限制，原因是它是基于链表来存储的，但是同样有一个缺点：

1、大量的fd的数组被整体复制于用户态和内核地址空间之间，而不管这样的复制是不是有意义，它的开销随着文件描述符数量的增加而线性增大。

2、poll还有一个特点是“水平触发”，如果报告了fd后，没有被处理，那么下次poll时会再次报告该fd。

select & poll区别1

poll支持的事件类型更多更详细:

They both call a lot of the same functions. One thing that the book mentioned in particular is that poll returns a larger set of possible results for file descriptors like POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR while select just tells you “there’s input / there’s output / there’s an error”.

select translates from poll’s more detailed results (like POLLWRBAND) into a general “you can write”. You can see the code where it does this in Linux 4.10 here.

select & poll区别2

Select 单个进程可监视的fd数量被限制,32位机默认是1024个,64位机默认是2048.

select & poll区别3

描述fd集合的方式不同，poll使用pollfd结构而不是select的fd_set结构

android新版本源码中runSelectLoop方法内部实现从select改为poll

Epoll

Here are the steps to using epoll:

Call epoll_create to tell the kernel you’re gong to be epolling! It gives you an id back
Call epoll_ctl to tell the kernel file descriptors you’re interested in updates about. Interestingly, you can give it lots of different kinds of file descriptors (pipes, FIFOs, sockets, POSIX message queues, inotify instances, devices, & more), but not regular files. I think this makes sense – pipes & sockets have a pretty simple API (one process writes to the pipe, and another process reads!), so it makes sense to say “this pipe has new data for reading”. But files are weird! You can write to the middle of a file! So it doesn’t really make sense to say “there’s new data available for reading in this file”.
Call epoll_wait to wait for updates about the list of files you’re interested in.

epoll既然是对select和poll的改进，就应该能避免上述的三个缺点。那epoll都是怎么解决的呢？在此之前，我们先看一下epoll和select和poll的调用接口上的不同，select和poll都只提供了一个函数——select或者poll函数。而epoll提供了三个函数，epoll_create,epoll_ctl和epoll_wait，epoll_create是创建一个epoll句柄；epoll_ctl是注册要监听的事件类型；epoll_wait则是等待事件的产生。

　　对于第一个缺点，epoll的解决方案在epoll_ctl函数中。每次注册新的事件到epoll句柄中时（在epoll_ctl中指定EPOLL_CTL_ADD），会把所有的fd拷贝进内核，而不是在epoll_wait的时候重复拷贝。epoll保证了每个fd在整个过程中只会拷贝一次。

　　对于第二个缺点，epoll的解决方案不像select或poll一样每次都把current轮流加入fd对应的设备等待队列中，而只在epoll_ctl时把current挂一遍（这一遍必不可少）并为每个fd指定一个回调函数，当设备就绪，唤醒等待队列上的等待者时，就会调用这个回调函数，而这个回调函数会把就绪的fd加入一个就绪链表）。epoll_wait的工作实际上就是在这个就绪链表中查看有没有就绪的fd（利用schedule_timeout()实现睡一会，判断一会的效果，和select实现中的第7步是类似的）。

　　对于第三个缺点，epoll没有这个限制，它所支持的FD上限是最大可以打开文件的数目，这个数字一般远大于2048,举个例子,在1GB内存的机器上大约是10万左右，具体数目可以cat /proc/sys/fs/file-max察看,一般来说这个数目和系统内存关系很大。