Disk Input / Output

London and Manchester, UK
0870 140 2525


Filesystem I/O - Physical layout of the hard disk drive

This page examines the data structures found on a hard disk running in a UNIX type environment and also the procedures involved in hard disk read / write access.
Examination of the data structures on a hard disk will show the following areas:

  • Bootblock
  • Superblock - (Note: use the fstyp -v /dev/dsk/* command to report contents)
  • Inode List - (Note: use the newfs -i or mkfs command to change the number of inodes)
  • Data blocks


Inodes

Each inode contains the following information:

  • Type of file and it's permissions, etc
  • Quantity of physical links to the file
  • GID
  • UID
  • Byte size
  • Array of block addresses:
    • The first few block addresses are used for data storage. Additional block addresses store indirect blocks, which point at arrays containing pointers to further data blocks. Each inode contains 12 direct block pointers and 3 indirect block pointers.
    • Generation number (incremented each time the inode is re-used)
    • Access time stamp
    • Modification time stamp
    • Change time stamp
    • Number of sectors
    • Shadow inode location: (used for ACLs [Access Control Lists]).


Data transfer to and from disk contains the following components:

  • I/O bus access: If the bus is busy, the request is queued by the driver. The information is reported by sar -d wait and %w and iostat -x avwait.
  • Bus transfer time: Arbitration time (which device gets to use the bus), time to transfer the command (usually ~ 1.5 ms), data transfer time (in the case of a write).
  • Seek time: Time for the head to move to the proper cylinder. Average seek times are reported by hard drive manufacturers.
  • Rotation time: Time for the correct sector to rotate under the head. This is usually calculated as 1/2 the time for a disk rotation. Rotation speeds (in RPM) are reported by hard drive manufacturers.
  • ITR time: Internal Throughput Rate. This is the amount of time required for a transfer between the hard drive's cache and the device media. The ITR time is the limiting factor for sequential I/O, and is reported by the hard drive manufacturer.
  • Reconnection time: After the data has been moved to/from the hard drive's internal cache, a connection with the host adapter must be completed. This is similar to the arbitration/ command transfer time discussed above.
  • Interrupt time: Time for the completion interrupt to be processed. This is very hard to measure, but high interrupt rates on the CPUs associated with this system board may be an indication of problems.

The disk's ITR rating and internal cache size can be critical when tuning maxcontig (maximum contiguous I/O size). Note: maxphys and maxcontig must be tuned at the same time. The unit of measurement for maxphys is bytes; maxcontig is in blocks.

maxcontig can be changed via the mkfs, newfs or tunefs commands.


The use of direct I/O

Large sequential I/O can cause performance problems due to excessive use of the memory page cache. One way to avoid this problem is to use direct I/O on filesystems where large sequential I/Os are common.

Direct I/O is a mechanism for bypassing the memory page cache alltogether. It is enforced by the directio() function or by the forcedirectio option to mount.

VxFS enables direct I/O for large sequential operations. It determines which operations are "large" by comparing them to the vxtunefs parameter discovered_direct_iosz (default 256KB).

One problem that can emerge is that if large sequential I/Os are handed to VxFS as several smaller operations, caching will still occur. This problem can be alleviated by reducing discovered_direct_iosz to a level that prevents caching of the smaller operations. In particular, this can be a problem in OLTP environments. A case study of this problem is discussed on the Sun web site.

> Back to the MAIN data recovery page