The Open Group Base Specifications Issue 6
IEEE Std 1003.1-2001
Copyright © 2001 The IEEE and The Open Group, All Rights reserved.

NAME

file - determine file type

SYNOPSIS

[UP] [Option Start] file [-dhi][-M file][-m file] file ... [Option End]

DESCRIPTION

The file utility shall perform a series of tests on each specified file in an attempt to classify it:

  1. If the file is not a regular file, its file type shall be identified. The file types directory, FIFO, socket, block special, and character special shall be identified as such. Other implementation-defined file types may also be identified.

  2. If the file is a regular file, and:

    1. The file is zero-length, it shall be identified as an empty file.

    2. The file is not zero-length, file shall examine an initial segment of the file and shall make a guess at identifying its contents or whether it is an executable binary file. (The answer is not guaranteed to be correct.)

If file does not exist, cannot be read, or its file status could not be determined, the output shall indicate that the file was processed, but that its type could not be determined.

If file is a symbolic link, by default the link shall be resolved and file shall test the type of file referenced by the symbolic link.

OPTIONS

The file utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.

The following options shall be supported by the implementation:

-d
Apply any default system tests to the file.
-h
When a symbolic link is encountered, identify the file as a symbolic link. If -h is not specified and file is a symbolic link that refers to a nonexistent file, file shall identify the file as a symbolic link, as if -h had been specified.
-i
If a file is a regular file, do not attempt to classify the type of the file further, but identify the file as specified in the STDOUT section, using a <type> string that contains the string "regular file" .
-M  file
Specify the name of a file containing tests that shall be applied to a file in order to classify it (see the EXTENDED DESCRIPTION). No default system tests shall be applied.
-m  file
Specify the name of a file containing tests that shall be applied to a file in order to classify it (see the EXTENDED DESCRIPTION).

If multiple instances of the -m, -d, or -M options are specified, the concatenation of the tests specified, in the order specified, shall be the set of tests that are applied. If a -M option is specified, no tests other than those specified using the -d, -M, and -m options shall be applied to the file. If neither the -d nor -M options are specified, any default system tests shall be applied after any tests specified using the -m option.

OPERANDS

The following operand shall be supported:

file
A pathname of a file to be tested.

STDIN

Not used.

INPUT FILES

The file can be any file type.

ENVIRONMENT VARIABLES

The following environment variables shall affect the execution of file:

LANG
Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of IEEE Std 1003.1-2001, Section 8.2, Internationalization Variables for the precedence of internationalization variables used to determine the values of locale categories.)
LC_ALL
If set to a non-empty string value, override the values of all the other internationalization variables.
LC_CTYPE
Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files).
LC_MESSAGES
Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error and informative messages written to standard output.
NLSPATH
[XSI] [Option Start] Determine the location of message catalogs for the processing of LC_MESSAGES . [Option End]

ASYNCHRONOUS EVENTS

Default.

STDOUT

In the POSIX locale, the following format shall be used to identify each operand, file specified:

"%s: %s\n", <file>, <type>

The values for <type> are unspecified, except that in the POSIX locale, if file is identified as one of the types listed in the following table, <type> shall contain (but is not limited to) the corresponding string. Each space shown in the strings shall be exactly one <space>.

Table: File Utility Output Strings

If file is a:

<type> shall contain the string:

Directory

directory

FIFO

fifo

Socket

socket

Block special

block special

Character special

character special

Executable binary

executable

Empty regular file

empty

Symbolic link

symbolic link to

ar archive library (see ar)

archive

Extended cpio format (see pax)

cpio archive

Extended tar format (see ustar in pax)

tar archive

Shell script

commands text

C-language source

c program text

FORTRAN source

fortran program text

If file is identified as a symbolic link (see -h), the following alternative output format shall be used:

"%s: %s %s\n", <file>, <type>, <contents of link>"

If the file named by the file operand does not exist or cannot be read, the string "cannot open" shall be included as part of the <type> field, but this shall not be considered an error that affects the exit status. If the type of the file named by the file operand cannot be determined, the string "data" shall be included as part of the <type> field, but this shall not be considered an error that affects the exit status.

STDERR

The standard error shall be used only for diagnostic messages.

OUTPUT FILES

None.

EXTENDED DESCRIPTION

A file specified as an option-argument to the -m or -M options shall contain one test per line, which shall be applied to the file. If the test succeeds, the message field of the line shall be printed and no further tests shall be applied, with the exception that tests on immediately following lines beginning with a single '>' character shall be applied.

Each line shall be composed of the following four <blank>-separated fields:

offset
An unsigned number (optionally preceded by a single '>' character) specifying the offset, in bytes, of the value in the file that is to be compared against the value field of the line. If the file is shorter than the specified offset, the test shall fail.

If the offset begins with the character '>' , the test contained in the line shall not be applied to the file unless the test on the last line for which the offset did not begin with a '>' was successful. By default, the offset shall be interpreted as an unsigned decimal number. With a leading 0x or 0X, the offset shall be interpreted as a hexadecimal number; otherwise, with a leading 0, the offset shall be interpreted as an octal number.

type
The type of the value in the file to be tested. The type shall consist of the type specification characters c , d , f , s , and u , specifying character, signed decimal, floating point, string, and unsigned decimal, respectively.

The type string shall be interpreted as the bytes from the file starting at the specified offset and including the same number of bytes specified by the value field. If insufficient bytes remain in the file past the offset to match the value field, the test shall fail.

The type specification characters d , f , and u can be followed by an optional unsigned decimal integer that specifies the number of bytes represented by the type. The type specification character f can be followed by an optional F , D , or L , indicating that the value is of type float, double, or long double, respectively. The type specification characters d and u can be followed by an optional C , S , I , or L , indicating that the value is of type char, short, int, or long, respectively.

The default number of bytes represented by the type specifiers d , f , and u shall correspond to their respective C-language types as follows. If the system claims conformance to the C-Language Development Utilities option, those specifiers shall correspond to the default sizes used in the c99 utility. Otherwise, the default sizes shall be implementation-defined.

For the type specifier characters d and u , the default number of bytes shall correspond to the size of a basic integer type of the implementation. For these specifier characters, the implementation shall support values of the optional number of bytes to be converted corresponding to the number of bytes in the C-language types char, short, int, or long. These numbers can also be specified by an application as the characters C , S , I , and L , respectively. The byte order used when interpreting numeric values is implementation-defined, but shall correspond to the order in which a constant of the corresponding type is stored in memory on the system.

For the type specifier f , the default number of bytes shall correspond to the number of bytes in the basic double precision floating-point data type of the underlying implementation. The implementation shall support values of the optional number of bytes to be converted corresponding to the number of bytes in the C-language types float, double, and long double. These numbers can also be specified by an application as the characters F , D , and L , respectively.

All type specifiers, except for s , can be followed by a mask specifier of the form &number. The mask value shall be AND'ed with the value of the input file before the comparison with the value field of the line is made. By default, the mask shall be interpreted as an unsigned decimal number. With a leading 0x or 0X, the mask shall be interpreted as an unsigned hexadecimal number; otherwise, with a leading 0, the mask shall be interpreted as an unsigned octal number.

The strings byte, short, long, and string shall also be supported as type fields, being interpreted as dC , dS , dL , and s , respectively.

value
The value to be compared with the value from the file.

If the specifier from the type field is s or string, then interpret the value as a string. Otherwise, interpret it as a number. If the value is a string, then the test shall succeed only when a string value exactly matches the bytes from the file.

If the value is a string, it can contain the following sequences:

\character
The backslash-escape sequences as specified in the Base Definitions volume of IEEE Std 1003.1-2001, Table 5-1, Escape Sequences and Associated Actions ( '\\' , '\a' , '\b' , '\f' , '\n' , '\r' , '\t' , '\v' ). The results of using any other character, other than an octal digit, following the backslash are unspecified.
\octal
Octal sequences that can be used to represent characters with specific coded values. An octal sequence shall consist of a backslash followed by the longest sequence of one, two, or three octal-digit characters (01234567). If the size of a byte on the system is greater than 9 bits, the valid escape sequence used to represent a byte is implementation-defined.

By default, any value that is not a string shall be interpreted as a signed decimal number. Any such value, with a leading 0x or 0X, shall be interpreted as an unsigned hexadecimal number; otherwise, with a leading zero, the value shall be interpreted as an unsigned octal number.

If the value is not a string, it can be preceded by a character indicating the comparison to be performed. Permissible characters and the comparisons they specify are as follows:

=
The test shall succeed if the value from the file equals the value field.
<
The test shall succeed if the value from the file is less than the value field.
>
The test shall succeed if the value from the file is greater than the value field.
&
The test shall succeed if all of the set bits in the value field are set in the value from the file.
^
The test shall succeed if at least one of the set bits in the value field is not set in the value from the file.
x
The test shall succeed if the file is large enough to contain a value of the type specified starting at the offset specified.
message
The message to be printed if the test succeeds. The message shall be interpreted using the notation for the printf formatting specification; see printf() . If the value field was a string, then the value from the file shall be the argument for the printf formatting specification; otherwise, the value from the file shall be the argument.

EXIT STATUS

The following exit values shall be returned:

 0
Successful completion.
>0
An error occurred.

CONSEQUENCES OF ERRORS

Default.


The following sections are informative.

APPLICATION USAGE

The file utility can only be required to guess at many of the file types because only exhaustive testing can determine some types with certainty. For example, binary data on some implementations might match the initial segment of an executable or a tar archive.

Note that the table indicates that the output contains the stated string. Systems may add text before or after the string. For executables, as an example, the machine architecture and various facts about how the file was link-edited may be included.

EXAMPLES

Determine whether an argument is a binary executable file:

file "$1" | grep -Fq executable &&
    printf "%s is executable.\n" "$1"

RATIONALE

The -f option was omitted because the same effect can (and should) be obtained using the xargs utility.

Historical versions of the file utility attempt to identify the following types of files: symbolic link, directory, character special, block special, socket, tar archive, cpio archive, SCCS archive, archive library, empty, compress output, pack output, binary data, C source, FORTRAN source, assembler source, nroff/ troff/ eqn/ tbl source troff output, shell script, C shell script, English text, ASCII text, various executables, APL workspace, compiled terminfo entries, and CURSES screen images. Only those types that are reasonably well specified in POSIX or are directly related to POSIX utilities are listed in the table.

Historical systems have used a "magic file" named /etc/magic to help identify file types. Because it is generally useful for users and scripts to be able to identify special file types, the -m flag and a portable format for user-created magic files has been specified. No requirement is made that an implementation of file use this method of identifying files, only that users be permitted to add their own classifying tests.

In addition, three options have been added to historical practice. The -d flag has been added to permit users to cause their tests to follow any default system tests. The -i flag has been added to permit users to test portably for regular files in shell scripts. The -M flag has been added to permit users to ignore any default system tests.

The historical -c option was omitted as not particularly useful to users or portable shell scripts. In addition, a reasonable implementation of the file utility would report any errors found each time the magic file is read.

The historical format of the magic file was the same as that specified by the Rationale in the ISO POSIX-2:1993 standard for the offset, value, and message fields; however, it used less precise type fields than the format specified by the current normative text. The new type field values are a superset of the historical ones.

The following is an example magic file:

0  short     070707              cpio archive
0  short     0143561             Byte-swapped cpio archive
0  string    070707              ASCII cpio archive
0  long      0177555             Very old archive
0  short     0177545             Old archive
0  short     017437              Old packed data
0  string    \037\036            Packed data
0  string    \377\037            Compacted data
0  string    \037\235            Compressed data
>2 byte&0x80 >0                  Block compressed
>2 byte&0x1f x                   %d bits
0  string    \032\001            Compiled Terminfo Entry
0  short     0433                Curses screen image
0  short     0434                Curses screen image
0  string    <ar>                System V Release 1 archive
0  string    !<arch>\n__.SYMDEF  Archive random library
0  string    !<arch>             Archive
0  string    ARF_BEGARF          PHIGS clear text archive
0  long      0x137A2950          Scalable OpenFont binary
0  long      0x137A2951          Encrypted scalable OpenFont binary

The use of a basic integer data type is intended to allow the implementation to choose a word size commonly used by applications on that architecture.

FUTURE DIRECTIONS

None.

SEE ALSO

ar , ls , pax

CHANGE HISTORY

First released in Issue 4.

Issue 6

This utility is marked as part of the User Portability Utilities option.

Options and an EXTENDED DESCRIPTION are added as specified in the IEEE P1003.2b draft standard.

IEEE PASC Interpretations 1003.2 #192 and #178 are applied.

End of informative text.


UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
[ Main Index | XBD | XCU | XSH | XRAT ]