$Id: Release-Notes-1.0.txt,v 1.1.2.8 1996/05/20 03:25:08 wessels Exp $ Release Notes for version 1.0 of the Squid cache. TABLE OF CONTENTS: Private Objects Proper parsing of HTTP reply codes Support for If-Modified-Since GET Improvements to the access log Metadata reloads in the background Unlinking swap files on restart and the -U option Changes to debugging New Access Control scheme Using SIGHUP to reconfigure the cache ftpget server Changes to cache shutdown Assigning weights to cache neighbors Converting 'cache/log' from cached-1.4.pl3 Notes on stoplists vs. ttl_pattern SIGUSR1 now rotates log files Private Objects ============================================================================== The Squid cache uses the notions of ``private'' and ``public'' objects. An object can start out as being private, but may later be given public status. Private objects are associated with only a single client whereas a public object may be sent to multiple clients at the same time. When the cache finishes retrieving an object, if the object is private it will be ejected from the cache. Only public objects are saved on disk. There are a few ways to determine whether an object should be private or public. One is the request method. Only URLs requested with the ``GET'' method can be public. Another way is by examining the URL string. URLs which match one of the stoplist entries will always be private objects. Usually this includes ``cgi-bin'' scripts. A third way is by checking the HTTP request and reply headers. For example, if the request includes user authentication information, then the object should never be made public. Additionally, some HTTP replies such as ``401 Unauthorized'' should also never be made public. For these reasons, Squid starts all objects out as private and changes them to public only after the HTTP reply headers have been read. Unfortunately, this causes some problems with the UDP-based Internet Cache Protocol (ICP) used to query neighboring caches. Specifically, when an ICP reply packet is received, it only contains the object URL which is not sufficient enough to locate private objects in the cache metadata. To get the additional information needed to locate private objects, we decided to use the ``reqnum'' field of the ICP packet. This is an acceptable solution, except that as implemented in cached-1.4.pl3 and earlier, all ICP replies have the reqnum field reset to zero! Squid will make use of private objects until it notices that one of its neighbors is sending ICP replies with the reqnum field set to zero. It will then only use private keys for objects which are not going to be queried for via ICP. These include objects in the stoplist and If-Modified-Since requests. Proper parsing of HTTP reply codes ============================================================================== Squid parses HTTP replies to extract the reply code. The codes are used to determine which objects should be cached, which should be ejected, and which should be negative-cached. See HTTP-codes.txt for a list of HTTP response codes, and how they are cached. The HTTP codes are now logged to "access.log" in the native format (ie with 'emulate_httpd_log off'). Support for If-Modified-Since GET ============================================================================== Squid supports IMS GET retrievals, but not through any neighbor caches. Whenever an IMS GET request is received, Squid will bypass the cache hierarchy and fetch the object on its own. Improvements to the access log ============================================================================== The "access.log" file has been improved in a number of ways. There is now only one log entry per client request and the size is always correct. The format is now timestamp elapsed src-address type/code size method URL - timestamp: When the request is completed with millisecond resolution - elapsed: elapsed time of the request, in milliseconds - src-address: IP address of the requesting client - type: An indication of how the request was handled by the cache. These are described further below - code: The HTTP reply code when available. For ICP requests this is always "000." If the reply code was not given, it will be logged as "555." - size: For TCP requests, the amount of data written to the client. For UDP requests, the size of the request. (in bytes) - method: The request method (GET, POST, etc). - URL The URL of the request Access Log Types: "TCP_" refers to requests on the HTTP port (3128) TCP_HIT A valid copy of the requested object was in the cache TCP_MISS The requested object was not in the cache TCP_EXPIRED The object was in the cache, but it had expired TCP_REFRESH The user forced a refresh ("reload") TCP_IFMODSINCE An If-Modified-Since GET request. TCP_SWAPFAIL The object was believed to be in the cache, but could not be accessed. TCP_DENIED Access was denied for this request "UDP_" refers to requests on the ICP port (3130) UDP_HIT A valid copy of the requested object was in the cache UDP_MISS The requested object was not in the cache UDP_DENIED Access was denied for this request Metadata reloads in the background ============================================================================== Upon restart, Squid automatically loads cache metadata in the background. It will be able to service new requests immediately. As new objects are added, there may be some "clashes" with old objects using the same swap file on disk. In these cases you may see a message in the cache logfile about "Active clash." This means the old object has been discarded since it was replaced by a new object. Unlinking swap files on restart and the -U option ============================================================================== When the cache reloads object metadata from disk some of the objects will be expired or otherwise invalid. In the interest of speed, these invalid objects will not be removed from the filesystem by default. They will eventually be overwritten by new objects as enter the cache and get saved to disk. The -U option can be used to actually remove the invalid objects from disk. In addition, the -z option will not cause 'rm -rf [0-9][0-9]' to be executed unless the -U option is also given. When swap files are not removed during restart there internal counters for disk space taken will not match the actual disk space used. If you have a large cache or plenty of extra disk space, this should not be a problem. However, if space is an issue, you may want to use the -U option at the cost of a slower restart. Changes to debugging ============================================================================== Squid has a flexible debugging scheme. You can enable more debugging for certain functions and less for others. For example if you needed to figure out why your access controls were behaving strangely, you could enable debugging for section 28 at level 9. Currently, each section corresponds to separate source code file: main.c: Section 1 cache_cf.c: Section 3 errorpage.c: Section 4 comm.c: Section 5 disk.c: Section 6 fdstat.c: Section 7 filemap.c: Section 8 ftp.c: Section 9 gopher.c: Section 10 http.c: Section 11 icp.c: Section 12 icp_lib.c: Section 13 ipcache.c: Section 14 neighbors.c: Section 15 objcache.c: Section 16 proto.c: Section 17 stat.c: Section 18 stmem.c: Section 19 store.c: Section 20 tools.c: Section 21 ttl.c: Section 22 url.c: Section 23 wais.c: Section 24 mime.c: Section 25 connect.c: Section 26 send-announce.c: Section 27 acl.c: Section 28 Debugging levels are set in the configuration file with the 'debug_options' line. For example: debug_options ALL,1 28,9 22,5 New Access Control scheme ============================================================================== The old IP-based access controls have been replaced with a much more flexible scheme. First you must define a set of access control lists. There are N types of lists: 'src' client IP address 'method' method of the request (eg, GET, POST) 'proto' protocol of the request (eg HTTP, WAIS) 'domain' domain of the URL request (eg .foo.org) 'port' port number of the URL request (eg 80, 21) 'time' time-of-day and day-of-week format: [SMTWHFA] [hh:mm-hh:mm] 'pattern' regular expression matching on the URL-path After the access lists have been defined, you can then combine them in way to allow or deny access. For example, your cache might be configured to accept requests from both inside and outside of your organization. In that case you'd probably want to allow internal clients to access anything, but limit outside access to only sites within your organization. It could be done like this: acl ourclients src 128.138.0.0/255.255.0.0 198.117.213.0/24 acl ourservers domain .whatsamattu.edu http_access deny !ourclients !ourservers http_access allow ourclients If you wanted to limit FTP requests to off-peak hours, you could use: acl daytime time MTWHF 08:00-17:00 acl FTP proto FTP http_access deny FTP daytime Any of the access list types can accept multiple values on the same line, except for 'time'. Multiple values of an 'acl' definition are treated with OR logic. Multiple ACLs of an 'http_access' are treated with AND logic. That is, all ACLs much match for the 'allow' or 'deny' take effect. The order of the 'http_access' lines are important. When a line matches any following lines are not considered at all. 'icp_access' is the same as 'http_access' but it applies to the ICP port. However, it is not yet fully implemented. It is only able to check 'src' and 'method' ACLs. Changes to cache shutdown ============================================================================== Squid attempts to implement a "nice shutdown" upon receipt of a SIGTERM signal. Rather than simply breaking all current connections, it waits a configurable number of seconds for active requests to complete. The default 'shutdown_lifetime' value is 30 seconds. As soon as the SIGTERM is received, the incoming HTTP socket is closed so that no further requests will be accepted. Using SIGHUP to reconfigure the cache ============================================================================== Sending the squid process a HUP signal will prompt it to re-read its configuration file. Before it can be reconfigured, it must make sure that all active connections are closed. For this purpose squid pretends to do a shutdown as described above. ie, it will wait up to 30 seconds for active requests to complete before re-reading the configuration file. ftpget server ============================================================================== The ftpget program has been modified to act as a server for FTP request. You may now notice that an "ftpget -S" process is always present while the cache is running. The benefit of using an ftpget server is that the cache process (which may be very large) no longer needs to fork itself for FTP requests. Assigning weights to cache neighbors ============================================================================== Squid allows you to assign weights to parent caches. The weights are used to calculate the ``first miss parent.'' The weight is specified in the 'options' field of the 'cache_host' line. For example: cache_host big.foo.org parent 3128 3130 weight=5 The weight must be a non-zero integer. It is used as a divisor to calculate a weighted round-trip-time (RTT). Higher weights will cause a parent to have a ``better'' RTT. Weights are only involved when all parent caches return MISS. Squid still fetches an object from the first parent or neighbor to reply with a HIT, regardless of any weight values. Converting 'cache/log' from cached-1.4.pl3 ============================================================================== Squid uses a slightly different format for the 'cache/log' file. In particular, the words 'FILE:' and 'URL:' have been removed from each line. To save your on-disk cache, you will need to convert this log file before starting Squid. To do that use a simple awk command: mv log log.old awk '{print $2,$4,$5,$6,$7}' < log.old > log Notes on stoplists vs. ttl_pattern ============================================================================== You can use the stoplists ('http_stop', etc) in the configuration file to prevent objects from being cached. Using a 'ttl_pattern' with the TTL to zero will also prevent objects from being saved. There is one important difference between these two methods however. Squid nevery makes ICP queries for objects which match the stoplists. Instead, the object will be fetched directly (unless on the other side of a firewall). We recommend that you use the stoplist for cgi-bin scripts and use the ttl_pattern rules to prevent caching of normal objects. SIGUSR1 now rotates log files ============================================================================== In order to be more consistent with other daemon programs, SIGHUP is used to reconfigure the running process. This means that we needed to change the signal used to rotate the log files. We now use SIGUSR1 to rotate the logs.