Amazon.co.uk Review
This is a blue O'Reilly book, packed to the gunwales with information of interest to people in a hurry to optimise their systems and resolve difficulties. It's easy to locate the passage you need via the index or through the table of contents, and most entries provide a great mix of how-to material (in the form of input-and-output listings) and explanatory text (expert commentary, often with notes on applicable variations). If there's a command, option, or configuration parameter associated with NIS or NFS, you'll find documentation of it here. --David Wall
Topics covered: Network File System (NFS) and Network Information System (NIS) for UNIX machines, especially Solaris (through version 8) and Linux (through version 2.2). Automounting, security, diskless workstations, and performance tuning are among the many details the authors address.
Review
Product Description
A modern computer system that's not part of a network is even more of an anomaly today than it was when we published the first edition of this book in 1991. But however widespread networks have become, managing a network and getting it to perform well can still be a problem. Managing NFS and NIS, in a new edition based on Solaris 8, is a guide to two tools that are absolutely essential to distributed computing environments: the Network Filesystem (NFS) and the Network Information System (formerly called the "yellow pages" or YP).
The Network Filesystem, developed by Sun Microsystems, is fundamental to most Unix networks. It lets systems ranging from PCs and Unix workstations to large mainframes access each other's files transparently, and is the standard method for sharing files between different computer systems.
As popular as NFS is, it's a "black box" for most users and administrators. Updated for NFS Version 3, Managing NFS and NIS offers detailed access to what's inside, including:
- How to plan, set up, and debug an NFS network
- Using the NFS automounter
- Diskless workstations
- PC/NFS
- A new transport protocol for NFS (TCP/IP)
- New security options (IPSec and Kerberos V5)
- Diagnostic tools and utilities
- NFS client and server tuning
NFS isn't really complete without its companion, NIS, a distributed database service for managing the most important administrative files, such as the passwd file and the hosts file. NIS centralizes administration of commonly replicated files, allowing a single change to the database rather than requiring changes on every system on the network.
If you are managing a network of Unix systems, or are thinking of setting up a Unix network, you can't afford to overlook this book.
From the Publisher
About the Author
Mike Eisler graduated from the University of Central Florida with a master's degree in computer science in 1985. His first exposure to NFS and NIS came while working for Lachman Associates, Inc., where he was responsible for porting NFS and NIS to System V platforms. He later joined Sun Microsystems, Inc., responsible for projects such as NFS server performance, NFS/TCP, WebNFS, NFS secured with Kerberos V5, NFS Version 4, and JavaCard security. Mike has authored or coauthored several Request For Comments documents for the Internet Engineering Task Force, relating to NFS and security. He is currently a Technical Director at Network Appliance, Inc.
Ricardo Labiaga is a staff engineer at Sun Microsystems, Inc., where he concentrates on networking and wireless technologies. Ricardo spent 8 years in the Solaris NFS group at Sun, where he worked on a variety of development projects with a primary focus on automounting and the NFS server. Ricardo is responsible for implementing significant functionality and performance enhancements to the automounter, as well as leading the NFS Server Logging design team. He holds a master of science degree in computer engineering from The University of Texas at El Paso.
Hal Stern is a technical consultant with Sun Microsystems, where he specializes in networking, performance tuning, and kernel hacking. Hal earned a Bachelor of Science degree from Princeton University in 1984. Before joining Sun, Hal was a member of the technical staff at Polygen Corporation, developing UNIX-based molecular modelling and chemical information system products. Hal also worked on the Massive Memory Machine project as a member of the Research Staff in Princeton University's Department of Computer Science. His interests include large installation system administration, virtual memory management systems, performance, local and wide-area networking, interactive graphics, applications in financial services, cosmology, and the history of science. Hal is active in the Sun User's Group and has served on the advisory trustee board of the Princeton Broadcasting Service for seven years. Hal and his wife Toby live in Burlington, Massachusetts. At home, Hal enjoys carpentry, jazz music, cooking, and watching the stock market.
Excerpted from Managing NFS & NIS by Hal Stern, Mike Eisler, Ricardo Labiaga. Copyright © 2001. Reprinted by permission. All rights reserved.
In this chapter:
Duplicate ARP Replies
Renegade NIS Server
Boot Parameter Confusion
Incorrect Directory Content Caching
Incorrect Mountpoint Permissions
Asynchronous NFS Error Messages
This chapter consists of case studies in network problem analysis and debugging, ranging from Ethernet addressing problems to a machine posing as an NIS server in the wrong domain. This chapter is a bridge between the formal discussion of NFS and NIS tools and their use in performance analysis and tuning. The case studies presented here walk through debugging scenarios, but should give you an idea of how the various tools work together.
When debugging a network problem, it's important to think about the potential cause of a problem, and use that to start ruling out other factors. For example, if your attempts to bind to an NIS server are failing, you should know that you could try testing the network using ping; the health of ypserv processes using rpcinfo, and finally the binding itself with ypset. Working your way through the protocol layers ensures that you don't miss a low-level problem that is posing as a higher-level failure. Keeping with that advice, we'll start by looking at a network layer problem.
Duplicate ARP Replies
ARP misinformation was briefly mentioned in , and this story showcases some of the baffling effects it creates. A network of two servers and ten clients suddenly began to run very slowly, with the following symptoms:
· Some users attempting to start a document processing application, were waiting 10 to 30 minutes for the application's window to appear, while those on well behaved machines waited a few seconds. The executables resided on a file server and were NFS mounted on each client. Every machine in the group experienced these delays over a period of a few days, although not all at the same time.
· Machines would suddenly "go away" for several minutes. Clients would stop seeing their NFS and NIS servers, producing streams of messages like:
NFS server muskrat not responding still trying
or:
ypbind: NIS server not responding for domain "techpubs"; still trying
The local area network with the problems was joined to the campus-wide backbone via a bridge. An identical network of machines, running the same applications with nearly the same configuration, was operating without problems on the far side of the bridge. We were assured of the health of the physical network by two engineers who had verified physical connections and cable routing.
The very sporadic nature of the problem -- and the fact that it resolved itself over time -- pointed toward a problem with ARP request and reply mismatches. This hypothesis neatly explained the extraordinarily slow loading of the application: a client machine trying to read the application executable would do so by issuing NFSv2 requests over UDP. To send the UDP packets, the client would ARP the server, randomly get the wrong reply, and then be unable to use that entry for several minutes. When the ARP table entry had aged and was deleted, the client would again ARP the server; if the correct ARP response was received then the client could continue reading pages of the executable. Every wrong reply received by the client would add a few minutes to the loading time.
There were several possible sources of the ARP confusion, so to isolate the problem, we forced a client to ARP the server and watched what happened to the ARP table:
# arp -d muskrat
muskrat (139.50.2.1) deleted
# ping -s muskrat
PING muskrat: 56 data bytes
No further output from ping
By deleting the ARP table entry and then directing the client to send packets to muskrat, we forced an ARP of muskrat from the client. ping timed out without receiving any ICMP echo replies, so we examined the ARP table and found a surprise:
# arp -a | fgrep muskrat
le0 muskrat 255.255.255.255 08:00:49:05:02:a9
Since muskrat was a Sun workstation, we expected its Ethernet address to begin with 08:00:20 (the prefix assigned to Sun Microsystems), not the 08:00:49 prefix used by Kinetics gateway boxes. The next step was to figure out how the wrong Ethernet address was ending up in the ARP table: was muskrat lying in its ARP replies, or had we found a network imposter?
Using a network analyzer, we repeated the ARP experiment and watched ARP replies returned. We saw two distinct replies: the correct one from muskrat, followed by an invalid reply from the Kinetics FastPath gateway. The root of this problem was that the Kinetics box had been configured using the IP broadcast address 0.0.0.0, allowing it to answer all ARP requests. Reconfiguring the Kinetics box with a non-broadcast IP address solved the problem.
The last update to the ARP table is the one that "sticks," so the wrong Ethernet address was overwriting the correct ARP table entry. The Kinetics FastPath was located on the other side of the bridge, virtually guaranteeing that its replies would be the last to arrive, delayed by their transit over the bridge. When muskrat was heavily loaded, it was slow to reply to the ARP request and its ARP response would be the last to arrive. Reconfiguring the Kinetics FastPath to use a proper IP address and network mask cured the problem.