Networked Software Development with PERFORCE

PERFORCE ...

The Fast Software Configuration Management System

Networked Software Development

SCM over the Internet and Intranets


Introduction

Most commercial software configuration management (SCM) systems now support the corporate LAN (local area network). That is, any user who is connected via the LAN can transparently access the SCM repository as if it were on his or her desktop workstation. Few commercial SCM systems, however, can extend that transparent support to the slower WAN (wide area network). Because of their reliance on the high speed and low latency of LANs, most SCM systems provide no support for WAN use at all, and those those that do have a not-so-transparent solution. Typically, local users have first-class access to the repository, while remote users must either deal with a read-only replica or face massive source code integration down the road.

But modern corporations and their software development organizations are becomming increasingly dispersed. After a corporate acquisition, a key development team may be across the country or around the world, connected by the corporate intranet; telecommuting developers often have their powerful development environments sitting on their desktops at home, connecting via ISDN lines back to central office; and some fledgling startups sometimes have no corporate headquarters at all, but instead collaborate entirely over the Internet (essentially a worldwide public WAN).

For these organizations, Networked Software Development is a necessity. In this paper we discuss how PERFORCE -- The Fast SCM System -- can handle such environments, using WANs such as corporate intranets or the Internet. The basic proposition is that PERFORCE is not hampered by the low speed and high latency of such long-haul networks. We also describe the pieces needed to make operating over such networks secure.

The Problem With WANs

The two metrics of a network's performance are its bandwidth and its latency. Bandwidth is simply the number of bits per second that can be moved from one computer to the next, and ranges nowadays from 14.4 Kb/s (kilobits/sec) over modems to over 100 Mb/s (megabits/sec) over fiber. Latency is the delay in getting each bit from one end to the other, and ranges from under 1ms for Ethernet to the better part of a second (or more) on the Internet. The following tables give a rough comparison of the common physical networks.

LAN Performance

Network Bandwidth LatencyCost
Ether 10Mb/s 1ms $
Fiber 100Mb/s 1ms $$$

WAN Performance

Network Bandwidth LatencyCost
Modem 28.8Kb/s 300ms $
ISDN 64Kb/s 20ms $$
T1 1Mb/s >1ms $$$
T3 40Mb/s >1ms $$$$
ATM 100Mb/s >1ms $$$$$

In the world of the LAN, the 10Mb/s Ethernet reigns as king. Few users would tolerate anything less in their working environment. On the WAN side, however, much of the world is connected at T1 speeds or lower.

It should be noted that the latency over WANs is determined both by the underlying technology and by the distances between the computers. Even an ATM network cannot overcome the speed of electricity through wires or microwaves through space. London and California are at least 30ms apart even with the fastest of networks. In practice, due to the storing, forwarding, and switching that packets typically endure when going over long networks, the latency of most WANs is 100ms or more.

The problem with WANs is that their long latency fouls up the network file system (NFS) protocols that lie beneath the architecture of most commercial SCM systems. These systems achieve their network transparency by relying on the fact that NFS makes remote disks seem local. In fact, some SCM systems have no network support per-se, but are dependent solely upon the access to network disks. Network file systems themselves are built upon the assumption that the network itself is fast: NFS tries to give the appearance of having local disks, and the NFS protocols assume that the network will answer at disk speeds. When the network's response arrives too late NFS falls down, pure and simple. As NFS falls so does the SCM system.

Some NFS implementations (NetBEUI on NT, NFS over TCP on 4.4 BSD) have overcome the problems of working over slower WANs, but they still cannot deliver the data any faster than the network. SCM systems are applications that expect fast access to the disks: they typically do disk intensive operations like scanning directories or opening many, many files. Remote disks mounted over slow WANs cannot deliver the data at the needed rate, and resultant performance of the SCM system is more or less intolerable.

The bottom line is that SCM solutions like Atria's Clearcase, Continuus/CM, Platinum's CCC/Harvest, Intersolv's PVCS, MKS Source Integrity, and Microsoft's Sourcesafe all rely on NFS for their transparency and none of them offer their normal solution for WANs. Atria alone has a WAN offering called Multisite, but it is markedly different than what their LAN product supports. It basically supports read-only replicas that are synchronized (by the user) over the WAN.

PERFORCE Network Support

Like the SCM solutions mentioned above, PERFORCE has an architecture that works over a network. But PERFORCE has two specific design features that make it considerably less reliant on the performance of the network. These design features are:
1.
Local files but centralized metadata.

A critical component of PERFORCE's design locates data so as to minimize network use. The repository metadata lives on the central server, so that global state is all in one, easy-to-access location. On the other hand, the files that a user works on are copied onto the local disk for fast access. This means that operations such as editing, compiling, searching, etc. all run at local disk speed without requiring any network interaction.

Only explicit PERFORCE operations make use of the network. When copying files between the user's client host and the central server, PERFORCE uses its global metadata to transfer only what is necessary. When reporting, the PERFORCE server has fast, local access to all metadata, so that only distilled information needs to be sent back over the network to the client.

2.
TCP/IP-based message queuing protocol.

When file transfers do occur, in order to synchronize the client with the repository or to submit files to the repository, PERFORCE uses a streaming, message queuing protocol build directly atop TCP/IP. It does not rely on NFS access, which is built upon remote procedure calls (RPCs) and unreliable datagrams (UDP/IP).

The difference between TCP/IP-based streams and UDP-based NFS is significant. With PERFORCE, the latency of the network affects PERFORCE once or twice each time the client command is invoked by the user. Thus over a network with a 100ms latency the user might notice an extra 200ms or 1/2 second delay after typing the PERFORCE client command. This delay is almost always tolerable and usually unnoticeable. With the NFS access required by all other SCM solutions, the latency of the network affects each of the file I/O operations, which can easily number in the hundreds for something as simple as a file checkout. The result is impractical, as users won't wait minutes to check out a single file.

Because a PERFORCE user normally works against the local disk, he or she is only aware of the speed of the network when performing explicit SCM operations. PERFORCE operations are geared to minimize the effects of network latency, leaving only network bandwidth a real factor in the performance of PERFORCE.

PERFORCE's requirements for network bandwidth are at their highest when transferring files from the repository to the client or vice-versa. Basically, the network can limit the file transfer rate. How long the user has to wait then depends on the number and size of files being transferred. For change submissions, the files being transferred are the ones changed by the user and are often fairly small in number. When the client is syncrhonizing with the repository, pulling down the combined efforts of the other clients of the repository, the number of files can grow quite large.

Note that the network is not always the limiting factor: at network speeds over 1Mb/s (e.g. T1, T3, Ethernet) it is often the client's disk that limits the file transfers. This happens not because the quantity of data is so large, but because writing new files on a client's disk implies creating directory entries. On most operating systems (including UNIX) directory updates are handled synchronously. This improves the integrity of the on-disk data in the case of a power failure or system crash, but carries quite a performance penalty.

Non-file interactions with the repository, such as reporting on changes or editing client specifications, result in very little data transfer and are usually not affected by network bandwidth.

Empirical reports from users support the notion that if the network is 1Mb/s or faster, the interactions with PERFORCE appear to be the same as if the client and server were on the same fast LAN. At 64Kb/s (ISDN speeds), the speed of the network is noticable when transferring files but not for reporting or other operations. At 28.8Kb/s (modem speeds), all operations are affected but PERFORCE continues to be usable.

The conclusion is that PERFORCE can perform SCM over any TCP/IP network, be it the fast LAN, the slower corporate WAN, or the (even slower) Internet.

Three Levels of Security

Once it becomes possible to open up access to the SCM repository to all users on the network, the most immediate concern will be security. The Internet, of course, is not a safe place for the software assets of any company, but even the corporate intranet can be an adversity for small groups operating in a large company.

This section discusses the measures that are necessary to protect software developed over the network. The severity of the measure depends on the sensitivity of the software being developed and the hostility of the network.

Level One - PERFORCE Protections

PERFORCE has a built-in mechanism for performing simple host authentication, based on TCP/IP host addresses. Called protections and managed by the protect command, this mechanism allows the PERFORCE repository administrator to limit the hosts that can access the repository. For a corporate intranet, this is often sufficient. It has the advantage that it can selectively enable read or write access to any parts of the repository for any collection of hosts. This makes it easy to give users on certain hosts the ability to look at data in the repository but not modify it. For example, the technical support organization might be permitted to look at source code or software developers may have read-only access to the documentation.

The PERFORCE protections mechanism can also limit access based on user name, but PERFORCE does not authenticate the user name, and so this is not recommended as a security measure. It is instead a safety device to keep users from accidentally modifying data in the repository. For example, if a code line is frozen for a release, that part of the repository may allow read/write access only to users in the product release organization.

The PERFORCE protections mechanism is not hardened against denial of service attacks (where unauthorized users repeatedly attempt to connect, just to use resources), and it does not encrypt the data passed between the client and server. Thus it is intended for use within corporations where espionage is not a concern. For more adverse environment, firewalls and/or encryption with strong authentication are required.

Level Two - Firewalls

If a company intends to operate in a hostile network environment, either by being connected to the Internet or by being part of a large corporate intranet without reasonable control over who has access to the network, firewalls may be in order. A firewall is a barrier between the trusted LAN and the untrusted wider network. Because PERFORCE uses plain TCP/IP connections, configuring a firewall to permit the authorized PERFORCE use is fairly straightforward. In addition to any other access, the firewall needs only to pass PERFORCE connections (usually port 1666) through from trusted hosts to the PERFORCE repository machine.

PERFORCE will work with any properly configured firewall. PERFORCE has been tested and is known to work with the freely available firewall toolkit FWTK from TIS (Trusted Information Systems). TIS also has a commercial product that they support. PERFORCE works well with the FWTK's "plug gateways", that allow a bastion host to handle and forward all connections that come through the firewall.

A firewall itself is fair protection against hostile networks, but doesn't help when sensitive data must travel over those hostile networks.

Level Three - Encryption and Strong Authentication

If a company plans to develop sensitive software over its corporate network or plans to involve the Internet at all in software development, then maximum security measures must be taken: encryption and strong authentication. Strong authentication is a means to verify that the TCP/IP connection coming into the PERFORCE repository server is from a trusted user on a trusted host, not just from someone who has gained physical access to the network. Such authentication mechanisms use public key cryptography. Encryption is simply the scrambling of data so that people snooping on the network can't make use of it.

PERFORCE has been tested and is known to work with SSH (Secure Shell), a free package that supports remote login using strong authentication and encryption. To use PERFORCE with ssh, the user must have login access (via ssh) to the PERFORCE repository machine. When logging in, the user passes a special flag to ssh to turn on a TCP/IP connection proxy. This proxy passes the PERFORCE connection from the client down the secure, encrypted pipe used by ssh to the server.

In practice, a combination of the above mechanisms will be needed to be used to allow authorized PERFORCE connections while securing the LAN from the hostile, wider network. The secure shell must be used to provide the authentication and encryption, the firewall must be used to block non-secure shell connections, and PERFORCE protections can be used to limit access to selected parts of the repository. The result is a PERFORCE repository that can be securely accessed from anywhere on the Internet or corporate intranet.

Conclusion

Networked Software Development is a real need in many companies right now and is likely to be the norm in the future. PERFORCE has the ability now to support networked software development, and when used in conjuction with a few tools (versions of which are freely available), PERFORCE can make the Internet or corporate intranet as viable a development environment as the LAN.
Networked Software Development with PERFORCE
Copyright 1996 Perforce Software. Comments to info@perforce.com.
Last updated: September 18, 1996