Draft Version 0.9, December 3, 2005
This paper presents the principles underlying software design and the major design practices.
Software design varies in formality according to an organization's culture and the size of a project. Sometimes the design work is very informal, and is done at the keyboard as the program code is typed in (McConnell, 1993). In this case the source code serves as the design. Other times, the design work is part of a more extensive software engineering process, where other activities may support the design work (e.g. requirements analysis, design inspections, etc.). In this case the design is written down using a design notation, possibly with a specialized program, and is usually put under version control.
In all cases software design is a "problem-solving task" and a "highly-creative process" (Budgen, 2003, p. 18, 32). Budgen makes a case that design is not an analytical process, and that it is not like the scientific approach to problem solving (ibid. p. 32). Consequently there may be several acceptable design solutions (ibid. p. 19).
On the other hand, scientific methods applied to physical processes can reduce a problem to smaller parts, and an analysis of the accumulated data will show a convergence that points to a single solution (ibid. p. 20). Software design rarely converges; in fact, the opposite is typical. This is because software design problems have the characteristics of "wicked" problems (ibid. p. 19). Some characteristics of 'wicked' problems are:
Understanding that several acceptable design solutions may exist, the goal of design becomes a goal to find the best solution. The best solution for a software design is the least complex solution (Yourdon and Constantine, 1979). Such a design minimizes the greatest cost factors of software development: implementation, maintenance, and modification. Finding the "best solution" requires a way to quantify quality in design: i.e. we need the attributes of quality and a means of assessing them (Budgen p. 75).
Budgen argues that the underpinning principles of design quality have changed little since their introduction during the 1970's (ibid. p. 445). However, the pace of change in software technologies increased and the demand for faster delivery put pressure on software professionals to find faster ways of doing things. Budgen found this to be a major influence in the rise of agile methods (ibid. p. 443). Another influence on design practice is the idea of market strategy, in which competition for market space and other non-technical factors influence a design. He concludes, however, that the most important new idea in software design is the emergence of architectural style (ibid. p. 446).
Each design methodology includes a set of graphical symbols for representing and archiving design data. There is no theoretical model that describes design notations, nor are there any "empirical investigations" that can be used to determine when to use a particular form or how a particular form should be used (ibid. p. 445). Even the relatively new Unified Modeling Language (UML), which has wide support, leaves plenty of ambiguity in a design: you cannot guarantee that two programmers will produce the same code from a UML diagram.
This paper attempts to show what software design is and how it is done. We start by stating that design is simply "a problem-solving task" and that a software design describes how requirements (e.g. in a requirements specification) are to be met (ibid. p. 18).
Probably the best description of design as it applies to software is found in Budgen (ibid.). He assembles work from both inside and outside the software industry to show how design is used to solve problems, and how software design differs from design in other engineering disciplines.
Budgen does a good job of building a case showing how software design is different and more difficult than design in other engineering disciplines, which he summarized into four main characteristics (ibid. p. 16):
Clearly, we need tools that can improve the situation. Budgen argues that the most practicable tools available are (ibid. p. 18):
Missing from the list is the CASE tool. While CASE tools may have the capability to check a design for "consistency and completeness," these tools are not yet capable of assessing the degree to which a design addresses the requirements of a system (ibid. p. 447).
Design models have a fundamental role in software design (ibid. p. 23, paraphrased):
There are two design models. Each design method belongs to one of the two models:
A design method - whether graphical or mathematical - consists of two major components (ibid. p. 34):
In addition to these two major components, most design methods include a "set of heuristics" that gude process activities for specific types of problems (ibid. p. 34).
Representations are useful in their ability to convey domain knowledge by illustrating concepts such as information flow and operations. Software design processes are useful in their ability to guide tasks (despite a lack of theoretical underpinnings that explain why this is). Such tasks include:
One of the main benefits of design representations is that they serve as a tool for knowledge transfer. Other uses for design documents are 1) they record design decisions and (ideally) the rationale behind them; 2) they serve as a reference during construction and maintenance. It is typical that software designs are written down for large, more formal projects but not for most smaller projects. This is the source of many problems.
When the design is not available in hard copy or electronic form, there is no vehicle for knowledge transfer. When new employees have questions about an application's design, they must rely on current employees for answers. There is a tendency to lose track of design decisions when they are not written down. This increases maintenance cost. When a system's design is not available, testing is hampered because there is no document specifying functionality that should be tested.
Recording of design decisions usually does not (but should) include the rationale behind the decision (ibid. p. 39). Budgen notes that the design audit (a.k.a. design inspection) encourages recording of design rationale.
Systems that evolve require more forethought in the design stage. Budgen cites the work of Lehman and Ramil (2002) who found that there are generally two kinds of systems (ibid. p. 58):
Even formal specifications can change from time to time. Parnas (1979) argues that a design should be based on the assumption that there will be a need to make changes to the product sometime.
A single person may be responsible for a design, or a team may share the responsibility. What does it take for a person or team to be successful, or even exceptional, at design? Budgen cites work by Curtis et. al. (1988) who found that exceptional designers possess three characteristics (ibid. p. 32):
One skill that is often missing in exceptional designers: good programming skills.
When design is performed by a team, coordination of design tasks and dividing the product are additional complications (Budgen p. 40). There are two kinds of team organization:
Research in the psychology of teams has found:
In order to measure design quality we need to know what the quality factors are and we need methods to measure them. Budgen described a group of quality factors (ibid. p. 70):
Measures of design quality, to which the above quality factors are applied, include:
Simplicity of design means it includes nothing more than what is essential. Simplicity is a measure of maintainability and testability, and to a lesser degree reliability and possibly efficiency (ibid. p. 76). Design simplicity is usually described in terms of complexity:
These ideas are examined more thoroughly in other work. For example, Yourdon and Constantine (1979) describe factors of complexity, what they call "complexity in human terms" (p. 73): 1) size of a module; 2) number of decision-making statements; 3) span of data elements (e.g. between uses of a variable); 4) span of control flow (e.g. between an entry point and exit point). In summary, Yourdon and Constantine find that complexity is determined by:
Yourdon and Constantine state that their discussion of complexity focuses on intermodular interfaces (ibid. p. 74). Furthermore, they confine their definition of "amount of information" to the number of arguments passed in a call to a routine. I.e., the intermodular interface.
Yourdon and Constantine describe "accessibility of the information" as applying to the use of the interface: what a programmer must know to interpret the code. They give four factors (ibid. p. 76):
Modularity of design means the extent to which it is divided into components that
Modularity is measured in terms of coupling and cohesion. Modularity is a measure of maintainability, testability, and possibly usability and reliability (Budgen 2003, p. 77).
Coupling is a measure of the type and strength of connections between modules. Highly coupled modules have strong interconnections, loosely coupled modules have weak interconnections. Uncoupled modules have no interconnections. A programmer who is coding, debugging or modifying one of two (or more) highly-coupled modules will encounter a higher probability that changes will be required in the other modules as well. Coupling is a relative measure indicated by four factors (Yourdon and Constantine 1979, p. 86); in decreasing magnitude of effect on coupling:
From a quality perspective, we are concerned mostly with the type of information flow. Yourdon and Constantine (ibid.) describe forms of coupling in detail but their work is less accessible today because their examples use machinery like paper tape readers and punch cards as input, punched tape as output, and they refer to language constructs that are no longer mainstream. McConnell's work (1993) is much more accessible.
There is no standardized empirical measure of coupling, and names of coupling forms vary from one author to another. However, Budgen states that knowledge of the presence of particular forms of coupling is more useful to the designer than the extent of any particular form (2003, p. 78).
Table of coupling forms in decreasing order of desireability (Sources: Yourdon and Constantine (1979), Martin and McClure (1985), McConnell (1993), Budgen (2003)).
|Yourdon and Constantine||Martin & McClure||McConnell||Budgen|
|Data coupling or input-output coupling||Data coupling||Simple-data coupling||Data coupling|
|Not mentioned||Stamp coupling||Data-structure coupling||Stamp coupling|
|Control coupling - activating||Control coupling||Control coupling||Control coupling - activating|
|Control coupling - co-ordinating||Control coupling||Control coupling||Control coupling - co-ordinating|
|Common-environment coupling||Common coupling||Global-data coupling||Common-environment coupling|
|Not mentioned||Content coupling||Content coupling||Not mentioned|
McConnell (1993) gives a thorough as well as accessible description of coupling, from a programmer's point of view (p. 87). In order of decreasing coupling:
Cohesion is a measure of the functional relatedness of the components of a module (Yourdon and Constantine 1979, p. 106; Budgen 2003, p. 78). Cohesion is directly related to a module's relation to the problem solution (Yourdon and Constantine 1979, p. 106). I.e. as cohesion increases, overall complexity decreases.
Table of cohesion forms in decreasing order of desireability (McConnell places temporal cohesion above procedural cohesion (1993 p. 83). Sources: Yourdon and Constantine (1979), Martin and McClure (1985), McConnell (1993), Budgen (2003)).
|Yourdon and Constantine||Martin & McClure||McConnell||Budgen|
Cohesion levels are determined by seven distinct associative principles. In order of increasing cohesion they are:
Elements (lines of code) in a coincidentally-cohesive module have no relationship. Typically occurs as the result of modularizing existing code, to separate out redundant code (Yourdon and Constantine 1979, p. 109).
Yourdon and Constantine (ibid.) take an example where a programmer following structured techniques ends up with duplicated code in many places. To reduce the duplication and create a more compact program, the programmer replaces the duplicate code with function calls, and puts the code in a separate module or modules. These modules and the functions within have no relationship to each other, and are coincidentally cohesive.
Why is this worse than leaving duplicated code alone? It depends - often the modules created to reduce duplicated code are not suited for reuse in other applications. Another reason is that it becomes a maintenance problem, e.g. when the functionality of the routine is not the same each time it is called.
While the elements of a logically-cohesive routine have some relationship to each other, the whole does not perform a function. The elements have a closer tie to the problem solution than coincidentally cohesive elements: they perform logically similar operations (ibid. p. 114).
McConnell (1993) describes an example of logical cohesion where the operations in a function are separated by some control structure, and the set that executes is determined by a control flag that's passed in the parameter list.
Temporally-cohesive elements are related by time. E.g. all of the elements execute to perform some operation in a given time period, such as start up or shut down. This is also logical cohesion - the elements are related to each other.
Yourdon and Constantine found that whenever you have logical cohesion without temporal cohesion, such code is usually "tricky, obscure, or clumsy code which is difficult to maintain and modify" (1979, p. 116).
Procedurally-cohesive operations are related by elements of processing such as looping or a decision process; a simple sequence of steps also relates elements of processing.
Yourdon and Constantine describe a design process where a flowchart of a process is used to determine where it should be separated into routines and modules. The results varied widely and often tended to produce lower cohesion. They discovered that higher forms of cohesion resulted when data relationships were separated from control features (ibid. p. 117).
Procedural cohesion typically results when using models of a process to determine code module structure. Modules that have procedural cohesion typically contain only part of a function, an entire function and parts of other functions, parts of several functions, or even a single function (ibid. p. 118). Sometimes a procedurally-cohesive module that performs a distinct task is acceptable, in terms of ease of maintenance and ties to problem structure.
Communicational cohesion is the lowest type of cohesion where processing elements have a structural relationship to the problem. McConnell (1993) gives a description where all operations make use of the same data and the order in which those operations occur is not important. There is no single name that clearly describes the function.
Sequential cohesion describes the situation where the flow of data proceeds sequentially through a set of functions; or, where operations on data proceed sequentially through a specific order and the data is shared from step to step, and the operations don't make a complete function.
All forms of cohesion discussed so far can result from flowchart design, which allows confusion of data flow with control flow (Yourdon and Constantine, 1979 p. 126). This happens because a single step on a flowchart may represent code from an entire function, part of a function, a separate program such as a DLL, etc.
Functional cohesion is the highest level of cohesion. It describes the case where a function does one thing: fulfilling what it's name says it does (ibid. p. 127). This brings up the subject of routine names. In the context of cohesion, a routine name should be a clear verb - and - object combination.
A problem with the structured concepts of coupling and cohesion is that they cannot be objectively assessed (Yourdon and Constantine p. 132; Budgen p. 78) (though work continues in an effort to change this - see for example Bieman and Kang, 1998, or Kramer and Kaindl, 2004).
Information hiding means to hide details of data structures inside a module. Only routines that are part of the module have direct access to the data structures; routines that are not part of the module must call routines within the module to work with the data structures.
The term "information hiding" may have originated in the work of Parnas (1972), who describes a module as "a responsibility assignment" (p. 1054), which is "characterized by its knowledge of a design decision which it hides from all others. Its interface or definition was chosen to reveal as little as possible about its inner workings" (ibid. p. 1056).
Information hiding is related to the quality factors reliability and maintainability (Budgen p. 79).
The product of software design - also known as implementation design - is the technical specification. This document or set of documents usually includes flowcharts to present process model, data flow, and process flow cycles; and a form-level hierarchy (i.e. input forms and dialogs). The technical specification describes a program's operating environment, interfaces, functions, and modules.
Several documents fall into the category of design specifications. The particular documents you create will depend on the design method you use.
Diagrams give us a way to communicate complex ideas: what we are capable of thinking depends on the language we use for thinking (Martin and McClure, 1985 p. 109). Diagrams extend our vocabulary, and different kinds of diagrams extend our vocabulary in different ways.
Diagrams are tools that communicate information about a system and are thus particularly useful for (ibid., paraphrased):
As better, more rigorous methods for specifying systems are created, new diagramming methods will be needed (ibid. p. 110).
Diagrams are used in four areas of design (ibid. p. 111):
Many of the early diagramming techniques have fallen into disuse. While some have been supplanted by newer UML techniques, others have no counterpart in UML. It is worth discussing these, if only briefly, because they are still useful. Martin and McClure's review of structured diagramming techniques concludes with a recommendation of the best of the techniques before UML and object-oriented methods appeared (ibid. p. 396). They recommend:
All design methodologies have the same goals, but some methods work better on certain kinds of problems than other methods. Often you will choose a design method that complements the programming environment. Most languages in common use today have object-oriented (OO) capabilities.
Design principles developed in the area of structured design are not made obsolete by principles developed in the area of object-oriented (OO) design. Structured design concepts such as cohesion and coupling describe fundamental characteristics of all software that has routines or modules.
Structured design is based on theories developed by Bohm and Jacopini (1966; Yourdon and Constantine, 1979, p. 73). Yourdon and Constantine describe structured design as "a collection of guidelines for distinguishing between good designs and bad designs, and a collection of techniques, strategies, and heuristics that generally leads to good designs..." (p. 15).
In the quest for the least complex design, they present a body of evidence that shows that the least complicated design consists of small modules that are relatively independent of each other but that relate easily to the application (ibid. p. 29). Two principles of structured design help focus the design effort to achieve these characteristics:
The unit of structured design is the function. Structured methodologies apply strategies and tools that help you
Structured design emphasizes two approaches:
Structured design is concerned with relationships between pieces of a system. We refer to these pieces as modules, though the term 'module' does not necessarily indicate a group of functions. Some qualities of modularity are apparent only in the relationship between modules. Y&C introduce the concepts of coupling (p. 85) and cohesion (p. 106) for this purpose. These concepts are the theoretical basis of structured design and the focus of this section.
Bieman, J. M. and Kang, B-K, 1998, Measuring Design-level Cohesion: IEEE Transactions on Software Engineering, v. 24, no. 2, pp. 111-124.
Bohm, Corrado and Jacopini, Giuseppe, 1966, Flow Diagrams, Turing Machines and Languages with Only Two Formation Rules: Communications of the ACM, v. 9, no. 5, pp. 366-371.
Budgen, David, 2003, Software Design (2nd ed.): Pearson Education Limited/Addison-Wesley,
Kramer, S., and Kaindl, H., 2004, Coupling and Cohesion Metrics for Knowledge-Based Systems Using Frames and Rules: ACM Transactions on Software Engineering and Methodology v. 13 no. 3, pp. 332-358
Martin, David and Estrin, Gerald, 1967, Models of Computations and Systems - Evalation of Vertex Probabilities in Graph Models of Computations: Journal of the ACM, v. 14, no. 2, pp. 281-299.
1993, Code Complete: A Practical Handbook of Software Construction: Microsoft Press,
Martin, J., and McClure, C., 1985, Structured Techniques For Computing: Prentice-Hall,
Meyer, Bertrand, 1997, Object-Oriented Software Construction: Prentice-Hall, 1254 pages.
Parnas, D. L., 1972, On The Criteria To Be Used In Decomposing Systems Into Modules: Communications ACM, v. 15, no. 12, pp. 1053-1058.
Yourdon, Edward and Constantine, Larry L., 1979, Structured Design: Fundamentals of a
Discipline of Computer Program and Systems Design: Prentice-Hall, 473 pages.
Zhu, Hong, 2005, Software Design Methodology: From Principles to Architectural Styles: Butterworth-Heinemann, 368 pages.