Info OOP Review

From CSDMS
Revision as of 18:05, 8 October 2008 by Huttone (talk | contribs) (Create object oriented programming review page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Review of Object-Oriented Programming Concepts

To understand component-based programming, which is the subject of the next section, it is necessary to first understand the basic concepts of object-oriented programming. Readers already familiar with object-oriented programming may wish to skip to the next section. This section provides only a brief discussion of object-oriented programming but builds up the key concepts in essentially the same way that they were developed historically. Readers interested in learning more can find a wealth of materials online by typing any term highlighted here into a search engine or an online encyclopedia like Wikipedia.

Data Types

When you stop to think about it, it is amazing how quickly programming languages have evolved during the relatively short time that computers have been around. To get a sense of the timespan involved, note that Fortran (The IBM Mathematical Formula Translation System) was developed in the 1950s by IBM as an alternative to assembly language for scientific computing. ALGOL (Algorithmic Language) followed in the mid 1950s and Lisp was introduced in 1958. BASIC (Beginners All-purpose Symbolic Instruction Code) was developed for use by non-science students at Dartmouth in 1964. The first personal or home computers (RadioShack TRS-80, Commodore PET and Apple II) became available in 1977 and were usually equipped with BASIC.

These early languages allowed users to work with different types of data or data types, such as boolean or logical (true/false), integers and floating-point (or decimal) numbers. These are now referred to as primitive data types (or basic types) and are available in every computer language. Early on, it was possible to store either a single number or an array of numbers as long as every number in the array had the same primitive type. Soon after, data types to store textual characters (letters, digits, symbols and white space) and arrays of characters or strings were introduced. Fortran also introduced a separate data type for complex numbers, which are simply pairs of decimal numbers (for the real and imaginary parts).

As computer programs became more complex, two important themes began to emerge, and these led to the introduction of new computer languages and to the eventual extension of existing languages. One theme was that of structured or modular programming, in which the goal was to streamline code design in ways that made it easier to understand, extend and maintain. This meant breaking large programs into smaller units and doing away with (or at least recommending against) goto statements in favor of for, while and repeat loops. A second major theme had to do with allowing programmers to create their own composite data types (or user-defined types) from the primitive data types. For example, Pascal was introduced in 1970 and allowed users to define their own data types called records. Records are collections of variables that may have different data types but that can be grouped together under a single name and treated as a single unit. In Pascal, the individual variables in the record are referred to as fields. Similarly, C was introduced in 1972 at Bell Labs and allowed users to define their own data types called structures, with the component variables being called members. The classic example of the kind of data that can be stored effectively in a record is employee data. Here, the individual fields might include things like the employee's name, title, date of birth, salary, address, phone number and so on. For each of these fields, there is a particular primitive type that is appropriate for storing the data, such as a string for the name and a floating-point number for the salary. Just as with the primitive types, it is then possible to work with arrays of records (or structures). Such an array could store the data for all employees that work for a particular company. With records and structures a new, simple and yet powerful syntax was introduced which is still in use today. In this syntax, a dot is placed between the record name and the field name, as in "employee.name", "circle.radius" or "csdms.colorado.edu".

Object-oriented programming can be viewed as a natural and yet very powerful extension of the concept of composite data types. The key idea is to bundle not only related items of data but also one or more functions that act on this data, into a single, user-defined type. Just as the terms structure (in C) and record (in Pascal) refer to user-defined "bundles of data" (and the keywords struct and record are used in code when defining these new data types), the term class is used in most (if not all) object-oriented languages to refer to this new type of bundle which can potentially contain both data and associated functions. (Note that a class doesn't need to have any functions, and in a sense even the primitive data types can be viewed as trivial classes.) You define a class just as you would define a new data type, and then individual "variables" that have this data type are referred to as objects or members of the class. In C++, the terms member data and member function are used to refer to data items and functions that are associated with a class. The analogous terms in Java are fields (as in Pascal) and methods. For example, you might define a class called SongWriter, which has fields called "number_of_songs" and "biggest_hit" and methods called "sing" and "write". You could then instantiate two objects of the SongWriter type with names like DonnieIris and Shakira. In C++, a structure (struct) is the same as a class, except that the members of a structure are all public by default while the members of a class are all private by default.

Footnote: Technically, the syntax of an object-oriented programming language only makes it look as though the functions are bundled with the data. One can think of the member functions as only having one instance, which is part of the class definition rather than part of each individual object. Regardless of how this illusion is actually achieved, however, this bundling allows the data and a set of associated functions to be accessed under a single name, and with a similar syntax.

The first object-oriented language is considered to be Simula 67, which was standardized in 1967. It introduced objects, classes, subclasses, virtual methods, coroutines and also featured garbage collection. It wasn't until twelve years later, in 1979, that the language "C with Classes" was developed. This new language, which was renamed to C++ in 1983, was directly influenced by Simula 67 and played a key role in helping object-oriented programming to become as widespread as it is today. Python, Java and C# are some of the most popular object-oriented languages today, but were introduced much more recently (1991, 1995 and 2001, respectively). Languages that are not object-oriented, such as C and Fortran, are often referred to as procedural languages. Keep in mind, however, that the member functions of a class are still just subroutines or procedures.

Finish this section with short discussion about: Typecasting, return type, interfaces, need for glue code

UML Class Diagrams

UML stands for Unified Modeling Language and is essentially a set of rules for creating "box diagrams" or "blueprints" that show class attributes and relationships between classes (such as inheritance) in an object-oriented application. You can learn more about UML at: [1]. An example of a fairly complicated UML diagram is given in Appendix B: UML Class Diagram for OpenMI.

Namespaces and Scope

Package naming convention, reverse URL, to ensure a unique namespace: e.g.

  • gov.cca.Services,
  • org.oms.model.components,
  • Oatc.OpenMI.Sdk.Backbone,
  • edu.colorado.csdms.utils (or org.csdms.utils ??),

Information Hiding and Encapsulation

Access levels: private/public/friendly/protected

  • public: a field, method, or class that is accessible to every class.
  • protected: a field, method, or class that is accessible to the class itself, subclasses, and all classes in the same package or directory.
  • friendly: a field, method, or class that is accessible to the class itself and to all classes in the same package or directory. Note that friendly is not a separate keyword. A field or method is declared friendly by virtue of the absence of any other access modifiers.
  • private: a field or method that is accessible only to the class in which it is defined. Note that a class cannot be declared private as a whole.

Given the fundamental idea of classes and their associated data and functions, there are a number of other concepts that follow more or less naturally. These are important but are not necessarily critical to understanding what component-based programming is all about. The interested reader may wish to look up the following terms in Wikipedia: inheritance, polymorphism, overloading, accessors (setters and getters), constructors and destructors.

Inheritance

Polymorphism and Overloading