Home | Industry Information | Business News | Browse by Publication | I | IBM Systems Journal

Contributions to the GNU Compiler Collection.(Free Software Foundation goals for the operating system)

Publication: IBM Systems Journal
Publication Date: 01-JUL-05
Format: Online
Delivery: Immediate Online Access

Article Excerpt
The GNU Compiler Collection (GCC) is all optimizing compiler for the GNU project that is capable of generating code for a variety of platforms and that supports a number of languages, computer architectures, and operating systems. (1-3) It is one of the most visible aspects of the Free (FSF)...

View more below

Read this article now - Try Goliath Business News - FREE!   
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access

Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More   Terms and Conditions

Purchase this article for $4.95

Already a subscriber? Log in to view full article

...Software Foundation GNU project. The goal of the GNU Project is to create a UNIX **-style operating system composed of free software.

The GCC compiler can be configured to generate code for more than 30 different computer architectures. Many architecture configurations are designed to support multiple operating systems, including the GNU system and GNU/Linux **. The primary set of languages available with the compiler are C, C++, Fortran, Java **, Objective C, and Ada. Runtime libraries for the languages are also included in the compiler suite. Support for additional languages is currently in various stages of development.

The GNU Project includes an assembler, linker, and other object file tools, commonly called "binutils," the GDB debugger, and glibc (GNU C library). Together, these components provide a software development environment or "tool chain."

GCC structure

GCC was one of the first components of the GNU Project. Richard Stallman initially tried to extend the Pastel compiler, developed by the Lawrence Livermore National Laboratory, but needed to rewrite the compiler from scratch due to technical limitations of the Pastel compiler.

The compiler was initially targeted at the common microprocessors of the late 1980s, such as the Motorola 68000, and was ported to other CISC (complex instruction set computer) processors, such as the Intel 80386. GCC initially parsed source code one statement at a time, focusing on local optimizations. One of the important optimizing phases from the earliest versions of GCC is a phase called combine that operates as a generalized peephole optimizer, reducing multiple instructions into single, more powerful instructions (see "GNU back end").

Recent improvements have expanded the compiler's view of the program to focus on one function at a time, the translation unit, or the whole program. These changes allow more aggressive optimizations, including inter-procedural analysis. Other recent enhancements include the addition of a Static Single Assignment (SSA) design with basic SSA-based scalar optimizations, high-level loop transformations, and vectorization.

The compiler phases for GCC 4.0 first parse the input program into an intermediate representation called GENERIC. GENERIC is expanded and lowered into an SSA form called CIMPLE. The compiler optimizes the SSA form and then removes the SSA names. The program statements are translated to a different intermediate language called register transfer language (RTL), which directly corresponds to the instruction set of the target processor (i.e., the "target instruction set"). RTL optimizations that require details about the target processor instructions are applied, such as instruction scheduling, software pipelining, and register allocation.

RTL is designed to correspond to valid target instructions. The RTL instruction codes themselves are independent of the target, but the subset of codes used for each target match the machine instructions of the target. Other than missing register numbers and memory offsets, RTL transformations are intended to convert a valid instruction stream into another valid instruction stream (i.e., sequence of instructions).

After all optimizations have been applied, the RTL instruction stream is output as a file in assembly language appropriate for the target system. GCC does not have an integrated assembler and does not generate an object file directly for any target. An external assembler, possibly the GNU Assembler, creates an object file, and an external linker, possibly the GNU Linker, binds the executable or shared object. The operating system may use the GNU C Library to provide an interface to system services.

The GCC compiler is written in the C language, and the source code is composed of files common to all targets and files with specific information about the target architecture, target system, and target file format--the latter referenced as the machine description. Some of the files in the machine description affect the way the common parts of the compiler behave (e.g., the size of data types, size of registers, register allocation order, etc.). Other files are used by programs within the GCC build process to create machine-generated files that interface with the common parts of CCC to describe the target instructions and output format.

GCC development

GCC evolved through the efforts of a worldwide group of developers, including members of industry, academia, and independent consultants. As with many other free-software and open-source projects, the hierarchy of developers strives to achieve a meritocracy. A core set of developers provides most of the technical leadership, and a steering committee provides the political leadership and interface to the FSF.

Developers collaborate in a decentralized fashion with informal collaboration, setting design goals and avoiding duplicated effort. The majority of communications and technical decisions occur in public forums such as mailing lists and chat rooms. The GCC source code is available in revision control systems on publicly accessible servers.

All GCC developers are required to have copyright assignments on file with the FSF. After that documentation is on file, changes offered by a developer for inclusion in GCC can be considered. Patches are mailed to public mailing lists and reviewed for coding style, design, and implementation correctness by senior developers with authority to approve patches for various components. Documentation for the GCC project explains the development plan and other criteria of the project. The coding style follows the GNU coding conventions and GCC extensions.

GCC includes an extensive and growing test suite to help maintain the quality of the compiler. All patches are supposed to be tested with the complete test suite, and authors are expected to certify that a proposed patch did not generate any new test suite failures.

To maintain the quality criteria for GCC, releases should create no test suite regressions on important target platforms. Because of the large number of GCC targets (architectures, operating systems, file formats, etc.), some regressions do occur. The lack of complete coverage testing and unit testing in the current design is one of the major limitations in the GCC testing procedures.

GCC legal issues

Free software, a concept originated by Richard Stallman to encompass the GNU Project, refers to the freedom of users and developers to use, modify, redistribute, and distribute modified versions of the software. (4) Free software commonly refers to software distributed under the terms of one of the GNU General Public Licenses (GPL). Open-source software refers to a broader set of possible licenses.

Although the GPL applies to the GCC and the GNU tool chain, building an application using the GCC does not affect the software license of the application itself. Proprietary applications can be built using GCC.

Use of GCC

Use of GCC has become pervasive throughout the software industry because of its flexibility. It is able to generate applications for many proprietary and open-source UNIX ** operating systems, as well as OpenVMS **, z/OS, * Microsoft Windows **, VxWorks **, and others.

GCC has been available for AIX * on the POWER * platform and MVS * on the S/390 * platform for over ten years. In addition to its use by IBM customers on AIX and in software enablement for embedded processors, GCC has also been used for many research projects and prototypes; for example, experimental work with the PowerPC* instruction set and the 64-bit XCOFF file format.

Customers frequently use GCC instead of proprietary compilers because of its portability. GCC itself provides language extensions, but the extensions are consistent across all systems; therefore, customers do not have to worry that they will use a compiler feature that locks them into a particular system. The GPL ensures that the customer always has access to the source code of the compiler and libraries to perform any development or maintenance. A customer's decision to use GCC often depends on a few primary factors, including performance, portability, and service.

Overview

In this paper, we describe several of our contributions to GCC. IBM has made additions to GCC which encompass all phases of the compiler--the front end, optimizations in tree and RTL intermediate representations, and the back end. The specific details of each contribution are outside the scope of this paper; the interested reader is referred to the actual code and documentation, which is freely available at http://gcc.gnu.org. This paper does not cover all contributions to GCC made by IBM developers, but rather describes some projects in an attempt to focus on our experience with GCC and its limitations and potential. In the following sections we describe a new front end, some optimizations, and a new back end.

PL8 FRONT END

This subsection describes the development of firmware for the PL8 and IBM zSeries * systems, and the technical issues arising from this effort.

PL8 and IBM zSeries firmware development

The term "firmware" refers to the software layer between hardware and the operating system. Firmware functionality includes I/O path management, I/O load balancing, recovery from hardware and firmware errors, and some system management functions, which, in other computer systems, are typically implemented in operating-system layers.

Firmware development requires low-level programming, as firmware has many interfaces to hardware registers and to assembler-written routines. The firmware implements low-level services that require accessing specific addresses and dealing with individual bits or words smaller than a byte. PL8, which basically is a subset of PL/I, supports these requirements by use of appropriate declarations, which is considered a strength of the language. (5) The language has been used for firmware development since the early 1980s with an old compiler that has not been maintained for years. However, there have been significant enhancements made to the zSeries architecture, including additions to its instruction set, improved pipeline structures, and an extension to 64 bits. Some firmware internal structures were strongly geared to 64-bit implementation, which the original PL8 compiler could not provide. The original compiler was also inherently tied to the library and build environment on VM/CMS as its only execution platform.

PL8-front-end technical issues

Given GCC's modular structure and the fact that GCC already had a back end generating S/390 code (see "The zSeries back end" and Reference 6), an obvious approach was to implement PL8 again as an additional GCC front end. The language was extended to support 64-bit data types, and its rules concerning memory layout were adapted. The GCC framework also suggested a few language modifications.

In contrast to most other GCC front ends, the PL8 front end is well-suited for two-pass compiling. This is because PL8 allows forward references to declarations. The two-pass approach also simplifies certain other translations. The first pass does lexical and syntactic analysis, which is implemented using the compiler-generating tools Flex and Bison, respectively. Its output is a front-end internal representation of the input program which is an attributed syntax tree. (7) Tree nodes are implemented as records with fields containing data or pointers to other tree nodes. Whenever possible, the GCC predefined tree nodes are used to represent PL8 constructs. For example, this is done for if, do while, and do until statements. More elaborate statements, such as select and PL8 counting loops, have no direct correspondence to any existing nodes; they are thus first translated into front-end specific nodes, as are most of PL8's declarations, namely the attributes based, offset, and redefines.

In pass two the compiler starts working on the data structures generated by pass one and does a few semantic checks. In this pass the compiler also does some optimizations. These include type compatibility checks to verify that variables are assignable. Implicit type conversions are inserted where the PL8 language definition allows the assignment of variables with different types. Range checks are generated for array accesses, and for all accesses to based variables through offsets. Pass two also carries out some optimizations, such as constant folding (8) and an elimination of range checks, which deletes a check if it can determine...

NOTE: All illustrations and photos have been removed from this article.



More articles from IBM Systems Journal
Supporting aspect-oriented software development with the Concern Manip..., July 01, 2005
Providing Linux 2.6 support for the zSeries platform., July 01, 2005
Integrating Web technologies in Eclipse.(use of web applications), July 01, 2005
The Eclipse 3.0 platform: adopting OSGi technology.(OSGi Alliance), July 01, 2005
Aspect-oriented programming with AspectJ.(new software programming for..., July 01, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.