Alex Sellink and Chris Verhoef
University of Amsterdam,
Programming Research Group
Kruislaan 403, 1098 SJ Amsterdam, The Netherlands
alex@wins.uva.nl, x@wins.uva.nl
We discuss an approach that explores the use of scaffolding of source code to facilitate its renovation. We show that scaffolding is a useful paradigm for software renovation. We designed syntax and semantics for scaffolding, that enables all relevant applications of scaffolding. The automatic generation of extensions to a normal grammar, so that the resulting extension grammar can parse code with scaffolding, is discussed. We used the scaffolding paradigm itself to implement the generation process, thereby showing that our approach towards scaffolding is also useful in software development. Finally, we discuss real-world applications of scaffolding for software renovation, in both our own work and work from people in the reengineering IT industry.
Categories and Subject Description:
D.2.6 [Software Engineering]: Programming Environments--Interactive;
D.2.7 [Software Engineering]: Distribution and Maintenance--Restructuring;
D.3.4. [Processors]: Parsing.
Additional Key Words and Phrases: Reengineering, System renovation, Software renovation factories, Language description development, Grammar reengineering, Scaffolding, Computer aided language engineering (CALE).
A very common concept in the development of systems is the use of code, data, or entire programs that are built for debugging or tracing purposes, but never intended to be in the final product. This technology is also known as scaffolding . Knuth [24, p. 189] mentions that the ``most effective debugging techniques seem to be those which are designed and built into the program itself--many of today's best programmers will devote nearly half of there programs to facilitating the debugging process on the other half; the first half, which usually consists of fairly straightforward routines, will eventually be thrown away, but the net result is a surprising gain in productivity.'' While Knuth focuses on programming in the small, these figures are also mentioned by people discussing programming in the large. For instance, Brooks [12, p. 148] wrote in 1975 that it is ``not unreasonable for there to be half as much code in scaffolding as there is in the [final] product.'' It will be not a surprise that in books on good coding practices, scaffolding is included. See for instance, [3] and [28]. Also in the cost estimation area the phenomenon of code that is not in the final product has been noted [20, p. 18].
The scaffolding around a building provides access to components that workers couldn't otherwise reach. Similarly, software scaffolding gives programmers access to parts that they can otherwise not reach [3]. Often such scaffolding code shows intermediate results in complex calculations and/or manipulations of data. So, scaffolding code is included to understand software. Since scaffolding is usually removed when a software product is put into production, it seems natural to us to bring back such knowledge in the source code while renovating it. Such knowledge can be control-flow or data-flow information, or more specific information. We observed that for automated renovation of software, the source code manipulations are so complex, that intermediate results of calculations are mandatory in order to keep track of the modification process. We have found it extremely useful to include such scaffolding in the source text so that easy and immediate inspection and/or modification of the results is viable. This inspection is not only meant for humans but also for automated transformations to the software. For example, in a first pass, a program is analyzed and the results of the analysis are put in a scaffolding so that we can see that the analysis provides the right information. Then in a second pass, the program can be transformed using the results of the analysis. Recall that scaffolding for development can be parsed by the compiler, for it is built in the programs. In our case scaffolding can also be parsed. However, our scaffolding normally is not part of the language that is used in the source programs. This means that we have to incorporate the scaffolding inside the existing language so that we can parse both the original code plus the scaffolding.
We have developed a systematic approach to the use of scaffolding in software renovation. We developed tools that turn a given context-free grammar into a grammar that incorporates scaffolding. Those tools are part of our Factory Generator. This is a piece of software that generates from a context-free grammar, an architecture that we call a generic software renovation factory for that language. A generic software renovation factory enables rapid development of analysis and transformation components that can use scaffolding: to add scaffolding to the source code, to analyze this scaffolding, and to use the analysis results for making the necessary transformations.
We realize that complex transformations that we carry out using scaffolding can also be implemented using many other transformation systems. In fact, everything we do can also be implemented in raw machine language. In our opinion, the relevant issue is not whether it is possible to implement renovation problems using technologies like REFINE [34], COSMOS [15], the ASF+SDF Meta-Environment [23], RainCode [33], Elegant [2,32], TXL [14], or still other systems. The relevant issue is whether it can be done in a convenient way. Our approach emphasizes how transformations can be implemented as easy as possible. Let us illustrate our point with an example. In the paper [26] a complex migration from a proprietary language to procedural C++ is carried out. One of the issues that is mentioned in this paper is that ``it has become apparent from this project, that the transformation logic should be as modular and localized as possible (i.e. different transformation programs for each language construct)''. We contacted Kostas Kontogiannis [25] about this project, and he told us that he will never ever implement such tools again. The reason was that implementing such transformations was a too complex task, and it was very difficult to keep track of the activities. We believe that scaffolding contributes to making transformations more easy, just like scaffolding also makes development more easy.
We start with a COBOL example containing a typical scaffolding. The syntax for scaffolding is to use the keyword SCAFFOLD with text brackets, and then use a self-defined type (e.g. WINDOW), and again text brackets containing DATA (in this case a variable YY of type Data-name).
COMPUTE XX = ZZ - 80.
SCAFFOLD [ WINDOW [ YY : Data-name ] ]
COMPUTE AGE = YY - 80.
COMPUTE XX = ZZ - 80.
IF YY > 50
COMPUTE AGE = YY - 80
ELSE
COMPUTE AGE = YY + 100 - 80.
We stress that this kind of scaffolding has an important use. Best-in-class Y2K analysis engines, like COSMOS [15,17,21], keep track of line and column information. This lexically oriented information can be used to make simple safe changes to the analyzed software like the addition of scaffolding. Then using a context-free change engine, the actual changes can be safely made.
GO *+24 GO LAB
S1 S1
S2 ---> S2
S3 LAB S3
S4 S4
In order to perform such a transformation, four questions need to be answered.
Scaffolding can be used to keep the mental steps in this process as small as necessary. Let us show how we can solve this problem with the aid of scaffolding. First we scaffold the code by providing the byte size of all the instructions. This answers the first question, and puts the result in the code, ready for human inspection.
GO *+24
S1 SCAFFOLD [ SIZE [ 8 ] ]
S2 SCAFFOLD [ SIZE [ 16 ] ]
S3 SCAFFOLD [ SIZE [ 8 ] ]
S4 SCAFFOLD [ SIZE [ 4 ] ]
The problem of providing the cumulative distance between statements is now reduced to simple addition of numbers. So we use the scaffolding code itself, and we produce the results of this calculation a new type of scaffolding (for the sake of clarity we omit the SIZE scaffolding):
GO *+24
S1 SCAFFOLD [ CUM_DIST [ 0 ] ]
S2 SCAFFOLD [ CUM_DIST [ 8 ] ]
S3 SCAFFOLD [ CUM_DIST [ 24 ] ]
S4 SCAFFOLD [ CUM_DIST [ 32 ] ]
At which statement the byte size equals 24 is now visible in the code for inspection. Moreover, the scaffolding can simply be matched by a transformation, so we can now easily add the label and replace the byte addressing:
GO LAB
S1 SCAFFOLD [ CUM_DIST [ 0 ] ]
S2 SCAFFOLD [ CUM_DIST [ 8 ] ]
LAB S3 SCAFFOLD [ CUM_DIST [ 24 ] ]
S4 SCAFFOLD [ CUM_DIST [ 32 ] ]
Finally we can remove the scaffolding:
GO LAB
S1
S2
LAB S3
S4
This example illustrates that problems can be decomposed into very small and understandable steps. Moreover, intermediate results of the calculations can be inspected by humans and used by transformations to make the necessary changes.
Before we continue, we mention that all modules are easily regenerated if the input grammar changes. Note that grammar changes are common in the reengineering world. For each new project often some changes to the grammar are necessary. We have elaborated on such issues in another paper and the interested reader is encouraged to read more in [9] on this topic. For now it is important to realize that everything has been generated from the input grammar so that changes to the input do not imply that the entire architecture has to be redone by hand.
In Figure 1 we depicted important parts of an extended grammar within the context of a generic software renovation factory architecture. First of all we have a library of modules that are language independent. This does not necessarily need to be a fixed library. To give the reader an idea, in such a library, we define a number of often used data types such as Booleans and Integers that are useful in code analyses and that can fruitfully be used as conditions in code transformations. When necessary, new data types can be added effortlessly. Furthermore, the simple scaffolding language itself is defined in the library. Its syntax and semantics are discussed in detail in Section 4. Since we wish to analyze and transform scaffolding itself, we generated for the scaffolding language an analysis and transformation framework as discussed in [10]. When the scaffolding language needs to be adapted to serve some specific need that we have not anticipated, we can easily regenerate its analysis and transformation framework.
In the second box of Figure 1, we generate from the input grammar a number of basic language specific modules. We mention a context-free grammar for dealing with lists and separated lists. Those are very useful when reengineering code (as has been shown in several other papers, for instance, [37]). Some other issues are of a more technical nature, such as a module containing all the sort declarations and such.
In the third box of Figure 1 we actually extend the input grammar with scaffolding syntax. Moreover, we isolate the list and separated list constructs. Since in this paper we focus on scaffolding we will elaborate a little bit more on that aspect and postpone discussions like list isolation to another paper. We proceed to present an input grammar fragment of COBOL. The grammar rule is in SDF [18] (Syntax Definition Formalism). The MOVE statement as constructed in [7] looks as follows.
"MOVE" Corr A-exp "TO" Data-name-p -> Statement
The sort name Corr stands for a possible occurrence of the keyword CORRESPONDING, the sort name A-exp is an arithmetical expression, and the sort name Data-name-p is a list of one or more data names. Of course, the sort name Statement represents a COBOL statement. It is our intention to be able to parse not only the COBOL code (possibly containing CICS/SQL/etc), but also scaffolding code as we have seen in the previous section. A sample of code that we might want to analyze or transform (and thus parse) is:
SCAFFOLD [ WINDOW [ TMP : Data-name ] ]
MOVE 19 TO TMP.
This code might have been produced by another tool, for instance a Y2K analysis engine. Or the code has been produced by hand by a Y2K analyist, working in a Y2K factory, such as COSMOS 2000 [15]. When we wish to further analyze and/or transform this code, we need to be able to parse the above code. This means that we have to adapt the above grammar rule so that the intermediate scaffolding code is taken into account. We inject in a structured way in the input grammar an extra or fresh sort that will take care of this. We will now show the changed grammar so that we can parse the scaffolding in the above fragment.
X-s "MOVE" Corr A-exp X-s "TO" Data-name-p -> Statement
What happened here is that before each terminal we have added a
fresh sort name X-s. This stands for zero or more occurrences of
sort X. This sort X is defined in the library. The
X stands for extension, or extended. Scaffolding is of sort X so
now we can parse the fragment. The reader might have expected another
name for this sort, like Scaffolding-s or something similar.
We put in an extra layer for the sort introduction for a good reason.
Sort introduction does not only serve scaffolding but also other purposes.
We mention that sort introduction also enables parsing code including
its comments. For, if we wish to transform code containing comments,
it is not a good idea to remove those comments. Consequently, comments
are also of sort X, so that we can deal with comments as well.
We refer the reader to [8] where transformations that include
comments are being discussed in detail. Since scaffolding as well as
comments can occur at virtually any location in code, their introduction in
a grammar is obtained in the same way as with scaffolding. Therefore,
we have chosen to not use Scaffolding-s as sort name. Other types
of extensions to grammars that one may wish to distinguish from comments
are compiler directives, such as the compiler directives that are used in
CHILL [19] with their typical diamond notation (<>
).
Since they can occur also at virtually every location in the code,
they are comparable to comments, but we must be able to separate them
from comments.
The above introduction with X-s might give rise to the idea that the introduction transformation is a trivial one. This is not the case. First of all we cannot just "prefix" every sort name with X-s, for this would give a truly ambiguous grammar. Therefore, we only prefix the terminals. We make the same modifications to all other context-free rules. So also for the production rule that defines Corr. With this procedure we cover almost everything. We miss the sorts that are lexically defined and context-free used. They are in fact, terminals in disguise. They do obviously not have a context-free definition and will be missed unless we take precautions. Those sorts need sort introduction as well. For instance a Data-name-p is a COBOL identifier. Thereto we generate a module that takes care of the prefixing of lexically defined sorts that are used context-free. For example, for Data-name-p we rename the original name in the lexical syntax by Data-name-lex-p and we define a context-free rule in a specially created module with the following form:
X-s Data-name-lex-p -> Data-name-p
Once we have interspersed the input grammar with the extra sort X-s, we call this an extended grammar, or an X grammar. In Section 5 we will come back on how we generate X grammars.
We process with the fourth box of Figure 1. For a start, as was seen in the example COBOL fragment, the scaffolding itself makes use of constructs of the COBOL language. In this case we mean the COBOL variable TMP. We have experienced that it is useful to have access to the entire language inside the scaffolding. Since we still wish to have the scaffolding language as fixed as possible, we put in this language an extra layer to deal with this phenomenon: all the code that can appear as information in scaffolding is known as DATA. We automatically generate a module that will make any sort of the input grammar also to be of sort DATA. For instance this means that for the above COBOL grammar rule we generate:
Corr ":Corr" -> DATA
A-exp ":A-exp" -> DATA
Data-name-p ":Data-name-p" -> DATA
Statement ":Statement" -> DATA
The :Data-name-p part is present for typing reasons: then we can see that TMP is of type Data-name-p. In the fourth box we also generate a native pattern language. This is a language that enables us to write patterns that look as much as possible like the original code. We kindly refer the reader to an extensive treatment of native pattern languages in paper [37]. Furthermore, we generate some auxiliary technical functionality that makes the construction of tools more easy. Finally, we generate for the X grammar a generic analysis and transformation framework as discussed in great detail in [10]. The analysis and transformation framework for scaffolding is directly imported by the language dependent analysis and transformation framework.
In the last box of Figure 1, we depicted that we can build tools. When we have this complete architecture, we are in a position to construct renovation tools, like analysis and transformation components. Moreover we can use scaffolding in them. Since the focus in this paper is not on tools, but on scaffolding and its use we will not discuss tools in detail. In many other papers, we focus more on tools, and the reader is kindly invited to consult the various papers of the authors.
imports Data
exports
sorts COMMENT SCAFFOLD X X-s
context-free syntax
SCAFFOLD -> X
COMMENT -> X
X* -> X-s
"SCAFFOLD" "[" DATA-s "]" -> SCAFFOLD
variables
"X" [0-9]* -> X
"X*" [0-9]* -> X *
"X+" [0-9]* -> X +
hiddens
variables
"d*" [0-9]+ -> DATA *
The only structure in this module is that the extension sort X is either of sort SCAFFOLD or of sort COMMENT. We have not implemented what COMMENT is, for this is language dependent and defined in a syntax module of language L if we want to renovate code written in L . Here we know that whatever the syntax of the comments is, it is also of sort X. The form of the scaffolding is partly defined. We chose to use prefix notation using the keyword SCAFFOLD and using text-brackets. The contents of the scaffolding is however language dependent. Inside a scaffolding everything is a list of zero or more objects of type DATA. We declare variables of sort X, and their list variants: X*, X+ zero resp. one or more occurrences of X. Finally we have hidden variables that are used locally to express the semantics. We give the semantics of the above syntax in an ASF module [4]:
equations
[1] SCAFFOLD [ d*1 ] SCAFFOLD [ d*2 ] =
SCAFFOLD [ d*1 d*2 ]
[2] X*1 SCAFFOLD [ ] X*2 = X*1 X*2
Equation [1] expresses that two consecutive occurrences of SCAFFOLD reduce to one. In equation [2] we express that an empty SCAFFOLD can be omitted. Note that X*1, X*2 are lists of arbitrary SCAFFOLDs or COMMENTs. The d* variables represent arbitrary lists of type DATA. We proceed to present the DATA syntax and semantics. We simplified the definition slightly for the sake of clarity.
imports LayoutChars
exports
sorts DATA DATA-s SCAFFOLD-TYPE
lexical syntax
[A-Z_]+ -> SCAFFOLD-TYPE
context-free syntax
SCAFFOLD-TYPE "[" DATA-s "]" -> DATA
DATA* -> DATA-s
hiddens
variables
"d*" [0-9]* -> DATA *
"d" [0-9]* -> DATA
"st" -> SCAFFOLD-TYPE
We import some LayoutChars, which we will not discuss since it just describes layout. As can be seen we define a SCAFFOLD-TYPE lexically. This means that we can define our own types of scaffolding. Since they are necessary for many and diverse tasks, it is not a good idea to invent a fixed number of scaffold types. We have chosen to use upper-case and underscores for SCAFFOLD-TYPEs; this choice is completely arbitrary and can be changed at wish. Subsequently, we define the language independent parts of the sort DATA. DATA can either be typed using a SCAFFOLD-TYPE and the text brackets, or untyped. We noted earlier that in order to be able to reason about language dependent issues like source code fragments in the scaffolding we generated modules so that all the sort names are of type DATA. The form of this syntax has been shown earlier. Finally, we define a few hidden variables in order to express the semantics of the above syntax. The semantics is very simple and discussed below.
equations
[1] d*1 d d*2 d d*3 = d*1 d d*2 d*3
[2] d*1 st [ ] d*2 = d*1 d*2
[3] d*1 st [ d*4 ] d*2 st [ d*5 ] d*3 =
d*1 st [ d*4 d*5 ] d*2 d*3
Equation [1] expresses that double occurring data is removed. Equation [2] implies that scaffold-types that are empty can be removed. Finally, equation [3] collects data of the same type.
In this paper, we add one more result: we generate extended grammars from context-free grammars. In this section we elaborate in more detail on an example of the use of scaffolding: our Factory Generator. The Factory Generator generates most of the abovementioned issues including extended grammars from context-free grammars. In Figure 2 we depicted the import graph of the modules that form the Factory Generator. Note that the dashed rectangles are an exploded view of the boxes of Figure 1. This might be confusing, for, in Figure 1 we were discussing an architecture containing extended grammars themselves. Indeed this is the case, and the reason that this architecture comes back in our Factory Generator is that it uses scaffolding as well (as already announced in Section 2). So our Factory Generator contains scaffolding and generates extension grammars to handle scaffolding. Seen in this light it is not a surprise that the Factory Generator is bootstrapped. For, a grammar is described in a formalism that has itself a grammar. In our case the grammars are described in SDF [18]. We generate not only syntax but also semantics for the scaffolding language (as we have discussed briefly). The semantics is expressed in ASF [4]. So the input of the Factory Generator is an SDF specification of a language L , and its output is an ASF+SDF specification. We call this output an L -factory or a software renovation factory for L , or just a (software renovation) factory if no confusion about the language can arise.
We proceed to discuss Figure 2. In the library, we have reused standard library ASF modules for Booleans and Integers. We implemented the X grammar and the Data grammar that have been discussed in Section 4. The other files in the library on top of the Data and X have been generated with an earlier version of the Factory Generator. We could call that part of the library an X-factory. The next three dashed rectangles are all generated. Of course, when we started to implement this, we first had to make one extended grammar by hand. This is the grammar in the third dashed box: we reused the ASF+SDF grammar, called Asdf, from the ASF+SDF Meta-Environment and we carried out the transformation by hand (this resulted in the modules X-TERM, X-Asf, X-Sdf, X-Asdf. In those modules we introduced all the extensions at the right locations and we isolated the list constructs. The process was iterative: each iteration we implemented the minimal part by hand so that we could generate a full new version of the Factory Generator. After a few iterations, we reached a fixed point, meaning that when we fed the Asdf specification to the Factory Generator, the generated modules were exactly the ones in the three boxes on top of the library. We implemented on top of the generated architecture about 100 small tools with the functionality that they generate for any grammar for a language L written in SDF, an L -factory.
It is out of scope for this paper to discuss all the tools that we implemented. We restrict ourselves to discussing the tools that introduce the extended grammar. We take a very simple input grammar to clarify the process of introducing the extensions. Consider the simple SDF grammar below.
lexical syntax
[A-Z] -> Character
[0-9]+ -> Number
Character+ -> Word
context-free syntax
Word+ "." -> Sentence
"paragraph" Number "." Sentence+ -> Paragraph
The above grammar describes characters, numbers, words, sentences, and how to combine this into a paragraph. When we enter the process, the grammar has already been changed to isolate list constructions. The reason why we do this is technical: it enables more easy development of renovation components. We show only the changed syntax part, and not a number of extra modules that have been created during the process:
lexical syntax
[A-Z] -> Character
[0-9]+ -> Number
Character+ -> Word
context-free syntax
Word-p "." -> Sentence
"paragraph" Number "." Sentence-p -> Paragraph
On this grammar fragment we will apply six tools that eventually will introduce the extensions. We will focus on the use of scaffolding while doing this.
SCAFFOLD [ DEF_LEX_SORTS [
Character :Id-lex
Number :Id-lex
Word :Id-lex ]]
As can be seen in the input grammar, indeed all the sorts that have been defined lexically, are extracted in this phase, and stored in a typed scaffolding.
SCAFFOLD [ USED_CF_SORTS [
Word :Id-lex
Sentence :Id-lex
Word-s :Id-lex
Sentence-s :Id-lex
Word-p :Id-lex
Number :Id-lex
Sentence-p :Id-lex ]]
Note that a two extra sorts with a -s postfix are defined. They represent zero or more lists. They are added in an extra module so that we can also construct tools matching zero or more statements. Also now, nothing has changed to the grammar fragment itself.
SCAFFOLD [ LEX_CF_SORTS [
Word :Id-lex
Number :Id-lex ]]
context-free syntax
X-s Word-lex -> Word
X-s Number-lex -> Number
In Section 4 we defined that X-s is zero or more occurrences of X. This implies that all the original programs can still be parsed. Also that all comment of these programs is now recognized, and moreover, scaffolding is allowed and can be parsed. We are not ready yet.
lexical syntax
[A-Z] -> Character
[0-9]+ -> Number-lex
Character+ -> Word-lex
And indeed the sort Character that is not used context-free is not changed. Now the final step.
context-free syntax
Word-p X-s "." -> Sentence
X-s "paragraph" Number X-s "." Sentence-p -> Paragraph
As an example of what we call an infinitesimal small transformation, we provide the implementation of the last transformation.
imports Asdf-Toolbasis
exports
context-free syntax
"Add-X" -> TRANSFORM
equations
[1] Add-X_CfElem(Lit1) = X-s Lit1
The SDF part imports the entire Factory Generator by way of
Asdf-Toolbasis. Since this generator is itself a factory, we have access
to generic transformation components as discussed in [10].
The only thing we need to do to build a tool that can use all the
generic functionality is to declare it being of sort TRANSFORM.
After that, we have for each sort of the Asdf-factory a tool
available. We use Add-X_CfElem
, which is the tool that gives
access to arbitrary context-free elements. The variable Lit1
matches an arbitrary literal (a terminal). When such a literal is matched
it is replaced by X-s followed by the same literal, as
expressed in equation [1].
This section did not only illustrate the generation process for extended grammars, but it also showed how scaffolding can be used during development of transformation and analysis tools. Such tools are typical for the implementation of software reengineering tasks.
We are also implementing a software renovation factory for Ericsson Software Technology AB. Using CALE (Computer Aided Language Engineering) technology [39] we first extracted and migrated about 20 grammars of their proprietary language for programming real-time switching systems (called SSL, short for Switching System Language). The resulting overall grammar contains about 3000 production rules. This specification has been input for the Factory Generator. We have generated a first version of an SSL-factory.
It takes about five hours to generate an entire COBOL-factory. It takes about 50 hours to generate an SSL-factory. Both times are measured in an interpreted environment. We also note that our Factory Generator not only generates the scaffolding, but an entire software renovation factory architecture. We can speed up the process considerably when we compile our ASF specifications to C. We refer to [5], for a recent paper on the compilation and memory management of this compiler, and to [31] for on line benchmarks. We have not yet compiled our specifications, since we do not generate new factories quite often. In large reengineering companies it is not uncommon to have a weekly release of software renovation factories. For such situations it is of course a better idea to make use of a compiled Factory Generator. Since we still make small changes to the generation process while becoming more and more experienced with scaffolding as we implemented it, we consider it more convenient to use an interpreted environment, to enable rapid development.
First we wish to show that it is not easy to move a paragraph to another place without changing the semantics of the program. Suppose our goal is to move PAR-2 to another location. The first-order approximation of this transformation is:
PAR-1. Sentence1 PAR-1. Sentence1
PAR-2. Sentence2 --> PERFORM PAR-2
PAR-3. Sentence3 PAR-3. Sentence3
In this example we assume that PAR-2 is now on another location where it can be called, like has been done in the modified code. Suppose that PAR-1 is called somewhere. Then the above transformation is not correct. For, in the original code when PAR-1 is performed, Sentence1 is executed and when the paragraph is finished, the control-flow goes back to the statement below the PERFORM PAR-1. In the transformed code both Sentence1 and Sentence2 are executed. So a precondition for moving paragraphs is that the previous paragraph is not performed somewhere in the program. We will proceed with the steps that are necessary to safely move paragraphs to other locations.
SCAFFOLD [ NUMBER_OF_GO [ n1 :Integer ] ]
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ] ]
PAR-2. Sentence2
SCAFFOLD [ NUMBER_OF_GO [ n2 :Integer ] ]
PAR-3. Sentence3
Furthermore, we need to know the label of the next paragraph. Therefore, a simple analysis tool takes care of that scaffold information (we omit the other scaffolding for the sake of clarity):
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ]
NEXT_PAR [ PAR-3 :Label ] ]
PAR-2. Sentence2
PAR-3. Sentence3
We also need to know whether the previous paragraph is being performed. So there is an analysis tool that adds this, too, to the scaffolding:
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ]
NEXT_PAR [ PAR-3 :Label ]
PERFORMED [ FALSE :Boolean ] ]
PAR-2. Sentence2
PAR-3. Sentence3
We also want to know the paragraphs that jump to PAR-2. We want to know this, so that we can change those GO TO statements to PERFORMs plus another GO TO statement to the next paragraph.
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ]
NEXT_PAR [ PAR-3 :Label ]
PERFORMED [ FALSE :Boolean ]
GO_FROM [ PAR-Q :Label ] ]
PAR-2. Sentence2
PAR-3. Sentence3
In the above example PAR-Q is a paragraph on an entirely different location in the source program. The scaffolding expresses that in that paragraph a GO TO PAR-2 is made.
TB_MOVED
shorthand for to be moved.
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ]
NEXT_PAR [ PAR-3 :Label ]
PERFORMED [ FALSE :Boolean ]
GO_FROM [ PAR-Q :Label ]
TB_MOVED [ PAR-2 :Label ] ]
PAR-2. Sentence2
PAR-3. Sentence3
Then in another step we collect the TB_MOVED
at the top of the
program so that in a next step a transformation can use this information
to move all the paragraphs that are contained in the scaffolding
TB_MOVED
.
PAR-Q. Statement1* PAR-Q. Statement1*
IF Condition1 --> IF Condition1
GO TO PAR-2. PERFORM PAR-2
GO TO PAR-3.
Note that we turn a GO TO in another GO TO. Therefore, we call it GO TO shifting. The idea of the algorithm is that we turn difficult GO TOs into simple ones so that eventually they all will be eliminated. See [36] for details.
PAR-1. Sentence1
SCAFFOLD [ NUMBER_OF_GO [ 0 :Integer ]
NEXT_PAR [ PAR-3 :Label ]
PERFORMED [ FALSE :Boolean ]
GO_FROM [ PAR-1 :Label ]
MOVE_PAR [ PAR-2. Sentence2 :Paragraph ] ]
PAR-3. Sentence3
We do this, since now we can eliminate GO TOs in the newly created code (for details we refer to [36]). When we have new jump instructions removed, we can reiterate the above steps and scaffold more paragraphs, and so on.
SECTION A.
PAR-1. Sentence1
PERFORM PAR-2.
PAR-3. Sentence3
BAR SECTION.
BAR-PARAGRAPH.
STOP RUN.
A-SUBROUTINES SECTION.
PAR-2. Sentence2
As can be seen, the above code is functionally equivalent to the original code. The difference is that using this algorithm we can slowly separate the business logic from the control logic.
A-SUBROUTINES SECTION. A-SUBROUTINES SECTION.
PAR-2. Sentence2 --> PAR-2. Sentence2
A-SUBROUTINES SECTION. PAR-4. Sentence4
PAR-4. Sentence4
As can be seen, all the individual steps are quite simple, but the resulting transformation is complex. Moreover, we make extensive use of scaffolding information. These transformations have all the same pattern: first we analyze the code, then we make calculations on the analysis. This leads to a conclusion, and we have the right information to carry out a number of transformations.
01 COPY X. 03 FOO PIC 99
there is a good chance that this layout is not recovered by the unparser. A common way of customers to checking the changes to the software is to run simple lexical comparison tools like diff on the original code and the transformed code. In order to avoid differences caused by the unparser, it is possible to scaffold the above code fragment with precise column information, so that during postprocessing of the code the layout idiosyncrasies will be put back in the text.