The second task was to reduce the load placed on the C stack by the interpreter to enable the interpreter to run in environments with limited stack space. In the case of our customer this would be a router.
To this end three different techniques were used:
For such types we reduced their default size either through an existing preprocessor macro determining their size, or by introducing such a macro and proceeding as before.
An example of such a type is Tcl_DString. Its base size
is
on typical 32bit systems, i.e. 12
bytes. However it also allocates a static character buffer as
part of the structure to handle small strings without having
to allocate space on the heap. The macro controlling the size
of this buffer is
TCL_DSTRING_STATIC_SIZE. Its
default value is 200, driving the size of the whole structure
to 212 bytes.
Table 3 shows the macros introduced in this way.
Macro | Default size | Meaning, Location of usage |
TCL_DSTRING_STATIC_SIZE | 200 | Default static buffer in Tcl_DString |
TCL_FMT_STATIC_FLOATBUFFER_SZ | 320 | generic/tclCmdAH.c, line 1978 |
TCL_FMT_STATIC_VALIDATE_LIST | 16 | generic/tclScan.c, line 266 |
TCL_FOREACH_STATIC_ARGS | 9 | generic/tclCmdAH.c, line 1735 |
TCL_FOREACH_STATIC_LIST_SZ | 4 | generic/tclCmdAH.c, line 1739 |
TCL_FOREACH_STATIC_VARLIST_SZ | 5 | generic/tclCompCmds.c, line 621 |
TCL_RESULT_APPEND_STATIC_LIST_SZ | 16 | generic/tclResult.c, line 458 |
generic/tclStringObj.c, line 1199 | ||
TCL_MERGE_STATIC_LIST_SZ | 20 | generic/tclListObj.c, line 1009 |
generic/tclUtil.c, line 831 | ||
TCL_PROC_STATIC_CLOCALS | 20 | generic/tclExecute.c, line 6272 |
TCL_PROC_STATIC_ARGS | 20 | generic/tclProc.c, line 751 |
TCL_INVOKE_STATIC_ARGS | 20 | generic/tclBasic.c, line 1782 |
generic/tclBasic.c, line 1857 | ||
generic/tclBasic.c, line 2951 | ||
generic/tclParse.c, line 1332 | ||
TCL_EVAL_STATIC_VARCHARS | 30 | generic/tclParse.c, line 1166 |
TCL_STATS_COUNTERS | 10 | generic/tclHash.c, line 306 |
generic/tclLiteral.c, line 898 | ||
TCL_LSORT_STATIC_MERGE_BUCKETS | 30 | generic/tclCmdIL.c, line 2680 |
An example of the usage of these macros can be found in the file generic/tclCompCmds.c, see the function TclCompileIncrCmd:
int TclCompileIncrCmd(interp, parsePtr, envPtr) Tcl_Interp *interp; Tcl_Parse *parsePtr; CompileEnv *envPtr; { Tcl_Token *varTokenPtr, *incrTokenPtr; TEMP (Tcl_Parse) elemParse; ... STRING (160, buffer); NEWTEMP (Tcl_Parse,elemParse); NEWSTR (160, buffer); envPtr->maxStackDepth = 0; if ((parsePtr->numWords != 2) && (parsePtr->numWords != 3)) { ... RELTEMP(elemParse); return TCL_ERROR; } ... }
Figure 4 shows side by side the two implementations for allocating temporary variables on either the stack or the heap. If the macro TCL_STRUCT_ON_HEAP is defined during compilation temporary variables will be allocated on the heap.
This new executor was named NRE, for ``non-recursive engine''.
To identify the hot spots of stack usage, and from there the variables and data structures causing them, we semi-automatically instrumented a copy of the base sources with code monitoring the entrance to and returns from all functions in the interpreter and also recording the the location of the stack pointer for each call.
This instrumentation was only semi-automatic because the script used to insert the code was very simple and did not recognize all possible C syntax for return statements and function calls. About 25 % of the instrumented code had to be reworked manually to get the instrumentation code to work properly in them. There were two big offenders in this regard.
if (...) return X;
causing the instrumentation script to change the semantics of the instrumented code.
foo () { ... return bar (); }
These were rewritten to
foo () { ... X = bar (); return X; }
allowing the insertion of the instrumentation code between the called function and the return of the caller itself.
The inserted code manages a logical duplicate of the C call stack on the heap and uses that to record both the names of the active functions and the locations of the cpu stack pointer for them. By comparing the location of the stack pointer before the call of a function F and its location inside, it was possible for us to deduce the amount of C stack space consumed by F.
This method also records the amount of stack required for function linkage, i.e. stack management done by the C compiler. This is no true disadvantage as this amount is usually small, and more important, constant. Because of the latter this does not disturb the ranking of functions when sorting them by the amount of stack they require.
As the execution of the inserted code causes a noticeable slowdown of the interpreter additional processing beyond the comparison of the stack pointer is not done. Instead we write the information about the association between functions and required stack directly to a log file. After each run a number of external scripts is used to postprocess the logged data. The result is a sorted list where the functions consuming the most stack are shown at the top. Thus identifying the hot spots to look at.
Instead of trying to devise a Tcl application which exercises all parts of the interpreter we simply executed the testsuite that comes with the Tcl core to generate the stack traces.