Article Posted To comp.lang.tcl

Subject:      nested arrays redux (plus patch for 8.0a1)
From:         loverso@osf.org (John Robert LoVerso)
Date:         20 Jan 1997
Organization: Open Group Research Institute, Cambridge MA
Newsgroups:   comp.lang.tcl
Message-Id:   <5c059s$p6l@paperboy.osf.org>

Introduction

Back in September there was a discussion on "nested arrays". I posted some test cases showing what did and did not work. John Ousterhout, fearing that the existing partial ability wasn't clean, and thus a feature, removed the ability to create them, as of 8.0a1 (originally he planned to do so in 7.6, but at my request he held off to the major number change).

This note is a follow up to that previous discussion. In particular, I include the patches necessary to correctly support the previous "nested array" ability in 8.0a1. My humble request is that the Tcl team include this minor change in 8.0a2.

Background

As background and to avoid confusion, let me restate what previously existed in, up to and including, 7.6. In particular, there are two completely different mechanisms that both look like "nested arrays".
  1. global variables with array-like names ("array-like names")

    These exist because some commands (notably "array" and "global") do not parse variable names to extract the two parts of the array notation a(x). Hence, you can create a scalar variable whose name looks like an array:

    	% array set a(x) {a 1 b 2}
    	% set a(y) 1
    	1
    	% array names a
    	y
    	% array get a
    	y 1
    	% array names a(x)
    	a b
    
    a(x) is a global variable unrelated to array a - it just looks like it is an element of a. "global a" will bring array a into scope, but not a(x). You need a separate "global a(x)" to do that.

    You can still create the array element and not affect the global variable:

    	% set a(x) 4
    	4
    	% array get a
    	x 4 y 1
    
  2. array elements that are upvar'd and then assigned to as arrays. ("upvar'd arrays")

    These exist because once an array element is upvar'd into a scalar, it can then be accessed as an array itself. The only way to utilize these "hidden" arrays is to "bring them into scope" with an upvar command.

    	% upvar 0 b(x) y
    	% array set y {a 1 b 2}
    	% set y(a)
    	1
    
    There is an array associated with b(x), but you can only access it via the scalar variable "y". These exist alongside other array elements just fine:
    	% set b(y) 1
    	1
    	% array names b
    	x y
    
To avoid confusion, I will call type 1 as "array-like names" and type 2 as "upvar'd arrays".

Problems

Both of these mechanisms contain substantial problems:
  1. Variables with "array-like names" are only possible because some code doesn't correctly parse a variable name into array and element. That is, until the addition of "array get" and "array set", these weren't even possible.

    Code that does parse a variable name (basically, almost all of the Tcl core) will not be able to access these. That is, the following commands will not work on such an array, or an element there of:

    Going back to the example, if you say "trace var a(x) rw trace-func", then the trace is applied to the element x of array a, not to the "array-like named" array a(x). Hence, "array set a(x) {c 3}" is not traceable. If you think about it, you'll see why: you need to use two array indices to name the traced variable.

    As for the other commands, when given "a(x)", the command will always work on element x of array a, not the variable "a(x)" with the array-like name.

    	% array set a(y) {a 1 b 2}
    	% info exists a(y)
    	0
    	% upvar 0 a(y) yy
    	% info exists yy		;# looking for element y of array a
    	0
    
  2. "upvar'd arrays" are only possible because the code to create an array didn't check that it wasn't already an array element. Normally, you wouldn't think this possible, but the power of the "upvar" construct makes it so. These have been possible for a long time (at least since I became an active Tcl user about 3-4 years ago).

    These are several problems with them.

    Neither "info exists" nor "array exists" works with them:

    	% info exists b(x)
    	0
    	% array exists b(x)
    	0
    
    But, "array names" will find them
    	% array names b
    	x y
    
    One way to test for an "upvar'd array" is to iterate over the "array names" output, and test for those that don't exist (!!).

    Finally, "array get" botches them badly:

    	% array get b
    	x 0xd4 y 1
    
    On the plus side, "upvar'd arrays" work normally with every other Tcl command. Meaning, you can use "set", "trace", and "upvar" on them and/or their elements, as long as you bring them into scope (with an upvar). I.e.,
    	% upvar 0 b(x) xx
    	% set xx(a)
    	1
    

My Fixes

The change that John added in 8.0a1 makes the "upvar'd array" impossible to create, while leaving the (in my humble opinion) more useless "global variable with array-like name". If anything, I think he should remove both "odd" language quirks, rather than just one. But, I'd rather he restore the first one.

John said two things to me:

> I don't think that nested arrays are even close to working in a
> reasonable fashion (are you sure that they can't generate core
> dumps right now?) so I don't think this is the solution.

> By all means feel free to take a stab at this. If you come up with
> something clean and simple (both in its interface and its
> implementation) we'll consider it for inclusion in the core.

This note is my stab at fixing the problems. These were my goals:

  1. Restore the "upvar'd array" style of nested array. This means that array elements themselves may become arrays.
  2. Fix "array exists" and "info exists" to correctly detect such variables.
  3. Correct "array get" and "array set" to manipulate nested arrays.
The changes below are exceptionally simple and straightforward. The fixes in the "array" command involved just including the TCL_PART1_NOT_PARSED flag when looking up variables. The fixes to the "info exists" command included removing complicated code that was duplicated from LookupVar(), and then calling into the existing function.

I also include a tests module to self test this ability. This exercises almost all of the paths in the changes. It also gives me high assurance that this cannot generate core dumps.

One existing test breaks because of this change, upvar-8.8, which was added to check that upvar'd arrays aren't possible. 8-}

There is one (unintended) side effect of this change. By fixing the "array" command for "upvar'd arrays", I also eliminate the path of code that allowed the creation of variables with "array-like names". Really, this was unintentional!

With these changes, the "array" command now allows "array names", "array exists", and "array get" to work on "upvar'd arrays". That is:

	% upvar 0 b(x) y
	% array set y {a 1 b 2}
	% set y(a)
	1
	% info exists b(x)
	1
	% array names a
	% set b(y) 1
	% array names b
	x y
	% array names b(x)
	a b
	% array get b(x)
	a 1 b 2
There two things to note, however:
  1. "array set" does not work on "upvar'd arrays":
    	% array set b(x) {z 1 z 2}
    	cannot set into nested array
    
    This is for the same reason that variables with "array-like names" cannot do traces: you end up with two array indices, and it is just not supported (not without much more work, at least). You can "array set" onto the upvar'd variable (as you see above).

  2. "array get" does not return any information about elements that are "upvar'd arrays".
    	% array names b
    	x y
    	% array get b
    	y 1
    
    This is because the element isn't something you can feed back to "array set". In 7.6, this was the one case that could cause a core dump.

These are very minor problems when weighed against the usefulness of this feature. Fixing them is not hard, but it is slightly more work than below. I'll be glad to do so. But, my first priority is to get the general ability back into Tcl8.0.

Thanks,
John

Patches

*** tclVar.c	1997/01/17 21:51:21	1.1
--- tclVar.c	1997/01/20 04:07:26
***************
*** 74,75 ****
--- 74,121 ----
   *
+  * TclLookupVar --
+  *
+  *	Locate a variable given its name(s).
+  *
+  * Results:
+  *	The return value is a pointer to the variable structure indicated by
+  *	part1 and part2, or NULL if the variable couldn't be found. If the
+  *	variable is found, *arrayPtrPtr is filled in with the address of the
+  *	variable structure for the array that contains the variable (or NULL
+  *	if the variable is a scalar). If the variable can't be found or
+  *	some other error occurs, NULL is returned and an error message is
+  *	left in interp->result if TCL_LEAVE_ERR_MSG is set in flags.
+  *	(The result isn't put in interp->objResult because
+  *	this procedure is used by so many string-based routines.)
+  *
+  *	Note: it's possible for the variable returned to be VAR_UNDEFINED.
+  *	For example, the variable might be a global that has been unset but
+  *	is still referenced by a procedure, or a variable that has been unset
+  *	but it only being kept in existence (if VAR_UNDEFINED) by a trace.
+  */
+ 
+ Var *
+ TclLookupVar(interp, part1, part2, flags, arrayPtrPtr)
+     Tcl_Interp *interp;		/* Interpreter to use for lookup. */
+     char *part1;		/* If part2 isn't NULL, this is the name of
+ 				 * an array. Otherwise, if the
+ 				 * TCL_PART1_NOT_PARSED flag bit is set this
+ 				 * is a full variable name that could
+ 				 * include a parenthesized array elemnt. If
+ 				 * TCL_PART1_NOT_PARSED isn't present, then
+ 				 * this is the name of a scalar variable. */
+     char *part2;		/* Name of element within array, or NULL. */
+     int flags;			/* Only TCL_GLOBAL_ONLY, TCL_LEAVE_ERR_MSG,
+ 				 * and TCL_PART1_NOT_PARSED bits matter. */
+     Var **arrayPtrPtr;		/* If the name refers to an element of an
+ 				 * array, *arrayPtrPtr gets filled in with
+ 				 * address of array variable. Otherwise
+ 				 * this is set to NULL. */
+ {
+     return LookupVar(interp, part1, part2, flags, "lookup", 
+ 	/* create */0, arrayPtrPtr);
+ }
+ 
+ /*
+  *----------------------------------------------------------------------
+  *
   * LookupVar --
***************
*** 302,304 ****
  
!     if (TclIsVarUndefined(varPtr) && !TclIsVarArrayElement(varPtr)) {
  	if (!(create & CRT_PART1)) {
--- 348,350 ----
  
!     if (TclIsVarUndefined(varPtr)) {
  	if (!(create & CRT_PART1)) {
***************
*** 1495,1497 ****
  
!     if (TclIsVarUndefined(arrayPtr) && !TclIsVarArrayElement(arrayPtr)) {
  	TclSetVarArray(arrayPtr);
--- 1541,1543 ----
  
!     if (TclIsVarUndefined(arrayPtr)) {
  	TclSetVarArray(arrayPtr);
***************
*** 2677,2679 ****
  
!     varPtr = LookupVar(interp, argv[2], (char *) NULL, /*flags*/ 0,
              /*msg*/ 0, /*create*/ 0, &arrayPtr);
--- 2723,2725 ----
  
!     varPtr = LookupVar(interp, argv[2], (char *) NULL, TCL_PART1_NOT_PARSED,
              /*msg*/ 0, /*create*/ 0, &arrayPtr);
***************
*** 2779,2781 ****
  	    varPtr2 = (Var *) Tcl_GetHashValue(hPtr);
! 	    if (TclIsVarUndefined(varPtr2)) {
  		continue;
--- 2825,2827 ----
  	    varPtr2 = (Var *) Tcl_GetHashValue(hPtr);
! 	    if (TclIsVarUndefined(varPtr2) || TclIsVarArray(varPtr2)) {
  		continue;
***************
*** 2864,2866 ****
  	    && (length >= 2)) {
! 	char **valueArgv;
  	int valueArgc, i, result;
--- 2910,2912 ----
  	    && (length >= 2)) {
! 	char **valueArgv, *p;
  	int valueArgc, i, result;
***************
*** 2880,2881 ****
--- 2926,2947 ----
  	    goto setDone;
+ 	}
+ 	/*
+ 	 * Enforce temporary restriction until a little more work is done.
+ 	 * Cannot just test for !arrayPtr, as it is NULL when no array exists
+ 	 * yet.  Otherwise, "array set x(c) {a 1 b 2}" will effectively
+ 	 * do "set x(c) 1; set x(c) 2", which is not the intended effect.
+ 	 */
+ 	for (p = argv[2];  *p != '\0';  p++) {
+ 	    if (*p == '(') {
+ 		do {
+ 		    p++;
+ 		} while (*p != '\0');
+ 		p--;
+ 		if (*p == ')') {
+ 		    interp->result = "cannot set into nested array";
+ 		    result = TCL_ERROR;
+ 		    goto setDone;
+ 		}
+ 		break;
+ 	    }
  	}
*** tclInt.h	1997/01/20 03:32:46	1.1
--- tclInt.h	1997/01/20 03:41:26
***************
*** 253,255 ****
  #define TclSetVarArrayElement(varPtr) \
!     (varPtr)->flags = ((varPtr)->flags & ~VAR_ARRAY) | VAR_ARRAY_ELEMENT
  
--- 253,255 ----
  #define TclSetVarArrayElement(varPtr) \
!     (varPtr)->flags |= VAR_ARRAY_ELEMENT
  
***************
*** 1177,1178 ****
--- 1177,1181 ----
  			    int argc, char **argv)) ;
+ EXTERN Var *		TclLookupVar _ANSI_ARGS_((Tcl_Interp *interp,
+ 			    char *part1, char *part2, int flags,
+ 			    Var **arrayPtrPtr));
  EXTERN int		TclNeedSpace _ANSI_ARGS_((char *start, char *end));
*** tclCmdIL.c	1997/01/17 22:30:05	1.1
--- tclCmdIL.c	1997/01/20 03:39:29
***************
*** 255,308 ****
  	/*
! 	 * The code below handles the special case where the name is for
! 	 * an array: Tcl_GetVar will reject this since you can't read
! 	 * an array variable without an index.
  	 */
- 
  	if (p == NULL) {
! 	    Tcl_HashEntry *hPtr;
! 	    Var *varPtr = NULL;
! 
! 	    if (strchr(argv[2], '(') != NULL) {
! 		noVar:
  		iPtr->result = "0";
  		return TCL_OK;
- 	    }
- 	    if (iPtr->varFramePtr == NULL) {
- 		hPtr = Tcl_FindHashEntry(&iPtr->globalTable, argv[2]);
- 		if (hPtr != NULL) {
- 		    varPtr = (Var *) Tcl_GetHashValue(hPtr);
- 		}
- 	    } else {
- 		CallFrame *varFramePtr = iPtr->varFramePtr;
- 		int localVarCt = varFramePtr->procPtr->numCompiledLocals;
- 		Var *localVarPtr;
- 		
- 		for (i = 0, localVarPtr = varFramePtr->compiledLocals;
- 		        i < localVarCt;
- 			i++, localVarPtr++) {
- 		    if (strcmp(argv[2], localVarPtr->name) == 0) {
- 			varPtr = localVarPtr;
- 			break;
- 		    }
- 		}
- 		if ((varPtr == NULL)
- 		        && (varFramePtr->varTablePtr != NULL)) {
- 		    hPtr = Tcl_FindHashEntry(varFramePtr->varTablePtr,
- 					     argv[2]);
- 		    if (hPtr != NULL) {
- 			varPtr = (Var *) Tcl_GetHashValue(hPtr);
- 		    }
- 		}
- 	    }
- 	    if (varPtr == NULL) {
- 		goto noVar;
- 	    }
- 	    while (varPtr->flags & VAR_LINK) {
- 		varPtr = varPtr->value.linkPtr;
- 	    }
- 	    if (varPtr->flags & VAR_UNDEFINED) {
- 		goto noVar;
- 	    }
- 	    if (!(varPtr->flags & VAR_ARRAY)) {
- 		goto noVar;
  	    }
--- 255,268 ----
  	/*
! 	 * Tcl_GetVar read the variable and generated a read trace.
! 	 * However, in the case where argv[2] is an array, Tcl_GetVar
! 	 * will not return a value.  So, retry just the lookup.
! 	 * By using TCL_PART1_NOT_PARSED, we handle nested arrays, too.
  	 */
  	if (p == NULL) {
! 	    Var *varPtr, *arrayPtr;
! 	    varPtr = TclLookupVar((Tcl_Interp *) iPtr, argv[2], 0,
! 				   TCL_PART1_NOT_PARSED, &arrayPtr);
! 	    if (!varPtr || TclIsVarUndefined(varPtr)) {
  		iPtr->result = "0";
  		return TCL_OK;
  	    }
*** /dev/null	Mon Jan 20 09:48:13 1997
--- nested-array.test	Sun Jan 19 23:02:07 1997
***************
*** 0 ****
--- 1,119 ----
+ # Test nested (upvar) arrays
+ #
+ #
+ if {[string compare test [info procs test]] == 1} then {source defs}
+ 
+ catch {unset m n __a __b}
+ test nestarr-1.1 {nested array set} {
+     upvar 0 m(c) __b
+     array set __b {
+ 	a 5
+ 	b 6
+     }
+ } {}
+ test nestarr-1.2 {array was created} {
+     list [catch {set m 1} msg] $msg
+ } {1 {can't set "m": variable is array}}
+ test nestarr-1.3 {normal array set succeeds} {
+     array set m {
+ 	a 1
+ 	b(1) 2
+     }
+ } {}
+ test nestarr-1.3 {nested array with array-like name} {
+     upvar 0 m(b(2)) __a
+     array set __a {
+ 	a 3
+ 	b 4
+     }
+ } {}
+ 
+ test nestarr-2.1 {array names returns all} {
+     array names m
+ } {b(1) b(2) a c}
+ test nestarr-2.2 {array get skips nested arrays} {
+     array get m
+ } {b(1) 2 a 1}
+ test nestarr-2.3 {array names of nested works} {
+     array names m(b(2))
+ } {a b}
+ test nestarr-2.4 {array get of nested works} {
+     array get m(b(2))
+ } {a 3 b 4}
+ test nestarr-2.5 {array get of upvar'd nested works} {
+     array get __a
+ } {a 3 b 4}
+ test nestarr-2.6 {array get of nested works} {
+     array get m(c)
+ } {a 5 b 6}
+ test nestarr-2.7 {array get of upvar'd nested works} {
+     array get __b
+ } {a 5 b 6}
+ 
+ test nestarr-3.1 {info exists of nested} {
+     info exists m(c)
+ } 1
+ test nestarr-3.2 {info exists of nested} {
+     info exists m(b(2))
+ } 1
+ test nestarr-3.3 {info exists of upvar'd nested} {
+     info exists __b
+ } 1
+ test nestarr-3.4 {array exists} {
+     array exists m
+ } 1
+ test nestarr-3.5 {array exists of nested} {
+     array exists m(c)
+ } 1
+ test nestarr-3.6 {array exists of nested} {
+     array exists m(b(2))
+ } 1
+ test nestarr-3.7 {array exists of upvar'd nested} {
+     array exists __b
+ } 1
+ 
+ test nestarr-4.1 {nested array element can't be changed} {
+     list [catch {array set m {b(2) x}} msg] $msg
+ } {1 {can't set "m(b(2))": variable is array}}
+ 
+ test nestarr-5.1 {array set restriction} {
+     list [catch {array set new(x) {b(2) x}} msg] $msg
+ } {1 {cannot set into nested array}}
+ 
+ if 0 {
+     puts "\ntest unparsed-name nesting"
+ 
+     array set n(1) {
+ 	a 1
+ 	b(1) 2
+     }
+     upvar 0 n(2) __c
+     array set n(2) {
+ 	c 3
+ 	d(1) 4
+     }
+     upvar 0 n(2) __d
+     set __d(x) xx
+     puts n=[array get n]
+     puts n(1)=[array get n(1)]
+     puts n(2)=[array get n(2)]
+     puts __c=[array get __c]
+     puts __d=[array get __d]
+     puts IE.n(1)=[info exists n(1)]
+     puts IE.n(2)=[info exists n(2)]
+     puts IE.__c=[info exists __c]
+     puts IE.__d=[info exists __d]
+     puts AE.n(1)=[array exists n(1)]
+     puts AE.n(2)=[array exists n(2)]
+     puts AE.__c=[array exists __c]
+     puts AE.__d=[array exists __d]
+ 
+     puts "\nglobals=[info globals]"
+ 
+     puts "\ndebug..."
+     catch {array D D}
+     puts AE.c=[array exists m(c)]
+     puts IE.c=[info exists m(c)]
+ }
+ 
+ concat