checkcorrectness function for dict #8687

StephenVavasis · 2014-10-15T04:01:57Z

This proposed unexported function, checkcorrectness, accompanies a proposed change that I just made to the documentation. As noted in a recent discussion on the julia user's group, https://groups.google.com/forum/#!topic/julia-users/bXyLf5v3uBE, the Dict indexing structure is corrupted if a mutable key gets changed by the user. The purpose of this function is to look for errors in a Dict in case the user inadvertently clobbers the index.

StefanKarpinski · 2014-10-15T04:57:06Z

How about calling this fsck!? The bang part is disputable, but I'd argue since it changes the observable contents of the structure, it's warranted. I guess we could have this as a generic function that is used to fix up data structures that might somehow become internally inconsistent.

JeffBezanson · 2014-10-15T14:35:17Z

This is a nice piece of code that seems to work well, but I'm really not sure how useful it is to have in Base. First, I just don't find this problem comes up very often. Second, this is a marginal kind of solution: you have to know that this function exists, and you have to obtain a reference to a suspected-invalid dictionary, and manually use this function. Having done that, you aren't much closer to finding the actual source of the bug. It seems like a strange kind of thing to burden the standard library with.

StefanKarpinski · 2014-10-15T14:51:42Z

I feel the same way. I can't imagine writing a program that would use this. If some value is getting mutated that is being used as a Dict key, I would copy the keys on insertion or extraction to avoid that instead.

StephenVavasis · 2014-10-15T15:55:36Z

Jeff and Stefan,

The point is that the dict keys might be changing unintentionally, i.e.,
the user did not notice that a statement modifying one data structure was
clobbering another one. How would such a user debug his/her program to
find the source of the collision? One way would be to call the
checkcorrectness function in various spots in his/her code, and then the
first time it fails, he/she would have a clue which statement was
clobbering which data structure.

In the near future Julia will get more complicated data structures (such
as the ones I am currently writing) with mutable keys inside the index.
An unintended collision like this will very likely occur for some users of
those structures, so there ought to be a toolbox available for verifying
data structure correctness.

Having said that, I agree that the toolbox does not necessary have to be
part of Base. It is desirable, however, to have it in the same
source-code file (dict.jl) since anyone later who changes dict.jl will
notice that he/she should also change the correctness checker.

On Wed, 15 Oct 2014, Stefan Karpinski wrote:

I feel the same way. I can't imagine writing a program that would use this. If some value is getting mutated that is
being used as a Dict key, I would copy the keys on insertion or extraction to avoid that instead.

—
Reply to this email directly or view it onGitHub.[7881863__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTAwMzkxMywiZGF0YSI6eyJpZCI6NDU4MjE1NTJ9fQ==--b7
762059dfdae4540443cd36b0e00ab1b4a7c1f2.gif]

JeffBezanson · 2014-10-15T16:10:02Z

I don't really see what kind of code ends up mutating dictionary keys. FWIW I agree that eliminating this problem (e.g by having immutable versions of all data structures, and only allowing those as dict keys) would be ideal, but in practice using a value as a dict key makes it very unlikely to be subject to mutation.

kmsquire · 2014-10-15T16:29:21Z

How about putting this in DataStructures.jl? There are already other Dict
implementations in there.

On Wednesday, October 15, 2014, Jeff Bezanson notifications@github.com
wrote:

I don't really see what kind of code ends up mutating dictionary keys.
FWIW I agree that eliminating this problem (e.g by having immutable
versions of all data structures, and only allowing those as dict keys)
would be ideal, but in practice using a value as a dict key makes it very
unlikely to be subject to mutation.

—
Reply to this email directly or view it on GitHub
#8687 (comment).

eschnett · 2014-10-15T16:48:17Z

Complex data structures often have invariants, which are boolean
expressions that must evaluate to true. Which data structure has what
invariant, of course, depends on its implementation. I often have a
function invariant for data structures, and checking invariants is then
done via @assert invariant(x). Almost my definition, the invariant should
go near the implementation. In heavy debug mode, one would check the
invariant in the beginning and at the end of each public routine.
Invariants are good for debugging the implementation of a data structure
itself, they help when one modifies the implementation of a data structure,
and they can (apparently) also help detecting problems if the using code
doesn't behave as expected.

If Base.Dict requires that keys don't change, and if this is not enforced
by the implementation for performance reasons, then having a public
Base.invariant routine would be convenient.

In some cases, such invariants can even help the compiler produce better
code, as the compiler can then be taught certain properties of the data
structure that are otherwise only known implicitly. I'm not sure that LLVM
is there yet.

-erik

On Wed, Oct 15, 2014 at 12:29 PM, Kevin Squire notifications@github.com
wrote:

How about putting this in DataStructures.jl? There are already other Dict
implementations in there.

On Wednesday, October 15, 2014, Jeff Bezanson notifications@github.com
wrote:

I don't really see what kind of code ends up mutating dictionary keys.
FWIW I agree that eliminating this problem (e.g by having immutable
versions of all data structures, and only allowing those as dict keys)
would be ideal, but in practice using a value as a dict key makes it
very
unlikely to be subject to mutation.

Reply to this email directly or view it on GitHub
#8687 (comment).

Reply to this email directly or view it on GitHub
#8687 (comment).

Erik Schnetter schnetter@cct.lsu.edu
http://www.perimeterinstitute.ca/personal/eschnetter/

JeffBezanson · 2014-10-15T16:54:10Z

That is all true; I guess it's reasonable to have a standard name for checking complex invariants.

StephenVavasis · 2014-10-15T18:06:07Z

Jeff,

You challenged me to come up with a realistic example of how a user could
accidentally clobber a dict by modifying mutable keys. I have not ever
encountered such an example; I am a relative Julia newbie and have not
written anything complicated enough yet for this to happen.

However, I can at least dream up a scenario where a user could make this
mistake.

Consider writing a compiler for a language L with a complicated system of
types. Suppose furthermore that in this language L, unlike Julia, types
are considered identical if they have the same bits representation, even
if they are declared separately. (There is a name for this kind of type
system that I can't remember.)

The author of this compiler represents each type as a Julia Array{Any,1}
with nested arrays of Any for compound data structures.

In order to recognize whether two types are the same, the author uses a
Dict where the key is the Array{Any,1} representation of the type. The
author now parses all the types in the program this way.

Now he/she wishes to compactify the set of types that have been defined
in the L source file by identifying common substructures and merging these
into a single type.

In this case, the compiler-writer might try the following: loop over the
types T in the Dict, and recursively for each component T' of T, look it
up, and if it already appears in the dict, then delete T from the dict and
reinsert it with some kind of index for the subcomponent replacing the
Array{Any,1} representation of the component T'.

The problem with this loop is that if the compiler modifies a type T, and
T happens to be a subcomponent of a different type T'' that is elsewhere
in the dictionary (such that the array of T is actually nested inside the
array of T''), then this also changes T'' and thus clobbers the
dictionary.

A rather convoluted example, but maybe it meets with your challenge!

On Wed, 15 Oct 2014, Jeff Bezanson wrote:

That is all true; I guess it's reasonable to have a standard name for checking complex invariants.

—
Reply to this email directly or view it onGitHub.[7881863__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTAxMTI2MSwiZGF0YSI6eyJpZCI6NDU4MjE1NTJ9fQ==--3f
99b35a72931f9e117d2b6da7239f7b60bc18c5.gif]

JeffBezanson · 2014-10-15T20:36:16Z

That strikes me as a case where an object is mutated in such a way that it remains equal to its old value. No self-respecting compiler writer would ever genuinely mutate a type :)

In any case, I think all we need to decide here is whether we want a generic function for checking complex invariants. If so, then this function is probably appropriate.

tkelman · 2014-10-16T03:26:30Z

Such structure-checking functions come up in plenty of other use cases - checking for duplicate or unsorted entries in a sparse matrix for example. Often they have application-specific names, but not always. For example there is currently an isvalid for strings and indices, but that function also has some undocumented methods for various Cholmod types in linalg/cholmod.jl.

vtjnash · 2019-06-27T16:09:20Z

I think it's sufficiently clear this hasn't been necessary from the passage of time

StefanKarpinski mentioned this pull request Oct 15, 2014

New discussion about dicts with mutable keys #8685

Closed

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

vtjnash closed this Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkcorrectness function for dict #8687

checkcorrectness function for dict #8687

StephenVavasis commented Oct 15, 2014

StefanKarpinski commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

StefanKarpinski commented Oct 15, 2014

StephenVavasis commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

kmsquire commented Oct 15, 2014

eschnett commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

StephenVavasis commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

tkelman commented Oct 16, 2014

vtjnash commented Jun 27, 2019

checkcorrectness function for dict #8687

checkcorrectness function for dict #8687

Conversation

StephenVavasis commented Oct 15, 2014

StefanKarpinski commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

StefanKarpinski commented Oct 15, 2014

StephenVavasis commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

kmsquire commented Oct 15, 2014

eschnett commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

StephenVavasis commented Oct 15, 2014

JeffBezanson commented Oct 15, 2014

tkelman commented Oct 16, 2014

vtjnash commented Jun 27, 2019