Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkcorrectness function for dict #8687

Closed
wants to merge 1 commit into from
Closed

checkcorrectness function for dict #8687

wants to merge 1 commit into from

Conversation

StephenVavasis
Copy link
Contributor

This proposed unexported function, checkcorrectness, accompanies a proposed change that I just made to the documentation. As noted in a recent discussion on the julia user's group, https://groups.google.com/forum/#!topic/julia-users/bXyLf5v3uBE, the Dict indexing structure is corrupted if a mutable key gets changed by the user. The purpose of this function is to look for errors in a Dict in case the user inadvertently clobbers the index.

This proposed unexported function, checkcorrectness, accompanies a proposed change that I just made to the documentation.  As noted in a recent discussion on the julia user's group, https://groups.google.com/forum/#!topic/julia-users/bXyLf5v3uBE, the Dict indexing structure is corrupted if a mutable key gets changed by the user.  The purpose of this function is to look for errors in a Dict in case the user inadvertently clobbers the index.
@StefanKarpinski
Copy link
Member

How about calling this fsck!? The bang part is disputable, but I'd argue since it changes the observable contents of the structure, it's warranted. I guess we could have this as a generic function that is used to fix up data structures that might somehow become internally inconsistent.

@JeffBezanson
Copy link
Member

This is a nice piece of code that seems to work well, but I'm really not sure how useful it is to have in Base. First, I just don't find this problem comes up very often. Second, this is a marginal kind of solution: you have to know that this function exists, and you have to obtain a reference to a suspected-invalid dictionary, and manually use this function. Having done that, you aren't much closer to finding the actual source of the bug. It seems like a strange kind of thing to burden the standard library with.

@StefanKarpinski
Copy link
Member

I feel the same way. I can't imagine writing a program that would use this. If some value is getting mutated that is being used as a Dict key, I would copy the keys on insertion or extraction to avoid that instead.

@StephenVavasis
Copy link
Contributor Author

Jeff and Stefan,

The point is that the dict keys might be changing unintentionally, i.e.,
the user did not notice that a statement modifying one data structure was
clobbering another one. How would such a user debug his/her program to
find the source of the collision? One way would be to call the
checkcorrectness function in various spots in his/her code, and then the
first time it fails, he/she would have a clue which statement was
clobbering which data structure.

In the near future Julia will get more complicated data structures (such
as the ones I am currently writing) with mutable keys inside the index.
An unintended collision like this will very likely occur for some users of
those structures, so there ought to be a toolbox available for verifying
data structure correctness.

Having said that, I agree that the toolbox does not necessary have to be
part of Base. It is desirable, however, to have it in the same
source-code file (dict.jl) since anyone later who changes dict.jl will
notice that he/she should also change the correctness checker.

On Wed, 15 Oct 2014, Stefan Karpinski wrote:

I feel the same way. I can't imagine writing a program that would use this. If some value is getting mutated that is
being used as a Dict key, I would copy the keys on insertion or extraction to avoid that instead.


Reply to this email directly or view it onGitHub.[7881863__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTAwMzkxMywiZGF0YSI6eyJpZCI6NDU4MjE1NTJ9fQ==--b7
762059dfdae4540443cd36b0e00ab1b4a7c1f2.gif]

@JeffBezanson
Copy link
Member

I don't really see what kind of code ends up mutating dictionary keys. FWIW I agree that eliminating this problem (e.g by having immutable versions of all data structures, and only allowing those as dict keys) would be ideal, but in practice using a value as a dict key makes it very unlikely to be subject to mutation.

@kmsquire
Copy link
Member

How about putting this in DataStructures.jl? There are already other Dict
implementations in there.

On Wednesday, October 15, 2014, Jeff Bezanson notifications@github.com
wrote:

I don't really see what kind of code ends up mutating dictionary keys.
FWIW I agree that eliminating this problem (e.g by having immutable
versions of all data structures, and only allowing those as dict keys)
would be ideal, but in practice using a value as a dict key makes it very
unlikely to be subject to mutation.


Reply to this email directly or view it on GitHub
#8687 (comment).

@eschnett
Copy link
Contributor

Complex data structures often have invariants, which are boolean
expressions that must evaluate to true. Which data structure has what
invariant, of course, depends on its implementation. I often have a
function invariant for data structures, and checking invariants is then
done via @assert invariant(x). Almost my definition, the invariant should
go near the implementation. In heavy debug mode, one would check the
invariant in the beginning and at the end of each public routine.
Invariants are good for debugging the implementation of a data structure
itself, they help when one modifies the implementation of a data structure,
and they can (apparently) also help detecting problems if the using code
doesn't behave as expected.

If Base.Dict requires that keys don't change, and if this is not enforced
by the implementation for performance reasons, then having a public
Base.invariant routine would be convenient.

In some cases, such invariants can even help the compiler produce better
code, as the compiler can then be taught certain properties of the data
structure that are otherwise only known implicitly. I'm not sure that LLVM
is there yet.

-erik

On Wed, Oct 15, 2014 at 12:29 PM, Kevin Squire notifications@github.com
wrote:

How about putting this in DataStructures.jl? There are already other Dict
implementations in there.

On Wednesday, October 15, 2014, Jeff Bezanson notifications@github.com
wrote:

I don't really see what kind of code ends up mutating dictionary keys.
FWIW I agree that eliminating this problem (e.g by having immutable
versions of all data structures, and only allowing those as dict keys)
would be ideal, but in practice using a value as a dict key makes it
very
unlikely to be subject to mutation.

Reply to this email directly or view it on GitHub
#8687 (comment).

Reply to this email directly or view it on GitHub
#8687 (comment).

Erik Schnetter schnetter@cct.lsu.edu
http://www.perimeterinstitute.ca/personal/eschnetter/

@JeffBezanson
Copy link
Member

That is all true; I guess it's reasonable to have a standard name for checking complex invariants.

@StephenVavasis
Copy link
Contributor Author

Jeff,

You challenged me to come up with a realistic example of how a user could
accidentally clobber a dict by modifying mutable keys. I have not ever
encountered such an example; I am a relative Julia newbie and have not
written anything complicated enough yet for this to happen.

However, I can at least dream up a scenario where a user could make this
mistake.

Consider writing a compiler for a language L with a complicated system of
types. Suppose furthermore that in this language L, unlike Julia, types
are considered identical if they have the same bits representation, even
if they are declared separately. (There is a name for this kind of type
system that I can't remember.)

The author of this compiler represents each type as a Julia Array{Any,1}
with nested arrays of Any for compound data structures.

In order to recognize whether two types are the same, the author uses a
Dict where the key is the Array{Any,1} representation of the type. The
author now parses all the types in the program this way.

Now he/she wishes to compactify the set of types that have been defined
in the L source file by identifying common substructures and merging these
into a single type.

In this case, the compiler-writer might try the following: loop over the
types T in the Dict, and recursively for each component T' of T, look it
up, and if it already appears in the dict, then delete T from the dict and
reinsert it with some kind of index for the subcomponent replacing the
Array{Any,1} representation of the component T'.

The problem with this loop is that if the compiler modifies a type T, and
T happens to be a subcomponent of a different type T'' that is elsewhere
in the dictionary (such that the array of T is actually nested inside the
array of T''), then this also changes T'' and thus clobbers the
dictionary.

A rather convoluted example, but maybe it meets with your challenge!

On Wed, 15 Oct 2014, Jeff Bezanson wrote:

That is all true; I guess it's reasonable to have a standard name for checking complex invariants.


Reply to this email directly or view it onGitHub.[7881863__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyOTAxMTI2MSwiZGF0YSI6eyJpZCI6NDU4MjE1NTJ9fQ==--3f
99b35a72931f9e117d2b6da7239f7b60bc18c5.gif]

@JeffBezanson
Copy link
Member

That strikes me as a case where an object is mutated in such a way that it remains equal to its old value. No self-respecting compiler writer would ever genuinely mutate a type :)

In any case, I think all we need to decide here is whether we want a generic function for checking complex invariants. If so, then this function is probably appropriate.

@tkelman
Copy link
Contributor

tkelman commented Oct 16, 2014

Such structure-checking functions come up in plenty of other use cases - checking for duplicate or unsorted entries in a sparse matrix for example. Often they have application-specific names, but not always. For example there is currently an isvalid for strings and indices, but that function also has some undocumented methods for various Cholmod types in linalg/cholmod.jl.

@vtjnash
Copy link
Member

vtjnash commented Jun 27, 2019

I think it's sufficiently clear this hasn't been necessary from the passage of time

@vtjnash vtjnash closed this Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants