Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add isDefault; more general than isNil, isEmpty etc #13526

Closed
wants to merge 8 commits into from

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Feb 28, 2020

this is an often requested feature [1], the lack of which forces some packages to define their own flavors eg:

  • in nimterop.globals:
template nBl(s: typed): untyped {.used.} = (s.len != 0)
template Bl(s: typed): untyped {.used.} = (s.len == 0)

isDefault is more useful than isEmpty

IMO isDefault is the more useful abstraction in the majority of cases, and is more general than isNil, or something like isEmpty (which has been proposed a few times) etc.

  • isNil and isEmpty don't cover important use cases, eg:
type Kind = enum kUnknown, kRed, kGreen
doAssert kUnknown.isDefault
  • the following semantic differences illustrate that isEmpty is usually the wrong abstraction:
      doAssert not @[0].isDefault # ok
      doAssert [0].isDefault # whereas [0].len != 0 and hence isEmpty would differ
      doAssert not "".cstring.isDefault # whereas "".cstring.len == 0, ditto
      doAssert not newTable[int,int]().isDefault # whereas newTable[int,int]().len == 0, ditto
  • isEmpty is always false for array[N, T] variables (with N>0), so it's not particularly useful here.

  • for cstring, most algorithms I can think of care about whether a variable is nil or not, not whether it's len (c_strlen) is == 0

  • isDefault is automatically defined for any type that has == defined

  • whereas it's much harder to define isEmpty generically, let alone doing so efficiently eg many types would need a custom definition (Tables, etc)
    eg: the naive

proc isEmpty[T](a: T): bool = when compiles(a.len): a.len == 0 else: handleDifferently(a)`

is not efficient for some types where computing len is O(n) or expensive (eg linked lists, intsets, packedjson, c_strlen for cstring, etc)

  • the semantics of isEmpty aren't always clear in particular with reference types (eg is "".cstring empty or not?); it would always give rise to debates about what's considered empty, unlike isDefault.

semantics

  • isDefault doesn't try to be too clever about what is considered "default", instead it just uses system.default(T) for that.
  • However, optimizations are allowed, so long semantics are preserved, hence the overloads I added for seq and string. More overloads can be added in other modules as needed, when optimization dictates it. Note that such overloads should rarely be needed, because most types would often have effcient == defined
  • note that isDefault is NOT binary equality, eg 0.0 and -0.0 aren't binary equal, but they're semantically (==) equal, hence isDefault(-0.0) is true
  • likewise, isDefault(initTable[int,int]()) is true despite initTable[int,int]() not being binary equal to t2 with var t2: Table[int,int]: what matters is semantic equality
  • but isDefault(newTable[int,int]()) is false and isDefault(t3) is true with t3: TableRef[int,int]

Examples 1:

see runnableExamples in PR

Example 2: enables DRY code

# main.nim
import mymod
doAssert Foo().bar.isDefault # this PR
# doAssert Foo().bar == Bar.default # doesn't compile, Bar private
doAssert Foo().bar == default(type(Foo().bar)) # not DRY

# mymod.nim
type Bar = ref object
type Foo* = ref object
  bar*: Bar

[1] requests for a similar feature

I've seen it a few times already in gitter eg
*

I always used if str != "", but using len makes more sense in terms of performance
Alexander Ivanov @alehander92 Apr 03 2019 08:35
i am not sure if this is slower if it is, we should optimize it
there was this idea of a geneirc isEmpty or something like that what happened

Alexander Ivanov @alehander92 Nov 08 2018 02:03
can we add isEmpty to the stdlib

From IRC (bridge bot) @FromIRC Nov 08 2018 03:04
Araq @alehander42: I want it too. for packedjson 'len == 0' is expensive. On the other hand, we have TR macros for that, maybe we should use them more

From IRC (bridge bot) @FromIRC Sep 22 2018 02:36
Araq isEmpty is important for Nim's future though, should write a blog post...

note

  • I've also added notDefault as sugar over not isDefault to avoid using not isDefault (same rationale as isnot vs is); a few ppl have mentioned the need

alehander92 maybe not isEmpty looks strange ? I think I prefer it compared to len > 0 still
Araq well I don't want to write if not x.isEmpty: for y in x.unwrap():

name-wise, notDefault is preferable to isAny because it's self documenting, whereas isAny would be counter-intuitive for things like doAssert not [0].isAny (see above discussion about isEmpty)

  • we could move it out of system but then please suggest which module; IMO it should be in system because that's also where default is defined, and it avoids having to import a heavy dependency like sugar etc for something that ought to be common

@Araq
Copy link
Member

Araq commented Feb 28, 2020

RFC process would be appreciated. I personally prefer isEmpty because it's shorter. The opposite would be isFilled.

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense for enums but apart from that I am skeptical, at the very least [0].isDefault returning true doesn't make sense to me.

Whatever is decided for the semantics of isDefault, an isEmpty should also exist for string and possibly cstring and seq (and probably other collection types). In fact, why don't we just define it for any type that defines a len proc?

@Araq
Copy link
Member

Araq commented Feb 28, 2020

why don't we just define it for any type that defines a len proc?

Yeah, that nails it.

@timotheecour
Copy link
Member Author

timotheecour commented Feb 28, 2020

RFC process would be appreciated.

I've added [RFC] in the title; having this discussion here or under https://github.com/nim-lang/RFCs/issues doesn't make a big difference but I can add a reference issue that simply forwards to here if needed

I personally prefer isEmpty because it's shorter.

it's not about the name it's about the underlying semantics, see top post for important differences

Whatever is decided for the semantics of isDefault, an isEmpty should also exist for string
why don't we just define it for any type that defines a len proc?

let's say this:

template isEmpty*[T](a: T): bool = a.len == 0

isEmpty (negated as isAny or isFilled), could be useful to have as well, however I've argued that isDefault is even more useful, see detailed explanation I gave in top post, please read it carefully if you haven't. So I'm not against adding isEmpty but I'm against having only isEmpty and not isDefault

isEmpty opens a can of worms especially for reference/pointer types eg:

  • does isEmpty((ptr int).default) compile? according to "must have len" definition, it shouldn't => less applicable; likewise with enum's
  • isEmpty("".cstring) is true or false? this will confuse ppl ; according to len definition, it should be true, but then isEmpty(cstring.default) would also be true so it doesn't help distinguishing whether the input is nil or not;

most of the time when you deal with ptr types (eg cstring), the 1st thing you care about is whether they're nil, not whether a,len == 0 (which can be important but only after nil checking).

As I mentioned above, you'll also have to overload isEmpty for certain types, otherwise performance will be bad as shown below, for any type where len is not O(1);
Note that isDefault does not have this issue, the complexity of isDefault is typically O(1), where the constant depends on the type, not the runtime value, eg for arra[N,T] complexity of isDefault is N = O(1) (fixed at CT, so it's a constant); but as examples below show, complexity of isEmpty can be O(n) where n depends on runtime value (ie, unbounded).

isEmpty has bad performance with cstring

  import times
  proc main()=
    let n = 100_000_000
    var a = newString(n).cstring
    for i in 0..<n: a[i]='x'
    template test(fun) =
      let t = cpuTime()
      let n2 = 1000
      for i in 0..<n2:
        doAssert not fun(a)
      echo astToStr(fun) & ": " & $(cpuTime() - t)
    test isEmpty
    test isDefault
  main()

isEmpty: 6.141948
isDefault: 1.899999999999125e-05

isEmpty has bad performance with intsets

import intsets
import times
proc main()=
  var a: Intset
  let n = 1_000_000
  for i in 0..<n: a.incl i
  template test(fun) =
    let t = cpuTime()
    let n2 = 1000
    for i in 0..<n2:
      doAssert not fun(a)
    echo cpuTime() - t
  test isEmpty
  test isDefault
main()
isEmpty: 9.339441000000001
isDefault: 4.500000000007276e-05

at the very least [0].isDefault returning true

that's desirable and follows semantics that are really simple to explain: checks whether it's equal to its default value. It says nothing about length.
Note that (as I mentioned in top post), for arrays (array[N,T] with N>0, isEmpty would always be false (ie, not very useful in runtime code); whereas isDefault is useful in runtime code.

@timotheecour timotheecour changed the title add isDefault; more general than isNil, isEmpty etc [RFC] add isDefault; more general than isNil, isEmpty etc Feb 28, 2020
@Araq
Copy link
Member

Araq commented Feb 28, 2020

Well obviously isEmpty should be specialized for IntSets and cstring. That's not an argument against it.

@dom96
Copy link
Contributor

dom96 commented Mar 6, 2020

isEmpty doesn't make sense for pointer types IMO. In fact, it's dangerous as it hides ptr usage (which should be made very explicit as much as possible in our code to avoid memory corruption), so I would argue against introducing it.

@Araq
Copy link
Member

Araq commented Mar 7, 2020

In my non-existing, alternative RFC isEmpty wouldn't be a thing for pointers.

@timotheecour
Copy link
Member Author

timotheecour commented Mar 7, 2020

Well obviously isEmpty should be specialized for IntSets and cstring.
In my non-existing, alternative RFC isEmpty wouldn't be a thing for pointers.

isEmpty then probably also shouldn't be a thing for cstring, which, although not a pointer/ptr, is similar to ptr char;
likewise with ref types which have the issues I described in top post, eg var t1: TableRef[int,int] and var t2 = newTableRef[int, int]() are both "empty" (don't contain elements) yet one is nil and the other is not, so the isEmpty test isn't really what matters first in practice. So that leaves object types (not ptr/ref/pointer) that have a len property.

isDefault by contrast works with any type that defines ==, doesn't need to be specialized for performance issues, and is usually the very first thing that one should test when dealing with an instance of a type (whether it's a cstring, a string, a ptr / ref / object, Table, TableRef, enum, etc); and it has obvious semantics derived from its definition.

@Araq
Copy link
Member

Araq commented Mar 9, 2020

But writing isDefault(x) instead of isEmpty(x) obfuscates the code, I usually don't want to test for default-ness, but for "is something in this collection". It misses the point because it's more general, too general I would argue.

@timotheecour
Copy link
Member Author

timotheecour commented Mar 10, 2020

I don't think there's anything wrong with "too general" in this case, but if you care about having something more restrictive we just may need both isDefault and isEmpty then; there are many valid use cases for isDefault that can't be done with isEmpty

here's a plausible version of isEmpty that works generically on object types with a len property
It can be implemented using concepts or (preferably IMO) enableif PR #12048 as follows:

I verified that both the concept version and the enableif version work, as shown below, and just added a test for that in enableif PR #12048

template isDefault*[T](a: T): bool =
  a == default(type(a))

when false:
  # pending `enableif` PR https://github.com/nim-lang/Nim/pull/12048
  template isEmpty*[T: object|seq|string|set](a: T): bool {.enableif: a.len is int .} =
    a.len == 0
else:
  # use concepts
  type HasLen = concept x
    x is object|seq|string|set
    x.len is int

  template isEmpty*[T: HasLen](a: T): bool =
    a.len == 0
      # still inefficient for types where `len` is not O(1) so needs specializations

type Foo = object
  x: int

type Foo2 = object
  x: int

proc len(a: Foo2): int = a.x

doAssert not isEmpty(@[""])
doAssert isEmpty(@[""].type.default)
doAssert not isEmpty("abc")
doAssert isEmpty("")
doAssert not compiles(isEmpty(Foo()))
doAssert not isEmpty(Foo2(x: 2))
doAssert isEmpty(Foo2())

##
## edge cases
##

## edge case 1:
type Foo3 = object
  capacity: int
  initialized: bool
  len: int

doAssert isEmpty(Foo3(capacity: 10, initialized: true, len: 0))

## edge case 2:
doAssert not compiles(isEmpty([1,2]))
  # this is an edge case; it's result depends on type, not runtime value,
  # so having it not compile is sensible; there are arguments in opposite direction

## edge case 3:
doAssert not compiles isEmpty((x: "foo", len: 0)) # should tuples be supported?

## for ref objects, `isEmpty` must use a deref; `isDefault` can be used both
## on the ref (checks nil-ness) and the deref (checks empty-ness)
import tables
var t = newTable[int, int]()

doAssert not compiles(t.isEmpty)
doAssert not t.isDefault

doAssert t[].isEmpty
doAssert t[].isDefault
t[1]=1
doAssert not t[].isEmpty
doAssert not t[].isDefault

var x: set[int8]
doAssert x.isEmpty
doAssert x.isDefault

@krux02
Copy link
Contributor

krux02 commented Mar 15, 2020

Why don't you just use x == default(T) instead?

@timotheecour
Copy link
Member Author

timotheecour commented Mar 15, 2020

Why don't you just use x == default(T) instead?

it's explained in top post.
You often don't have T defined when x is some expression, so you'd need:

foo[2].bar() == default(type(foo[2].bar())) # not DRY
vs:
foo[2].bar().isDefault

and "knowing" that bar() is of some type Bar doesn't always help either, eg for generics:

foo[2].bar() == default(Bar[string, float])
vs:
foo[2].bar().isDefault

or when Bar is private, you'd be forced to use foo[2].bar() == default(type(foo[2].bar())) # not DRY
isDefault avoids that.

And writing something like:

let x = foo[2].bar()
x == type(default(T))

can change the semantics, eg cause a copy, which is obviously worse.

@krux02
Copy link
Contributor

krux02 commented Mar 16, 2020

Why again do you need this to be in system.nim? Why don't you just declare template isDefault(arg: untyped): bool = arg == default(typeof(arg)) locally to your project and move on?

@timotheecour
Copy link
Member Author

Why don't you just declare template isDefault(arg: untyped): bool = arg == default(typeof(arg)) locally to your project and move on?

the need for such sugar is often request feature (see top post, eg can we add isEmpty to the stdlib + similar) and I've been trying to argue in this RFC that isDefault is the right (more general) abstraction to use instead of alternative (not so well defined, see above arguments on semantics) isEmpty

it's a trivial function, but one that can be expected to be used a lot, so it makes sense to defined in stdlib (right where default(T) is defined ideally, which is in system.nim) instead of having user code define it (in subtly different ways, see template nBl example I gave above)

if system.nim is no good, can you suggest another module? sugar would make sense but carries along dependencies (macros, typetraits, underscored_calls)

And if no module is suitable, I'll go ahead and close this PR (but I don't think adding isEmtpy carries its weight as an alternative proposal)

@krux02
Copy link
Contributor

krux02 commented Mar 18, 2020

@timotheecour unlike you, I don't see the use case for isDefault. I simply don't see any code examples that would significantly be simplified because af a generic isDefault. All I see is yet again more bloat in system. That doesn't mean that isDefault isn't useful, but it means that isDefault is probably better placed in your project, not in the core project that everybody gets.

@Araq
Copy link
Member

Araq commented Mar 18, 2020

Sorry, rejected before we all spend even more time arguing about it.

@timotheecour
Copy link
Member Author

timotheecour commented Dec 2, 2020

I'd like to revive this in light of #16191 (comment) with the following modification:

  • put isDefault not in system.nim but in some other (small-dependency, ie lowlevel) module (so it can be imported without adding much dependencies)
  • maybe, specialize it for pointer-like types (ptr, pointer, cstring, ref, procvar etc) so that it still works with the meaning of isNil in those cases even if == is specialized (eg for ref types)

@Araq
Copy link
Member

Araq commented Dec 3, 2020

Adding a 3rd variant when there is no consensus on the existing 2 variants isn't all that helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants