[RFC] add `isDefault`; more general than isNil, isEmpty etc #13526

timotheecour · 2020-02-28T05:30:46Z

this is an often requested feature [1], the lack of which forces some packages to define their own flavors eg:

in nimterop.globals:

template nBl(s: typed): untyped {.used.} = (s.len != 0)
template Bl(s: typed): untyped {.used.} = (s.len == 0)

`isDefault` is more useful than `isEmpty`

IMO isDefault is the more useful abstraction in the majority of cases, and is more general than isNil, or something like isEmpty (which has been proposed a few times) etc.

isNil and isEmpty don't cover important use cases, eg:

type Kind = enum kUnknown, kRed, kGreen
doAssert kUnknown.isDefault

the following semantic differences illustrate that isEmpty is usually the wrong abstraction:

      doAssert not @[0].isDefault # ok
      doAssert [0].isDefault # whereas [0].len != 0 and hence isEmpty would differ
      doAssert not "".cstring.isDefault # whereas "".cstring.len == 0, ditto
      doAssert not newTable[int,int]().isDefault # whereas newTable[int,int]().len == 0, ditto

isEmpty is always false for array[N, T] variables (with N>0), so it's not particularly useful here.
for cstring, most algorithms I can think of care about whether a variable is nil or not, not whether it's len (c_strlen) is == 0
isDefault is automatically defined for any type that has == defined
whereas it's much harder to define isEmpty generically, let alone doing so efficiently eg many types would need a custom definition (Tables, etc)
eg: the naive

proc isEmpty[T](a: T): bool = when compiles(a.len): a.len == 0 else: handleDifferently(a)`

is not efficient for some types where computing len is O(n) or expensive (eg linked lists, intsets, packedjson, c_strlen for cstring, etc)

the semantics of isEmpty aren't always clear in particular with reference types (eg is "".cstring empty or not?); it would always give rise to debates about what's considered empty, unlike isDefault.

semantics

isDefault doesn't try to be too clever about what is considered "default", instead it just uses system.default(T) for that.
However, optimizations are allowed, so long semantics are preserved, hence the overloads I added for seq and string. More overloads can be added in other modules as needed, when optimization dictates it. Note that such overloads should rarely be needed, because most types would often have effcient == defined
note that isDefault is NOT binary equality, eg 0.0 and -0.0 aren't binary equal, but they're semantically (==) equal, hence isDefault(-0.0) is true
likewise, isDefault(initTable[int,int]()) is true despite initTable[int,int]() not being binary equal to t2 with var t2: Table[int,int]: what matters is semantic equality
but isDefault(newTable[int,int]()) is false and isDefault(t3) is true with t3: TableRef[int,int]

Examples 1:

see runnableExamples in PR

Example 2: enables DRY code

# main.nim
import mymod
doAssert Foo().bar.isDefault # this PR
# doAssert Foo().bar == Bar.default # doesn't compile, Bar private
doAssert Foo().bar == default(type(Foo().bar)) # not DRY

# mymod.nim
type Bar = ref object
type Foo* = ref object
  bar*: Bar

[1] requests for a similar feature

I've seen it a few times already in gitter eg
*

I always used if str != "", but using len makes more sense in terms of performance
Alexander Ivanov @alehander92 Apr 03 2019 08:35
i am not sure if this is slower if it is, we should optimize it
there was this idea of a geneirc isEmpty or something like that what happened

Alexander Ivanov @alehander92 Nov 08 2018 02:03
can we add isEmpty to the stdlib

From IRC (bridge bot) @FromIRC Nov 08 2018 03:04
Araq @alehander42: I want it too. for packedjson 'len == 0' is expensive. On the other hand, we have TR macros for that, maybe we should use them more

From IRC (bridge bot) @FromIRC Sep 22 2018 02:36
Araq isEmpty is important for Nim's future though, should write a blog post...

note

I've also added notDefault as sugar over not isDefault to avoid using not isDefault (same rationale as isnot vs is); a few ppl have mentioned the need

alehander92 maybe not isEmpty looks strange ? I think I prefer it compared to len > 0 still
Araq well I don't want to write if not x.isEmpty: for y in x.unwrap():

name-wise, notDefault is preferable to isAny because it's self documenting, whereas isAny would be counter-intuitive for things like doAssert not [0].isAny (see above discussion about isEmpty)

we could move it out of system but then please suggest which module; IMO it should be in system because that's also where default is defined, and it avoids having to import a heavy dependency like sugar etc for something that ought to be common

Araq · 2020-02-28T09:51:31Z

RFC process would be appreciated. I personally prefer isEmpty because it's shorter. The opposite would be isFilled.

dom96

This makes sense for enums but apart from that I am skeptical, at the very least [0].isDefault returning true doesn't make sense to me.

Whatever is decided for the semantics of isDefault, an isEmpty should also exist for string and possibly cstring and seq (and probably other collection types). In fact, why don't we just define it for any type that defines a len proc?

Araq · 2020-02-28T10:04:04Z

why don't we just define it for any type that defines a len proc?

Yeah, that nails it.

timotheecour · 2020-02-28T11:54:37Z

RFC process would be appreciated.

I've added [RFC] in the title; having this discussion here or under https://github.com/nim-lang/RFCs/issues doesn't make a big difference but I can add a reference issue that simply forwards to here if needed

I personally prefer isEmpty because it's shorter.

it's not about the name it's about the underlying semantics, see top post for important differences

Whatever is decided for the semantics of isDefault, an isEmpty should also exist for string
why don't we just define it for any type that defines a len proc?

let's say this:

template isEmpty*[T](a: T): bool = a.len == 0

isEmpty (negated as isAny or isFilled), could be useful to have as well, however I've argued that isDefault is even more useful, see detailed explanation I gave in top post, please read it carefully if you haven't. So I'm not against adding isEmpty but I'm against having only isEmpty and not isDefault

isEmpty opens a can of worms especially for reference/pointer types eg:

does isEmpty((ptr int).default) compile? according to "must have len" definition, it shouldn't => less applicable; likewise with enum's
isEmpty("".cstring) is true or false? this will confuse ppl ; according to len definition, it should be true, but then isEmpty(cstring.default) would also be true so it doesn't help distinguishing whether the input is nil or not;

most of the time when you deal with ptr types (eg cstring), the 1st thing you care about is whether they're nil, not whether a,len == 0 (which can be important but only after nil checking).

As I mentioned above, you'll also have to overload isEmpty for certain types, otherwise performance will be bad as shown below, for any type where len is not O(1);
Note that isDefault does not have this issue, the complexity of isDefault is typically O(1), where the constant depends on the type, not the runtime value, eg for arra[N,T] complexity of isDefault is N = O(1) (fixed at CT, so it's a constant); but as examples below show, complexity of isEmpty can be O(n) where n depends on runtime value (ie, unbounded).

`isEmpty` has bad performance with `cstring`

  import times
  proc main()=
    let n = 100_000_000
    var a = newString(n).cstring
    for i in 0..<n: a[i]='x'
    template test(fun) =
      let t = cpuTime()
      let n2 = 1000
      for i in 0..<n2:
        doAssert not fun(a)
      echo astToStr(fun) & ": " & $(cpuTime() - t)
    test isEmpty
    test isDefault
  main()

isEmpty: 6.141948
isDefault: 1.899999999999125e-05

`isEmpty` has bad performance with `intsets`

import intsets
import times
proc main()=
  var a: Intset
  let n = 1_000_000
  for i in 0..<n: a.incl i
  template test(fun) =
    let t = cpuTime()
    let n2 = 1000
    for i in 0..<n2:
      doAssert not fun(a)
    echo cpuTime() - t
  test isEmpty
  test isDefault
main()

isEmpty: 9.339441000000001
isDefault: 4.500000000007276e-05

at the very least [0].isDefault returning true

that's desirable and follows semantics that are really simple to explain: checks whether it's equal to its default value. It says nothing about length.
Note that (as I mentioned in top post), for arrays (array[N,T] with N>0, isEmpty would always be false (ie, not very useful in runtime code); whereas isDefault is useful in runtime code.

Araq · 2020-02-28T13:34:21Z

Well obviously isEmpty should be specialized for IntSets and cstring. That's not an argument against it.

dom96 · 2020-03-06T10:49:27Z

isEmpty doesn't make sense for pointer types IMO. In fact, it's dangerous as it hides ptr usage (which should be made very explicit as much as possible in our code to avoid memory corruption), so I would argue against introducing it.

Araq · 2020-03-07T08:23:24Z

In my non-existing, alternative RFC isEmpty wouldn't be a thing for pointers.

timotheecour · 2020-03-07T09:17:22Z

Well obviously isEmpty should be specialized for IntSets and cstring.
In my non-existing, alternative RFC isEmpty wouldn't be a thing for pointers.

isEmpty then probably also shouldn't be a thing for cstring, which, although not a pointer/ptr, is similar to ptr char;
likewise with ref types which have the issues I described in top post, eg var t1: TableRef[int,int] and var t2 = newTableRef[int, int]() are both "empty" (don't contain elements) yet one is nil and the other is not, so the isEmpty test isn't really what matters first in practice. So that leaves object types (not ptr/ref/pointer) that have a len property.

isDefault by contrast works with any type that defines ==, doesn't need to be specialized for performance issues, and is usually the very first thing that one should test when dealing with an instance of a type (whether it's a cstring, a string, a ptr / ref / object, Table, TableRef, enum, etc); and it has obvious semantics derived from its definition.

Araq · 2020-03-09T12:39:57Z

But writing isDefault(x) instead of isEmpty(x) obfuscates the code, I usually don't want to test for default-ness, but for "is something in this collection". It misses the point because it's more general, too general I would argue.

timotheecour · 2020-03-10T01:37:32Z

I don't think there's anything wrong with "too general" in this case, but if you care about having something more restrictive we just may need both isDefault and isEmpty then; there are many valid use cases for isDefault that can't be done with isEmpty

here's a plausible version of isEmpty that works generically on object types with a len property
It can be implemented using concepts or (preferably IMO) enableif PR #12048 as follows:

I verified that both the concept version and the enableif version work, as shown below, and just added a test for that in enableif PR #12048

template isDefault*[T](a: T): bool =
  a == default(type(a))

when false:
  # pending `enableif` PR https://github.com/nim-lang/Nim/pull/12048
  template isEmpty*[T: object|seq|string|set](a: T): bool {.enableif: a.len is int .} =
    a.len == 0
else:
  # use concepts
  type HasLen = concept x
    x is object|seq|string|set
    x.len is int

  template isEmpty*[T: HasLen](a: T): bool =
    a.len == 0
      # still inefficient for types where `len` is not O(1) so needs specializations

type Foo = object
  x: int

type Foo2 = object
  x: int

proc len(a: Foo2): int = a.x

doAssert not isEmpty(@[""])
doAssert isEmpty(@[""].type.default)
doAssert not isEmpty("abc")
doAssert isEmpty("")
doAssert not compiles(isEmpty(Foo()))
doAssert not isEmpty(Foo2(x: 2))
doAssert isEmpty(Foo2())

##
## edge cases
##

## edge case 1:
type Foo3 = object
  capacity: int
  initialized: bool
  len: int

doAssert isEmpty(Foo3(capacity: 10, initialized: true, len: 0))

## edge case 2:
doAssert not compiles(isEmpty([1,2]))
  # this is an edge case; it's result depends on type, not runtime value,
  # so having it not compile is sensible; there are arguments in opposite direction

## edge case 3:
doAssert not compiles isEmpty((x: "foo", len: 0)) # should tuples be supported?

## for ref objects, `isEmpty` must use a deref; `isDefault` can be used both
## on the ref (checks nil-ness) and the deref (checks empty-ness)
import tables
var t = newTable[int, int]()

doAssert not compiles(t.isEmpty)
doAssert not t.isDefault

doAssert t[].isEmpty
doAssert t[].isDefault
t[1]=1
doAssert not t[].isEmpty
doAssert not t[].isDefault

var x: set[int8]
doAssert x.isEmpty
doAssert x.isDefault

krux02 · 2020-03-15T16:43:16Z

Why don't you just use x == default(T) instead?

timotheecour · 2020-03-15T20:54:10Z

Why don't you just use x == default(T) instead?

it's explained in top post.
You often don't have T defined when x is some expression, so you'd need:

foo[2].bar() == default(type(foo[2].bar())) # not DRY
vs:
foo[2].bar().isDefault

and "knowing" that bar() is of some type Bar doesn't always help either, eg for generics:

foo[2].bar() == default(Bar[string, float])
vs:
foo[2].bar().isDefault

or when Bar is private, you'd be forced to use foo[2].bar() == default(type(foo[2].bar())) # not DRY
isDefault avoids that.

And writing something like:

let x = foo[2].bar()
x == type(default(T))

can change the semantics, eg cause a copy, which is obviously worse.

krux02 · 2020-03-16T00:13:03Z

Why again do you need this to be in system.nim? Why don't you just declare template isDefault(arg: untyped): bool = arg == default(typeof(arg)) locally to your project and move on?

timotheecour · 2020-03-18T11:41:01Z

Why don't you just declare template isDefault(arg: untyped): bool = arg == default(typeof(arg)) locally to your project and move on?

the need for such sugar is often request feature (see top post, eg can we add isEmpty to the stdlib + similar) and I've been trying to argue in this RFC that isDefault is the right (more general) abstraction to use instead of alternative (not so well defined, see above arguments on semantics) isEmpty

it's a trivial function, but one that can be expected to be used a lot, so it makes sense to defined in stdlib (right where default(T) is defined ideally, which is in system.nim) instead of having user code define it (in subtly different ways, see template nBl example I gave above)

if system.nim is no good, can you suggest another module? sugar would make sense but carries along dependencies (macros, typetraits, underscored_calls)

And if no module is suitable, I'll go ahead and close this PR (but I don't think adding isEmtpy carries its weight as an alternative proposal)

krux02 · 2020-03-18T12:13:35Z

@timotheecour unlike you, I don't see the use case for isDefault. I simply don't see any code examples that would significantly be simplified because af a generic isDefault. All I see is yet again more bloat in system. That doesn't mean that isDefault isn't useful, but it means that isDefault is probably better placed in your project, not in the core project that everybody gets.

Araq · 2020-03-18T12:28:29Z

Sorry, rejected before we all spend even more time arguing about it.

timotheecour · 2020-12-02T22:44:00Z

I'd like to revive this in light of #16191 (comment) with the following modification:

put isDefault not in system.nim but in some other (small-dependency, ie lowlevel) module (so it can be imported without adding much dependencies)
maybe, specialize it for pointer-like types (ptr, pointer, cstring, ref, procvar etc) so that it still works with the meaning of isNil in those cases even if == is specialized (eg for ref types)

Araq · 2020-12-03T15:52:52Z

Adding a 3rd variant when there is no consensus on the existing 2 variants isn't all that helpful.

timotheecour added 8 commits February 27, 2020 21:16

add isDefault; more general than isNil, isEmpty etc

37ae1b2

fixup

02a590b

fixup

9a19131

use since

167c944

fix test

2840e06

add notDefault

df61046

use a template instead for isDefault

db34143

fixup

5c0f0db

dom96 reviewed Feb 28, 2020

View reviewed changes

timotheecour changed the title ~~add isDefault; more general than isNil, isEmpty etc~~ [RFC] add isDefault; more general than isNil, isEmpty etc Feb 28, 2020

Araq closed this Mar 18, 2020

timotheecour mentioned this pull request Mar 18, 2020

misc curated list of unfixed closed issues & PR's timotheecour/Nim#32

Open

This was referenced Mar 31, 2020

fix #13797 #13812

Merged

Thinking about default "not nil" #6638

Open

timotheecour mentioned this pull request Apr 17, 2020

fix https://github.com/nim-lang/RFCs/issues/211: var a: DateTime compiles and is usable #14002

Merged

timotheecour mentioned this pull request Dec 2, 2020

revive isDefault PR timotheecour/Nim#413

Open

timotheecour mentioned this pull request Jan 1, 2021

std/lists: Various changes to lists (RFC #303) #16536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] add `isDefault`; more general than isNil, isEmpty etc #13526

[RFC] add `isDefault`; more general than isNil, isEmpty etc #13526

timotheecour commented Feb 28, 2020 •

edited

Loading

Araq commented Feb 28, 2020

dom96 left a comment

Araq commented Feb 28, 2020

timotheecour commented Feb 28, 2020 •

edited

Loading

Araq commented Feb 28, 2020

dom96 commented Mar 6, 2020

Araq commented Mar 7, 2020

timotheecour commented Mar 7, 2020 •

edited

Loading

Araq commented Mar 9, 2020

timotheecour commented Mar 10, 2020 •

edited

Loading

krux02 commented Mar 15, 2020

timotheecour commented Mar 15, 2020 •

edited

Loading

krux02 commented Mar 16, 2020

timotheecour commented Mar 18, 2020

krux02 commented Mar 18, 2020 •

edited

Loading

Araq commented Mar 18, 2020

timotheecour commented Dec 2, 2020 •

edited

Loading

Araq commented Dec 3, 2020

[RFC] add isDefault; more general than isNil, isEmpty etc #13526

[RFC] add isDefault; more general than isNil, isEmpty etc #13526

Conversation

timotheecour commented Feb 28, 2020 • edited Loading

isDefault is more useful than isEmpty

semantics

Examples 1:

Example 2: enables DRY code

[1] requests for a similar feature

note

Araq commented Feb 28, 2020

dom96 left a comment

Choose a reason for hiding this comment

Araq commented Feb 28, 2020

timotheecour commented Feb 28, 2020 • edited Loading

isEmpty has bad performance with cstring

isEmpty has bad performance with intsets

Araq commented Feb 28, 2020

dom96 commented Mar 6, 2020

Araq commented Mar 7, 2020

timotheecour commented Mar 7, 2020 • edited Loading

Araq commented Mar 9, 2020

timotheecour commented Mar 10, 2020 • edited Loading

krux02 commented Mar 15, 2020

timotheecour commented Mar 15, 2020 • edited Loading

krux02 commented Mar 16, 2020

timotheecour commented Mar 18, 2020

krux02 commented Mar 18, 2020 • edited Loading

Araq commented Mar 18, 2020

timotheecour commented Dec 2, 2020 • edited Loading

Araq commented Dec 3, 2020

[RFC] add `isDefault`; more general than isNil, isEmpty etc #13526

[RFC] add `isDefault`; more general than isNil, isEmpty etc #13526

timotheecour commented Feb 28, 2020 •

edited

Loading

`isDefault` is more useful than `isEmpty`

timotheecour commented Feb 28, 2020 •

edited

Loading

`isEmpty` has bad performance with `cstring`

`isEmpty` has bad performance with `intsets`

timotheecour commented Mar 7, 2020 •

edited

Loading

timotheecour commented Mar 10, 2020 •

edited

Loading

timotheecour commented Mar 15, 2020 •

edited

Loading

krux02 commented Mar 18, 2020 •

edited

Loading

timotheecour commented Dec 2, 2020 •

edited

Loading