Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode tables as constants #4516

Closed
akzhan opened this issue Jun 5, 2017 · 21 comments
Closed

Unicode tables as constants #4516

akzhan opened this issue Jun 5, 2017 · 21 comments

Comments

@akzhan
Copy link
Contributor

akzhan commented Jun 5, 2017

Unicode tables now saved as Series of proc calls, that built on runtime into singleton tables on demand.

Looks like overengineering that doubles the used memory and decreases startup time.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Useful article about constants placement: https://randomascii.wordpress.com/2017/01/08/add-a-const-here-delete-a-const-there/

@asterite
Copy link
Member

asterite commented Jun 5, 2017

It's not overengineering, there's no way to put that in the read-only segment in Crystal. But if you find a way, please send a pull request.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Ok, but what prevents to simply setup unicode tables instead of lot of put calls?

Something like

module Unicode
  UpcaseRanges : Array({Int32, Int32, Int32}).new(
    data...
  )
end

@RX14
Copy link
Member

RX14 commented Jun 5, 2017

@akzhan hash table "constants" are still constructed at startup, just not lazily. The compiler would need to assume a hash table implementation to construct a constant hash table at compile-time, and I think such assumptions are wrong.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Understandably. I'll close this issue, and think about the read-only segment later

@akzhan akzhan closed this as completed Jun 5, 2017
@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

And no, I need to answer to another question:

Why not to

    @@upcase_ranges ||= [
      {97, 122, -32},
      {181, 181, 743},

instead of

    @@upcase_ranges ||= begin
      data = Array({Int32, Int32, Int32}).new(131)
      put(data, 97, 122, -32)
      put(data, 181, 181, 743)

@akzhan akzhan reopened this Jun 5, 2017
@RX14
Copy link
Member

RX14 commented Jun 5, 2017

@akzhan Array is the same:

See here and here

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Anyway using of array literal looks more consistently

@RX14
Copy link
Member

RX14 commented Jun 5, 2017

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Looks like a hack.

Is it not more correct to get rid of extra allocations when forming a constant expression?

@asterite
Copy link
Member

asterite commented Jun 5, 2017

It's a hack and I know it. But if I use a literal (which will boil down to exactly the same code in release mode) it will take like 10 seconds or more to compile a simple "hello world" program. This is something that needs to be fixed in the compiler's codegen phase. If someone wants to fix it and then remove the hack, please go ahead :-)

@bew
Copy link
Contributor

bew commented Jun 5, 2017

@asterite what's the problem, why does it take 10 seconds for simple programs?

@asterite
Copy link
Member

asterite commented Jun 5, 2017

Run --emit llvm-ir on this code:

fun testfun : Void
  [
    {1, 2, 3},
    {1, 2, 3},
    {1, 2, 3},
  ]
end

You will see something like this:

define void @testfun() #0 !dbg !7144 {
alloca:
  %__temp_213 = alloca %"Tuple(Int32, Int32, Int32)"*, !dbg !7146
  %capacity = alloca i32, !dbg !7146
  %ary = alloca %"Array(Tuple(Int32, Int32, Int32))"*, !dbg !7146
  %0 = alloca %"Tuple(Int32, Int32, Int32)", !dbg !7147
  %1 = alloca %"Tuple(Int32, Int32, Int32)", !dbg !7148
  %2 = alloca %"Tuple(Int32, Int32, Int32)", !dbg !7148
  br label %entry

That is, one alloca per each tuple. In the unicode data there are lots of tuples, maybe 100+ per method. LLVM is known to be very slow when compiling functions with lots of allocas. Also the generated functions are very big. I exagerated a bit with 10 seconds, the time for me goes from 0.2s to 0.7s, which is an unacceptable slowdown in compile times.

The codegen phase needs to be smarter about this (avoid allocas, generate less code), but for now this workarounds works well.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 5, 2017

Thanks for explanation. Will add ref to this.

@akzhan akzhan closed this as completed Jun 5, 2017
akzhan added a commit to akzhan/crystal that referenced this issue Jun 5, 2017
for explanation of the hack.
@S-YOU
Copy link

S-YOU commented Jan 16, 2018

I am not trying to say to go backwards, but this file is generated.
So, instead of writing only .cr file, I think you could keep data in C array, and load as lib from crystal and cast it as Int32* in crystal, you know the exact size of arrays during generating. And also .o file is cacheable, so no performance hit during compilation too.

@HertzDevil
Copy link
Contributor

HertzDevil commented Jul 13, 2021

For primitive integers there is a hack: encoding the entire contents of the array as a non-UTF8 String, then constructing a read-only Slice out of it. For example:

POWERS_OF_10 = Int32[1, 10, 100, 1000, 10000, 100000]

# the above becomes:
POWERS_OF_10 = begin
  __temp = "\x01\x00\x00\x00\x0A\x00\x00\x00\x64\x00\x00\x00\xE8\x03\x00\x00\x10\x27\x00\x00\xA0\x86\x01\x00".to_unsafe
  ::Slice.new(__temp.unsafe_as(::Pointer(Int32)), 6, read_only: true)
end

It can be done entirely with a macro:

macro const_data(value)
  {%
    value.raise "expected Call, got #{value.class_name.id}" unless value.is_a?(Call)
    value.name.raise "expected call to `[]`, got #{value.name}" unless value.name == "[]"
    value.receiver.raise "expected Path, got #{value.receiver.class_name.id}" unless value.receiver.is_a?(Path)

    int_type = value.receiver.resolve?
    raise "expected primitive integer type, got #{int_type}" unless Int::Primitive.union_types.includes?(int_type)
    escapes = [] of String
    bytes = [] of String
    hex = "0123456789ABCDEF".chars.map(&.id)

    # add padding to account for alignment (string header occupies 12 bytes)
    if int_type == Int64 || int_type == UInt64
      escapes << "\\x00\\x00\\x00\\x00"
    end

    value.args.each do |x|
      raise "Invalid #{int_type}: #{x}" unless x >= int_type.constant(:MIN) && x <= int_type.constant(:MAX)
      bytes.clear
      if int_type == Int8 || int_type == UInt8
        bytes << "\\x#{hex[(x >> 4) & 0xF]}#{hex[x & 0xF]}"
      elsif int_type == Int16 || int_type == UInt16
        bytes << "\\x#{hex[(x >> 4) & 0xF]}#{hex[x & 0xF]}"
        bytes << "\\x#{hex[(x >> 12) & 0xF]}#{hex[(x >> 8) & 0xF]}"
      elsif int_type == Int32 || int_type == UInt32
        bytes << "\\x#{hex[(x >> 4) & 0xF]}#{hex[x & 0xF]}"
        bytes << "\\x#{hex[(x >> 12) & 0xF]}#{hex[(x >> 8) & 0xF]}"
        bytes << "\\x#{hex[(x >> 20) & 0xF]}#{hex[(x >> 16) & 0xF]}"
        bytes << "\\x#{hex[(x >> 28) & 0xF]}#{hex[(x >> 24) & 0xF]}"
      elsif int_type == Int64 || int_type == UInt64
        bytes << "\\x#{hex[(x >> 4) & 0xF]}#{hex[x & 0xF]}"
        bytes << "\\x#{hex[(x >> 12) & 0xF]}#{hex[(x >> 8) & 0xF]}"
        bytes << "\\x#{hex[(x >> 20) & 0xF]}#{hex[(x >> 16) & 0xF]}"
        bytes << "\\x#{hex[(x >> 28) & 0xF]}#{hex[(x >> 24) & 0xF]}"
        bytes << "\\x#{hex[(x >> 36) & 0xF]}#{hex[(x >> 32) & 0xF]}"
        bytes << "\\x#{hex[(x >> 44) & 0xF]}#{hex[(x >> 40) & 0xF]}"
        bytes << "\\x#{hex[(x >> 52) & 0xF]}#{hex[(x >> 48) & 0xF]}"
        bytes << "\\x#{hex[(x >> 60) & 0xF]}#{hex[(x >> 56) & 0xF]}"
      else
        raise "BUG: unsupported int_type #{int_type}"
      end

      unless IO::ByteFormat::SystemEndian == IO::ByteFormat::LittleEndian
        bytes = (0...bytes.size).map { |i| bytes[-i - 1] }
      end
      escapes << bytes.join("").id
    end
  %}

  begin
    %ptr = "{{ escapes.join("").id }}".to_unsafe
    {% if int_type == Int64 || int_type == UInt64 %} %ptr += (-String::HEADER_SIZE) % 8 {% end %}
    ::Slice.new(%ptr.unsafe_as(::Pointer({{ int_type }})), {{ value.args.size }}, read_only: true)
  end
end

Example usage:

{% begin %}
  SQUARES = const_data Int64[{% for i in 1..10000 %}{{ i * i }}, {% end %}]
{% end %}

puts "okay" if SQUARES.sum == 10000_i64 * 10001_i64 * 20001_i64 // 6 # okay

Compiler stats with the const_data macro:

$ crystal --version
Crystal 1.0.0 [dd40a2442] (2021-03-22)

LLVM: 10.0.0
Default target: x86_64-unknown-linux-gnu

$ crystal run --stats const_data.cr
Parse:                             00:00:00.001120836 (   1.00MB)
Semantic (top level):              00:00:00.782978076 ( 106.66MB)
Semantic (new):                    00:00:00.001563506 ( 106.66MB)
Semantic (type declarations):      00:00:00.022377438 ( 106.66MB)
Semantic (abstract def check):     00:00:00.011341638 ( 106.66MB)
Semantic (ivars initializers):     00:00:00.029802889 ( 106.66MB)
Semantic (cvars initializers):     00:00:00.090018114 ( 106.66MB)
Semantic (main):                   00:00:00.208435125 ( 106.72MB)
Semantic (cleanup):                00:00:00.004821887 ( 106.72MB)
Semantic (recursive struct check): 00:00:00.000940188 ( 106.72MB)
Codegen (crystal):                 00:00:00.368542466 ( 106.72MB)
Codegen (bc+obj):                  00:00:01.396642477 ( 106.72MB)
Codegen (linking):                 00:00:00.198957887 ( 106.72MB)

Codegen (bc+obj):
 - no previous .o files were reused
Execute:                           00:00:00.008469669 ( 106.72MB)

$ crystal run --stats --release const_data.cr
Parse:                             00:00:00.001280861 (   1.00MB)
Semantic (top level):              00:00:00.715491060 ( 106.66MB)
Semantic (new):                    00:00:00.001425830 ( 106.66MB)
Semantic (type declarations):      00:00:00.022352583 ( 106.66MB)
Semantic (abstract def check):     00:00:00.007131136 ( 106.66MB)
Semantic (ivars initializers):     00:00:00.024142182 ( 106.66MB)
Semantic (cvars initializers):     00:00:00.087466321 ( 106.66MB)
Semantic (main):                   00:00:00.214100435 ( 106.72MB)
Semantic (cleanup):                00:00:00.004929722 ( 106.72MB)
Semantic (recursive struct check): 00:00:00.000990324 ( 106.72MB)
Codegen (crystal):                 00:00:00.321377562 ( 106.72MB)
Codegen (bc+obj):                  00:00:12.400422522 ( 106.72MB)
Codegen (linking):                 00:00:00.114224412 ( 106.72MB)

Codegen (bc+obj):
 - no previous .o files were reused
Execute:                           00:00:00.007889040 ( 106.72MB)

Without the macro:

$ crystal run --stats const_data.cr
Parse:                             00:00:00.001196452 (   1.00MB)
Semantic (top level):              00:00:00.280059839 (  59.15MB)
Semantic (new):                    00:00:00.001538629 (  59.15MB)
Semantic (type declarations):      00:00:00.035312029 (  59.15MB)
Semantic (abstract def check):     00:00:00.009662214 (  59.15MB)
Semantic (ivars initializers):     00:00:00.037343606 (  59.15MB)
Semantic (cvars initializers):     00:00:00.199617703 (  75.21MB)
Semantic (main):                   00:00:00.365238564 ( 107.33MB)
Semantic (cleanup):                00:00:00.006895654 ( 107.33MB)
Semantic (recursive struct check): 00:00:00.001133926 ( 107.33MB)
Codegen (crystal):                 00:00:00.466290108 ( 123.33MB)
Codegen (bc+obj):                  00:00:01.999441506 ( 123.33MB)
Codegen (linking):                 00:00:00.203274394 ( 123.33MB)

Codegen (bc+obj):
 - no previous .o files were reused
Execute:                           00:00:00.010151760 ( 123.33MB)

$ crystal run --stats --release const_data.cr
Parse:                             00:00:00.001651610 (   1.00MB)
Semantic (top level):              00:00:00.286388482 (  58.66MB)
Semantic (new):                    00:00:00.001524574 (  58.66MB)
Semantic (type declarations):      00:00:00.027061158 (  58.66MB)
Semantic (abstract def check):     00:00:00.009273951 (  58.66MB)
Semantic (ivars initializers):     00:00:00.035160314 (  58.66MB)
Semantic (cvars initializers):     00:00:00.195522580 (  74.72MB)
Semantic (main):                   00:00:00.339564279 ( 106.84MB)
Semantic (cleanup):                00:00:00.006341359 ( 106.84MB)
Semantic (recursive struct check): 00:00:00.001118258 ( 106.84MB)
Codegen (crystal):                 00:00:00.423288175 ( 122.84MB)
Codegen (bc+obj):^C

$ # I gave up after around 15 minutes

A read-only Slice is much safer than a StaticArray, can be freely passed around, and saves all the runtime costs associated with Array#<<. For floating-point types and other non-primitive types the story is obviously very different.

@straight-shoota
Copy link
Member

Looks like a good use case for #2886 🙈

@HertzDevil
Copy link
Contributor

That alone too wouldn't allow floating-point arrays, unfortunately. The macro language is simply not expressive enough to convert 1.0_f32 into "\x00\x00\xF8\x03".

@straight-shoota
Copy link
Member

Yeah, I wasn't refering to that hurdle but to the concept in general.

The macro expressiveness limitation would not exist if this just becomes a compiler feature.

@asterite
Copy link
Member

I think in general it would be nice to have const/immutable data structures that, if assigned to a constant, would generate static data in the executable. Maybe with a new feature.

@HertzDevil
Copy link
Contributor

HertzDevil commented Jul 14, 2021

This should be fairly easy for aggregate types without inner pointers (e.g. StaticArray and Tuple) since the infrastructure is already there in LLVM::Context#const_struct. Slightly more complex initializers like String::CHAR_TO_DIGIT could be worked around by constructing the array literal in the macro language (this is also necessary for the hack above):

{% begin %}
  {%
    table = [] of Int8
    (0...256).each { table << -1 }
    (0x30..0x39).each { |i| table[i] = i - 0x30 }
    (0x41..0x5a).each { |i| table[i] = i - 0x41 + 10 }
    (0x61..0x7a).each { |i| table[i] = i - 0x61 + 10 }
  %}

  @[ReadOnly]
  CHAR_TO_DIGIT = Int8.static_array({{ table.splat }})
{% end %}

IMO the main problem is deciding which constant expressions are allowed here. For such a feature to be useful it needs to support a wider set of expressions than MathInterpreter.

For Array and Hash, again I'm not sure how to approach them.

straight-shoota pushed a commit that referenced this issue Feb 22, 2025
We use a special `put` function instead of an array literal for populating large `Array` constants (see #4516), since this eliminates the individual `alloca`s for each array element. However, this is only true for simple literals, and tuple literals are not one of them. The following snippet:

```crystal
def put(array : Array, values) : Nil
  array << values
end

N = 2

def foo
  data = Array({Int32, Int32, Int32}).new(N)
  {% for _ in 0...N %}
    put(data, {1, 2, 3})
  {% end %}
end

foo
```

produces:

```llvm
define internal void @"*foo:Nil"() #0 {
alloca:
  %data = alloca ptr, align 8
  %0 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  %1 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  %2 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)::new<Int32>:Array(Tuple(Int32, Int32, Int32))"(i32 844, i32 2)
  store ptr %2, ptr %data, align 8
  %3 = load ptr, ptr %data, align 8
  %4 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 0
  store i32 1, ptr %4, align 4
  %5 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 1
  store i32 2, ptr %5, align 4
  %6 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 2
  store i32 3, ptr %6, align 4
  %7 = load %"Tuple(Int32, Int32, Int32)", ptr %0, align 4
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %3, %"Tuple(Int32, Int32, Int32)" %7)
  %8 = load ptr, ptr %data, align 8
  %9 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 0
  store i32 1, ptr %9, align 4
  %10 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 1
  store i32 2, ptr %10, align 4
  %11 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 2
  store i32 3, ptr %11, align 4
  %12 = load %"Tuple(Int32, Int32, Int32)", ptr %1, align 4
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %8, %"Tuple(Int32, Int32, Int32)" %12)
  ret void
}

; Function Attrs: uwtable
define internal void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %array, %"Tuple(Int32, Int32, Int32)" %values) #0 {
alloca:
  %values1 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  store %"Tuple(Int32, Int32, Int32)" %values, ptr %values1, align 4
  %0 = load %"Tuple(Int32, Int32, Int32)", ptr %values1, align 4
  %1 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)#<<<Tuple(Int32, Int32, Int32)>:Array(Tuple(Int32, Int32, Int32))"(ptr %array, %"Tuple(Int32, Int32, Int32)" %0)
  ret void
}
```

whereas this:

```crystal
def put(array : Array, *values) : Nil
  array << values
end

N = 2

def foo
  data = Array({Int32, Int32, Int32}).new(N)
  {% for _ in 0...N %}
    put(data, 1, 2, 3)
  {% end %}
end

foo
```

produces:

```llvm
; Function Attrs: uwtable
define internal void @"*foo:Nil"() #0 {
alloca:
  %data = alloca ptr, align 8
  br label %entry

entry:                                            ; preds = %alloca
  %0 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)::new<Int32>:Array(Tuple(Int32, Int32, Int32))"(i32 844, i32 2)
  store ptr %0, ptr %data, align 8
  %1 = load ptr, ptr %data, align 8
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %1, i32 1, i32 2, i32 3)
  %2 = load ptr, ptr %data, align 8
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %2, i32 1, i32 2, i32 3)
  ret void
}

; Function Attrs: uwtable
define internal void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %array, i32 %__temp_1436, i32 %__temp_1437, i32 %__temp_1438) #0 {
alloca:
  %values = alloca %"Tuple(Int32, Int32, Int32)", align 8
  %0 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  %1 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 0
  store i32 %__temp_1436, ptr %1, align 4
  %2 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 1
  store i32 %__temp_1437, ptr %2, align 4
  %3 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 2
  store i32 %__temp_1438, ptr %3, align 4
  %4 = load %"Tuple(Int32, Int32, Int32)", ptr %0, align 4
  store %"Tuple(Int32, Int32, Int32)" %4, ptr %values, align 4
  %5 = load %"Tuple(Int32, Int32, Int32)", ptr %values, align 4
  %6 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)#<<<Tuple(Int32, Int32, Int32)>:Array(Tuple(Int32, Int32, Int32))"(ptr %array, %"Tuple(Int32, Int32, Int32)" %5)
  ret void
}
```

The `alloca` and `getelementptr` instructions are moved into the `put` function. If we set `N = 10000` instead, this subtle change could reduce the bytecode generation phase's time for this snippet by as much as 0.7s on my Windows machine.
kojix2 pushed a commit to kojix2/crystal that referenced this issue Feb 23, 2025
…al-lang#15495)

We use a special `put` function instead of an array literal for populating large `Array` constants (see crystal-lang#4516), since this eliminates the individual `alloca`s for each array element. However, this is only true for simple literals, and tuple literals are not one of them. The following snippet:

```crystal
def put(array : Array, values) : Nil
  array << values
end

N = 2

def foo
  data = Array({Int32, Int32, Int32}).new(N)
  {% for _ in 0...N %}
    put(data, {1, 2, 3})
  {% end %}
end

foo
```

produces:

```llvm
define internal void @"*foo:Nil"() #0 {
alloca:
  %data = alloca ptr, align 8
  %0 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  %1 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  %2 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)::new<Int32>:Array(Tuple(Int32, Int32, Int32))"(i32 844, i32 2)
  store ptr %2, ptr %data, align 8
  %3 = load ptr, ptr %data, align 8
  %4 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 0
  store i32 1, ptr %4, align 4
  %5 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 1
  store i32 2, ptr %5, align 4
  %6 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 2
  store i32 3, ptr %6, align 4
  %7 = load %"Tuple(Int32, Int32, Int32)", ptr %0, align 4
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %3, %"Tuple(Int32, Int32, Int32)" %7)
  %8 = load ptr, ptr %data, align 8
  %9 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 0
  store i32 1, ptr %9, align 4
  %10 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 1
  store i32 2, ptr %10, align 4
  %11 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %1, i32 0, i32 2
  store i32 3, ptr %11, align 4
  %12 = load %"Tuple(Int32, Int32, Int32)", ptr %1, align 4
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %8, %"Tuple(Int32, Int32, Int32)" %12)
  ret void
}

; Function Attrs: uwtable
define internal void @"*put<Array(Tuple(Int32, Int32, Int32)), Tuple(Int32, Int32, Int32)>:Nil"(ptr %array, %"Tuple(Int32, Int32, Int32)" %values) #0 {
alloca:
  %values1 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  store %"Tuple(Int32, Int32, Int32)" %values, ptr %values1, align 4
  %0 = load %"Tuple(Int32, Int32, Int32)", ptr %values1, align 4
  %1 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)#<<<Tuple(Int32, Int32, Int32)>:Array(Tuple(Int32, Int32, Int32))"(ptr %array, %"Tuple(Int32, Int32, Int32)" %0)
  ret void
}
```

whereas this:

```crystal
def put(array : Array, *values) : Nil
  array << values
end

N = 2

def foo
  data = Array({Int32, Int32, Int32}).new(N)
  {% for _ in 0...N %}
    put(data, 1, 2, 3)
  {% end %}
end

foo
```

produces:

```llvm
; Function Attrs: uwtable
define internal void @"*foo:Nil"() #0 {
alloca:
  %data = alloca ptr, align 8
  br label %entry

entry:                                            ; preds = %alloca
  %0 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)::new<Int32>:Array(Tuple(Int32, Int32, Int32))"(i32 844, i32 2)
  store ptr %0, ptr %data, align 8
  %1 = load ptr, ptr %data, align 8
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %1, i32 1, i32 2, i32 3)
  %2 = load ptr, ptr %data, align 8
  call void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %2, i32 1, i32 2, i32 3)
  ret void
}

; Function Attrs: uwtable
define internal void @"*put<Array(Tuple(Int32, Int32, Int32)), Int32, Int32, Int32>:Nil"(ptr %array, i32 %__temp_1436, i32 %__temp_1437, i32 %__temp_1438) #0 {
alloca:
  %values = alloca %"Tuple(Int32, Int32, Int32)", align 8
  %0 = alloca %"Tuple(Int32, Int32, Int32)", align 8
  br label %entry

entry:                                            ; preds = %alloca
  %1 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 0
  store i32 %__temp_1436, ptr %1, align 4
  %2 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 1
  store i32 %__temp_1437, ptr %2, align 4
  %3 = getelementptr inbounds %"Tuple(Int32, Int32, Int32)", ptr %0, i32 0, i32 2
  store i32 %__temp_1438, ptr %3, align 4
  %4 = load %"Tuple(Int32, Int32, Int32)", ptr %0, align 4
  store %"Tuple(Int32, Int32, Int32)" %4, ptr %values, align 4
  %5 = load %"Tuple(Int32, Int32, Int32)", ptr %values, align 4
  %6 = call ptr @"*Array(Tuple(Int32, Int32, Int32))@array(T)#<<<Tuple(Int32, Int32, Int32)>:Array(Tuple(Int32, Int32, Int32))"(ptr %array, %"Tuple(Int32, Int32, Int32)" %5)
  ret void
}
```

The `alloca` and `getelementptr` instructions are moved into the `put` function. If we set `N = 10000` instead, this subtle change could reduce the bytecode generation phase's time for this snippet by as much as 0.7s on my Windows machine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants