Skip to content

Commit

Permalink
GH-44575: [C#] Replace LINQ expression with for loop (#44576)
Browse files Browse the repository at this point in the history
For code which repeatedly access columns by name, this LINQ expression can form part of the hot path. This PR replaces the LINQ with the equivalent for loop, and should preserve all existing behaviour ([return -1 in the event of no match](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.indexof?view=net-8.0#system-collections-generic-list-1-indexof(-0))).

I ran a quick benchmark to validate the speedup

```cs
[MemoryDiagnoser]
public class ColumnIndexerBenchmark
{
    private readonly RecordBatch _batch;

    public ColumnIndexerBenchmark()
    {
        var builder = new Schema.Builder();
        builder
            .Field(new Field("A", Int32Type.Default, true))
            .Field(new Field("B", Int32Type.Default, true))
            .Field(new Field("C", Int32Type.Default, true))
            .Field(new Field("D", Int32Type.Default, true))
            .Field(new Field("E", Int32Type.Default, true))
            .Field(new Field("F", Int32Type.Default, true))
            .Field(new Field("G", Int32Type.Default, true))
            .Field(new Field("H", Int32Type.Default, true))
            .Field(new Field("I", Int32Type.Default, true))
            .Field(new Field("J", Int32Type.Default, true));
        var schema = builder.Build();
        _batch = new RecordBatch(schema, new IArrowArray[schema.FieldsList.Count], 0);
    }

    [Benchmark]
    public void GetColumnByIndex()
    {
        _batch.Column("H", StringComparer.Ordinal);
    }
}

```

Some numbers from my machine

```
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
13th Gen Intel Core i7-13800H, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.306
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
```

| Method                  | Mean     | Error     | StdDev    | Gen0   | Allocated |
|------------------------ |---------:|----------:|----------:|-------:|----------:|
| GetColumnByIndexLinq    | 67.84 ns | 1.178 ns  | 1.102 ns  | 0.0107 |     136 B |
| GetColumnByIndexForLoop | 9.428 ns | 0.1334 ns | 0.1114 ns |      - |         - |

In theory, we could achieve a greater speedup by maintaining a lookup of column names to ordinals. We already have several lookup structures inside `Schema`, but none of them provides access to ordinal values. However, the speedup from adding another mapping might not warrant adding yet another lookup structure to `Schema`.

If merged, will close #44575.

* GitHub Issue: #44575

Authored-by: George Vanburgh <gvanburgh@bloomberg.net>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
  • Loading branch information
georgevanburgh authored and pull[bot] committed Jan 5, 2025
1 parent d7140ef commit 46342ea
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion csharp/src/Apache.Arrow/Schema.cs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,13 @@ public int GetFieldIndex(string name, IEqualityComparer<string> comparer = defau
{
comparer ??= StringComparer.CurrentCulture;

return _fieldsList.IndexOf(_fieldsList.First(x => comparer.Equals(x.Name, name)));
for (int i = 0; i < _fieldsList.Count; i++)
{
if (comparer.Equals(_fieldsList[i].Name, name))
return i;
}

return -1;
}

public Schema RemoveField(int fieldIndex)
Expand Down

0 comments on commit 46342ea

Please sign in to comment.