Multi-Row Insert #25

AndrewRademacher · 2015-02-21T18:36:03Z

The executeMany functionality in the postgresql-simple driver can be the difference between inserting a few hundred rows a second and inserting tens of thousands. Are there any plans to implement this functionality in hasql?

The text was updated successfully, but these errors were encountered:

begriffs · 2015-02-21T18:48:06Z

+1

also relates to issue #1

nikita-volkov · 2015-02-21T19:26:27Z

I've planned to explore this subject, but I'm swamped currently.

A proper place to start with this IMO is to write a benchmark, which proves that implementing this feature would make much difference at all, since the mechanics behind "postgresql-simple" and "hasql" are very different. Such a benchmark, for instance, could compare inserting a lot of rows using the standard "hasql" API and with "executeMany" of "postgresql-simple".

AndrewRademacher · 2015-02-21T21:08:51Z

I've setup the benchmark you have described at https://github.com/AndrewRademacher/sql-driver-race. It includes both the benchmark code and the results in the results.html file. This test indicates that postgres executeMany is about 270 times faster at inserting large batches.

nikita-volkov · 2015-02-22T13:03:12Z

@AndrewRademacher Thank you. So this settles it, the thing needs to implemented. However I am gonna remain swamped in the foreseeable future, so it's a subject for contribution.

AndrewRademacher · 2015-02-22T16:31:40Z

Fair enough. I don't really have any experience with this sort of thing, but I'll look into it as well.

cocreature · 2017-03-04T21:27:11Z

I’ve looked into this and it’s a bit tricky. libpq does not have special support for multi-row inserts one thereby has to modify the query string to include more parameters and in fact that’s exactly what postgresql-simple does. It is also really ugly imho. One way around this is to pass array parameters through libpq and then use unnest in the query. However, you then run into problems because libpq does not support passing arrays of composites. You can still work around this by passing multple arrays and then zipping them in sql.

The good news is: This does the job! I’ve updated the benchmarks and included this workaround and it’s now faster than postgresql-simple (and significantly faster than hasql without this workaround). The bad news: It’s still a bit ugly. I wonder if it would be possible to bundle this nicely into a library function.

nikita-volkov · 2017-03-13T09:20:27Z

Inspired by the input from @cocreature (Cheers, Moritz!), I've come up with a solution.

First, a bit of an insight on the problem. It's true that Postgres limits us from passing arrays of composites as parameters to the queries. The reason is the underlying uncomposable OID-based type identification system that it has. Each type has to have a final unique OID, which basically removes the anonymous composite types from the picture. If we can't have anonymous composite types, we can't have arrays of them either, hence is our problem.

What we can do though is work around that by, instead of passing an array of products, passing a product of arrays.

Starting from Postgres version 9.4 the unnest function can be applied to multiple arrays of different types, when used in the from clause (Docs). Thus we can create a select query, which turns multiple arrays of different types into rows. E.g.:

select * from unnest(array[1,2,3], array[true, false])

We can then combine that with our ability to use select instead of values in the insert statement:

insert into "location" ("id", "x", "y") select * from unnest ($1, $2, $3)

The final Hasql query for that statement then can look like this:

insertMultipleLocations :: Query (Vector (UUID, Double, Double)) ()
insertMultipleLocations =
  statement sql encoder decoder True
  where
    sql =
      "insert into location (id, x, y) select * from unnest ($1, $2, $3)"
    encoder =
      contramap Vector.unzip3 $
      contrazip3 (vector Encoders.uuid) (vector Encoders.float8) (vector Encoders.float8)
      where
        vector value =
          Encoders.value (Encoders.array (Encoders.arrayDimension foldl' (Encoders.arrayValue value)))
    decoder =
      Decoders.unit

I must remind that this solution seems to be applicable to Postgres of versions starting from 9.4. For older versions you'll have to simulate the same with a more verbose work-around. The answers to this StackOverflow question should be of help.

axman6 · 2017-04-23T12:20:42Z

@nikita-volkov Would you be able to add this example to the documentation for Hasql.Encoder? It would be really valuable in the the section on arrays.

nikita-volkov · 2017-04-23T15:41:28Z

@axman6 Can you make a PR?

mbj · 2018-05-25T21:31:57Z

@nikita-volkov Another worthwhile (?) alternative to encode to a tuple with lists per column is to use a query like:

INSERT INTO table_name ( field_a, field_b ) SELECT ( field_a, field_b ) FROM json_populate_recordset ( null::table, '[{"field_a":1,"field_b":2}]' )

Especially when you have a record type with an aeson ToJSON instance that maps to your DB fields. It removes the need to encode records to tuples, which on larger records can cause some duplication.

Despite the flaws of the record types, they are easier to handle in masses than long tuples.

If you consider this a worthwhile alternative please give me a headsup and I'll PR a documentation addition.

vdukhovni · 2020-09-06T21:57:51Z

It would I think be great to have a short version of this discussion in the Encoder documentation, that explains briefly that composite encoders are not generally viable, but that one can get most of the desired functionality from unnest, by encoding separate arrays for each desired "primitive" column, and then creating the desired row-like objects with unnest (i.e. the Postgres equivalent of zip).

nikita-volkov · 2020-09-07T07:38:52Z

Care to PR?

sigma-andex · 2022-10-24T07:40:52Z

For anyone else stumbling upon this issue:
I was able to circumvent this issue by serialising my list of data to a json array of json objects and insert it as a single row with a single value:

insert into mytable (myfield)
(
select
            (val->>'myfield') :: text as myfield
            -- ... add all the fields you need from the record
from jsonb_array_elements($1 :: jsonb) j(val)
)

My insertion times went down from ~30s for 1000 rows to ~0.5s for 1000 rows.

nikita-volkov closed this as completed Mar 13, 2017

nikita-volkov mentioned this issue Mar 13, 2017

Multi-row statement parameters #14

Closed

nikita-volkov mentioned this issue Mar 29, 2017

Unable to encode array of composites #65

Closed

axman6 mentioned this issue Apr 24, 2017

Add documentation on how to perform an insert or multiple values. #73

Merged

nikita-volkov mentioned this issue Dec 3, 2018

Adding Encoding/Decoding for custom PG Types #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Row Insert #25

Multi-Row Insert #25

AndrewRademacher commented Feb 21, 2015

begriffs commented Feb 21, 2015

nikita-volkov commented Feb 21, 2015

AndrewRademacher commented Feb 21, 2015

nikita-volkov commented Feb 22, 2015

AndrewRademacher commented Feb 22, 2015

cocreature commented Mar 4, 2017

nikita-volkov commented Mar 13, 2017

axman6 commented Apr 23, 2017

nikita-volkov commented Apr 23, 2017

mbj commented May 25, 2018

vdukhovni commented Sep 6, 2020

nikita-volkov commented Sep 7, 2020

sigma-andex commented Oct 24, 2022 •

edited

Loading

Multi-Row Insert #25

Multi-Row Insert #25

Comments

AndrewRademacher commented Feb 21, 2015

begriffs commented Feb 21, 2015

nikita-volkov commented Feb 21, 2015

AndrewRademacher commented Feb 21, 2015

nikita-volkov commented Feb 22, 2015

AndrewRademacher commented Feb 22, 2015

cocreature commented Mar 4, 2017

nikita-volkov commented Mar 13, 2017

axman6 commented Apr 23, 2017

nikita-volkov commented Apr 23, 2017

mbj commented May 25, 2018

vdukhovni commented Sep 6, 2020

nikita-volkov commented Sep 7, 2020

sigma-andex commented Oct 24, 2022 • edited Loading

sigma-andex commented Oct 24, 2022 •

edited

Loading