Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Row Insert #25

Closed
AndrewRademacher opened this issue Feb 21, 2015 · 13 comments
Closed

Multi-Row Insert #25

AndrewRademacher opened this issue Feb 21, 2015 · 13 comments

Comments

@AndrewRademacher
Copy link
Contributor

The executeMany functionality in the postgresql-simple driver can be the difference between inserting a few hundred rows a second and inserting tens of thousands. Are there any plans to implement this functionality in hasql?

@begriffs
Copy link

+1

also relates to issue #1

@nikita-volkov
Copy link
Owner

I've planned to explore this subject, but I'm swamped currently.

A proper place to start with this IMO is to write a benchmark, which proves that implementing this feature would make much difference at all, since the mechanics behind "postgresql-simple" and "hasql" are very different. Such a benchmark, for instance, could compare inserting a lot of rows using the standard "hasql" API and with "executeMany" of "postgresql-simple".

@AndrewRademacher
Copy link
Contributor Author

I've setup the benchmark you have described at https://github.com/AndrewRademacher/sql-driver-race. It includes both the benchmark code and the results in the results.html file. This test indicates that postgres executeMany is about 270 times faster at inserting large batches.

@nikita-volkov
Copy link
Owner

@AndrewRademacher Thank you. So this settles it, the thing needs to implemented. However I am gonna remain swamped in the foreseeable future, so it's a subject for contribution.

@AndrewRademacher
Copy link
Contributor Author

Fair enough. I don't really have any experience with this sort of thing, but I'll look into it as well.

@cocreature
Copy link
Contributor

I’ve looked into this and it’s a bit tricky. libpq does not have special support for multi-row inserts one thereby has to modify the query string to include more parameters and in fact that’s exactly what postgresql-simple does. It is also really ugly imho. One way around this is to pass array parameters through libpq and then use unnest in the query. However, you then run into problems because libpq does not support passing arrays of composites. You can still work around this by passing multple arrays and then zipping them in sql.

The good news is: This does the job! I’ve updated the benchmarks and included this workaround and it’s now faster than postgresql-simple (and significantly faster than hasql without this workaround). The bad news: It’s still a bit ugly. I wonder if it would be possible to bundle this nicely into a library function.

@nikita-volkov
Copy link
Owner

Inspired by the input from @cocreature (Cheers, Moritz!), I've come up with a solution.

First, a bit of an insight on the problem. It's true that Postgres limits us from passing arrays of composites as parameters to the queries. The reason is the underlying uncomposable OID-based type identification system that it has. Each type has to have a final unique OID, which basically removes the anonymous composite types from the picture. If we can't have anonymous composite types, we can't have arrays of them either, hence is our problem.

What we can do though is work around that by, instead of passing an array of products, passing a product of arrays.

Starting from Postgres version 9.4 the unnest function can be applied to multiple arrays of different types, when used in the from clause (Docs). Thus we can create a select query, which turns multiple arrays of different types into rows. E.g.:

select * from unnest(array[1,2,3], array[true, false])

We can then combine that with our ability to use select instead of values in the insert statement:

insert into "location" ("id", "x", "y") select * from unnest ($1, $2, $3)

The final Hasql query for that statement then can look like this:

insertMultipleLocations :: Query (Vector (UUID, Double, Double)) ()
insertMultipleLocations =
  statement sql encoder decoder True
  where
    sql =
      "insert into location (id, x, y) select * from unnest ($1, $2, $3)"
    encoder =
      contramap Vector.unzip3 $
      contrazip3 (vector Encoders.uuid) (vector Encoders.float8) (vector Encoders.float8)
      where
        vector value =
          Encoders.value (Encoders.array (Encoders.arrayDimension foldl' (Encoders.arrayValue value)))
    decoder =
      Decoders.unit

I must remind that this solution seems to be applicable to Postgres of versions starting from 9.4. For older versions you'll have to simulate the same with a more verbose work-around. The answers to this StackOverflow question should be of help.

@axman6
Copy link
Contributor

axman6 commented Apr 23, 2017

@nikita-volkov Would you be able to add this example to the documentation for Hasql.Encoder? It would be really valuable in the the section on arrays.

@nikita-volkov
Copy link
Owner

@axman6 Can you make a PR?

@mbj
Copy link
Contributor

mbj commented May 25, 2018

@nikita-volkov Another worthwhile (?) alternative to encode to a tuple with lists per column is to use a query like:

INSERT INTO table_name ( field_a, field_b ) SELECT ( field_a, field_b ) FROM json_populate_recordset ( null::table, '[{"field_a":1,"field_b":2}]' )

Especially when you have a record type with an aeson ToJSON instance that maps to your DB fields. It removes the need to encode records to tuples, which on larger records can cause some duplication.

Despite the flaws of the record types, they are easier to handle in masses than long tuples.

If you consider this a worthwhile alternative please give me a headsup and I'll PR a documentation addition.

@vdukhovni
Copy link

It would I think be great to have a short version of this discussion in the Encoder documentation, that explains briefly that composite encoders are not generally viable, but that one can get most of the desired functionality from unnest, by encoding separate arrays for each desired "primitive" column, and then creating the desired row-like objects with unnest (i.e. the Postgres equivalent of zip).

@nikita-volkov
Copy link
Owner

Care to PR?

@sigma-andex
Copy link

sigma-andex commented Oct 24, 2022

For anyone else stumbling upon this issue:
I was able to circumvent this issue by serialising my list of data to a json array of json objects and insert it as a single row with a single value:

insert into mytable (myfield)
(
select
            (val->>'myfield') :: text as myfield
            -- ... add all the fields you need from the record
from jsonb_array_elements($1 :: jsonb) j(val)
)

My insertion times went down from ~30s for 1000 rows to ~0.5s for 1000 rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants