[Docs] Read costs could be interpreted as much higher they'll end up. #398

Izhaki · 2023-02-23T22:37:27Z

Izhaki
Feb 23, 2023

As I was doing some Database design today, it occur to me to double check the pricing details. I was commuting so only had access to the docs, and having read them, was quite alarmed to a point of thinking to drop PlanetScale altogether.

The Question

The question I was trying to answer was:

If there are 100,000 users in the DB, and I search for a user with a specific unique id, will that constitute 100,000 reads?

The Docs

So I headed to the docs, specifically the Billing page, and there goes it:

Under PlanetScale charges on three factors it says (bolds are mine):

Reads: Retrieving or inspecting rows during a query or mutation of any kind to your PlanetScale databases.

Well, how would you find a row in a DB without inspecting some rows? I know that under the hood it's a binary search on an index, but that is still a form of "inspection of any kind".

Then under Understanding rows read, it says:

Rows read is a measure of the work that the database engine does

Searching for a specific record is work.

Then it goes:

The SELECT count(*) query is a special case.

OK, so if this is a special case then any other selects wont be?

Note, that all the docs citations above seem to guard against cost under-calculations.

Experiment 1

At this point, I got really concerned, when I got back home I ran this query on a table with 10 rows:

SELECT comments.id, comments.text FROM comments WHERE comments.id = 1;

And that yielded a read of 1 row:

Full reproduction queries

CREATE TABLE comments (
  id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
  text varchar(255) NOT NULL
);


INSERT INTO comments (text)
VALUES  ('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ('H'), ('I'), ('G');

SELECT comments.id, comments.text FROM comments WHERE comments.id = 1;

Experiment 2

SELECT comments.id, comments.text FROM comments WHERE comments.text = "A";

That will yield 10 reads (makes sense).

Experiment 3

Setup (note the index on text):

CREATE TABLE comments (
  id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
  text varchar(255) NOT NULL,
  KEY `text_idx` (`text`)
);


INSERT INTO comments (text)
VALUES  ('123'), ('234'), ('345'), ('456'), ('567'), ('678'), ('789'), ('890'), ('901'), ('102');

Then:

SELECT * from comments WHERE (text LIKE '123'); # 1 read (1 returned).

SELECT * from comments WHERE (text LIKE '1%'); # 2 reads (2 returned).

SELECT * from comments WHERE (text LIKE '%1%'); # 10 reads (3 returned).

This generally makes sense.

Just for sanity, with these values:

VALUES  ('000'), ('001'), ('010'), ('011'), ('100'), ('101'), ('110'), ('111');

This search is 2 reads:

SELECT * from comments WHERE (text LIKE '01%');

The Odd Freebie Case

Can't quite understand how I got 0 reads here:

Setup

CREATE TABLE comments (
  id int NOT NULL AUTO_INCREMENT PRIMARY KEY,
  text varchar(255) NOT NULL,
  FULLTEXT(text)
);

INSERT INTO comments (text)
VALUES ('some word here'),
       ('some word there'), 
       ('some boy likes'), 
       ('some girl likes'), 
       ('blue red green'), 
       ('red sun'),
       ('red moon'),
       ('dark side of the moon');

SELECT * FROM comments WHERE MATCH(text) AGAINST ('some' IN NATURAL LANGUAGE MODE);

Conclusion

So it seems that index search (unless a full scan is needed) you don't get charged for any but the returned rows.

Suggestion

Seems to me it would be wise to add some note on this to the billing page. It is oriented around what you will pay for, and would be nice to add a bit more on what you won't.

Answered by fooforge

Feb 28, 2023

Hey @Izhaki,

Thanks a lot for writing this up. I appreciate the thoughtfulness and completeness of your comment with the reproduction queries and everything.

Generally, the included row reads and writes for the different plans were all designed with ample room for growth so you should not need to worry about whether a query triggers 10 or 1,000 row reads. It's obviously important for performance, but I would argue that, from a billing point of view, we allow for enough headroom for you to experiment or make mistakes.

In the end, it all depends on how MySQL decides to execute a query. A good rule of thumb is that if MySQL has sufficient indexes to satisfy a query, the amount of row reads w…

View full answer

fooforge · 2023-02-28T15:51:52Z

fooforge
Feb 28, 2023

Hey @Izhaki,

Thanks a lot for writing this up. I appreciate the thoughtfulness and completeness of your comment with the reproduction queries and everything.

Generally, the included row reads and writes for the different plans were all designed with ample room for growth so you should not need to worry about whether a query triggers 10 or 1,000 row reads. It's obviously important for performance, but I would argue that, from a billing point of view, we allow for enough headroom for you to experiment or make mistakes.

In the end, it all depends on how MySQL decides to execute a query. A good rule of thumb is that if MySQL has sufficient indexes to satisfy a query, the amount of row reads will be a lot lower than if it had to run a full table scan for each query.

That being said, we know that good database schema design isn't easy.

I sometimes have users reaching out to us about them getting close to or exceeding the limits of their plan, e.g. due to a missing index or some other kind of misconfiguration. Or, when we stumble upon such cases in our metrics, we reach out proactively and figure it out with the user. Developers make mistakes and writing and maintaining software is a complex task. We have no desire in exploiting any of this complexity, but rather aim to grow along with and support the users who have decided to use PlanetScale for their business no matter how big or small it may be.

That being said, I agree that we have some work to do in this area and that we could provide more clarity around the more fluid billing aspects. I'm hoping that, in the future, we will be able to offer a way for users to upload their schema along with some more information on data size and usage statistics, and that we can then provide them with an estimate.

I'm also going to run this discussion post by our documentation team for feedback as this may be a good opportunity to iterate on our billing documentation.

Thanks again for the feedback! Have a good rest of your week.

0 replies

Sebastian-Nielsen · 2023-09-13T12:00:15Z

Sebastian-Nielsen
Sep 13, 2023

@Izhaki What are you using to know how many rows were read/affected/etc.?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Read costs could be interpreted as much higher they'll end up. #398

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Docs] Read costs could be interpreted as much higher they'll end up. #398

Izhaki Feb 23, 2023

The Question

The Docs

Experiment 1

Experiment 2

Experiment 3

The Odd Freebie Case

Conclusion

Suggestion

Replies: 2 comments

fooforge Feb 28, 2023

Sebastian-Nielsen Sep 13, 2023

Izhaki
Feb 23, 2023

fooforge
Feb 28, 2023

Sebastian-Nielsen
Sep 13, 2023