Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing for query on data->>'created_at' #28

Open
FranckPachot opened this issue Dec 19, 2024 · 2 comments
Open

Indexing for query on data->>'created_at' #28

FranckPachot opened this issue Dec 19, 2024 · 2 comments

Comments

@FranckPachot
Copy link
Contributor

FranckPachot commented Dec 19, 2024

For the following query (PostgresSwitch4):

SELECT data
            FROM github.pulls
            WHERE (to_timestamp((data->>'created_at')::text, 'YYYY-MM-DD"T"HH24:MI:SS"Z"') >= NOW() - INTERVAL '3 months')
            LIMIT 10;

Why is it cast from text to timestamp and then back to text?
The problem is that this function is not immutable and cannot be indexed.

With the followingquery, an index on github.pulls ((data->>'created_at') asc) could be used

SELECT data
            FROM github.pulls
            WHERE data->>'created_at' >= to_char(NOW() - INTERVAL '3 months','YYYY-MM-DD"T"HH24:MI:SS"Z"')
            LIMIT 10;
@dbazhenov
Copy link
Owner

I've spent very little time on SQL queries so far.

The main idea was to make 4 switches with different types of queries, including bad queries that run slow or even extremely slow, to find problems in monitoring or to show the impact of bad queries on performance.

Initially when I started the project, I wanted to compare MySQL, PG, MongoDB with the same data and queries. But then it turned into something more.

I plan to improve both Dataset and SQL switch queries in the future.

I'll tell you right away, Switch 1 and Switch 2 will be for good queries, Switch 3 and Switch 4 for bad, slow or no index.

I can also make some Switch to delete the index on power on and create it on power off.

There is a lot of room and flexibility for improvement.
You can add new tables and columns.
As a dataset I use GitHub data from API, right now only Repos and Pulls, but in the future I will add other data like Issues, Comments, Tags or something like that.

By the way, you can change the Dataset type to GitHub, put as GitHub_ORG your organization, for demo_app_dataset service. Then the app will import your organization's repository data into the dataset. But you also need to set GITHUB_API for using it

@dbazhenov
Copy link
Owner

I'll try to add this enhancement in an upcoming release.

SELECT data
FROM github.pulls
WHERE data->>'created_at' >= to_char(NOW() - INTERVAL '3 months','YYYY-MM-DD"T"HH24:MI:SS"Z"')
LIMIT 10;

I would appreciate your other suggestions on how to improve SQL queries that generate load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants