Skip to content
This repository has been archived by the owner on Feb 26, 2018. It is now read-only.

Implement commit_contains(commit_hash, commit_blob)bool UDF #7

Open
ajnavarro opened this issue Feb 14, 2018 · 7 comments
Open

Implement commit_contains(commit_hash, commit_blob)bool UDF #7

ajnavarro opened this issue Feb 14, 2018 · 7 comments

Comments

@ajnavarro
Copy link

No description provided.

@ajnavarro
Copy link
Author

Depends on src-d/go-mysql-server#1

@erizocosmico erizocosmico self-assigned this Feb 21, 2018
@erizocosmico
Copy link

This is missing the repo_id parameter, right?

@ajnavarro
Copy link
Author

After a talk we decided to do not add repo_id. The performance of that udfs will be improved using indexes. At the begining will be really slow.

@erizocosmico
Copy link

So, if the repo_id is missing and the only things the UDF has are commit_hash and commit_blob, how are we supposed to retrieve that info?

Repository Pool does not have all repositories opened, right? So you can't just iterate them all until you find a match. The UDF should receive something with the repo associated to the given row or something along those lines. Otherwise, where is the UDF supposed to look for?

@ajnavarro
Copy link
Author

ajnavarro commented Feb 26, 2018

Given a commit hash, it will always contains the specified blob or not. In the future, we will have a bitmap index to be able to answer this kind of questions. Right now, the only way that we have to do it is iterate over all the repositories.
Also, if the commit is repeated in several repositories, it will appears n times on the result.
Also, you don't need to have all the repositories opened, you can iterate them and send commits per each repository, and filter that ones that does not match.

@erizocosmico
Copy link

So, for each row that uses that UDF we have to iterate all repositories again?

@ajnavarro
Copy link
Author

Right now, yes. In the future it will be a simple query to an index. Also the UDF can be improved to be executed at the table iterator level, like another column. Doing this, you don't need to iterate over all the repositories per each column again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants