Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

naming of pip package should think of whole GA4GH ecosystem to come #355

Closed
diekhans opened this issue Apr 16, 2015 · 20 comments
Closed

naming of pip package should think of whole GA4GH ecosystem to come #355

diekhans opened this issue Apr 16, 2015 · 20 comments

Comments

@diekhans
Copy link
Contributor

Sorry, you guys moved so fast I completely missed you actually got the server release into pypi.

I think we need to rename the package to take into account that there will be a lot more GA4GH software in pypi. I would suggest ga4gh-refserver.

anyway, congrads, I owe you all a beer.

@dcolligan
Copy link
Member

Is there any other ga4gh software on pypi currently? If not, when do we expect there to be?

@diekhans
Copy link
Contributor Author

AFAIK, none so far and not soon enough.

Danny Colligan notifications@github.com writes:

Is there any other ga4gh software on pypi currently? If not, when do we expect
there to be?


Reply to this email directly or view it on GitHub.*

@jeromekelleher
Copy link
Contributor

Well, we're covering both the reference client and server currently (and I'm not sure it would make sense to break them up). What other ga4gh software do you envisage on pypi?

I'm not against the idea of changing the name, I'm just not convinced there's much point...

@diekhans
Copy link
Contributor Author

I sure hope there is a point ;-) if we don't end up with rich
collection of software, I want to do something else..

The client could be very useful against other servers.
Not sure that that makes is worth the effort.

Things that really should be independent:

  • python client site library
  • conformance suite
  • software associated with containers and workflow
  • data validation software
  • standard conversion software

Mark

Jerome Kelleher notifications@github.com writes:

Well, we're covering both the reference client and server currently (and I'm
not sure it would make sense to break them up). What other ga4gh software do
you envisage on pypi?

I'm not against the idea of changing the name, I'm just not convinced there's
much point...


Reply to this email directly or view it on GitHub.*

@jeromekelleher
Copy link
Contributor

I agree these are all useful things, but a lot of them are in here already (client side library, conversion tools) or are planned (conformance suite). As it turns out, the main dependency we have is pysam, which we need on the client side if we are going to do SAM/VCF conversion. Sure, we could split all these things up, but I just don't see the value in it. I think that it's a lot more convenient to just pip install ga4gh and get the full suite of tools from the GA4GH reference implementation. From a development perspective, it's definitely a lot easier for us to just have one package in which we reuse as much code as possible across the different tools and maintain uniform interfaces by doing this (see the CLI programs, for example).

Sure, I also hope that there will be piles of other software related to GA4GH on PyPI, but they will all be more specific and (hopefully) not done by us. They can call themselves whatever they want. Even more hopefully, if GA4GH becomes a boring data interchange standard that is just there and nobody thinks about, they won't need to mention GA4GH in their names at all.

@lh3
Copy link
Member

lh3 commented Apr 17, 2015

Do we have the full list of package dependencies, broken-down to server and client code? For example, I guess client doesn't need flask? Sorry if this is obvious - not familiar with python packaging...

@jeromekelleher
Copy link
Contributor

That's true @lh3, the client doesn't need Flask. We haven't broken the requirements down in detail between the client and the server, and the server does have a few extra dependencies. In practice though, the build time for pysam completely dwarfs the installation time for all the other packages, so I don't think it's really worth the effort of trying to split them up.

The full list of dependencies we need for installation is in setup.py

@diekhans
Copy link
Contributor Author

The convenience of one install is useful to a lot of people.

However, it also create version synchronization issues for the
less tightly bound packages. For instance, being able to
release a new version of the client library without bundling in
the server.

I think an elegant approach to this is make `ga4gh' a
meta-package that has dependencies on the full range of
packages. This address the convenience, clarity, and
flexibility.

all at the cost of developer time ;-)

Jerome Kelleher notifications@github.com writes:

I agree these are all useful things, but a lot of them are in here already
(client side library, conversion tools) or are planned (conformance suite). As
it turns out, the main dependency we have is pysam, which we need on the client
side if we are going to do SAM/VCF conversion. Sure, we could split all these
things up, but I just don't see the value in it. I think that it's a lot more
convenient to just pip install ga4gh and get the full suite of tools from the
GA4GH reference implementation. From a development perspective, it's definitely
a lot easier for us to just have one package in which we reuse as much code as
possible across the different tools and maintain uniform interfaces by doing
this (see the CLI programs, for example).

Sure, I also hope that there will be piles of other software related to GA4GH
on PyPI, but they will all be more specific and (hopefully) not done by us.
They can call themselves whatever they want. Even more hopefully, if GA4GH
becomes a boring data interchange standard that is just there and nobody thinks
about, they won't need to mention GA4GH in their names at all.


Reply to this email directly or view it on GitHub.*

@diekhans
Copy link
Contributor Author

I don't think the issue is install time. It's flexibility and
clarity for what the components actually are. However, it is
maddening when some unnecessary, for what one is trying to do,
dependency ends up breaking things.

see previous comment on meta-package. I think that address
all user needs.

Jerome Kelleher notifications@github.com writes:

That's true @lh3, the client doesn't need Flask. We haven't broken the
requirements down in detail between the client and the server, and the server
does have a few extra dependencies. In practice though, the build time for
pysam completely dwarfs the installation time for all the other packages, so I
don't think it's really worth the effort of trying to split them up.

The full list of dependencies we need for installation is in setup.py


Reply to this email directly or view it on GitHub.*

@pgrosu
Copy link

pgrosu commented Apr 17, 2015

Mark, so would we be shifting to following enterprise software development best practices and SOPs? That's a major shift, though I would welcome it.

~p

@jeromekelleher
Copy link
Contributor

Well, we can always transition to a metapackage in the future when this becomes a problem @diekhans. I agree with your points, I just think it's premature. We can make a bunch of packages called ga4gh-client, ga4gh-server, etc later on, if/when there is a real desire for it.

@diekhans
Copy link
Contributor Author

I am a big fan of putting of work until it's actually need!

Jerome Kelleher notifications@github.com writes:

Well, we can always transition to a metapackage in the future when this becomes
a problem @diekhans. I agree with your points, I just think it's premature. We
can make a bunch of packages called ga4gh-client, ga4gh-server, etc later on,
if/when there is a real desire for it.


Reply to this email directly or view it on GitHub.*

@diekhans
Copy link
Contributor Author

Shifting towards modern engineering practices is very important.
The packaging is a minor part of that.

The reference server group is actually leading the way. It's
the rest of GA4GH that needs to catch up. We can't be putting
out GA4GH schema `releases' that don't have a reference server
implementation and compliance suite. Committing to the master
should be the first step towards release, not the last.

Paul Grosu notifications@github.com writes:

Mark, so would we be shifting to following enterprise software development best
practices and SOPs? That's a major shift, though I would welcome it.

~p


Reply to this email directly or view it on GitHub.*

@pgrosu
Copy link

pgrosu commented Apr 17, 2015

I wouldn't say that either server team or schemas teams are ahead of each other, just on different paths that connect at different times. Actually server probably should be changed to something like GA4GH Wire Protocol Validation Suite. What ideally would be nice would be each time a PR in schemas gets merged, then the server automatically gets triggered and would pull all the Avro files and rebuild the server and check it against all the tests. If new Avro files are introduced, then skeleton code would be added that would not interfere with the compilation or running of the program. The whole process would be automated and nothing but the integration modules' code would need to be adapted. Basically server is a subset of schemas and should be part of Travis. This would enable one to easily switch the serialization protocol in case we choose other ones in the future, which recently got raised again in the following issue:

ga4gh/ga4gh-schemas#287

~p

@diekhans
Copy link
Contributor Author

Paul Grosu notifications@github.com writes:

I wouldn't say that either server team or schemas teams are ahead of each
other, just on different paths that connect at different times.

I will buy that; but the server does have working code ;-)

Actually server
probably should be changed to something like GA4GH Wire Protocol Validation
Suite.

The reference server will be useful in it's own right for small groups that
don't need a highly salable server. Reference implementation is a standard
way to think about this.

The compliance suite need to be able to run against other
servers, provide they have the compliance data set loaded.

What ideally would be nice would be each time a PR in schemas gets
merged, then the server automatically gets triggered and would pull all the
Avro files and rebuild the server and check it against all the tests.

YES!

And when new schemas are added, they must be implemented the server and compliance
suite before they are even considered for release.

Travis. This would enable one to easily switch the serialization protocol in
case we choose other ones in the future, which recently got raised again in the
following issue:

Right now, the serialization alternative I would like to try is
the Avro binary encoding.

@pgrosu
Copy link

pgrosu commented Apr 17, 2015

I considered the idea of reference implementation something more along the lines as the proof-of-concept spec-interpretation standard, against which all other servers will be built, though I sort of had something larger and more modular in mind based on my previous post, where the API schemas do more of the work. Yes the binary approach would also improve on the throughput, though the server could have a plugin for any serialization format and one does not need to architect the server on any specific type - basically protocol-agnostic.

In any case, I think we agree that many things can be decoupled, modularized and automated; and, I'm glad that we have the same mindset on streamlining the process, which I'm really happy about :)

~p

@dcolligan
Copy link
Member

In the spirit of "putting off work until it's actually needed" I'm going to close this issue since there aren't any other ga4gh python packages that are going to be deployed imminently, and the release cycle and packaging is working well enough as is. We can revisit this when any of those become an issue.

@diekhans
Copy link
Contributor Author

Is there a way of marking it deferred? I hate losing
discussions and starting over. I will be very disappointed if
this doesn't become an issue.

Danny Colligan notifications@github.com writes:

In the spirit of "putting off work until it's actually needed" I'm going to
close this issue since there aren't any other ga4gh python packages that are
going to be deployed imminently, and the release cycle and packaging is working
well enough as is. We can revisit this when any of those become an issue.


Reply to this email directly or view it on GitHub.*

@dcolligan
Copy link
Member

GitHub allows reopening of issues at any time. It still stays in the database even though closed, and an easy search can retrieve it.

@jeromekelleher
Copy link
Contributor

I agree with @dcolligan --- let's close this issue and revive it when these issues become pressing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants