Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider migrating python scripts & modules to separate repo #1754

Closed
rkm opened this issue Feb 12, 2024 · 7 comments · Fixed by #1804
Closed

Consider migrating python scripts & modules to separate repo #1754

rkm opened this issue Feb 12, 2024 · 7 comments · Fixed by #1804
Labels
type/question Further information is requested

Comments

@rkm
Copy link
Member

rkm commented Feb 12, 2024

These are updated independently to the rest of the repo, but have to pass the same relatively long CI checks and must be released along with the main codebase. We should consider whether this is sensible to maintain, or if it would simplify version changes and releases to migrate them to another (new?) repo.

We'll also (hopefully) soon be dropping the Java CTP service from this repo in favour of running ctp-anon-cli through the C# service wrapper (via #1679). After this it will just be the c# services and python modules remaining.

Thoughts @howff ?

@rkm rkm added the type/question Further information is requested label Feb 12, 2024
@jas88
Copy link
Member

jas88 commented Feb 12, 2024

Sounds sensible. We should also be able to build CTP into .Net code the way we do with NER in IsIdentifiable and either integrate it or have it a standalone Linux+Windows binary, rather than needing a JRE provided separately.

@howff
Copy link
Contributor

howff commented Feb 12, 2024

It's a shame the CI doesn't work like a Makefile and only build things that have changed!

Yes, I've been meaning to restructure the repos

  • SemEHR - contains three completely different things, none of which will ever track their upstream any more.
    • anonymisation of plain text (specifically the output from DicomToText)
    • annotation of plain text into json and metadata
    • web service for a database of annotations
  • StructuredReports - the general tooling around the use of SRs: scripts, test data
  • SmiServices - the library, the SRAnonTool application, and some other scripts
    and in the process also restructure the collection of scripts into something more sensible.

I must admit there's a certain consistency to keeping some things inside the SmiServices repo, especially if some of the library or tools depend on other APIs for example rabbit messages or yaml configs or logging or filesystem layout. However that might not actually be the case and the two themes C# and Python do indeed move independently.

I could consolidate everything from SmiServices into the StructuredReports repo?
The web service should probably also move there.
The anonymisation and annotation could maybe stay where they are? Or be turned into modules instead of scripts.

@jas88
Copy link
Member

jas88 commented Feb 12, 2024

@howff There is an option for something like "skip CI if only *.cs files have changed", usually used to ignore *.md changes so documentation updates don't trigger full CI runs.

I think I like the idea of moving everything that isn't part of the smi command line binary into a separate repo (and unifying smi some more, too - fewer .csproj files would streamline building and avoid file conflicts). By everything from SmiServices into the SR repo you meant everything SR-specific?

@howff
Copy link
Contributor

howff commented Feb 12, 2024

src/applications/SRAnonTool and src/common/Smi_Common_Python

@rkm
Copy link
Member Author

rkm commented Feb 12, 2024

@howff There is an option for something like "skip CI if only *.cs files have changed", usually used to ignore *.md changes so documentation updates don't trigger full CI runs.

This is an option, but it feels quite brittle when I've done it in the past and still doesn't allow the different products to be tagged/released under different version schemes. We should decouple them so they can evolve independently.

Consolidating everything into the StructuredReports repo is probably a sensible choice. If we want further separation we could split this into a reusable (pip-installable) library module and the various applications, but not sure there's as clear a motivation for that. Happy for you to guide on this, Andrew.

@howff
Copy link
Contributor

howff commented Feb 14, 2024

ok I've copied the two paths (above) into the StructuredReports repo.

I've also done some work there and in the semehr repo to simplify the scripts.

They could do with having the CI stuff you've done in SmiServices ported into them.

@rkm
Copy link
Member Author

rkm commented Feb 14, 2024

I can take a look at porting the CI bits tomorrow, if you want?

@rkm rkm mentioned this issue Apr 23, 2024
11 tasks
@rkm rkm closed this as completed in #1804 Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants