Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps-conda #1072

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

deps-conda #1072

wants to merge 3 commits into from

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Jul 11, 2023

This starts Conda with deps-conda as a replacement for Apt with deps-ubuntu to install system dependencies.

System dependencies should be encapsulated better than via fixed Linux distributions in OCR-D. Long ago I expressed my conviction that we can find some existing universal mechanism for porting. IMHO Conda fits that description quite well.

So the idea is to allow OCR-D modules to express their system requirements in a deps-conda rule, which will run conda install .... Of course, modules could do even more and provide a full conda build (i.e. build.sh and meta.yaml) and have the Makefile simply delegate to that.

For example, the ocrd/tesserocr Dockerfile could be as simple as

FROM ocrd/core
COPY Makefile .
RUN make deps-conda
RUN pip install .

where that would simply be defined as

deps-conda:
	conda install -c conda-forge tesseract leptonica

(which would give us an up to date libtesseract against which pip install tesserocr will compile).

To make that work, this PR lays the foundation in ocrd/core:

  • install Conda, if not already present
  • install system dependencies for ocrd, ocrd_utils etc. via its own deps-conda rule

Techically, we could separate the second commit (i.e. switching from apt to conda in the Dockerfiles) and save it for another day (when we have the courage).

- factor `get-conda` out of `deps-cuda` so it can run independently
- add `deps-conda` as alternative to `deps-ubuntu`: installing
  system dependencies (including Python) for core
- when doing `make deps-cuda` or `make deps-conda` and there is
  not `conda` already installed, run `make get-conda`
@bertsky bertsky marked this pull request as ready for review July 11, 2023 23:58
@bertsky bertsky requested review from kba and joschrew July 11, 2023 23:58
@mikegerber
Copy link
Contributor

Thanks for this PR, it nicely illustrated some things for me. Definitely going to look into using conda myself.

@kba
Copy link
Member

kba commented Jan 24, 2024

This works like a charm, both in Docker and natively. With micromamba installed and conda symlinked to it, running make deps-conda installs everything necessary, without sudo and faster.

Considering that we want to future-proof OCR-D, we should really consider switching to conda/mamba for the system dependencies. It isn't even an either-or-situation, it's still possible to install geos etc. via apt-get (and we might want to retain deps-ubuntu for that purpose) but consistent platform-independent builds are definitely worth the additional tooling. This sould also be ported to ocrd_all/the processor projects with make deps-ubuntu.

One thing I noticed is that pyenv and micromamba interfere with each other, i.e. one has to make sure how to avoid (i.e. document to use just one of) venv, pyenv, conda etc. being used at the same time.

We could also add a make get-conda-native target to install micromamba into the root directory of the repository and activate it via .env.

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 24, 2024

One thing I noticed is that pyenv and micromamba interfere with each other, i.e. one has to make sure how to avoid (i.e. document to use just one of) venv, pyenv, conda etc. being used at the same time.

Yes, since conda cannot run "inside" a venv (only reverse), we would not be able to support installing into an existing venv/pyenv anymore (only an existing conda env). But that's a small price to pay IMO.

Regarding ocrd_all, if we are afraid of breaking too much at the same time, we could (for a while at least) support both options, perhaps configuring with a makefile variable, say SYS_BACKEND ?= apt (or "ubuntu") vs conda. The CI/CD could even build both variants (to see which fares better).

And indeed, switching to conda in ocrd_all only makes sense once we have deps-conda rules for all modules (as replacement for deps-ubuntu where that's necessary at all). It would not even have to be in the respective modules' Makefile: in lieu of that, the rules could be written in ocrd_all's Makefile, as was prominently done for ocrd_tesserocr sysdeps (CUSTOM_DEPS mechanism etc).

BTW, one more benefit of conda over venv is that we can re-use downloaded packages, even if they are in different (think: sub-) envs. Which is necessary as long as we don't container/service-compartmentalise ocrd_all. (And once that is done, sharing of layers can help save disk space, if utilised properly.)

We could also add a make get-conda-native target to install micromamba into the root directory of the repository and activate it via .env.

Then I am more in favour of rewriting the existing get-conda rule to be as general as possible (i.e. covering the non-Docker case to equal satisfaction). We already have CONDA_PREFIX overriden in the Dockerfiles, so we could just change the Makefile's default to $(PWD)/.conda or something. But the /etc/profile.d hack for installing environment variables would need to be changed (perhaps by generating some shellscript $(CONDA_PREFIX)/activate which does the envvars). I don't have a good idea for an ldconfig searchpath in userspace, though. (And I do not consider LD_LIBRARY_PATH a valid option – this is but a last resort.) Thus, perhaps the whole idea of a "native, yet unprivileged" conda installation is not fruitful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants