2

I want to create a Python environment with the data science libraries NumPy, Pandas, Pytorch, and Hugging Face transformers. I use miniconda to create the environment and download and install the libraries. There is a flag in conda install, --download-only to download the required packages without installing them and install them afterwards from a local directory. Even when conda just downloads the packages without installing them, it also extracts them.

Is it possible to download the packages without extracting them and extract them afterwards before installation?

jsqs
  • 41
  • 5
  • 1
    Maybe you can expand on what your motivation is. Why do you require finer-grained control over how Conda manages the package cache? What advantage is to be had in preventing the extraction if ultimately one is going to install the package? – merv Feb 15 '21 at 14:57

1 Answers1

3

There is no simple command in the CLI to prevent the extraction step. The extraction is regarded as part of the FETCH operation to populate the package cache before running the LINK operation to transfer the package to the specified environment.

The alternative would be to do something manually. Naively, one could search Anaconda Cloud and manually download, however, it would probably be better to go through the solver to ensure package compatibility. All the info for operations to be run can be viewed by including the --json flag. This could be filtered to just the tarball URLs and then downloaded directly. Here's a script along these lines (assuming Linux/Unix):

File: conda-download.sh

#!/bin/bash -l
conda create -dn null --json "$@" |\
    grep '"url"' | grep -oE 'https[^"]+' |\
    xargs wget -c

which can be used as

./conda-download.sh -c conda-forge -c pytorch numpy pandas pytorch transformers

that is, it accepts all arguments conda create would, and will download all the tarballs locally.

Ignoring Cached Packages

If you already have some packages cached then the above will not redownload them. Instead, if you wish to download all tarballs needed for an environment, then you could use this alternate version which overrides the package cache using an empty temporary directory:

File: conda-download-all.sh

#!/bin/bash -l
tmp_dir=$(mktemp -d)

CONDA_PKGS_DIRS=$tmp_dir conda create -dn null --json "$@" |\
    grep '"url"' | grep -oE 'https[^"]+' |\
    xargs wget -c

rm -r $tmp_dir
merv
  • 53,208
  • 11
  • 148
  • 196
  • thank you for your answer. First, I will answer to the question of your comment in my question. I want to create a docker with minimum size and sent it to a remote machine that does not have internet access. So, I want to create a docker that when it will run, it will create the environment that I want and then run my script. Since there will be no access to the Internet the packages should be already in the docker. As as said, I also want the docker to be as small as possible. .... – jsqs Feb 17 '21 at 08:05
  • The size of the docker has a big difference before and after the extraction of the download packages. For example, Pytorch GPU packeas are 1.3 GB and when extracted, they end up above 5 GB. If there is something better I can do, let me know – jsqs Feb 17 '21 at 08:05
  • I don't have any packages installed, so I used your first script. I get the error `wget: missing URL Usage: wget [OPTION]... [URL]...` Then you also say that "can be used with" ./conda-download.sh -c conda-forge -c pytorch numpy pandas pytorch transformers. Since I am not very experienced with Linux, could you be more elaborate? – jsqs Feb 17 '21 at 09:37
  • @jsqs the point is to give all the packages you want installed to the script and it will download all the tarballs. Running the script empty won’t do anything. I’m not going adjust the answer to address the Docker stuff. Maybe ask a new question about that. The best practice if one must use Conda in Docker is to create the environment with all the packages, then run `conda clean —all` to delete the package cache. But if you really are concerned about Docker image size, then *don’t use Conda* - it is purely a convenience. – merv Feb 18 '21 at 02:44
  • Thank you @merv, you have been very helpful. I did not manage to run `./conda-download.sh -c conda-forge -c pytorch numpy pandas pytorch transformers`, I get errors. I don't know if the problem is that I do it through I lightweight docker (Ubuntu ~ 70MB). `conda clean --all` does make my containers smaller. – jsqs Feb 19 '21 at 13:29