-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathenvironments.qmd
348 lines (240 loc) · 10.2 KB
/
environments.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
# Managing Environments {#sec-environments}
## Context
This is a big one for the R user. When you are unfamiliar with how Python virtual environments work, it is probably one of the most frustrating concepts to pick up from scratch.
But fear not - to fully get to grips with how Python virtual environments work and package installation works, it is good to remind ourselves how this works in R.
### How R package installation works
Well, *remind ourselves....* Python installation in R is likely something you have not really thought about, beyond writing `install.packages()`.
This is one of the huge benefits that makes R easy to pick up for "non-programmers", a lot of this stuff is handled for you.
#### Building from binaries vs. building from source
### How Python package installation works
pip, conda, miniconda, mambo PyPi... where do I start?
## With conda
### Setting up an environment from scratch
conda is ...
To create a project environment which manages both R (r-base) and Python (python):
``` sh
conda create -n env_name r-base python=3.12
```
This sets up a new environment called `env_name` with both R and Python install.
Notice you can specify which version of R and/or Python to install.
#### Python packages
You can then install Packages packages with:
``` sh
conda install numpy
```
#### Python packages on Github
For Python packages on Github:
``` sh
conda install git pip
# Now use pip install with git+<repo-url>
pip install git+https://github.com/davidwilby/amundsen-sea-low-index
# If you need a specific optional dependency:
pip install asli[s3]@git+https://github.com/davidwilby/amundsen-sea-low-index
```
#### R packages
R packages are installed with the same `conda install` command. Note that in this case, package names are pre-pended with `r-`:
``` sh
conda install r-dplyr
```
#### R packages on Github, or not on CRAN
For R packages on Github:
Taken from [this issue](https://stackoverflow.com/questions/52061664/install-r-package-from-github-using-conda), but does not work for me
``` sh
conda install conda build
conda skeleton cran <github-url>
# Replace with your version of R
conda build --R=4.2.2 r-<lowercase-packagename>
```
This creates a "conda skeleton" of the R package hosted on Github.
**note, explain what a conda skeleton is?**
**Alternative approach, specify R version in conda but use `renv::restore()` for non-CRAN packages**
All packages installed and used should now be recorded in the environment, with their appropriate version number.
### Exporting conda environment: environment.yml
To access and share your environment, run:
``` sh
conda env export > environment.yml
```
This will create an environment.yml file in your project directory. This will allow for this environment to be set up on other systems.
### Installing from an environment.yml
For others (or for yourself on a different system) to set up an identical environment, run:
``` sh
conda -f create env_name environment.yml
```
If you can't (or don't want to) use conda, it is possible to manage the Python and R environments in the same project, but treat their management as seperate processes.
## Without conda
### Python without conda
In Python, this is fairly straightforward, and you can use this method regardless of the level of permissions you have on your machine. Here, you would use `venv`, and `requirements.txt`.
#### Setting up a venv and requirements.txt
``` sh
# To create your venv
python -m venv env_name
```
This creates you virtual environment. To use it, and install your package you need to "activate" it.
``` sh
# To activate your venv
source env_name/bin/activate
```
Now you have activated your environment, you can install a packages using `pip`.
``` sh
pip install pathlib
```
To keep track of the packages your project depends on, record them in a file called `requirements.txt`. It is a very simple file that looks like this:
``` txt
configparser==7.1.0
pathlib==1.0.1
xarray==2024.6.0
```
All it records is `packagename==1.0.0`, very simple!
If you are not sure what packages you have imported.
``` sh
pip freeze
```
This should print them in the terminal.
#### Installation in a new venv from a requirements.txt
To set up a new environment and install the required packages from a specified requirements.txt:
``` sh
# To create your venv
python -m venv env_name
source env_name/bin/activate
pip install -r requirements.txt
```
### R without conda
To manage R dependencies, you have a couple of options, depending on the level of permissions you have on the system you are working on.
#### With sudo permissions
If you have sudo permissions, we can use the `pak` package. The great things about `pak` is that, unlike `install.packages()` and `renv::install()` (more on renv later), it will automatically fetch the pre-built binaries for your operating system, distribution and version. More detail on pak, and how it operates in [R in Production](https://r-in-production.org/packages.html#pak).
It will also automatically install any system-level dependencies that your R package may require. This is especially useful on linux systems, where system dependency installation can vastly differ between distributions.
If we have sudo permissions, and are using `pak`, your approach could be as simple as providing an `install.R` file which installs the right packages for you.
First install `pak`:
``` sh
Rscript -e install.packages('pak')
```
Now it can be called in your set up script. You would normally only run this once on deployment.
``` r
#!/usr/bin/env Rscript
# Usage R -f install.R
# Selecting p3m.dev is an optional step for linux distros
# It will speed up installation and prevents the risk of installation
# failing on external C libraries
# This is because CRAN only provides source packages for linux
# and not binary
# see: https://r-in-production.org/packages.html#installing-a-package-on-linux
# For Ubuntu 24.04
options(repos = c(CRAN = "https://p3m.dev/cran/__linux__/noble/latest"))
# For Rocky 9
# options(repos = c(CRAN = "https://p3m.dev/cran/__linux__/rhel9/latest"))
pak::pak("readr")
pak::pak("paws")
pak::pak("ini")
pak::pak("assertr")
pak::pak("dplyr")
# A package on Github:
pak::pak("thomaszwagerman/butterfly")
```
Call it with:
``` sh
R -f install.R
```
#### Without sudo permission
Without sudo permission on your machine, you might have trouble running installation commands such as `install.packages()` or `pak::pak()`, as R might be trying to install your packages into a shared library, where you do not have 'write' permission.
In this case, the path of least resistance would be to use `renv`. To manage your environment.
If you have not used `renv` before, it is highly recommended you read [Getting Started with renv](https://rstudio.github.io/renv/) before reading further.
To start using `renv`:
``` r
install.packages('renv')
renv::init()
```
This will install and set up `renv` for you. `renv::init()` generates a `renv.lock` file based on the packages you have installed and used.
An extract from a `renv.lock` is shown below. You will notice it specifies the version of R used, which repositories it has used for installation, as well as packages and their associated version and download source.
``` json
{
"R": {
"Version": "4.4.1",
"Repositories": [
{
"Name": "P3M",
"URL": "https://packagemanager.posit.co/cran/__linux__/centos7/latest"
}
]
},
"Packages": {
"MASS": {
"Package": "MASS",
"Version": "7.3-59",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"grDevices",
"graphics",
"methods",
"stats",
"utils"
],
"Hash": "0cafd6f0500e5deba33be22c46bf6055"
},
"R6": {
"Package": "R6",
"Version": "2.5.1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "470851b6d5d0ac559e9d01bb352b4021"
}
}
}
```
<!-- You might have noticed, that the Repositories URL is set to a specific -->
<!-- destination: `https://packagemanager.posit.co/cran/__linux__/centos7/latest`. -->
This destination is set to a specific operating system, **centos7**. This URL is obtained from the [Posit Package Manager](https://p3m.dev/client/#/repos/cran/setup).
When using `pak`, this is automatically fetched for us. Unfortunately for us, `renv` does not use `pak`. To prevent having to manually change this URL for each deployment on a different system, we need to insert this URL depending on the operating system we are working on.
[Shannon Pileggi](https://www.pipinghotdata.com/posts/2024-09-16-ease-renvrestore-by-updating-your-repositories-to-p3m/)
With the above in mind, the `install.R` script would look like this:
``` r
#!/usr/bin/env Rscript
# Usage R -f hpc_setup.R
#
# This will not work with opensuse and sle,
# naming inconsistencies across distros is hard
# This is not an R project, so need to manually "activate" renv
source("renv/activate.R")
install.packages("pkgcache")
# Moving on to installing r and system dependencies with renv.lock
# Have R obtain the current platform distro and release
os <- data.frame(
distribution = pkgcache::current_r_platform_data()$distribution,
release = pkgcache::current_r_platform_data()$release
)
os$release <- round(as.numeric((os$release)))
# Some wrangling to make matching more reliable across distros
ppm_platforms <- pkgcache::ppm_platforms()
# Take the word "linux" out of distribution names
ppm_platforms$distribution <- gsub("linux", "", ppm_platforms$distribution)
# Makes matching rocky distro possible
ppm_platforms$release <- round(as.numeric((ppm_platforms$release)))
# Match with pak's ppm_platforms
os_table <- merge(
os,
ppm_platforms
)
if (os_table$os == "linux") {
p3m_url <- paste0(
"https://p3m.dev/cran/__linux__/",
os_table$binary_url,
"/latest"
)
} else {
p3m_url <- "https://p3m.dev/cran/latest"
}
renv::lockfile_modify(repos = c(
P3M = p3m_url
)) |>
renv::lockfile_write()
renv::restore()
```
In summary, this script should:
1. Activate `renv` and install `pkgcache`.
2. Detect which **os**, **distribution** and **version** R is being run on.
3. Concatenate the **package manager URL** and the **os binary URL**.
4. Modify the `renv.lock` file to point the repos URL to the correct binary URL.