<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>CRAN | B101nfo</title>
    <link>https://llrs.dev/categories/cran/</link>
      <atom:link href="https://llrs.dev/categories/cran/index.xml" rel="self" type="application/rss+xml" />
    <description>CRAN</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>If it is code you can copy and reuse (MIT) if it is text, please cite and reuse CC-BY 2024.</copyright><lastBuildDate>Sun, 05 May 2024 00:00:00 +0000</lastBuildDate>
    <image>
      <url>img/map[gravatar:%!s(bool=false) shape:circle]</url>
      <title>CRAN</title>
      <link>https://llrs.dev/categories/cran/</link>
    </image>
    
    <item>
      <title>Packaging R: getting in repositories</title>
      <link>https://llrs.dev/post/2024/05/05/packaging-r-getting-in/</link>
      <pubDate>Sun, 05 May 2024 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2024/05/05/packaging-r-getting-in/</guid>
      <description>


&lt;p&gt;After the previous post collecting information about repositories I want to collect here my thoughts on adding a package in a repository and how repositories are recognized.
As in the previous post this is built on the assumption that one already has a package or more and wants to distribute it.&lt;/p&gt;
&lt;p&gt;This is meant as a reflection of what is an R repository and not intended for R package developers.
However, their feedback is appreciated to consider how an ideal repository would be.&lt;/p&gt;
&lt;div id=&#34;package-submission&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Package submission&lt;/h2&gt;
&lt;p&gt;An R repository will have a way to incorporate a package.
CRAN submission process starts with &lt;a href=&#34;https://cran.r-project.org/submit.html&#34;&gt;a form&lt;/a&gt;, while Bioconductor is done through a &lt;a href=&#34;https://github.com/Bioconductor/Contributions/issues/new&#34;&gt;Github issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The process will usually then start with an automated process.
Until the automated process check hasn’t passed probably no one will look into the package submission.
This reduce the hours a human must dedicate to manage submissions.
If a man is kept in the loop one could appeal the automatic process contacting them, or if it is a random failing re-submitting the package again.&lt;/p&gt;
&lt;div class=&#34;float&#34;&gt;
&lt;img src=&#34;images/submissions.png&#34; alt=&#34;Package submission checks: first a check of the package, if it is not new a dependency check from the repository if all checks pass the package is added to the repository.&#34; /&gt;
&lt;div class=&#34;figcaption&#34;&gt;&lt;strong&gt;Package submission checks&lt;/strong&gt;: first a check of the package, if it is not new a dependency check from the repository if all checks pass the package is added to the repository.&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Generally a package must first pass a package quality check before it is considered for further integration test.
This integration test is usually checking the new version of a package with packages that depend on it, also known as reverse dependencies.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;package-maintenance&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Package maintenance&lt;/h2&gt;
&lt;p&gt;Once a package is included in a repository it usually needs to be maintained.&lt;/p&gt;
&lt;p&gt;There are many moving pieces, chips architecture, OS, R, other packages.
This all lead that authors need to maintain the packages in good shape if they want it to remain useful to users.
Of course, if one doesn’t want to do that they do not need to create a repository to share their package.&lt;/p&gt;
&lt;div class=&#34;float&#34;&gt;
&lt;img src=&#34;images/checks.png&#34; alt=&#34;Graphic showing time and different R versions and checks. Repositories check the packages on them on multiple levels.&#34; /&gt;
&lt;div class=&#34;figcaption&#34;&gt;&lt;strong&gt;Graphic showing time and different R versions and checks.&lt;/strong&gt; Repositories check the packages on them on multiple levels.&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This leads that at any given time point there must be some tests for any given package under different conditions as shown in image 2.
This leads to the possibility of having a package archived from the repository for failing the checks in place.&lt;/p&gt;
&lt;p&gt;Repositories provide these checks as a service to the users.
They guarantee that R packages in the repository work well together and pass the same set of packages (mostly).
This is what leads to their reputation and usage among users (this is true beyond R, DEBIAN, Ubuntu, …).&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;closing-remarks&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Closing remarks&lt;/h2&gt;
&lt;p&gt;There are several official repositories how the package submission works when a package is submitted to one but it is related, via dependencies to other repositories is a matter of another post.&lt;/p&gt;
&lt;p&gt;There are some discussion on what is an R repository.
The importance of CRAN and Bioconductor has lead to some confusion.
There are generally two meanings of what a cran-like repository is:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;One where &lt;code&gt;install.packages()&lt;/code&gt; works (This is defined by how the files and binaries are organized and will be a theme for another time).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One were all the checks described here are in place and &lt;code&gt;install.packages()&lt;/code&gt; works too.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;r-universe is using the first definition but could be used to generate repositories with checks that comply with the second definition.
Other repositories that use that are the &lt;a href=&#34;https://packagemanager.posit.co/client/#/&#34;&gt;&lt;em&gt;Posit&lt;/em&gt; Public &lt;em&gt;Package Manager&lt;/em&gt;&lt;/a&gt;, or the &lt;a href=&#34;https://r4pi.org/&#34;&gt;R4Pi repository&lt;/a&gt; (which provides binaries for Raspberry Pi OS).&lt;/p&gt;
&lt;p&gt;As the second definition is more strict I’ll focus on it as this post has explained.&lt;/p&gt;
&lt;p&gt;PS: This post might be edited as it has been siting in my computer for several months.
I prefer to post it and be improved with feedback, so let me know if you have any addition.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>New rtweet release: 2.0.0</title>
      <link>https://llrs.dev/post/2024/02/16/new-rtweet-release-2-0-0/</link>
      <pubDate>Fri, 16 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2024/02/16/new-rtweet-release-2-0-0/</guid>
      <description>


&lt;p&gt;This is a brief announcement of rtweet version 2.0.0.
This major version changes signals the move from the API v1.1 to the API v2.&lt;/p&gt;
&lt;p&gt;There haven’t been many changes since 1.2.1 but this is to signal that the API v1.1 is deprecated.&lt;/p&gt;
&lt;p&gt;The previous release was a bit of a rush to meet with the requirements of CRAN maintainers to fix an error and it wasn’t polished.
Some users complained that it was difficult to find what worked.
In this release I focused mostly to make life easier for users:&lt;/p&gt;
&lt;p&gt;Now there is a document the deprecated functions from API v1.1 to API v2: see &lt;code&gt;help(&#34;rtweet-deprecated&#34;, &#34;rtweet&#34;)&lt;/code&gt;.
I also made it easier for the rtweet to work with API v2: the release of httr2 1.0.0 version helped to avoid some workarounds with the authentication process.&lt;/p&gt;
&lt;p&gt;I also focused on updating the vignettes to the most up to date recommendations.
I am not sure the streaming vignettes is up to date (but keep reading why I left it as is).&lt;/p&gt;
&lt;p&gt;Last, following CRAN policy: if users create rtweet data they can now delete it with &lt;code&gt;client_clean()&lt;/code&gt; and &lt;code&gt;auth_clean()&lt;/code&gt;.&lt;/p&gt;
&lt;div id=&#34;future-releases&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Future releases&lt;/h1&gt;
&lt;p&gt;For the last year I &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/763&#34;&gt;asked the community&lt;/a&gt; for a co-maintainer with interest in the package.
Unfortunately, people that showed some interest at the end didn’t commit to it.&lt;/p&gt;
&lt;p&gt;At the same time I &lt;a href=&#34;https://llrs.dev/post/2023/02/16/rtweet-future/&#34;&gt;also asked&lt;/a&gt; for &lt;a href=&#34;https://www.buymeacoffee.com/llrs&#34;&gt;donations&lt;/a&gt; to support an API access.
It currently costs 100€ to access most endpoints which is needed to test and develop the package.
However, this is more than half of what I spend in groceries last month.&lt;br /&gt;
Other packages like &lt;a href=&#34;https://cran.r-project.org/package=academictwitteR&#34;&gt;academictwitteR&lt;/a&gt; are also stopping development/support.
Although not archived from CRAN, it has a note in the README:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note this repo is now ARCHVIED due to changes to the Twitter API. The paid API means open-source development of this package is no longer feasible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Similarly without financial help and community interest I won’t invest more time on it.&lt;br /&gt;
This is the last version that I release.
I have other interests and I would like to focus on other projects.
My focus will be on updating and releasing some packages I have.
I also want to focus more on my own company to help the R community (and beyond).
I will write about the company shortly.&lt;/p&gt;
&lt;p&gt;There have been some discussions on social media how to signal deprecation of packages.
The only method available on CRAN that I know is to declare a package ORPHANATED.
I have requested to CRAN to declared the package ORPHANATED.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
##  os       Ubuntu 22.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language en
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2024-02-24
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.18    2023-06-19 [1] CRAN (R 4.3.1)
##  bookdown      0.37    2023-12-01 [1] CRAN (R 4.3.1)
##  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.3.1)
##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.1)
##  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
##  digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
##  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
##  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
##  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.2)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
##  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
##  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
##  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
##  sass          0.4.8   2023-12-06 [1] CRAN (R 4.3.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
##  xfun          0.42    2024-02-08 [1] CRAN (R 4.3.1)
##  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
## 
##  [1] /home/lluis/bin/R/4.3.1
##  [2] /opt/R/4.3.1/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Submissions accepted on the first try</title>
      <link>https://llrs.dev/post/2024/01/10/submission-cran-first-try/</link>
      <pubDate>Wed, 10 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2024/01/10/submission-cran-first-try/</guid>
      <description>


&lt;p&gt;Recently someone in social media was saying that they do not succeed on submissions to CRAN on the first try.
In this post I’ll try to answer this question.&lt;/p&gt;
&lt;p&gt;First we need to know the submissions to CRAN.
We can download the last 3 years of CRAN submissions thanks to &lt;a href=&#34;https://r-hub.github.io/cransays/articles/dashboard.html&#34;&gt;cransays&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cdh &amp;lt;- cransays::download_history()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the bulk of the analysis of the history of package submissions.
This is explained in different posts, but basically I keep only one package per snapshot, try to identify new submissions instead of changes in the same submission and calculate some date-related variables.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;dplyr&amp;quot;, warn.conflicts	 = FALSE)
library(&amp;quot;lubridate&amp;quot;, warn.conflicts	 = FALSE)
library(&amp;quot;tidyr&amp;quot;, warn.conflicts	 = FALSE)
diff0 &amp;lt;- structure(0, class = &amp;quot;difftime&amp;quot;, units = &amp;quot;hours&amp;quot;)
cran &amp;lt;- cdh |&amp;gt; 
  filter(!is.na(version)) |&amp;gt; 
  distinct() |&amp;gt; 
  arrange(package, snapshot_time) |&amp;gt; 
  group_by(package, snapshot_time) |&amp;gt; 
  # Remove some duplicated packages in different folders
  mutate(n = seq_len(n())) |&amp;gt; 
  filter(n == n()) |&amp;gt; 
  ungroup() |&amp;gt; 
  select(-n) |&amp;gt; 
  arrange(package, snapshot_time, version) |&amp;gt; 
  # Packages last seen in queue less than 24 ago are considered same submission 
  # (even if their version number differs)
  mutate(diff_time = difftime(snapshot_time, lag(snapshot_time), units = &amp;quot;hour&amp;quot;),
         diff_time = if_else(is.na(diff_time), diff0, diff_time), # Fill NAs
         diff_v = version != lag(version),
         diff_v = if_else(is.na(diff_v), TRUE, diff_v), # Fill NAs
         near_t = abs(diff_time) &amp;lt;= 24,
         resubmission = !near_t | diff_v, 
         resubmission = if_else(resubmission == FALSE &amp;amp; diff_time == 0, 
                               TRUE, resubmission),
         resubmission_n = cumsum(as.numeric(resubmission)),
         new_version = !near(diff_time, 1, tol = 24) &amp;amp; diff_v, 
         new_version = if_else(new_version == FALSE &amp;amp; diff_time == 0, 
                               TRUE, new_version),
         submission_n = cumsum(as.numeric(new_version)), .by = package) |&amp;gt; 
  select(-diff_time, -diff_v, -new_version, -new_version, -near_t) |&amp;gt; 
  mutate(version = package_version(version, strict = FALSE),
         date = as_date(snapshot_time))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we need to compare with the CRAN archive to know if the submission were accepted.&lt;/p&gt;
&lt;p&gt;First we need to retrieve the data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cran_archive &amp;lt;- tools:::CRAN_archive_db()
# When row binding the data.frames that have only one row lose they row name:
# handle those cases to keep the version number:
archived &amp;lt;- vapply(cran_archive, NROW, numeric(1L))
names(cran_archive)[archived == 1L] &amp;lt;- vapply(cran_archive[archived == 1L], rownames, character(1L))
# Merge current and archive data
cran_dates &amp;lt;- do.call(rbind, cran_archive)
cran_dates$type &amp;lt;- &amp;quot;archived&amp;quot;
current &amp;lt;- tools:::CRAN_current_db()
current$type &amp;lt;- &amp;quot;available&amp;quot;
cran_h &amp;lt;- rbind(current, cran_dates)
# Keep minimal CRAN data archive
cran_h$pkg_v &amp;lt;- basename(rownames(cran_h))
rownames(cran_h) &amp;lt;- NULL
cda &amp;lt;- cran_h |&amp;gt; 
  mutate(strcapture(x = pkg_v, &amp;quot;^(.+)_([0-9]*.+).tar.gz$&amp;quot;, 
                    proto = data.frame(package = character(), version = character())),
         package = if_else(is.na(package), pkg_v, package)) |&amp;gt; 
  arrange(package, mtime) |&amp;gt; 
  mutate(acceptance_n = seq_len(n()), .by = package) |&amp;gt; 
  select(package, pkg_v, version, acceptance_n, date = mtime, uname, type) |&amp;gt; 
  mutate(date = as_date(date))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We use &lt;code&gt;tools:::CRAN_current_db&lt;/code&gt;, because &lt;code&gt;package.available&lt;/code&gt; will filter packages based on OS and other options (see the &lt;code&gt;filter&lt;/code&gt; argument).&lt;/p&gt;
&lt;p&gt;We can make a quick detour to plot the number of accepted articles and when were they first published:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggplot2&amp;quot;)
cdas &amp;lt;- cda |&amp;gt; 
  summarize(available = if_else(any(type == &amp;quot;available&amp;quot;), &amp;quot;available&amp;quot;, &amp;quot;archived&amp;quot;),
            published = min(date),
            n_published = max(acceptance_n),
            .by = package)

ggplot(cdas) + 
  geom_point(aes(published, n_published, col = available, shape = available)) +
  theme_minimal() +
  theme(legend.position = c(0.7, 0.8), legend.background = element_rect()) +
  labs(x = element_blank(), y = &amp;quot;Versions&amp;quot;, col = &amp;quot;Status&amp;quot;, shape = &amp;quot;Status&amp;quot;,
       title = &amp;quot;First publication of packages and versions published&amp;quot;) +
  scale_x_date(expand = expansion(), date_breaks = &amp;quot;2 years&amp;quot;, date_labels = &amp;quot;%Y&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2024/01/10/submission-cran-first-try/index.en_files/figure-html/cran-published-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In summary, there are 6291 packages archived, and 20304 available.
We can observe that there is a package that had more than 150 versions that was later archived.&lt;/p&gt;
&lt;p&gt;Now we can really compare the submission process with the CRAN archive:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cran_subm &amp;lt;- cran |&amp;gt; 
  summarise(
    resubmission_n = max(resubmission_n, na.rm = TRUE),
    submission_n = max(submission_n, na.rm = TRUE),
    # The number of submissions 
    submissions = resubmission_n - submission_n + 1,
    date = min(date),
    .by = c(&amp;quot;package&amp;quot;, &amp;quot;version&amp;quot;)) |&amp;gt; 
  arrange(package, version)
# Filter to those packages submitted in the period we have data
cda_acc &amp;lt;- cda |&amp;gt; 
  filter(date &amp;gt;= min(cran_subm$date)) |&amp;gt; 
  select(-pkg_v) |&amp;gt; 
  mutate(version = package_version(version, FALSE))

accepted_subm &amp;lt;- merge(cda_acc, cran_subm, by = c(&amp;quot;package&amp;quot;, &amp;quot;version&amp;quot;),
             suffixes = c(&amp;quot;.cran&amp;quot;, &amp;quot;.subm&amp;quot;), all = TRUE, sort = FALSE) |&amp;gt; 
  arrange(package, version, date.cran, date.subm) |&amp;gt; 
  mutate(submissions = if_else(is.na(submissions), 1, submissions),
         acceptance_n = if_else(is.na(acceptance_n), 0, acceptance_n))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can explore a little bit this data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lp &amp;lt;- scales::label_percent(accuracy = 0.1)
accepted_subm |&amp;gt; 
  summarize(cransays = sum(!is.na(date.subm)),
            cran = sum(!is.na(date.cran)),
            missed_submissions = cran - cransays,
            percentaged_missed = lp(missed_submissions/cran))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;cransays&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;cran&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;missed_submissions&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentaged_missed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;46525&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;50413&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;3888&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;7.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This means that &lt;a href=&#34;https://r-hub.github.io/cransays/articles/dashboard.html&#34;&gt;cransays&lt;/a&gt;, the package used to archive this data, misses ~8% of submissions, probably because they are handled in less than an hour!!
Another explanation might be because for some periods cransays bot didn’t work well…&lt;/p&gt;
&lt;p&gt;On the other hand we can look how long does it take for a version to be published on CRAN:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;accepted_subm |&amp;gt; 
  filter(!is.na(date.cran)) |&amp;gt; 
  mutate(time_diff = difftime(date.cran, date.subm, units = &amp;quot;weeks&amp;quot;)) |&amp;gt;
  # Calculate the number of accepted packages sine the recording of submissions
  mutate(accepted_n = acceptance_n - min(acceptance_n[acceptance_n != 0L], na.rm = TRUE) + 1, .by = package) |&amp;gt; 
  filter(time_diff &amp;gt;= 0) |&amp;gt; 
  ggplot() + 
  geom_point(aes(date.cran, time_diff, col = accepted_n)) +
  theme_minimal() +
  theme(legend.position = c(0.2, 0.8), legend.background = element_rect()) +
  labs(x = &amp;quot;Published on CRAN&amp;quot;, title = &amp;quot;Time since submitted to CRAN&amp;quot;, 
       y = &amp;quot;Weeks&amp;quot;, col = &amp;quot;Accepted&amp;quot;)
## Don&amp;#39;t know how to automatically pick scale for object of type &amp;lt;difftime&amp;gt;.
## Defaulting to continuous.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2024/01/10/submission-cran-first-try/index.en_files/figure-html/accepted_subm-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I explored some of those outliers and there is a package that was submitted in 2021 and two years later it was submitted with the same version.
In other cases the submission was done with more than 1 hour of tolerance (see the “new_version” variable creation in the second code chunk.)&lt;/p&gt;
&lt;p&gt;This means that the path to CRAN might be long and that developers do not change the version number on each submission.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This section is new after detecting problems with the way it was initially published.&lt;/p&gt;
&lt;p&gt;In the following function I calculate the number of submissions and similar information for each package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;count_submissions &amp;lt;- function(x) {
  x |&amp;gt; 
    mutate(submission_in_period = seq_len(n()),
           date.mix = pmin(date.cran, date.subm, na.rm = TRUE),
           .by = package, .after = acceptance_n) |&amp;gt; 
    summarise(
      # Number of accepted packages on CRAN
      total_accepted = sum(!is.na(date.cran), 0, na.rm = TRUE),
      # At minimum 0 through {cransays}
      through_cransays = sum(!is.na(date.subm), 0, na.rm = TRUE), 
      # In case same version number is submitted at different timepoints
      resubmissions = ifelse(any(!is.na(resubmission_n)), 
                              max(resubmission_n, na.rm = TRUE) - min(resubmission_n, na.rm = TRUE) - through_cransays, 0),
      resubmissions = if_else(resubmissions &amp;lt; 0L, 0L, resubmissions),
      # All submission + those that were duplicated on the submission system
      total_submissions = max(submission_in_period, na.rm = TRUE) + resubmissions,
      # The submissions that were not successful
      total_attempts = total_submissions - total_accepted,
      percentage_failed_submissions = lp(total_attempts/total_accepted), 
      .by = package)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I created a function to apply the same logic in whatever group I want to analyse.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Another relevant edit was that the selection criteria changed as I missed some packages in some analysis and included other that shouldn’t be.
Now we are ready to apply to those that got the first version of the package on CRAN:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;first_submissions &amp;lt;- accepted_subm |&amp;gt; 
  group_by(package) |&amp;gt; 
  # Keep submission that where eventually accepted
  filter(length(acceptance_n != 0L) &amp;gt; 1L &amp;amp;&amp;amp; any(acceptance_n[acceptance_n != 0L] == 1)) |&amp;gt; 
  # Keep submissions until the first acceptance but not after
  filter(cumsum(acceptance_n) &amp;lt;= 1L &amp;amp; seq_len(n()) &amp;lt;= which(acceptance_n == 1L)) |&amp;gt; 
  ungroup()
ffs &amp;lt;- first_submissions |&amp;gt;   
  count_submissions() |&amp;gt; 
  count(total_attempts, sort = TRUE,  name = &amp;quot;packages&amp;quot;) |&amp;gt; 
  mutate(percentage = lp(packages/sum(packages, na.rm = TRUE)))
ffs&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;total_attempts&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;packages&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;3390&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;65.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;1141&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;21.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;425&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;8.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;138&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;2.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;72&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;1.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;23&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;12&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;7&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;8&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;9&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;12&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;16&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This means that close to 35.0% first time submissions are rejected.
Including those that are not yet (never?) included on CRAN (~1000).&lt;/p&gt;
&lt;p&gt;This points out a problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the developers need to resubmit their packages and fix it more.&lt;/li&gt;
&lt;li&gt;the reviewers need to spend more time (approximately 50% of submissions are at one point or another handled by a human).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After this exercise we might wonder whether this is just for new packages?&lt;br /&gt;
If we look up those submissions that are not the first version of a package, we find the following:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;submissions_with_accepted &amp;lt;- accepted_subm |&amp;gt; 
  # Filter those that were included on CRAN (not all submission rejected)
  filter(any(acceptance_n &amp;gt;= 1), .by = package) |&amp;gt; 
  mutate(date.mix = pmin(date.cran, date.subm, na.rm = TRUE)) |&amp;gt; 
  group_by(package) |&amp;gt; 
  arrange(date.mix) |&amp;gt; 
  filter(
    # Those that start by 0 but next acceptance is 1 or higher
     cumsum(acceptance_n) &amp;gt;= 1L | 
       min(acceptance_n[acceptance_n != 0L], na.rm = TRUE) &amp;gt;= 2) |&amp;gt; 
  ungroup() 
fs_exp &amp;lt;- count_submissions(submissions_with_accepted)
fs_exp |&amp;gt; 
  count(more_failed = total_accepted &amp;gt; total_attempts, 
            sort = TRUE, name = &amp;quot;packages&amp;quot;) |&amp;gt; 
  mutate(percentage = lp(packages/sum(packages)))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;more_failed&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;packages&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;15337&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;96.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;600&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;3.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Still the majority of packages have more attempts than versions released in the period analysed.
Failing the checks on CRAN is normal, but how many more attempts are to CRAN?&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggrepel&amp;quot;)
ggplot(fs_exp) +
  geom_abline(slope = 1, intercept = 0, linetype = 2) +
  geom_count(aes(total_accepted, total_attempts)) +
  geom_label_repel(aes(total_accepted, total_attempts, label = package), data = . %&amp;gt;% filter(total_attempts &amp;gt;= 10)) +
  labs(x = &amp;quot;CRAN versions&amp;quot;, y = &amp;quot;Rejected submissions&amp;quot;,  size = &amp;quot;Packages&amp;quot;,
       title = &amp;quot;Packages after the first version&amp;quot;, subtitle = &amp;quot;for the period analyzed&amp;quot;) +
  scale_size(trans = &amp;quot;log10&amp;quot;) +
  theme_minimal() +
  theme(legend.position = c(0.8, 0.7), legend.background = element_rect())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2024/01/10/submission-cran-first-try/index.en_files/figure-html/failed-exp-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can see that there are packages with more than 30 versions on CRAN in these 3 years which never had a rejected submission.
Congratulations!!&lt;/p&gt;
&lt;p&gt;Others have a high number of submissions rejected, and very few versions:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fs_exp |&amp;gt; 
  count(total_attempts &amp;gt; total_accepted, name = &amp;quot;packages&amp;quot;) |&amp;gt; 
  mutate(percentage = lp(packages/sum(packages)))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;total_attempts &amp;gt; total_accepted&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;packages&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;15792&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;99.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;145&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;0.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Close to 1% require more than double submissions per version.&lt;/p&gt;
&lt;p&gt;Last we can see the overall experience for developers:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fs &amp;lt;- count_submissions(accepted_subm)

ggplot(fs) +
  geom_abline(slope = 1, intercept = 0, linetype = 2) +
  geom_count(aes(total_accepted, total_attempts)) +
  geom_label_repel(aes(total_accepted, total_attempts, label = package), 
                   data = . %&amp;gt;% filter(total_attempts &amp;gt;= 12)) +
  labs(x = &amp;quot;CRAN versions&amp;quot;, y = &amp;quot;Rejected submissions&amp;quot;,  size = &amp;quot;Packages&amp;quot;,
       title = &amp;quot;All packages submissions&amp;quot;, subtitle = &amp;quot;for the period analyzed ~174 weeks&amp;quot;) +
  theme_minimal() +
  scale_size(trans = &amp;quot;log10&amp;quot;) +
  theme(legend.position = c(0.8, 0.7), legend.background = element_rect())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2024/01/10/submission-cran-first-try/index.en_files/figure-html/plot-failed-submissions-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;It doesn’t change much between the experienced.
Note that this only add the packages that were not approved ever and the submissions done to be first accepted.
So the changes should only be observable on the bottom left corner of the plot.&lt;/p&gt;
&lt;p&gt;Overall, 14.5% of the attempts end up being rejected.&lt;/p&gt;
&lt;div id=&#34;main-take-away&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Main take away&lt;/h2&gt;
&lt;p&gt;Submitting to CRAN is not easy on the first try, and it usually requires 2 submissions for each accepted version.&lt;br /&gt;
While &lt;a href=&#34;https://cran.r-project.org/doc/manuals/r-devel/R-exts.html&#34;&gt;Writing R extensions&lt;/a&gt; document is clear, it might be too extensive for many cases.&lt;br /&gt;
The &lt;a href=&#34;https://cran.r-project.org/web/packages/policies.html&#34;&gt;CRAN policy&lt;/a&gt; is short but might not be clear enough for new maintainers.&lt;br /&gt;
A document in the middle might be &lt;a href=&#34;https://r-pkgs.org/&#34;&gt;R packages&lt;/a&gt; but it is still extensive and focused on only a small opionated set of packages.&lt;br /&gt;
A CRAN Task View or some training might be a good solution to reduce the overall problem.&lt;br /&gt;
For those maintainers struggling, maybe clearer technical or editorial decisions might be a good solution.&lt;/p&gt;
&lt;p&gt;In addition, it seems that packages having more problems with the submissions are not new: experienced maintainers have troubles getting their package accepted when submitting them.&lt;br /&gt;
This might hint to troubles replicating the CRAN checks or environments or the scale of the checks (dependency checks).&lt;br /&gt;
Maybe focusing on helping those packages’ maintainer might provide a good way to help CRAN maintainers reduce their load.&lt;/p&gt;
&lt;p&gt;I also want to comment that this analysis could be improved if we knew, whether the rejection was automatic or manual.&lt;br /&gt;
This would allow to see the burden on CRAN volunteers and perhaps define better the problem and propose better solutions.&lt;br /&gt;
It could be attempted by looking the last folder of a package in the submission process, but it would still not be clear what the most frequent problem is.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;bonus&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Bonus&lt;/h2&gt;
&lt;p&gt;From all the new packages more than half are already archived (with either newer versions or totally):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;accepted_subm |&amp;gt; 
  filter(acceptance_n == 1L) |&amp;gt; 
  count(status = type, name = &amp;quot;packages&amp;quot;) |&amp;gt; 
  mutate(percentage = lp(packages/sum(packages)))&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;status&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;packages&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;archived&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;4763&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;65.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;available&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;2517&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;34.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Of them:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fully_archived &amp;lt;- accepted_subm |&amp;gt;
  filter(acceptance_n != 0L) |&amp;gt; 
  filter(any(acceptance_n == 1L), .by = package) |&amp;gt; 
  summarize(archived = all(type == &amp;quot;archived&amp;quot;), .by = package) |&amp;gt; 
  count(archived, name = &amp;quot;packages&amp;quot;) |&amp;gt; 
  mutate(percentage = lp(packages/sum(packages)))
fully_archived&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;center&#34;&gt;archived&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;packages&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;center&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;6783&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;93.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;center&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;497&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;6.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Only 6.8% of packages were fully archived at the end of this period 2020-09-12, 2024-01-20.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
##  os       Ubuntu 22.04.3 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2024-01-20
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.18    2023-06-19 [1] CRAN (R 4.3.1)
##  bookdown      0.37    2023-12-01 [1] CRAN (R 4.3.1)
##  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.3.1)
##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.1)
##  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
##  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.1)
##  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
##  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
##  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
##  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
##  farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.1)
##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.1)
##  ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
##  ggrepel     * 0.9.5   2024-01-10 [1] CRAN (R 4.3.1)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
##  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.3.1)
##  highr         0.10    2022-12-22 [1] CRAN (R 4.3.1)
##  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
##  knitr       * 1.45    2023-10-30 [1] CRAN (R 4.3.2)
##  labeling      0.4.3   2023-08-29 [1] CRAN (R 4.3.2)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
##  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.1)
##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.1)
##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.1)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.1)
##  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.1)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
##  Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.3.1)
##  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
##  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
##  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
##  sass          0.4.8   2023-12-06 [1] CRAN (R 4.3.1)
##  scales        1.3.0   2023-11-28 [1] CRAN (R 4.3.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.1)
##  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.1)
##  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.1)
##  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.1)
##  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.2)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
##  withr         2.5.2   2023-10-30 [1] CRAN (R 4.3.2)
##  xfun          0.41    2023-11-01 [1] CRAN (R 4.3.2)
##  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
## 
##  [1] /home/lluis/bin/R/4.3.1
##  [2] /opt/R/4.3.1/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Packaging R: repositories</title>
      <link>https://llrs.dev/post/2023/12/09/packaging-r-repositories/</link>
      <pubDate>Sat, 09 Dec 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/12/09/packaging-r-repositories/</guid>
      <description>


&lt;p&gt;In this post I want to collect some thoughts about R repositories.
In R we have multiple repositories that store packages for users.
In this post I want to write about the purpose, functionality, benefits and drawbacks of R repositories and how packages are managed.
The goal is to summarize what I’ve learnt these last years about them.
I’ll also collect some information about them from various sources to make it easier for myself to find it later on.&lt;/p&gt;
&lt;p&gt;I am writing this because I am worried about the future of CRAN and R.
Due to multiple circumstances, the current position is not sustainable as is.
I hope that this post, will help me to understand the past, present and create some concrete steps to do.&lt;/p&gt;
&lt;div id=&#34;history&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;History&lt;/h1&gt;
&lt;p&gt;I was not there, but the first repository started around April 1997.
This repository is CRAN: the Comprehensive R Archive Network.
The &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-devel/1997-April/017026.html&#34;&gt;first mention&lt;/a&gt; I found is already about changes in it, but it was not until the end of the month when &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-announce/1997/000001.html&#34;&gt;it was announced&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;CRAN was created by a few volunteers, some of which are still mainting it 25 years later.
The current team is listed on &lt;a href=&#34;https://cran.r-project.org/CRAN_team.htm&#34;&gt;their website&lt;/a&gt;.
From the beginning it was “a collection of sites which carry identical material, consisting of the R&amp;amp;R R distribution(s), the contributed extensions, documentation for R, and binaries.”&lt;/p&gt;
&lt;p&gt;Omegahat was another repository created &lt;a href=&#34;https://omegahat.net/&#34;&gt;shortly after CRAN&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Omega project began in July, 1998, with discussions among designers responsible for three current statistical languages (S, R, and Lisp-Stat), with the idea of working together on new directions with special emphasis on web-based software, Java, the Java virtual machine, and distributed computing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Many developers of Omegahat were in the R Core or CRAN team.
It was available as a repository from the R source code but was removed definitely in version R 4.1, in 2021&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Bioconductor, was the next major repository that appeared.
It was funded by Robert Gentleman and others in 2004 (it started the mailing list).
A paper describing it &lt;a href=&#34;https://doi.org/10.1186/gb-2004-5-10-r80&#34;&gt;appeared in late 2004&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;an initiative for the collaborative creation of extensible software for computational biology and bioinformatics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Through its history repositories have evolved with R and R with them.
For example: R was released twice a year at the beginning, and Bioconductor did too.
But when R moved to be released once per year (in 2013 with version 3.0) Bioconductor kept using two releases a year.
This introduced some problems when installing packages from Bioconductor, when a single R release can be compatible with two Bioconductor releases&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In other cases, checks have evolved.
For instance &lt;a href=&#34;https://en.wikipedia.org/wiki/Oracle_Solaris&#34;&gt;Solaris&lt;/a&gt; was used to test packages in CRAN until 2021, if I recall correctly, because it allowed to test in a proprietary C or C++ compiler.
This lead to discover bugs but also to more distress in R-package developers which had difficulties checking their packages in that environment.&lt;/p&gt;
&lt;p&gt;Other checks evolve with R, becoming more strict with time: In the early versions of R the use of NAMESPACE was not regulated.
But since R version 2.15, 2012 it was compulsory even for data-only packages&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;.
This was synchronized with repositories checks.&lt;/p&gt;
&lt;p&gt;Last, some goals/desires of CRAN are not fulfilled (or where abandoned).
For example, from the start CRAN aimed to have packages authenticated (see the bottom of &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-announce/1997/000001.html&#34;&gt;the announcement&lt;/a&gt;).
This might be due to lack of time, resources or that the plans are in progress but require (volunteer) time.&lt;/p&gt;
&lt;p&gt;With time, different repositories arose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MRAN, which was available since September 17th, 2014 to July 1st, 2022.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Rstudio Public Package Manager later renamed &lt;a href=&#34;https://packagemanager.posit.co/&#34;&gt;Posit Public Package Manager&lt;/a&gt; has &lt;a href=&#34;https://posit.co/blog/the-road-to-building-ten-million-binaries/&#34;&gt;binaries for several OS&lt;/a&gt; since 2019.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is the &lt;a href=&#34;https://pkgs.r4pi.org/&#34;&gt;R4pi repository&lt;/a&gt; with binaries for Raspberry Pi.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I remember a proteomics repository available.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;rOpenSci started its own repository which later evolved into the &lt;a href=&#34;https://r-universe.org&#34;&gt;r-universe&lt;/a&gt;.
The r-universe currently can provide binaries of packages that are hosted in a git repository.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;literature&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Literature&lt;/h1&gt;
&lt;p&gt;The role and prominence of the repositories has lead to many articles being written about it.
I wanted to link and collect some of them for easier retrieval.&lt;/p&gt;
&lt;p&gt;I was wondering how CRAN is described by the volunteers that built it.
From the announcing email:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CRAN is a collection of sites which carry identical material, consisting of the R&amp;amp;R R distribution(s), the contributed extensions, documentation for R, and binaries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From the &lt;a href=&#34;https://cran.r-project.org&#34;&gt;website&lt;/a&gt; (at 2023/12/09):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Initially there was R NEWS, with an article dedicated to CRAN and one to Omegahat too.
These articles usually describe new package additions but sometimes they also provide information about changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2001-1-cran&#34;&gt;CRAN-2001-1&lt;/a&gt;: It list new packages, &lt;a href=&#34;https://journal.r-project.org/news/RN-2001-2-cran&#34;&gt;CRAN-2001-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2001-3-cran&#34;&gt;CRAN-2001-3&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/articles/RN-2001-008/&#34;&gt;Omegahat-2001-3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2002-1-cran&#34;&gt;CRAN-2002-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2002-2-cran/&#34;&gt;CRAN-2002-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2002-3-cran/&#34;&gt;CRAN-2002-3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2003-1-cran/&#34;&gt;CRAN-2003-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2003-2-cran/&#34;&gt;CRAN-2003-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2003-3-cran/&#34;&gt;CRAN-2003-3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2004-1-cran/&#34;&gt;CRAN-2004-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2004-2-cran/&#34;&gt;CRAN-2004-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2005-1-cran/&#34;&gt;CRAN-2005-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2005-2-cran/&#34;&gt;CRAN-2005-2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Since 2006 there is also an article about Bioconductor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2006-2-cran/&#34;&gt;CRAN-2006-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2006-2-bioc&#34;&gt;Bioc-2006-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2007-1-cran/&#34;&gt;CRAN-2007-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2007-2-cran/&#34;&gt;CRAN-2007-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2007-2-bioc&#34;&gt;Bioc-2007-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2007-3-cran/&#34;&gt;CRAN-2007-3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RN-2008-1-cran/&#34;&gt;CRAN-2008-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2008-1-bioc&#34;&gt;Bioc-2008-1&lt;/a&gt; &lt;a href=&#34;https://journal.r-project.org/news/RN-2008-2-cran/&#34;&gt;CRAN-2008-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RN-2008-2-bioc&#34;&gt;Bioc-2008-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Later it became the &lt;a href=&#34;https://journal.r-project.org/&#34;&gt;R Journal&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/issues/2009-1/RJ-2009-1.pdf&#34;&gt;CRAN-2009-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/issues/2009-2/RJ-2009-2.pdf&#34;&gt;CRAN-2009-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/issues/2010-1/RJ-2010-1.pdf&#34;&gt;CRAN-2010-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/issues/2010-2/RJ-2010-2.pdf&#34;&gt;CRAN-2010-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/issues/2011-1/RJ-2011-1.pdf&#34;&gt;CRAN-2011-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/issues/2011-2/RJ-2011-2.pdf&#34;&gt;CRAN and Bioconductor 2011-2&lt;/a&gt;.
In the bioconductor section it mentions conference, and important directions for the Bioconductor core.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/issues/2012-1/RJ-2012-1.pdf&#34;&gt;CRAN-2012-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/issues/2012-2/RJ-2012-2.pdf&#34;&gt;CRAN and Bioconductor 2012-2&lt;/a&gt;: Mentions &lt;code&gt;biocLite()&lt;/code&gt; to install packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2013-1-cran&#34;&gt;CRAN-2013-1&lt;/a&gt; &lt;a href=&#34;https://journal.r-project.org/news/RJ-2013-1-bioconductor/&#34;&gt;Bioc-2013-1&lt;/a&gt;: mentions better integration of parallel evaluation.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2013-2-cran/&#34;&gt;CRAN-2013-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2013-2-bioconductor/&#34;&gt;Bioc-2013-2&lt;/a&gt;: Mentions again AnnotationHub&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2014-1-cran/&#34;&gt;CRAN-2014-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2014-1-bioconductor/&#34;&gt;Bioc-2014-1&lt;/a&gt;: Mentions the git-svn bridge to synchronize git and svn repository.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2014-2-cran/&#34;&gt;CRAN-2014-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2014-2-bioconductor/&#34;&gt;Bioc-2014-2&lt;/a&gt;: Bioconductor 3.0 release, besides some packages Amazon Machine Image are offered as well as docker images.
Packages are required to pass BiocCheck, checks in a different package specific for Bioconductor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2015-1-cran/&#34;&gt;CRAN-2015-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2015-1-bioconductor/&#34;&gt;Bioc-2015-1&lt;/a&gt;: Same mentions as the previous and encouragement to guidelines an package submission.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2015-2-cran/&#34;&gt;CRAN-2015-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2015-2-bioconductor/&#34;&gt;Bioc-2015-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2016-1-cran/&#34;&gt;CRAN-2016-1&lt;/a&gt;: on this article there is a plot of the number of CRAN packages and time, and doesn’t list all packages listed.
It explicitly mentions that the CRAN team asked for help processing package submissions and some people stepped up.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2016-1-bioconductor/&#34;&gt;Bioc-2016-1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2016-2-cran/&#34;&gt;CRAN-2016-2&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2016-2-bioc/&#34;&gt;Bioc-2016-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2017-1-cran/&#34;&gt;CRAN-2017-1&lt;/a&gt;: mentions changes in CRAN checks, adding new memory access and static code analysis checks.
It mentions that the submission has moved to a more automated one.
It also mentions changes in the CRAN Repository Policy.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2017-1-bioc/&#34;&gt;Bioc-2017-1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2018-1-cran/&#34;&gt;CRAN-2018-1&lt;/a&gt;: checks in alternative BLAS/LAPACK implementations, the submission pipeline is defined.
First time the amount of action taken by CRAN reviewers is listed in two categories automatic and manual.
Changes in repository policy are listed.
Changes in location of package repository archive , &lt;a href=&#34;https://journal.r-project.org/news/RJ-2018-1-bioc/&#34;&gt;Bioc-2018-1&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2018-2-cran/&#34;&gt;CRAN-2018-2&lt;/a&gt;: Changes in policy; packages should not give a check warning nor error.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2018-2-bioc/&#34;&gt;Bioc-2018-2&lt;/a&gt;: Moved to BiocManager to install packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2019-1-cran/&#34;&gt;CRAN-2019-1&lt;/a&gt;: More mentions to CRAN mirror security.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2019-2-cran/&#34;&gt;CRAN-2019-2&lt;/a&gt;: Updates in checklist for CRAN submissions, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2019-2-bioc/&#34;&gt;Bioc-2019-2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2020-1-cran/&#34;&gt;CRAN-2020-1&lt;/a&gt;: Many changes in CRAN policies.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2020-2-cran/&#34;&gt;CRAN-2020-2&lt;/a&gt;: Many changes to CRAN policies.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2020-2-bioc/&#34;&gt;Bioc-2020-2&lt;/a&gt;: Announces the Technical and Community advisory boards (as well as the project-wide Code of Conduct).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2021-1-cran/&#34;&gt;CRAN-2021-1&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2021-1-bioc/&#34;&gt;Bioc-2021-1&lt;/a&gt;: Mentions conferences that will be virtual.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2021-2-cran/&#34;&gt;CRAN-2021-2&lt;/a&gt;: Shows an &lt;a href=&#34;https://cran.r-project.org/incoming/&#34;&gt;incomig&lt;/a&gt; path [See &lt;a href=&#34;https://r-hub.github.io/cransays/articles/dashboard.html&#34;&gt;this friendly viewer&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2021-2-bioc/&#34;&gt;Bioc-2021-2&lt;/a&gt;: Mentions AnVIL and two online workshops to develop workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-1-cran/&#34;&gt;CRAN-2022-1&lt;/a&gt;: List a change in CRAN policy and the CRAN Task View Initiative.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-2-cran/&#34;&gt;CRAN-2022-2&lt;/a&gt;: List some more repository policies.
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-2-bioconductor/&#34;&gt;Bioc-2022-2&lt;/a&gt;: Lists infrastructure updates (and its funding), changes in the core team and new initiatives.&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-3-cran/&#34;&gt;CRAN-2022-3&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-3-bioconductor/&#34;&gt;Bioc-2022-3&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-4-cran/&#34;&gt;CRAN-2022-4&lt;/a&gt;, &lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-4-bioconductor/&#34;&gt;Bioc-2022-4&lt;/a&gt;: default branch renaming, partnership with Outreachy and blog are featured.
Several working groups provide updates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://journal.r-project.org/news/RJ-2023-1-cran/&#34;&gt;CRAN-2023-1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition, several articles and blog posts have appeared.
From those I found it is worth mentioning the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://doi.org/10.17713/ajs.v41i1.188&#34;&gt;Are There Too Many R Packages?&lt;/a&gt; and &lt;a href=&#34;https://www.r-bloggers.com/2014/04/does-r-have-too-many-packages/&#34;&gt;derived posts&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/&#34;&gt;Hacking Bioconductor&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And my own posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/&#34;&gt;Reasons CRAN packages are archived&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/&#34;&gt;CRAN files part 1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/&#34;&gt;CRAN files part 2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/&#34;&gt;CRAN maintained packages&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2021/01/31/cran-review/&#34;&gt;CRAN review&lt;/a&gt; (and the &lt;a href=&#34;https://llrs.dev/talk/user-2021/&#34;&gt;talk at useRs 2021&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2020/07/31/bioconductor-submissions-reviews/&#34;&gt;Bioconductor review&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/post/2020/09/02/ropensci-submissions/&#34;&gt;rOpenSci&lt;/a&gt; reviews&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://llrs.dev/2020/07/bioconductor-submissions-reviews/&#34;&gt;Bioconductor reviews&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The article &lt;a href=&#34;https://journal.r-project.org/articles/RJ-2009-014/&#34;&gt;“Aspects of the Social Organization and Trajectory of the R Project”&lt;/a&gt;, from the R Journal 2009, also has a section about CRAN, noting that it “is challenged by its own success”.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;characteristics&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Characteristics&lt;/h1&gt;
&lt;p&gt;The predominance of CRAN and its role as primary and default R repository has lead to some special treatment of the repository.&lt;/p&gt;
&lt;p&gt;CRAN checks are in the R source code itself.
While other repositories have their own checks in different tools.
In addition, the CRAN environmental variables used are documented in the &lt;a href=&#34;https://cran.r-project.org/doc/manuals/r-release/R-ints.html&#34;&gt;R-internals&lt;/a&gt; (they are more or less accessible in the &lt;a href=&#34;https://svn.r-project.org/R-dev-web/trunk/CRAN/&#34;&gt;svn repository&lt;/a&gt; too).&lt;/p&gt;
&lt;p&gt;Others who know more have stated the benefits of CRAN too: This text is copied from Henrik Bengstsson in &lt;a href=&#34;https://community-bioc.slack.com/archives/CLF37V6C8/p1698869264884649?thread_ts=1698804037.467439&amp;amp;cid=CLF37V6C8&#34; title=&#34;Link to the thread&#34;&gt;Bioconductor Slack&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;FOREVER ARCHIVE:&lt;/p&gt;
&lt;p&gt;The first one is that it publishes packages and versions of them until the end of time.
When a package has been published on CRAN, it takes a lot for it to be removed from there.
I don’t know if it ever happened, but I can imagine a package can be fully removed if it was illegally published in the first place (e.g. copyright, illegal content, ...) or malicious.&lt;/p&gt;
&lt;p&gt;INSTALLATION SERVICE:&lt;/p&gt;
&lt;p&gt;Then CRAN also provides a R package repository service for installing packages on CRAN using built-in R functions.
The set of packages in the package repo is a subset of all packages on CRAN.
The CRAN package repo makes a promise that all packages listed in PACKAGES can be installed.
If they cannot make that promise, they’ll archive the package (=remove it from PACKAGES).
I should also say, install.packages(url) can be used to install from the set of packages that are archived.
Technically, old package versions are always archived.&lt;/p&gt;
&lt;p&gt;CHECK SERVICE:&lt;/p&gt;
&lt;p&gt;The content of the R package repository is guided by the CRAN package checks that run on R-oldrel, R-release, and R-devel across multiple platforms.
The minimal requirement is that no package should remain in the package repository if the checks detects ERRORs (and those errors are not due to recently introduced bugs in R-devel).
WARNINGs can also cause a package to be archived, but that process often takes longer.
AFAIK, NOTEs are not a cause for a package being archived (but I could be wrong).
The CRAN incoming checks, which you have to pass when you submit a new package, or an updated version, will make sure that the published package pass with all OKs.
(It’s possible to argue for NOTEs being false positives, or for them not to be fixed, but that requires a manual approval by the CRAN Team).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think there are many more resources discussing R repositories.
If you know more I’ll be happy to update this post.&lt;/p&gt;
&lt;p&gt;I think before I drag too much on the steps I’ll post this and collect some more articles I might have missed.&lt;/p&gt;
&lt;p&gt;Last, Uwe Liegges presented about &lt;a href=&#34;https://www.youtube.com/watch?v=-vX-CDiiZKI&#34;&gt;CRAN in useR!2017&lt;/a&gt;, thanks Tim Taylor for &lt;a href=&#34;https://fosstodon.org/@_TimTaylor/111612010185631808&#34;&gt;sharing it&lt;/a&gt;. In this video there is an explanation of why the solaris OS was used.&lt;/p&gt;
&lt;p&gt;It has come to my attention that there is an article, by G. Brooke Anderson and Dirk Eddelbuette, about the R package repositories structure (among other things): &lt;a href=&#34;https://journal.r-project.org/archive/2017/RJ-2017-026/RJ-2017-026.pdf&#34;&gt;Hosting Data Packages via drat: A Case Study with Hurricane Exposure Data&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
##  os       Ubuntu 22.04.3 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2024-01-15
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.18    2023-06-19 [1] CRAN (R 4.3.1)
##  bookdown      0.37    2023-12-01 [1] CRAN (R 4.3.1)
##  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.3.1)
##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.1)
##  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
##  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
##  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
##  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
##  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.2)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
##  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
##  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
##  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
##  sass          0.4.8   2023-12-06 [1] CRAN (R 4.3.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
##  xfun          0.41    2023-11-01 [1] CRAN (R 4.3.2)
##  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
## 
##  [1] /home/lluis/bin/R/4.3.1
##  [2] /opt/R/4.3.1/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes footnotes-end-of-document&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;In version 3.1.2 &lt;a href=&#34;https://cran.r-project.org/doc/manuals/NEWS.3&#34;&gt;Omegahat didn’t provide&lt;/a&gt; Windows binaries and in 4.1 from the default repositories (See 4.1 in &lt;a href=&#34;https://cran.r-project.org/doc/manuals/r-release/NEWS.html&#34;&gt;NEWS(.4)&lt;/a&gt;).&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;This lead to the need of having a special function to install packages from Bioconductor.
Initially a function &lt;code&gt;biocLite&lt;/code&gt; and later with the &lt;a href=&#34;https://cran.r-project.org/package=BiocManager&#34;&gt;BiocManager package&lt;/a&gt;.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/doc/manuals/NEWS.2&#34;&gt;NEWS in 2.15 section&lt;/a&gt;&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>CRAN maintained packages</title>
      <link>https://llrs.dev/post/2023/05/03/cran-maintained-packages/</link>
      <pubDate>Wed, 03 May 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/05/03/cran-maintained-packages/</guid>
      <description>


&lt;p&gt;The role of package managers in software is paramount for developers.
In R the CRAN team provides a platform to tests and host packages.
This means ensuring that R dependencies are up to date and software required by some packages are also available in CRAN.&lt;/p&gt;
&lt;p&gt;This helps testing ~20000 packages frequently (daily for most packages) in several architectures and R versions.
In addition, they test updates for compatibility with the dependencies and test and review new packages.&lt;/p&gt;
&lt;p&gt;Most of the work with packages is automated but often requires human intervention (&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-4-cran/#cran-package-submissions&#34;&gt;50% of the submisions&lt;/a&gt;).
Another consuming activity is keeping up packages abandoned by their original maintainers.&lt;/p&gt;
&lt;p&gt;While newer packages are &lt;a href=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/&#34;&gt;archived from CRAN often&lt;/a&gt;, some old packages were adopted by CRAN.
The &lt;a href=&#34;https://cran.r-project.org/CRAN_team.htm&#34;&gt;CRAN team&lt;/a&gt; is &lt;a href=&#34;https://mastodon.social/@henrikbengtsson/110186925898457474&#34;&gt;looking for help&lt;/a&gt; maintining those.&lt;/p&gt;
&lt;p&gt;In this post I’ll explore the packages maintained by CRAN.&lt;/p&gt;
&lt;div id=&#34;cran-in-packages&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;CRAN in packages&lt;/h1&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages_db &amp;lt;- as.data.frame(tools::CRAN_package_db())
cran_author &amp;lt;- grep(&amp;quot;CRAN Team&amp;quot;, x = packages_db$Author, ignore.case = TRUE)
cran_authorsR &amp;lt;- grep(&amp;quot;CRAN Team&amp;quot;, x = packages_db$`Authors@R`, ignore.case = TRUE)
CRAN_TEAM_mentioned &amp;lt;- union(cran_author, cran_authorsR)
unique(packages_db$Package[CRAN_TEAM_mentioned])
## [1] &amp;quot;fBasics&amp;quot;   &amp;quot;fMultivar&amp;quot; &amp;quot;geiger&amp;quot;    &amp;quot;plotrix&amp;quot;   &amp;quot;RCurl&amp;quot;     &amp;quot;RJSONIO&amp;quot;  
## [7] &amp;quot;udunits2&amp;quot;  &amp;quot;XML&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In some of these package the CRAN team appears as contributors because they provided help/code to fix bugs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=geiger&#34;&gt;geiger&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=fMultivar&#34;&gt;fMultivar&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=fBasics&#34;&gt;fBasics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=udunits2&#34;&gt;udunits2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In others they are the maintainers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=XML&#34;&gt;XML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=RCurl&#34;&gt;RCurl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=RJSONIO&#34;&gt;RJSONIO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From these three packages RJSONIO is the newest (first release in 2010) and requires less updates (lately 1 or 2 a year).
However, in 2022 RCurl and XML required 4 and 5 updates respectively.
I will focus on these packages as these are the ones they are looking for new maintainers.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rcurl-and-xml&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;RCurl and XML&lt;/h1&gt;
&lt;div id=&#34;circular-dependency&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Circular dependency&lt;/h2&gt;
&lt;p&gt;Both XML and RCurl depend on each other.&lt;/p&gt;
&lt;p&gt;We can see that the packages are direct dependencies of one of their direct dependencies!
How can be that?
If we go the the &lt;a href=&#34;https://cran.r-project.org/package=RCurl&#34;&gt;RCurl&lt;/a&gt; website we see in “Suggests: XML”, and in the &lt;a href=&#34;https://cran.r-project.org/package=XML&#34;&gt;XML&lt;/a&gt; website the RCurl is there too.
This circular dependency is allowed because they have each other in Suggests.&lt;/p&gt;
&lt;p&gt;A first step to reduce any possible problem would be to separate them.
This would make it easier understanding which package is worth prioritizing and possible missteps will have less impact.&lt;/p&gt;
&lt;p&gt;If we look at &lt;a href=&#34;https://github.com/search?q=repo%3Acran%2FXML%20RCurl&amp;amp;type=code&#34;&gt;XML source code for RCurl we find&lt;/a&gt; some code in &lt;code&gt;inst/&lt;/code&gt; folder.
If these two cases were removed the package could remove its dependency to RCurl.&lt;/p&gt;
&lt;p&gt;Similarly, if we look at &lt;a href=&#34;https://github.com/search?q=repo%3Acran%2FRCurl%20XML&amp;amp;type=code&#34;&gt;RCurl source code for XML we find&lt;/a&gt; some code in &lt;code&gt;inst/&lt;/code&gt; folder and in some examples.
If these three cases were removed the package could remove its dependency to XML.&lt;/p&gt;
&lt;p&gt;RCurl has been &lt;a href=&#34;https://diffify.com/R/RCurl/1.95-4.9/1.98-1.12&#34;&gt;more stable&lt;/a&gt; than XML, which have seen &lt;a href=&#34;https://diffify.com/R/XML/3.98-1.7/3.99-0.14&#34;&gt;new functions added and one removed&lt;/a&gt; since CRAN is maintaining it.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;relevant-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Relevant data&lt;/h2&gt;
&lt;p&gt;We will look at 4 sets of data for each pacakge: &lt;a href=&#34;#dependencies&#34;&gt;dependencies&lt;/a&gt;, &lt;a href=&#34;#releases&#34;&gt;releases&lt;/a&gt;, &lt;a href=&#34;#maintainers&#34;&gt;maintainers&lt;/a&gt; and &lt;a href=&#34;#downloads&#34;&gt;downloads&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;dependencies&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Dependencies&lt;/h3&gt;
&lt;p&gt;Both packages have some system dependencies which might make the maintenance harder.
In addition they have a large number of dependencies.
We can gather the dependencies in CRAN and Bioconductor software packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tools&amp;quot;)
# Look up only software dependencies in Bioconductor
options(repos = BiocManager::repositories()[c(&amp;quot;BioCsoft&amp;quot;, &amp;quot;CRAN&amp;quot;)])
ap &amp;lt;- available.packages()
all_deps &amp;lt;- package_dependencies(c(&amp;quot;RCurl&amp;quot;, &amp;quot;XML&amp;quot;), 
                                 reverse = TRUE, db = ap, which = &amp;quot;all&amp;quot;)
all_unique_deps &amp;lt;- unique(unlist(all_deps, FALSE, FALSE))
first_deps &amp;lt;- package_dependencies(all_unique_deps, db = ap, which = &amp;quot;all&amp;quot;)
first_deps_strong &amp;lt;- package_dependencies(all_unique_deps, db = ap, which = &amp;quot;strong&amp;quot;)
strong &amp;lt;- sapply(first_deps_strong, function(x){any(c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;) %in% x)})
deps_strong &amp;lt;- package_dependencies(all_unique_deps, recursive = TRUE, 
                                 db = ap, which = &amp;quot;strong&amp;quot;)
first_rdeps &amp;lt;- package_dependencies(all_unique_deps, 
                                   reverse = TRUE, db = ap, which = &amp;quot;all&amp;quot;)
deps_all &amp;lt;- package_dependencies(all_unique_deps, recursive = TRUE, 
                                 db = ap, which = &amp;quot;all&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;They have 495 direct dependencies (and 8 more in annotation packages in Bioconductor: recount3, ENCODExplorerData, UCSCRepeatMasker, gDNAinRNAseqData, qdap, qdapTools, metaboliteIDmapping and curatedBreastData).&lt;/p&gt;
&lt;p&gt;These two packages with their dependencies are used one way or another by around 20000 packages (about 90% of CRAN and Bioconductor)!
If these packages fail the impact on the community will be huge.&lt;/p&gt;
&lt;p&gt;To reduce the impact of the dependencies we should look up the direct dependencies.
But we also looked at the reverse dependencies to asses the impact of the package in the other packages.&lt;/p&gt;
&lt;p&gt;Know which are these, and who maintain them will help decide what is the best course of action.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;releases&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Releases&lt;/h3&gt;
&lt;p&gt;A first approach is looking into the number of releases and dates to asses if the package has an active maintainer or not:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;archive &amp;lt;- tools:::CRAN_archive_db()[all_unique_deps]
packages &amp;lt;- tools::CRAN_package_db()
library(&amp;quot;dplyr&amp;quot;)
library(&amp;quot;BiocPkgTools&amp;quot;)
fr &amp;lt;- vapply(archive, function(x) {
  if (is.null(x)) {
    return(NA)
  }
  as.Date(x$mtime[1])
}, FUN.VALUE = Sys.Date())
fr_bioc &amp;lt;- biocDownloadStats(&amp;quot;software&amp;quot;) |&amp;gt; 
  filter(Package %in% all_unique_deps) |&amp;gt; 
  firstInBioc() |&amp;gt; 
  pull(Date, name = Package)
first_release &amp;lt;- c(as.Date(fr[!is.na(fr)]), as.Date(fr_bioc))[all_unique_deps]
last_update &amp;lt;- packages$Published[match(all_unique_deps, packages$Package)]
releases &amp;lt;- vapply(archive, NROW, numeric(1L)) + 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We only have information about CRAN packages:&lt;br /&gt;
Bioconductor has two releases every year, and while the maintainers can release patched versions of packages between them that information is not stored (or easily retrieved, they are still available in the &lt;a href=&#34;https://code.bioconductor.org&#34;&gt;git server&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Even if Bioconductor maintainers didn’t modify the package the version number increases with each release.
But the version update in the git doesn’t propagate to users automatically unless their checks pass.
For all these reasons it doesn’t make sense to count releases of packages in Bioconductor.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;maintainers&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Maintainers&lt;/h3&gt;
&lt;p&gt;Now that we know which packages are more active, we can look up for the people behind it.
This way we can prioritize working with maintainers that are known to be active&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maintainers &amp;lt;- packages_db$Maintainer[match(all_unique_deps, packages_db$Package)]
maintainers &amp;lt;- trimws(gsub(&amp;quot;&amp;lt;.+&amp;gt;&amp;quot;, &amp;quot;&amp;quot;, maintainers))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once again, the Bioconductor repository doesn’t provide a file to gather this kind of data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;downloads&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Downloads&lt;/h3&gt;
&lt;p&gt;Another variable we can use are the downloads from users of said packages.
Probably, packages more downloaded are used more and a breaking change on them will have impact on more people than other packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;cranlogs&amp;quot;)
acd &amp;lt;- cran_downloads(intersect(all_unique_deps, packages_db$Package), 
                          when = &amp;quot;last-month&amp;quot;)
cran_pkg &amp;lt;- summarise(acd, downloads = sum(count), .by = package)
loc &amp;lt;- Sys.setlocale(locale = &amp;quot;C&amp;quot;)
bioc_d &amp;lt;- vapply(setdiff(all_unique_deps, packages_db$Package), function(x){
  pkg &amp;lt;- pkgDownloadStats(x)
  tail(pkg$Nb_of_downloads, 1)
  }, numeric(1L))
bioc_pkg &amp;lt;- data.frame(package = names(bioc_d), downloads = bioc_d)
downloads &amp;lt;- rbind(bioc_pkg, cran_pkg)
rownames(downloads) &amp;lt;- downloads$package
dwn &amp;lt;- downloads[all_unique_deps, ]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The logs are provided by the global mirror of CRAN (sponsored by Rstudio).&lt;br /&gt;
The Bioconductor infrastructure which provides total number of downloads and number of downloads from distinct IPs &lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Analysis&lt;/h2&gt;
&lt;p&gt;We collected the data that might be relevant.
Now, we can start looking all the data gathered:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;repo &amp;lt;- vector(&amp;quot;character&amp;quot;, length(all_unique_deps))
ap_deps &amp;lt;- ap[all_unique_deps, ]
repo[startsWith(ap_deps[, &amp;quot;Repository&amp;quot;], &amp;quot;https://bioc&amp;quot;)] &amp;lt;- &amp;quot;Bioconductor&amp;quot;
repo[!startsWith(ap_deps[, &amp;quot;Repository&amp;quot;], &amp;quot;https://bioc&amp;quot;)] &amp;lt;- &amp;quot;CRAN&amp;quot;
deps &amp;lt;- data.frame(package = all_unique_deps,
                   direct_dep_XML = all_unique_deps %in% all_deps$XML,
                   direct_dep_RCurl = all_unique_deps %in% all_deps$RCurl,
                   first_deps_n = lengths(first_deps),
                   deps_all_n = lengths(deps_all),
                   first_rdeps_n = lengths(first_rdeps),
                   first_deps_strong_n = lengths(first_deps_strong), 
                   deps_strong_n = lengths(deps_strong),
                   direct_strong = strong, 
                   releases = releases,
                   strong = strong, 
                   first_release = first_release,
                   last_release = last_update,
                   maintainer = maintainers,
                   downloads = dwn$downloads,
                   repository = repo) |&amp;gt; 
  mutate(type = case_when(direct_dep_XML &amp;amp; direct_dep_RCurl ~ &amp;quot;both&amp;quot;,
                          direct_dep_XML ~ &amp;quot;XML&amp;quot;,
                          direct_dep_RCurl ~ &amp;quot;RCurl&amp;quot;))
rownames(deps) &amp;lt;- NULL
head(deps)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;8%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;9%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;4%&#34; /&gt;
&lt;col width=&#34;3%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;4%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;2%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;package&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_XML&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_RCurl&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_deps_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps_all_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_rdeps_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_deps_strong_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps_strong_n&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_strong&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;releases&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;strong&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;first_release&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;last_release&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;maintainer&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;downloads&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;repository&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;AnnotationForge&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;26&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;47&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2012-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;8113&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;AnnotationHubData&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;26&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;136&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2015-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6619&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;autonomics&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;61&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2499&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;34&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;104&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2021-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;91&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BaseSpaceR&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2013-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;218&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BayesSpace&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;34&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2459&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;24&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;161&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2020-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;221&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BgeeDB&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;19&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2457&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;14&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;71&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2016-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;238&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I added some numbers and logical values that might help exploring this data.&lt;/p&gt;
&lt;p&gt;We will look at the &lt;a href=&#34;#distribution-dependencies&#34;&gt;packages dependencies between RCurl and XML&lt;/a&gt;, some plots to have a &lt;a href=&#34;#overview&#34;&gt;quick view&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;distribution-dependencies&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Distribution dependencies&lt;/h3&gt;
&lt;p&gt;Let’s see how many packages depend in each of them:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps |&amp;gt; 
  summarise(Packages = n(), deps = sum(first_deps_n),
            q25 = quantile(deps_all_n, probs = 0.25),
            mean_all = mean(deps_all_n),
            q75 = quantile(deps_all_n, probs = 0.75),
            .by = c(direct_dep_XML, direct_dep_RCurl)) |&amp;gt; 
  arrange(-Packages)&lt;/code&gt;&lt;/pre&gt;
&lt;table style=&#34;width:100%;&#34;&gt;
&lt;colgroup&gt;
&lt;col width=&#34;22%&#34; /&gt;
&lt;col width=&#34;25%&#34; /&gt;
&lt;col width=&#34;13%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;13%&#34; /&gt;
&lt;col width=&#34;10%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_XML&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_RCurl&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Packages&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;q25&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mean_all&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;q75&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;235&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3584&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2365.596&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2458.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;193&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3187&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2320.855&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2460.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;67&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1216&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2423.119&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2457.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There are ~40 more packages depending on XML than to RCurl and just 67 to both of them.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;overview&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We can plot some variables to get a quick overview of the packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggplot2&amp;quot;)
library(&amp;quot;ggrepel&amp;quot;)
deps_wo &amp;lt;- filter(deps, !package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;))
deps_wo |&amp;gt; 
  ggplot() +
  geom_point(aes(first_deps_n, downloads, shape = type)) +
  geom_text_repel(aes(first_deps_n, downloads, label = package),
                  data = filter(deps_wo, first_deps_n &amp;gt; 40 | downloads &amp;gt; 10^5)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Packages and downloads&amp;quot;, 
       x = &amp;quot;Direct dependencies&amp;quot;, y = &amp;quot;Downloads&amp;quot;, size = &amp;quot;Packages&amp;quot;)
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot1&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot1-1.png&#34; alt=&#34;Direct dependencies vs downloads. Many pakcages have up to 50 packages and most have below 1000 downloads in a month.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Direct dependencies vs downloads. Many pakcages have up to 50 packages and most have below 1000 downloads in a month.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There is an outlier on &lt;a href=&#34;#fig:plot1&#34;&gt;1&lt;/a&gt;, the mlr package has more than 10k downloads and close to 120 direct dependencies, but down to less than 15 strong dependencies !&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps_wo |&amp;gt; 
  ggplot() +
  geom_point(aes(first_deps_n, first_rdeps_n, shape = type)) +
  geom_text_repel(aes(first_deps_n, first_rdeps_n, label = package),
                  data = filter(deps_wo, first_deps_n &amp;gt; 60 | first_rdeps_n &amp;gt; 50)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Few dependencies but lots of dependents&amp;quot;,
    x = &amp;quot;Direct dependencies&amp;quot;, y = &amp;quot;Depend on them&amp;quot;, size = &amp;quot;Packages&amp;quot;)
## Warning: Transformation introduced infinite values in continuous y-axis
## Transformation introduced infinite values in continuous y-axis&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot2-1.png&#34; alt=&#34;Dependencies vs packages that depend on them. &#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: Dependencies vs packages that depend on them.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In general though, the packages that have more dependencies have less direct dependencies.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggplot2&amp;quot;)
library(&amp;quot;ggrepel&amp;quot;)
deps_wo &amp;lt;- filter(deps, !package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;))
deps_wo |&amp;gt; 
  ggplot() +
  geom_vline(xintercept = 20, linetype = 2) +
  geom_point(aes(first_deps_strong_n, downloads, shape = repository)) +
  geom_text_repel(aes(first_deps_strong_n, downloads, label = package),
                  data = filter(deps_wo, first_deps_strong_n &amp;gt; 20 | downloads &amp;gt; 10^5)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Packages and downloads&amp;quot;, 
       x = &amp;quot;Direct strong dependencies&amp;quot;, y = &amp;quot;Downloads&amp;quot;, shape = &amp;quot;Repository&amp;quot;)
## Warning: ggrepel: 20 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot3&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot3-1.png&#34; alt=&#34;Direct strong dependencies vs downloads. Many pakcages have more than 20 direct imports.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: Direct strong dependencies vs downloads. Many pakcages have more than 20 direct imports.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One observable effect is that many packages do not comply with current CRAN rules of having 20 strong dependencies (as &lt;a href=&#34;https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#index-_005fR_005fCHECK_005fEXCESSIVE_005fIMPORTS_005f&#34;&gt;described in R-internals&lt;/a&gt;).
This suggests that these CRAN packages are old or that this limit is not checked in packages updates.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data_maintainers &amp;lt;- deps_wo |&amp;gt; 
  filter(!is.na(maintainer)) |&amp;gt; 
  summarize(n = n(), downloads = sum(downloads), .by = maintainer)
data_maintainers |&amp;gt; 
  ggplot() +
  geom_point(aes(n, downloads)) +
  geom_text_repel(aes(n, downloads, label = maintainer),
                  data = filter(data_maintainers, n &amp;gt; 2 | downloads &amp;gt; 10^4)) +
  scale_y_log10(labels = scales::label_log()) +
  scale_x_continuous(breaks = 1:10, minor_breaks = NULL) +
  theme_minimal() +
  labs(title = &amp;quot;CRAN maintainers that depend on XML and RCurl&amp;quot;,
       x = &amp;quot;Packages&amp;quot;, y = &amp;quot;Downloads&amp;quot;)
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot-maintainers&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot-maintainers-1.png&#34; alt=&#34;Looking at maintainers and the number of downloads they have.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: Looking at maintainers and the number of downloads they have.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most maintainer have few packages, some highly used packages but some have many packages relatively highly used.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;finding-important-packages&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Finding important packages&lt;/h3&gt;
&lt;p&gt;We can use a PCA to find which packages are more important.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cols_pca &amp;lt;-  c(4:7, 15)
pca_all &amp;lt;- prcomp(deps_wo[, cols_pca], scale. = TRUE, center = TRUE)
summary(pca_all)
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.386 1.2478 0.9458 0.65380 0.44846
## Proportion of Variance 0.384 0.3114 0.1789 0.08549 0.04022
## Cumulative Proportion  0.384 0.6954 0.8743 0.95978 1.00000
pca_data &amp;lt;- cbind(pca_all$x, deps_wo)
ggplot(pca_data) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = repository, shape = repository)) +
  geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;PCA of the numeric variables&amp;quot;)
## Warning: ggrepel: 58 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-all&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-all-1.png&#34; alt=&#34;PCA of all packages.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: PCA of all packages.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We can see in the first PCA some packages that have many downloads and/or depend on many packages.
The second one are packages with many dependencies, as explained by &lt;code&gt;rotation&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_all$rotation[, 1:2]&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PC1&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PC2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_deps_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6521642&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.1528947&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;deps_all_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.3304698&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.0549046&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_rdeps_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1235972&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6948659&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_deps_strong_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6606765&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.0750116&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;downloads&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1170554&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6965223&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;But more important is that are packages that are named in &lt;a href=&#34;#fig:pca-all&#34;&gt;5&lt;/a&gt;, there is the RUnit package, markdown and rgeos that have high number of downloads and many package depend on them one way or another.&lt;/p&gt;
&lt;p&gt;However we can focus on packages that without RCurl or XML wouldn’t work:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_strong &amp;lt;- prcomp(deps_wo[deps_wo$strong, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_strong)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4198 1.3005 0.9373 0.49421 0.41258
## Proportion of Variance 0.4032 0.3382 0.1757 0.04885 0.03404
## Cumulative Proportion  0.4032 0.7414 0.9171 0.96596 1.00000
pca_data_strong &amp;lt;- cbind(pca_strong$x, deps_wo[deps_wo$strong, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = repository, shape = repository)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Important packages depending on XML and RCurl&amp;quot;, 
       subtitle = &amp;quot;PCA of numeric variables of strong dependencies&amp;quot;,
       col = &amp;quot;Repository&amp;quot;, shape = &amp;quot;Repository&amp;quot;)
## Warning: ggrepel: 42 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-strong&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-strong-1.png&#34; alt=&#34;PCA of packages with strong dependency to XML or RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: PCA of packages with strong dependency to XML or RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The main packages that depend on XML and RCurl are from Biocondcutor, followed by mlr and rlist.
rlist has as dependency XML and only uses 3 functions from it.
mlr uses 5 different functions from XML.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_weak &amp;lt;- prcomp(deps_wo[!deps_wo$strong, cols_pca], 
                   scale. = TRUE, center = TRUE)
summary(pca_weak)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4500 1.1578 0.9901 0.63980 0.40895
## Proportion of Variance 0.4205 0.2681 0.1960 0.08187 0.03345
## Cumulative Proportion  0.4205 0.6886 0.8847 0.96655 1.00000
pca_data_weak &amp;lt;- cbind(pca_weak$x, deps_wo[!deps_wo$strong, ])
ggplot(pca_data_weak) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
  geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_weak, abs(PC1)&amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;PCA of packages in CRAN&amp;quot;, col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-weak&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-weak-1.png&#34; alt=&#34;Packages with weak dependency to XML or RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 7: Packages with weak dependency to XML or RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;keep &amp;lt;- deps_wo$repository == &amp;quot;CRAN&amp;quot; &amp;amp; deps_wo$strong
pca_cran &amp;lt;- prcomp(deps_wo[keep, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_cran)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4174 1.3060 0.9244 0.51813 0.40278
## Proportion of Variance 0.4018 0.3412 0.1709 0.05369 0.03245
## Cumulative Proportion  0.4018 0.7430 0.9139 0.96755 1.00000
pca_data_strong &amp;lt;- cbind(pca_cran$x, deps_wo[keep, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Packages in CRAN&amp;quot;, 
       col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)
## Warning: ggrepel: 26 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-cran-1.png&#34; alt=&#34;PCA of packages on CRAN.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 8: PCA of packages on CRAN.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;keep &amp;lt;- deps_wo$repository == &amp;quot;Bioconductor&amp;quot;  &amp;amp; deps_wo$strong
pca_bioc &amp;lt;- prcomp(deps_wo[keep, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_bioc)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4913 1.3703 0.8495 0.33584 0.25281
## Proportion of Variance 0.4448 0.3755 0.1443 0.02256 0.01278
## Cumulative Proportion  0.4448 0.8203 0.9647 0.98722 1.00000
pca_data_strong &amp;lt;- cbind(pca_bioc$x, deps_wo[keep, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Packages in Bioconductor&amp;quot;, 
       subtitle = &amp;quot;PCA of numeric variables of strong dependencies&amp;quot;,
       col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-bioc&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-bioc-1.png&#34; alt=&#34;PCA of packages on Bioconductor.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 9: PCA of packages on Bioconductor.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GenomeInfoDb is the package that seems more important that only uses the &lt;code&gt;RCurl::getURL&lt;/code&gt; function.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;outro&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Outro&lt;/h2&gt;
&lt;p&gt;I wanted to explore a bit how these packages got into this position &lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps |&amp;gt; 
  filter(strong) |&amp;gt; 
  ggplot() +
  geom_vline(xintercept = as.Date(&amp;quot;2013-06-15&amp;quot;), linetype = 2) +
  geom_point(aes(first_release, downloads, col = type, shape = type, 
                 size = first_deps_strong_n)) +
  geom_label(aes(first_release, downloads, label = package),
             data = filter(deps, package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;)), show.legend = FALSE) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  annotate(&amp;quot;text&amp;quot;, x = as.Date(&amp;quot;2014-6-15&amp;quot;), y = 5*10^5, 
           label = &amp;quot;CRAN maintained&amp;quot;, hjust = 0) +
  labs(x = &amp;quot;Release date&amp;quot;, y = &amp;quot;Downloads&amp;quot;, 
       title = &amp;quot;More packages added after CRAN maintenance than before&amp;quot;,
       subtitle = &amp;quot;Release date and downloads&amp;quot;,
       col = &amp;quot;Depends on&amp;quot;, shape = &amp;quot;Depends on&amp;quot;, size = &amp;quot;Direct strong dependencies&amp;quot;) 
## Warning: Removed 34 rows containing missing values (`geom_point()`).&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:deps-time&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/deps-time-1.png&#34; alt=&#34;First release of packages in relation to the maintenance by CRAN of XML and RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 10: First release of packages in relation to the maintenance by CRAN of XML and RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Almost the CRAN team have been maintaining these packages longer than the previous maintainer(s?).&lt;/p&gt;
&lt;p&gt;Next, we look at the dependencies added after CRAN started maintaining them&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summarize(deps_wo,
          before = sum(first_release &amp;lt;= as.Date(&amp;quot;2013-06-15&amp;quot;), na.rm = TRUE), 
          later = sum(first_release &amp;gt; as.Date(&amp;quot;2013-06-15&amp;quot;), na.rm = TRUE),
          .by = type)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;type&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;before&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;later&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;14&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;XML&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;63&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;156&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More packages have been released after CRAN is maintaining it than before.
Maybe packages authors trusted the CRAN team for their dependencies or there was no other alternative for the functionality.
This might also be explained by the expansion of CRAN (and Bioconductor) with more packages being added each day.
However, this places further pressure in the CRAN team to maintain those packages. Removing this burden might free more time for them or to dedicate to CRAN.&lt;/p&gt;
&lt;p&gt;A replacement for XML could be &lt;a href=&#34;https://cran.r-project.org/package=xml2&#34;&gt;xml2&lt;/a&gt;, first released in 2015 (which uses the same system dependency libxml2).&lt;br /&gt;
A replacement for RCurl could be &lt;a href=&#34;https://cran.r-project.org/package=curl&#34;&gt;curl&lt;/a&gt;, first released at the end of 2014 (which uses the same system dependency libcurl).&lt;/p&gt;
&lt;p&gt;Until their release there were no other replacement for these packages (if there are other packages, please let me know).
It is not clear to me if those packages at their first release could replace XML and RCurl.&lt;/p&gt;
&lt;p&gt;This highlight the importance of correct replacement of packages in the community.
Recent examples are the efforts taken by the &lt;a href=&#34;https://r-spatial.org/&#34;&gt;spatial community&lt;/a&gt; led by Roger Bivand, Edzer Pebesma.
Where packages have been carefully designed and planned to replace older packages that are going to be retired soon.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;recomendations&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Recomendations&lt;/h1&gt;
&lt;p&gt;As a final recommendations I think:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Disentangle the XML and RCurl circular dependency.&lt;/li&gt;
&lt;li&gt;Evaluate if the xml2 and curl packages provides enough functionality to replace XML and RCurl respectively.
If not see what should be added to these packages or how to develop alternative packages to fill the gap if needed.&lt;br /&gt;
Maybe a helping documentation about the alternative from XML and RCurl could be written to ease the transition and evaluate if the functionality is covered by these packages.&lt;/li&gt;
&lt;li&gt;Contact package maintainers to replace the functionality they currently depend on XML and RCurl as seen in &lt;a href=&#34;#fig:plot-maintainers&#34;&gt;4&lt;/a&gt; and the maintainers of packages seen in figures &lt;a href=&#34;#fig:pca-all&#34;&gt;5&lt;/a&gt;, &lt;a href=&#34;#fig:pca-strong&#34;&gt;6&lt;/a&gt;, &lt;a href=&#34;#fig:pca-cran&#34;&gt;8&lt;/a&gt;, and &lt;a href=&#34;#fig:pca-bioc&#34;&gt;9&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Set deprecation warnings on the XML and RCurl packages.&lt;/li&gt;
&lt;li&gt;Archive XML and RCurl packages in CRAN.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This might take years of moving packages around but I am confident that once the word is out, package developers will avoid XML and RCurl and current maintainers that depend on them will replace them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;On 2024/01/22 the &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010359.html&#34;&gt;CRAN team asked for a maintainer of XML&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## - Session info ---------------------------------------------------------------
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
##  os       Ubuntu 22.04.3 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C
##  tz       Europe/Madrid
##  date     2024-01-22
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## - Packages -------------------------------------------------------------------
##  package       * version     date (UTC) lib source
##  Biobase         2.62.0      2023-10-24 [1] Bioconductor
##  BiocFileCache   2.10.1      2023-10-26 [1] Bioconductor
##  BiocGenerics    0.48.1      2023-11-01 [1] Bioconductor
##  BiocManager     1.30.22     2023-08-08 [1] CRAN (R 4.3.1)
##  BiocPkgTools  * 1.20.0      2023-10-24 [1] Bioconductor
##  biocViews       1.70.0      2023-10-24 [1] Bioconductor
##  bit             4.0.5       2022-11-15 [1] CRAN (R 4.3.1)
##  bit64           4.0.5       2020-08-30 [1] CRAN (R 4.3.1)
##  bitops          1.0-7       2021-04-24 [1] CRAN (R 4.3.1)
##  blob            1.2.4       2023-03-17 [1] CRAN (R 4.3.1)
##  blogdown        1.18        2023-06-19 [1] CRAN (R 4.3.1)
##  bookdown        0.37        2023-12-01 [1] CRAN (R 4.3.1)
##  bslib           0.6.1       2023-11-28 [1] CRAN (R 4.3.1)
##  cachem          1.0.8       2023-05-01 [1] CRAN (R 4.3.1)
##  cli             3.6.2       2023-12-11 [1] CRAN (R 4.3.1)
##  codetools       0.2-19      2023-02-01 [2] CRAN (R 4.3.1)
##  colorspace      2.1-0       2023-01-23 [1] CRAN (R 4.3.1)
##  cranlogs      * 2.1.1       2019-04-29 [1] CRAN (R 4.3.1)
##  crul            1.4.0       2023-05-17 [1] CRAN (R 4.3.1)
##  curl            5.2.0       2023-12-08 [1] CRAN (R 4.3.1)
##  DBI             1.2.1       2024-01-12 [1] CRAN (R 4.3.1)
##  dbplyr          2.4.0       2023-10-26 [1] CRAN (R 4.3.2)
##  digest          0.6.34      2024-01-11 [1] CRAN (R 4.3.1)
##  dplyr         * 1.1.4       2023-11-17 [1] CRAN (R 4.3.1)
##  DT              0.31        2023-12-09 [1] CRAN (R 4.3.1)
##  evaluate        0.23        2023-11-01 [1] CRAN (R 4.3.2)
##  fansi           1.0.6       2023-12-08 [1] CRAN (R 4.3.1)
##  farver          2.1.1       2022-07-06 [1] CRAN (R 4.3.1)
##  fastmap         1.1.1       2023-02-24 [1] CRAN (R 4.3.1)
##  fauxpas         0.5.2       2023-05-03 [1] CRAN (R 4.3.1)
##  filelock        1.0.3       2023-12-11 [1] CRAN (R 4.3.1)
##  generics        0.1.3       2022-07-05 [1] CRAN (R 4.3.1)
##  ggplot2       * 3.4.4       2023-10-12 [1] CRAN (R 4.3.1)
##  ggrepel       * 0.9.5       2024-01-10 [1] CRAN (R 4.3.1)
##  gh              1.4.0       2023-02-22 [1] CRAN (R 4.3.1)
##  glue            1.7.0       2024-01-09 [1] CRAN (R 4.3.1)
##  graph           1.80.0      2023-10-24 [1] Bioconductor
##  gtable          0.3.4       2023-08-21 [1] CRAN (R 4.3.1)
##  highr           0.10        2022-12-22 [1] CRAN (R 4.3.1)
##  hms             1.1.3       2023-03-21 [1] CRAN (R 4.3.1)
##  htmltools       0.5.7       2023-11-03 [1] CRAN (R 4.3.2)
##  htmlwidgets   * 1.6.4       2023-12-06 [1] CRAN (R 4.3.1)
##  httpcode        0.3.0       2020-04-10 [1] CRAN (R 4.3.1)
##  httr            1.4.7       2023-08-15 [1] CRAN (R 4.3.1)
##  igraph          1.6.0       2023-12-11 [1] CRAN (R 4.3.1)
##  jquerylib       0.1.4       2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite        1.8.8       2023-12-04 [1] CRAN (R 4.3.1)
##  knitr         * 1.45        2023-10-30 [1] CRAN (R 4.3.2)
##  labeling        0.4.3       2023-08-29 [1] CRAN (R 4.3.2)
##  lifecycle       1.0.4       2023-11-07 [1] CRAN (R 4.3.2)
##  magrittr        2.0.3       2022-03-30 [1] CRAN (R 4.3.1)
##  memoise         2.0.1       2021-11-26 [1] CRAN (R 4.3.1)
##  munsell         0.5.0       2018-06-12 [1] CRAN (R 4.3.1)
##  pillar          1.9.0       2023-03-22 [1] CRAN (R 4.3.1)
##  pkgconfig       2.0.3       2019-09-22 [1] CRAN (R 4.3.1)
##  purrr           1.0.2       2023-08-10 [1] CRAN (R 4.3.1)
##  R6              2.5.1       2021-08-19 [1] CRAN (R 4.3.1)
##  RBGL            1.78.0      2023-10-24 [1] Bioconductor
##  Rcpp            1.0.12      2024-01-09 [1] CRAN (R 4.3.1)
##  RCurl           1.98-1.14   2024-01-09 [1] CRAN (R 4.3.1)
##  readr           2.1.5       2024-01-10 [1] CRAN (R 4.3.1)
##  rlang           1.1.3       2024-01-10 [1] CRAN (R 4.3.1)
##  rmarkdown       2.25        2023-09-18 [1] CRAN (R 4.3.1)
##  rorcid          0.7.0       2021-01-20 [1] CRAN (R 4.3.1)
##  RSQLite         2.3.5       2024-01-21 [1] CRAN (R 4.3.1)
##  rstudioapi      0.15.0      2023-07-07 [1] CRAN (R 4.3.1)
##  RUnit           0.4.32      2018-05-18 [1] CRAN (R 4.3.1)
##  rvest           1.0.3       2022-08-19 [1] CRAN (R 4.3.1)
##  sass            0.4.8       2023-12-06 [1] CRAN (R 4.3.1)
##  scales          1.3.0       2023-11-28 [1] CRAN (R 4.3.1)
##  sessioninfo     1.2.2       2021-12-06 [1] CRAN (R 4.3.1)
##  stringi         1.8.3       2023-12-11 [1] CRAN (R 4.3.1)
##  stringr         1.5.1       2023-11-14 [1] CRAN (R 4.3.1)
##  tibble          3.2.1       2023-03-20 [1] CRAN (R 4.3.1)
##  tidyselect      1.2.0       2022-10-10 [1] CRAN (R 4.3.1)
##  tzdb            0.4.0       2023-05-12 [1] CRAN (R 4.3.1)
##  utf8            1.2.4       2023-10-22 [1] CRAN (R 4.3.2)
##  vctrs           0.6.5       2023-12-01 [1] CRAN (R 4.3.1)
##  whisker         0.4.1       2022-12-05 [1] CRAN (R 4.3.1)
##  withr           3.0.0       2024-01-16 [1] CRAN (R 4.3.1)
##  xfun            0.41        2023-11-01 [1] CRAN (R 4.3.2)
##  XML             3.99-0.16.1 2024-01-22 [1] CRAN (R 4.3.1)
##  xml2            1.3.6       2023-12-04 [1] CRAN (R 4.3.1)
##  yaml            2.3.8       2023-12-11 [1] CRAN (R 4.3.1)
## 
##  [1] /home/lluis/bin/R/4.3.1
##  [2] /opt/R/4.3.1/lib/R/library
## 
## ------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes footnotes-end-of-document&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;the &lt;code&gt;maintainer&lt;/code&gt; function only works for installed packages, and I don’t have all these packages installed.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;Both logs only count those of their repository and not from other mirrors or approaches (RSPM, bspm, r2u, ….).&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;I recently found this as opposite of introduction/intro.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Los paquetes van a CRAN</title>
      <link>https://llrs.dev/talk/los-paquetes-van-a-cran/</link>
      <pubDate>Fri, 25 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/talk/los-paquetes-van-a-cran/</guid>
      <description>


&lt;p&gt;The “XII Jornadas de R y I congreso de R Hispano” conference is the meeting point of useRs in Spain with many people around the country and with different background from mathematicians, engineers, forest engineering.
All the talks were in Spanish, although some of the contents were in English.
I submitted this abstract in Spanish:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Por lo general, compartir nuestro trabajo con la comunidad R significa enviar un paquete a
un archivo (CRAN, Bioconductor u otros). CRAN es el mayor repositorio de paquetes de R que viene aceptado por
defecto por R. Pero, ¿Qué hay que hacer para escribir un paquete, que CRAN lo acepte y se mantenga
en CRAN?&lt;br /&gt;
Que un paquete se mantenga en CRAN depende de la calidad del paquete. Esto se debe a que hay que
pasar un proceso de revisión. Si el buen paquete sigue las reglas y tiene una calidad de acuerdo con
sus criterios, se durará. Primero, hay un chequeo inicial automático; segundo, una revisión manual
más profunda del código. Luego, si las sugerencias se aplican o se responden correctamente, el
paquete se incluye en el archivo.&lt;br /&gt;
En cada paso se utilizan algunas reglas y criterios para decidir si el paquete avanza o no. Comprender
lo que dicen estas reglas, los problemas comunes y los comentarios de los revisores ayudarán a evitar
enviar un paquete para que sea rechazado. Reducir la fricción entre compartir nuestro trabajo,
proporcionar paquetes útiles a la comunidad y minimizar el tiempo y los esfuerzos de los revisores.&lt;br /&gt;
A partir de los datos históricos veremos el proceso habitual, el tiempo de espera hasta su inclusión,
el número de revisiones habituales antes de ser aceptados y el porcentaje de éxito. También haremos
un recorrido histórico de los paquetes de CRAN: tiempo de duración de una versión en CRAN, relación
entre versiones y dependencias y el número de paquetes nuevos habituales. Para ver qué
características tienen que cumplir nuestro paquete para ser incluido y que otros usuarios pueden usar
nuestro código con garantías de calidad.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was accepted as a flash presentation of just 5 minutes in a parallel session focused in programming and teaching R.
The room was full and people showed their interest before and after the talk, specifically how easy would it be to keep the package in CRAN or Bioconductor.
If you have other questions let me know!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring CRAN&#39;s files: part 2</title>
      <link>https://llrs.dev/post/2022/07/28/cran-files-2/</link>
      <pubDate>Thu, 28 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2022/07/28/cran-files-2/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/&#34;&gt;first post&lt;/a&gt; of the series we briefly explored packages available on CRAN.
Now I’ll focus on history of the packages and its size using the following files:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages &amp;lt;- tools::CRAN_package_db()
current &amp;lt;- tools:::CRAN_current_db()
archive &amp;lt;- tools:::CRAN_archive_db()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this part we will use two files: The &lt;code&gt;current&lt;/code&gt; and the &lt;code&gt;archive&lt;/code&gt;, let’s see why.&lt;/p&gt;
&lt;div id=&#34;current-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;current file&lt;/h3&gt;
&lt;p&gt;The current database has has the package size, dates of modification, which I assume is date added to CRAN and user name of who last modified it.
This is the same information returned by &lt;a href=&#34;https://search.r-project.org/R/refmans/base/html/file.info.html&#34;&gt;&lt;code&gt;file.info&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;current[1, 1:10]
##     size isdir mode               mtime               ctime               atime
## A3 42810 FALSE  664 2015-08-16 23:05:54 2022-09-03 12:02:27 2022-09-03 14:00:19
##     uid  gid  uname    grname
## A3 1001 1001 hornik cranadmin&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;archive-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;archive file&lt;/h3&gt;
&lt;p&gt;The archive database returns the same information, but as you might guess by the name it doesn’t provide information about current packages but for packages in the archive and no longer available by default.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;archive[[1]]
##                     size isdir mode               mtime               ctime
## A3/A3_0.9.1.tar.gz 45252 FALSE  664 2013-02-07 10:00:29 2022-08-22 18:14:53
## A3/A3_0.9.2.tar.gz 45907 FALSE  664 2013-03-26 19:58:40 2022-08-22 18:14:53
##                                  atime  uid  gid  uname    grname
## A3/A3_0.9.1.tar.gz 2022-08-22 17:39:50 1001 1001 hornik cranadmin
## A3/A3_0.9.2.tar.gz 2022-08-22 17:39:50 1010 1001 ligges cranadmin&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The date matches that available on the &lt;a href=&#34;https://cran.r-project.org/src/contrib/Archive/A3/&#34;&gt;web’s old sources&lt;/a&gt;, so we can be confident of it’s meaning.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;cran-history&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;CRAN history&lt;/h2&gt;
&lt;p&gt;As we have seen there are some files about the archives of CRAN.
These include information about date of modification (moving/editing) and user who did it and of course name and sometimes version of the package.
These archives are the great treasure of CRAN because they help to make reproducible long time ago run experiments or analysis.&lt;/p&gt;
&lt;p&gt;Note that I’m not totally sure that this archive contains the full record of packages, some initial packages might be missing.
I’m also aware of some packages removed by CRAN which do not longer appear on this records.&lt;/p&gt;
&lt;p&gt;Nevertheless, this should provide an accurate picture of packages available through time.
Also as there is no information when a package is archived (here, &lt;a href=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/&#34;&gt;there is on PACKAGES.in&lt;/a&gt;) so I might overestimate the packages available at any given moment.&lt;/p&gt;
&lt;p&gt;Remember the plot about &lt;a href=&#34;#accepted&#34;&gt;acceptance of packages on CRAN?&lt;/a&gt;
That plot only looked at current packages available, let’s check it with all the archive:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:accumulative-packages&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/accumulative-packages-1.png&#34; alt=&#34;*Packages on CRAN archive by their addition to it.* There are over 125000 archives on CRAN.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: &lt;em&gt;Packages on CRAN archive by their addition to it.&lt;/em&gt; There are over 125000 archives on CRAN.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;All these packages come from packages with few releases and packages with many releases.
If we look at which packages had the most releases:&lt;/p&gt;
&lt;template id=&#34;41fb6fac-ce02-4889-ac51-217e365f4058&#34;&gt;&lt;style&gt;
.tabwid table{
  border-spacing:0px !important;
  border-collapse:collapse;
  line-height:1;
  margin-left:auto;
  margin-right:auto;
  border-width: 0;
  display: table;
  margin-top: 1.275em;
  margin-bottom: 1.275em;
  border-color: transparent;
}
.tabwid_left table{
  margin-left:0;
}
.tabwid_right table{
  margin-right:0;
}
.tabwid td {
    padding: 0;
}
.tabwid a {
  text-decoration: none;
}
.tabwid thead {
    background-color: transparent;
}
.tabwid tfoot {
    background-color: transparent;
}
.tabwid table tr {
background-color: transparent;
}
.katex-display {
    margin: 0 0 !important;
}
&lt;/style&gt;&lt;div class=&#34;tabwid&#34;&gt;&lt;style&gt;.cl-e305f260{}.cl-e2fc13c6{font-family:&#39;DejaVu Sans&#39;;font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-e2fc2fdc{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-e2fc2fe6{margin:0;text-align:right;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-e2fc7a46{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a5a{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a64{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a6e{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a6f{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a82{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a8c{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a96{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a97{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aa0{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aa1{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aaa{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aab{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7ab4{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}&lt;/style&gt;&lt;table class=&#39;cl-e305f260&#39;&gt;
&lt;thead&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7aab&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;package&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7ab4&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Releases&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;spatstat&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;206&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Matrix&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;204&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a6f&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;mgcv&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a82&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;162&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;RcppArmadillo&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;150&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;rgdal&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;146&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;nlme&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;143&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;caret&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;139&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;spdep&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;139&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;lattice&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;137&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;plotrix&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;131&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a6f&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;sp&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a82&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;128&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;XML&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;126&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Rcmdr&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;123&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;lme4&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;122&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;gstat&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;121&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;arm&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;119&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;foreign&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;117&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;party&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;117&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;maptools&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;113&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7aa1&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;raster&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aaa&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;108&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/template&gt;
&lt;div class=&#34;flextable-shadow-host&#34; id=&#34;c207439a-5643-4e95-950e-721182ef54dd&#34;&gt;&lt;/div&gt;
&lt;script&gt;
var dest = document.getElementById(&#34;c207439a-5643-4e95-950e-721182ef54dd&#34;);
var template = document.getElementById(&#34;41fb6fac-ce02-4889-ac51-217e365f4058&#34;);
var caption = template.content.querySelector(&#34;caption&#34;);
if(caption) {
  caption.style.cssText = &#34;display:block;text-align:center;&#34;;
  var newcapt = document.createElement(&#34;p&#34;);
  newcapt.appendChild(caption)
  dest.parentNode.insertBefore(newcapt, dest.previousSibling);
}
var fantome = dest.attachShadow({mode: &#39;open&#39;});
var templateContent = template.content;
fantome.appendChild(templateContent);
&lt;/script&gt;

&lt;p&gt;Surprisingly there are packages with more than 200 versions on CRAN!&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-distribution&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-distribution-1.png&#34; alt=&#34;*Releases distirbution*. Packages and number of releases&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: &lt;em&gt;Releases distirbution&lt;/em&gt;. Packages and number of releases
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most packages have 1 release, usually packages have 3, but the mean is around 6.&lt;/p&gt;
&lt;p&gt;Given all this different versions of packages how big are all the packages on CRAN?&lt;/p&gt;
&lt;div id=&#34;cran-size&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;CRAN size&lt;/h3&gt;
&lt;p&gt;Have you ever wondered how big is CRAN? According to the memory size of the source packages all CRAN source packages are approximately 96.8 Gb.&lt;/p&gt;
&lt;p&gt;This doesn’t include binaries for multiple architectures and OS.
The package size might indicate whether the package has considerable amount of data.&lt;/p&gt;
&lt;p&gt;Looking back to the size of the packages along time we can see this pattern:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:packages-size&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/packages-size-1.png&#34; alt=&#34;*Package and their median size.* Archived packages have become bigger since 2014. Packages on CRAN have been getting bigger since 2017.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: &lt;em&gt;Package and their median size.&lt;/em&gt; Archived packages have become bigger since 2014. Packages on CRAN have been getting bigger since 2017.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Packages available on CRAN are smaller than those no longer on CRAN.
But versions of packages on CRAN that got archived are usually bigger than current versions.
Packages no longer on CRAN are usually bigger.
Median size of packages is increasing (quickly).&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-size&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-size-1.png&#34; alt=&#34;*Size of package with releases.* Package are usually small but seem to gain weight when updating.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: &lt;em&gt;Size of package with releases.&lt;/em&gt; Package are usually small but seem to gain weight when updating.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Typically packages increase their size with each new release up to when they reach 50 releases.
For higher releases this plot depends on very few packages and might not be representative.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-size2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-size2-1.png&#34; alt=&#34;*Size of package with releases by availability.* Packages no longer in CRAN are usually smaller than those in it. The continous black line is CRAN&#39;s current threshold, while the discontinous black line is current median size.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: &lt;em&gt;Size of package with releases by availability.&lt;/em&gt; Packages no longer in CRAN are usually smaller than those in it. The continous black line is CRAN’s current threshold, while the discontinous black line is current median size.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Here we can appreciate better how packages tend to be below the CRAN threshold.
There isn’t much of a difference between packages available on CRAN and those archived.&lt;/p&gt;
&lt;p&gt;If we look at the size of package of the first release over time we’ll see a representative view:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:size-time&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/size-time-1.png&#34; alt=&#34;*Size of the first release by time*. Package size increases with time with a peak around 2010 and increasing again since 2014 but still hasn&#39;t surprased the previous record.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: &lt;em&gt;Size of the first release by time&lt;/em&gt;. Package size increases with time with a peak around 2010 and increasing again since 2014 but still hasn’t surprased the previous record.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Package size tends to increase except for the brief period 2010-2014.
Currently it increases less than before that period but is close to its maximum.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Most packages are not updated too much, between 1 and 3 times.
But there are packages that are updated quite a lot, this might mean they are data packages and not software packages or that they have frequent minor and major updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most current packages have smaller size than those archived.
Packages no longer available usually had bigger size than those packages still on CRAN.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Surprisingly packages increase their size a lot till the 25 release.
But also with time except for a period in 2010 and 2014.
This decreasing period might be due to a change in CRAN policy.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;future-parts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Future parts&lt;/h2&gt;
&lt;p&gt;On future posts I’ll explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;patterns accepting packages and updates in packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the relation between dependencies, initial release and updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;who handled the packages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Exploring CRAN&#39;s files: part 1</title>
      <link>https://llrs.dev/post/2022/07/23/cran-files-1/</link>
      <pubDate>Sat, 23 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2022/07/23/cran-files-1/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;There are many great things in base R, one of them is the &lt;a href=&#34;https://search.r-project.org/R/refmans/tools/html/00Index.html&#34;&gt;tools package&lt;/a&gt;.
This package has the functions that are used to build, check and create packages, documentation and manuals.&lt;/p&gt;
&lt;p&gt;As I wanted to know how CRAN works and its changes I was looking into the source code of tools.
I found some internal functions that access freely available files with information about CRAN packages.
These private functions are at the &lt;a href=&#34;https://svn.r-project.org/R/trunk/src/library/tools/R/CRANtools.R&#34;&gt;CRANtools.R file&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages &amp;lt;- tools::CRAN_package_db()
# current &amp;lt;- tools:::CRAN_current_db()
# archive &amp;lt;- tools:::CRAN_archive_db()
# issues &amp;lt;- tools::CRAN_check_issues()
# alias &amp;lt;- tools:::CRAN_aliases_db()
# rdxrefs &amp;lt;- tools:::CRAN_rdxrefs_db()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As I was not sure of the information on these files I asked on &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-devel/2022-May/081770.html&#34;&gt;R-devel&lt;/a&gt; but I did not receive an answer.
They seem to be quite obscure and as private functions might be removed without notice and shouldn’t be used in any dependency.
However, as the files contain information about CRAN they might provide interesting clues about the history of CRAN and how it is operated.&lt;/p&gt;
&lt;p&gt;On this post I will focus on the first file.
I’ll explore a couple of fields and in future posts I will use the other files to explore more about CRAN history.&lt;/p&gt;
&lt;div id=&#34;packages-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;packages file&lt;/h3&gt;
&lt;p&gt;First of all a very brief exploration of what is in this file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##    Package Version Priority                        Depends
## 1       A3   1.0.0     &amp;lt;NA&amp;gt; R (&amp;gt;= 2.15.0), xtable, pbapply
## 2 AATtools   0.0.1     &amp;lt;NA&amp;gt;                   R (&amp;gt;= 3.6.0)
## 3   ABACUS   1.0.0     &amp;lt;NA&amp;gt;                   R (&amp;gt;= 3.1.0)
##                                 Imports LinkingTo
## 1                                  &amp;lt;NA&amp;gt;      &amp;lt;NA&amp;gt;
## 2  magrittr, dplyr, doParallel, foreach      &amp;lt;NA&amp;gt;
## 3 ggplot2 (&amp;gt;= 3.1.0), shiny (&amp;gt;= 1.3.1),      &amp;lt;NA&amp;gt;
##                               Suggests Enhances    License License_is_FOSS
## 1                  randomForest, e1071     &amp;lt;NA&amp;gt; GPL (&amp;gt;= 2)            &amp;lt;NA&amp;gt;
## 2                                 &amp;lt;NA&amp;gt;     &amp;lt;NA&amp;gt;      GPL-3            &amp;lt;NA&amp;gt;
## 3 rmarkdown (&amp;gt;= 1.13), knitr (&amp;gt;= 1.22)     &amp;lt;NA&amp;gt;      GPL-3            &amp;lt;NA&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Packages has similar information as &lt;code&gt;available.packages()&lt;/code&gt; but with many more columns with published date, reverse dependencies, X-CRAN-Comment, who packaged it…
Also note that all this packages are not filtered to match R version, OS_type, subarch and there are almost duplicates (I learned about this filtering while reading the great documentation of &lt;a href=&#34;https://search.r-project.org/R/refmans/utils/html/available.packages.html&#34;&gt;&lt;code&gt;available.packages()&lt;/code&gt;&lt;/a&gt; and also finding some mentions online).&lt;/p&gt;
&lt;p&gt;As we have data from several years I’ll sometimes show the release dates of different R versions to provide some context.
Without further delay let’s explore the data!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;accepted&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Published packages&lt;/h2&gt;
&lt;p&gt;CRAN started some time ago (in 1997) but it hasn’t remained frozen.
The package archive (the A in CRAN) has been updating since then.
For instance the current packages do not include packages that were removed, archived or those replaced by updates.&lt;/p&gt;
&lt;p&gt;First packages are submitted to CRAN and once accepted they are published.
As accepted and published usually are almost instantaneous I might use them as synonyms.
Looking at the current available packages and their publication date, we can see the following:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:daily-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/daily-cran-1.png&#34; alt=&#34;ggplot2 plot of date vs packages accepted on a given day. Until2020 less than 10 packages were accepted daily. Lately more than 30 are added to CRAN. The plot also displays the R release versions from 2.12 in 2010 to 4.2.0 in 2022.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: &lt;em&gt;Packages accepted on CRAN by the publication date.&lt;/em&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The oldest package added was in 2010.
This means a package without issues, dependencies changes, bugs detected by the automatic checks since 12 years!&lt;/p&gt;
&lt;p&gt;The daily rate of acceptance has increased from less than 10 a day till 2020 to more than 30 this year 2022.
If we summarize that information for month we see the same, but the little bump in 2020 disappears but we see other patterns:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:monthly-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/monthly-cran-1.png&#34; alt=&#34;ggplot figure with the monthly published packages. till 2015 it raises very slowly, then in is around 50 monthly packages and there are some wobbles. In 2022 it raised to over 800 packages.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: &lt;em&gt;Monthly packages published to CRAN&lt;/em&gt;. Some monthly variance is observed.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Instead of just one bump we see some waves with less packages on CRAN accepted late in the year and an increase of packages the first months of the year.&lt;/p&gt;
&lt;p&gt;If we look at the accumulated packages on CRAN we see an exponential growth:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-cumsum&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-cumsum-1.png&#34; alt=&#34;Plot with the accumulative number of packages in CRAN. Raising from a few 10 to currently more than 18000.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: &lt;em&gt;Acumulation of packages&lt;/em&gt;. Most of the packages have been published in the last 2 years.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In fact, most packages currently on CRAN where added since March 2021 than all the previous years.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-perc&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-perc-1.png&#34; alt=&#34;Line with percentages of packages in CRAN by date. Close to 50% of current packages were published between 2010 and 2021.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: &lt;em&gt;Percentage of current packages on CRAN according to their date of publication&lt;/em&gt;. Most of them were published/updated on the last year and a half.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is a good time to remind that the date being used is the date of publication of this version of the packages.
Many had previous versions on CRAN:&lt;/p&gt;
&lt;template id=&#34;9668142b-64d5-4c3d-842e-fbcef8304c16&#34;&gt;&lt;style&gt;
.tabwid table{
  border-spacing:0px !important;
  border-collapse:collapse;
  line-height:1;
  margin-left:auto;
  margin-right:auto;
  border-width: 0;
  display: table;
  margin-top: 1.275em;
  margin-bottom: 1.275em;
  border-color: transparent;
}
.tabwid_left table{
  margin-left:0;
}
.tabwid_right table{
  margin-right:0;
}
.tabwid td {
    padding: 0;
}
.tabwid a {
  text-decoration: none;
}
.tabwid thead {
    background-color: transparent;
}
.tabwid tfoot {
    background-color: transparent;
}
.tabwid table tr {
background-color: transparent;
}
&lt;/style&gt;&lt;div class=&#34;tabwid&#34;&gt;&lt;style&gt;.cl-3baefb4c{}.cl-3ba22c8c{font-family:&#39;DejaVu Sans&#39;;font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3ba253e2{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3ba253ec{margin:0;text-align:right;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3ba2b7e2{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b7f6{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b7f7{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b800{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b80a{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b814{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}&lt;/style&gt;&lt;table class=&#39;cl-3baefb4c&#39;&gt;
&lt;thead&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b80a&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;First release&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b814&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;Packages&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b7e2&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b7f6&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;14,294&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b7f7&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b800&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;4,113&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/template&gt;
&lt;div class=&#34;flextable-shadow-host&#34; id=&#34;1027b3f4-86a2-414b-90aa-a3bab733e0c0&#34;&gt;&lt;/div&gt;
&lt;script&gt;
var dest = document.getElementById(&#34;1027b3f4-86a2-414b-90aa-a3bab733e0c0&#34;);
var template = document.getElementById(&#34;9668142b-64d5-4c3d-842e-fbcef8304c16&#34;);
var caption = template.content.querySelector(&#34;caption&#34;);
if(caption) {
  caption.style.cssText = &#34;display:block;text-align:center;&#34;;
  var newcapt = document.createElement(&#34;p&#34;);
  newcapt.appendChild(caption)
  dest.parentNode.insertBefore(newcapt, dest.previousSibling);
}
var fantome = dest.attachShadow({mode: &#39;open&#39;});
var templateContent = template.content;
fantome.appendChild(templateContent);
&lt;/script&gt;

&lt;/div&gt;
&lt;div id=&#34;delays&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Processing time&lt;/h2&gt;
&lt;p&gt;Previously I found that &lt;a href=&#34;https://llrs.dev/post/2021/01/31/cran-review/&#34;&gt;CRAN submissions&lt;/a&gt; present some key differences between new packages and already published packages which impact how long do they need to wait to be published on CRAN.
With the existing data we can compare how fast is the process by comparing the published date with the build date.&lt;/p&gt;
&lt;p&gt;The build date is added to the tar.gz file automatically when the developer builds the package via &lt;code&gt;R CMD build&lt;/code&gt;. However, the published date is set by CRAN once the packages are accepted on CRAN.&lt;/p&gt;
&lt;p&gt;To visualize the differences I will also compare if there is some difference with new packages and those that were already on CRAN:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-delays&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-delays-1.png&#34; alt=&#34;Histogram of packages and the time between build and publication. They take less than 50 days usually.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: &lt;em&gt;Histogram of time difference between building and publishing a package.&lt;/em&gt; Color indicates if the package is new to CRAN or not. Most of the published packages take more or less the same time regardless of if it is the first time or not.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There doesn’t seem to be much difference between date of building and date of publication according to if it is the first release or not.
The precision is just a day and this is usually a fast process well below 50 days.
Few packages exceed spend so much after build before publication and they are too few to be noticeable at this scale.
Since 2016/05/02 there is a &lt;a href=&#34;https://github.com/r-devel/r-svn/blob/676c1183801648b68f8f6719701445b2f9a5e3fd/src/library/tools/R/QC.R#L7583&#34;&gt;check&lt;/a&gt; that raises an issue if the build is older than a month.&lt;/p&gt;
&lt;p&gt;Note that one might need to build multiple times the package before it is accepted.
Packages published for the first time on CRAN might have been submitted previously, but when they finally built and pass the checks and manual review they are handled as fast as packages already on CRAN.&lt;/p&gt;
&lt;p&gt;However, this time between build and acceptance might have changed with time:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-delays2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-delays2-1.png&#34; alt=&#34;Smoothed lines of published packages with different linetype and color depending on if it is the first time they are on CRAN or not. New packages currently take less than 4 days and old packages less than 2. This is down from 2018 to 2021, when new packages took above 4 days to be published on CRAN&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: &lt;em&gt;Processing time between building the package and being published by date.&lt;/em&gt; There is a high difference between new packages and old ones. New packages usually take more time while existing packages take less than a day currently.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We clearly see a difference in processing time for those packages already on CRAN and those that are not.
Keep in mind that for the few packages from before 2016 the estimation might not be accurate.
At the same time this is consistent with the manual review process (For more information see &lt;a href=&#34;https://llrs.dev/post/2021/01/31/cran-review/&#34;&gt;my previous post&lt;/a&gt; about the review process of CRAN or my &lt;a href=&#34;https://llrs.dev/talk/user-2021/&#34;&gt;talk at the useR2021&lt;/a&gt;).
It also means that there is a huge variation of time about how packages are handled.
However this seems to be reducing: while in 2010 it took around 2 weeks, nowadays it takes less than a week and getting closer to a 1 day of median time between a package being built and appearing on CRAN that takes for existing packages.&lt;/p&gt;
&lt;p&gt;This difference might be explainable due to experience: authors and maintainers whose package(s) are already in CRAN know better how to submit a new version without problems the checks.&lt;/p&gt;
&lt;p&gt;It could also be that new packages need more time from the CRAN team.
In 2020 we see it took longer than in previous years for packages to be added on CRAN.
Maybe the increase in the processing time in 2020 was due the huge volume of submissions CRAN received or more checks on the developer side before submitting it to CRAN.&lt;/p&gt;
&lt;p&gt;Both explanations are not mutually exclusive.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
More packages published the same day mean more processing time? It doesn’t look like it.
&lt;/summary&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-reasons&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-reasons-1.png&#34; alt=&#34;ggplot graphic with the time of processing time and the number of packages accepted the same day. New packages have less delay than already published packages, but the more packages are accepted, the less delay there is.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 7: &lt;em&gt;Packages accepted the same day and processing time.&lt;/em&gt;New packages are accepted sooner than packages on CRAN respect to the builddate.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Surprisingly, we see a lot of variation on the delay of packages already accepted on CRAN.
In addition, the more new packages accepted the same day, the less delay there is.
I think this just means that when reviewers work on the submission queue several packages might be approved.&lt;/p&gt;
&lt;p&gt;This might also mean packages have already been built several times before finally being accepted and now the errors, warnings and notes have been solved.
Last, this could indicate that developers with their package already on CRAN wait a bit between building and submitting the package as the developer might be taking some time to double check before submission (dependencies, on several machines, other?) or a time zone difference (submitting in the noon of a region but at the reviewers night).&lt;/p&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;There are packages that for 12 years have been working without problems despite the several major changes in R (See figure &lt;a href=&#34;#fig:daily-cran&#34;&gt;1&lt;/a&gt;).
This speaks volumes of the packages’ quality, and the backward compatibility that the R core aims and CRAN checks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRAN accepts an incredible amount of packages daily and monthly.
The system and the team are doing an incredible work mostly on their free time (See figure &lt;a href=&#34;#fig:monthly-cran&#34;&gt;2&lt;/a&gt;).
Many thanks!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Accepted packages are handled very fast, in less than a week usually (See figure &lt;a href=&#34;#fig:cran-reasons&#34;&gt;7&lt;/a&gt;).
But it is not possible to distinguish alone time in the submission system and time on the developer computer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;future-parts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Future parts&lt;/h2&gt;
&lt;p&gt;We’ve explored a snapshot of current packages and a brief window of all the history of CRAN.
There is much more that can be done with all the other files.&lt;/p&gt;
&lt;p&gt;On future posts I’ll explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;patterns accepting packages and updates in packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;who handled the packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Size of packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the relation between dependencies, initial release and updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other suggestions?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Edit&lt;/strong&gt;: Many thanks to &lt;a href=&#34;https://masalmon.eu/&#34;&gt;Maëlle Salmon&lt;/a&gt; and &lt;a href=&#34;https://dirk.eddelbuettel.com/&#34;&gt;Dirk Eddelbuettel&lt;/a&gt; for their feedback on an initial version of this series of posts.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## - Session info -------------------------------------------------------------------------------------------------------
##  setting  value
##  version  R version 4.2.1 (2022-06-23)
##  os       Ubuntu 20.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C
##  tz       Europe/Madrid
##  date     2022-07-23
##  pandoc   2.18 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
## 
## - Packages -----------------------------------------------------------------------------------------------------------
##  package      * version    date (UTC) lib source
##  assertthat     0.2.1      2019-03-21 [2] RSPM (R 4.2.0)
##  base64enc      0.1-3      2015-07-28 [2] CRAN (R 4.0.0)
##  blogdown       1.10       2022-05-10 [2] RSPM (R 4.2.0)
##  bookdown       0.27       2022-06-14 [2] RSPM (R 4.2.0)
##  bslib          0.4.0      2022-07-16 [2] RSPM (R 4.2.0)
##  cachem         1.0.6      2021-08-19 [2] RSPM (R 4.2.0)
##  cli            3.3.0      2022-04-25 [2] RSPM (R 4.2.0)
##  codetools      0.2-18     2020-11-04 [2] RSPM (R 4.2.0)
##  colorspace     2.0-3      2022-02-21 [2] RSPM (R 4.2.0)
##  crayon         1.5.1      2022-03-26 [2] RSPM (R 4.2.0)
##  curl           4.3.2      2021-06-23 [2] RSPM (R 4.2.0)
##  data.table     1.14.2     2021-09-27 [2] RSPM (R 4.2.0)
##  DBI            1.1.3      2022-06-18 [2] RSPM (R 4.2.0)
##  digest         0.6.29     2021-12-01 [2] RSPM (R 4.2.0)
##  dplyr        * 1.0.9      2022-04-28 [2] RSPM (R 4.2.0)
##  ellipsis       0.3.2      2021-04-29 [2] RSPM (R 4.2.0)
##  evaluate       0.15       2022-02-18 [2] RSPM (R 4.2.0)
##  fansi          1.0.3      2022-03-24 [2] RSPM (R 4.2.0)
##  farver         2.1.1      2022-07-06 [2] RSPM (R 4.2.0)
##  fastmap        1.1.0      2021-01-25 [2] RSPM (R 4.2.0)
##  flextable    * 0.7.2      2022-06-12 [2] RSPM (R 4.2.0)
##  forcats      * 0.5.1      2021-01-27 [2] RSPM (R 4.2.0)
##  gdtools        0.2.4      2022-02-14 [2] RSPM (R 4.2.0)
##  generics       0.1.3      2022-07-05 [2] RSPM (R 4.2.0)
##  geomtextpath * 0.1.0      2022-01-24 [2] CRAN (R 4.2.1)
##  ggplot2      * 3.3.6.9000 2022-06-29 [2] Github (tidyverse/ggplot2@7571122)
##  ggrepel      * 0.9.1      2021-01-15 [2] RSPM (R 4.2.0)
##  glue           1.6.2      2022-02-24 [2] RSPM (R 4.2.0)
##  gtable         0.3.0      2019-03-25 [2] CRAN (R 4.0.0)
##  highr          0.9        2021-04-16 [2] RSPM (R 4.2.0)
##  htmltools      0.5.3      2022-07-18 [2] RSPM (R 4.2.0)
##  jquerylib      0.1.4      2021-04-26 [2] RSPM (R 4.2.0)
##  jsonlite       1.8.0      2022-02-22 [2] RSPM (R 4.2.0)
##  knitr          1.39       2022-04-26 [2] RSPM (R 4.2.0)
##  labeling       0.4.2      2020-10-20 [2] RSPM (R 4.2.0)
##  lattice        0.20-45    2021-09-22 [3] CRAN (R 4.2.0)
##  lifecycle      1.0.1      2021-09-24 [2] RSPM (R 4.2.0)
##  lubridate    * 1.8.0      2021-10-07 [2] RSPM (R 4.2.0)
##  magrittr       2.0.3      2022-03-30 [2] RSPM (R 4.2.0)
##  Matrix         1.4-1      2022-03-23 [2] RSPM (R 4.2.0)
##  mgcv           1.8-40     2022-03-29 [2] RSPM (R 4.2.0)
##  munsell        0.5.0      2018-06-12 [2] RSPM (R 4.2.0)
##  nlme           3.1-158    2022-06-15 [2] RSPM (R 4.2.0)
##  officer        0.4.3      2022-06-12 [2] RSPM (R 4.2.0)
##  pillar         1.8.0      2022-07-18 [2] RSPM (R 4.2.0)
##  pkgconfig      2.0.3      2019-09-22 [2] RSPM (R 4.2.0)
##  purrr          0.3.4      2020-04-17 [2] RSPM (R 4.2.0)
##  R6             2.5.1      2021-08-19 [2] RSPM (R 4.2.0)
##  Rcpp           1.0.9      2022-07-08 [2] RSPM (R 4.2.0)
##  rlang          1.0.4      2022-07-12 [2] RSPM (R 4.2.0)
##  rmarkdown      2.14       2022-04-25 [2] RSPM (R 4.2.0)
##  rstudioapi     0.13       2020-11-12 [2] RSPM (R 4.2.0)
##  rversions    * 2.1.1      2021-05-31 [2] RSPM (R 4.2.0)
##  sass           0.4.2      2022-07-16 [2] RSPM (R 4.2.0)
##  scales         1.2.0      2022-04-13 [2] RSPM (R 4.2.0)
##  sessioninfo    1.2.2      2021-12-06 [2] RSPM (R 4.2.0)
##  stringi        1.7.8      2022-07-11 [2] RSPM (R 4.2.0)
##  stringr        1.4.0      2019-02-10 [2] RSPM (R 4.2.0)
##  systemfonts    1.0.4      2022-02-11 [2] RSPM (R 4.2.0)
##  textshaping    0.3.6      2021-10-13 [2] RSPM (R 4.2.0)
##  tibble         3.1.7      2022-05-03 [2] RSPM (R 4.2.0)
##  tidyr        * 1.2.0      2022-02-01 [2] RSPM (R 4.2.0)
##  tidyselect     1.1.2      2022-02-21 [2] RSPM (R 4.2.0)
##  utf8           1.2.2      2021-07-24 [2] RSPM (R 4.2.0)
##  uuid           1.1-0      2022-04-19 [2] RSPM (R 4.2.0)
##  vctrs          0.4.1      2022-04-13 [2] RSPM (R 4.2.0)
##  withr          2.5.0      2022-03-03 [2] RSPM (R 4.2.0)
##  xfun           0.31       2022-05-10 [2] RSPM (R 4.2.0)
##  xml2           1.3.3      2021-11-30 [2] RSPM (R 4.2.0)
##  yaml           2.3.5      2022-02-21 [2] RSPM (R 4.2.0)
##  zip            2.2.0      2021-05-31 [2] RSPM (R 4.2.0)
## 
##  [1] /home/lluis/bin/R/4.2.1
##  [2] /usr/lib/R/site-library
##  [3] /usr/lib/R/library
## 
## ----------------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Reasons why packages are archived on CRAN</title>
      <link>https://llrs.dev/post/2021/12/07/reasons-cran-archivals/</link>
      <pubDate>Tue, 07 Dec 2021 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2021/12/07/reasons-cran-archivals/</guid>
      <description>


&lt;p&gt;On the Repositories working group of the R Consortium Rich FitzJohn posted &lt;a href=&#34;https://github.com/RConsortium/r-repositories-wg/issues/8#issuecomment-979486806&#34;&gt;a comment&lt;/a&gt; to &lt;a href=&#34;https://cran.r-project.org/src/contrib/PACKAGES.in&#34;&gt;a file&lt;/a&gt; that seems to be were the CRAN team stores and uses to check the package history.&lt;/p&gt;
&lt;p&gt;The structure is not defined anywhere I could find (I haven’t looked much to be honest).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Package: &amp;lt;package name&amp;gt;
X-CRAN-Comment: Archived on YYYY-MM-DD as &amp;lt;reason&amp;gt;.
X-CRAN-History: Archived on YYYY-MM-DD as &amp;lt;reason&amp;gt;.
  Unarchived on YYYY-MM-DD.
  .
  &amp;lt;Optional clarification of archival reason&amp;gt;
&amp;lt;Optional fields like License_restricts_use, Replaced_by, Maintainer: ORPHANED, OS_type: unix&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I think the X-CRAN-Comment is what appears on the website of an archived package, like on &lt;a href=&#34;https://cran.r-project.org/package=radix&#34;&gt;radix package&lt;/a&gt;. However, other comments on the website do not appear on that file.&lt;/p&gt;
&lt;p&gt;In addition, the file doesn’t have some records of archiving and unarchiving of some packages, but there are old records from 2013 or before to now. But we can use it to see understand what are the &lt;em&gt;reasons&lt;/em&gt; of archiving packages, which seems to be the main purpose of the file.&lt;/p&gt;
&lt;div id=&#34;the-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;The data&lt;/h1&gt;
&lt;p&gt;First step is read the record.
As it seems that it has some &lt;code&gt;key: value&lt;/code&gt; structure similar to DESCRIPTION file of packages it seems it is a DCF format: Debian Control File format which is easy to read with the &lt;code&gt;read.dcf&lt;/code&gt; function.&lt;/p&gt;
&lt;div id=&#34;exploring&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exploring&lt;/h2&gt;
&lt;p&gt;A brief exploration of the data:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
comment
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
history
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3612
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2345
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
434
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
70
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages have either comments or history but relatively few both.
I’m not sure when either of them is used, as I would expect that all that have history would have a comment.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Replaced_by
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6360
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
101
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages are simply replaced by some other package.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Maintainer
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6366
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
95
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Most of the packages that have a Maintainer field are orphaned/archived.
Does it mean that all the others are not orphaned?&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;extracting-reasons&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Extracting reasons&lt;/h2&gt;
&lt;p&gt;Now that it is in R data structure, we can extract the relevant information, dates, type of action and reasons for each archivation event.
I use &lt;code&gt;strcapture&lt;/code&gt; for this task with a regex to extract the action, the date and the explanation it migh have.&lt;/p&gt;
&lt;p&gt;I don’t know how the file is written probably it is a mix of automated tools and manual editing so there isn’t a simple way to collect all the information in a structured way.
Simply because the structure has been changing along the years as well as the details of what is stored has changed, or there are missing events.
However, the extracted information should be enough for our purposes.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Action
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Events
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
archived
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
7096
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
orphaned
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
341
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
removed
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
113
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
renamed
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
replaced
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
unarchived
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2973
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As expected the most common recorded event are archivations, but there are some orphaned packages and even some removed packages.
Also note the number of orphaned packages is greater than those with the Maintainer field, supporting my theory that the format has changed and that this shouldn’t be taken as an exhaustive and complete analysis of archivations.&lt;/p&gt;
&lt;p&gt;How are they along time?&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/index.en_files/figure-html/plots_df-1.png&#34; width=&#34;864&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Even if there are some events recorded from 2009 it seems that this file has been more used more recently (last commit related to &lt;a href=&#34;https://github.com/wch/r-source/blame/trunk/src/library/tools/R/QC.R#L7778&#34;&gt;this was on 2015&lt;/a&gt;).
I know that there are some old events not recorded on the file, because there are some packages currently present on CRAN that they had been archived but do not have an unarchived action, so conversely it could happen.
So, this doesn’t necessarily mean that there are currently more packages archived from CRAN. But it is a clear indication that now at least there is a more accurate record of archived packages on this file.&lt;/p&gt;
&lt;p&gt;Another source of records of archived packages might be &lt;a href=&#34;http://dirk.eddelbuettel.com/cranberries/cran/removed/&#34;&gt;cranberries&lt;/a&gt;. It would be nice to compare this file with the records on the database there.&lt;/p&gt;
&lt;p&gt;Now that most of the package events are collected and we have the reasons of the actions, we can explore and classify the reasons.
Using some simple regex I explore for key words or sentences.&lt;/p&gt;
&lt;p&gt;We can look at the most frequent error reasons for archiving packages, patterns I found with more than 100 cases:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/index.en_files/figure-html/reasons_top-1.png&#34; width=&#34;864&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The most frequent error is that errors are not corrected or checks, even when there are reminders.&lt;br /&gt;
Next are the packages archived because they depend on other packages already not on CRAN.&lt;br /&gt;
There are some packages that are replaced by others and some maintainers might not want to continue supporting the package when they receive a message from CRAN about fixing an error.&lt;/p&gt;
&lt;p&gt;Policy violation makes to the top 5 but with less than 500 events.
Dependencies problems are the sixth cause, followed by email errors (bouncing, incorrect email…) and then come very sporadic problems about license, not fixing on updates of R, authorship problems or requests from authors.&lt;/p&gt;
&lt;p&gt;Some of these errors happen at the same time for each event, but grouping these reasons together we get a similar table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
package_not_corrected
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
request_maintainer
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
dependencies
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
other
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
events
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4366
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1530
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
767
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
374
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
15
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
13
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Surprisingly the second most frequent group of archiving actions are due to many different reasons.
This is probably the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pareto_principle&#34;&gt;Pareto’s principle&lt;/a&gt; in action because they are around 15% of the archiving events but the causes are very diverse between them.&lt;/p&gt;
&lt;p&gt;However, if we look at the packages which were archived (not at the request of maintainers), most of them just happen once:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Events
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5304
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
594
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
115
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
31
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
8
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This suggests that once a package is archived maintainers do not make the effort to put it back on CRAN except on very few cases were there are multiple attempts.
To check we can see the current available packages and see how many of those are still present on CRAN:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
CRAN
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Packages
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Proportion
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3869
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
64%
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2183
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
36%
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages are currently on CRAN despite their past archivation but close to 64% are currently not on CRAN.&lt;/p&gt;
&lt;p&gt;Almost all that are on CRAN have now no &lt;code&gt;X-CRAN-Comment&lt;/code&gt;, except for a few:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Package
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
X-CRAN-Comment
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
geiger
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
&lt;p&gt;Orphaned and corrected on 2022-05-09.&lt;/p&gt;
Repeated notifications about USE_FC_LEN_T were ignored.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
alphahull
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Versions up to 2.3 have been removed for mirepresentation of authorship.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
udunits2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Orphaned on 2022-01-06 as installation problems were not corrected.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
bibtex
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Orphaned and corrected on 2020-09-19 as check problems were not corrected in time.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;CRAN team might have missed these few packages and didn’t move the comments to X-CRAN-history.&lt;/p&gt;
&lt;p&gt;There are some packages that are not archived that don’t have a CRAN-history happens too, but they usually have other fields changed.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;discussion&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Discussion&lt;/h1&gt;
&lt;p&gt;Most packages archived on CRAN are due to the maintainers not correcting errors found on the package by CRAN checks.
It is clear that the checks that CRAN help packages to have a high quality but it has high cost on the maintainers and specially on CRAN team.
Maintainers don’t seem to have enough time to fix the issues on time.
And the CRAN team sends personalized reminders to maintainers and sometimes patches to the packages.&lt;/p&gt;
&lt;p&gt;Although the desire to have packages corrected and with no issues is the common goal there are few options on light of these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Be more restrictive&lt;/p&gt;
&lt;p&gt;Prevent a package to be accepted if it breaks dependencies or archive packages when they fail checks.
This will make it harder to keep packages on CRAN but would lift some pressure on the CRAN team.
This would go against the current on other languages repositories, which often they don’t check the packages/modules and even have less restrictions on dependencies (so it might be an unpopular decision).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Be more permissive:&lt;/p&gt;
&lt;p&gt;One option would be to allow for more time for maintainers to fix issues. I haven’t find any report of how long does it take for a package since an error to a fix on CRAN but often it is quite long.
I have seen packages with a warning for months if not years and they weren’t archived from CRAN.&lt;/p&gt;
&lt;p&gt;Maybe if users get a warning on installing packages that a package or one of its dependencies is not clear on all CRAN checks (without error or warnings).
This might help to make users more conscious of their dependencies but this might add pressure to maintainers who already don’t have enough time to fix the problems of their packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide more help or tools to maintainers&lt;/p&gt;
&lt;p&gt;Another option is to provide a mechanism for maintainers to receive help or fix the package.
Currently CRAN requires that new packages that break dependencies to give enough notice in advance to other maintainers to fix their package.
On &lt;a href=&#34;https://stat.ethz.ch/mailman/listinfo/r-package-devel&#34;&gt;R-pkg-devel mailing list&lt;/a&gt; there are often requests for help on submitting and fixing some errors detected by CRAN checks which often result on other maintainers sharing their solutions for the same problem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There high percentage of packages that once archived do not come back to CRAN might be a good place to start helping maintainers and an opportunity for users to step in and help maintainers of packages they have been using.
There is need for something else? How would that work?&lt;/p&gt;
&lt;p&gt;At the same time it is admirable that after so many years there are few errors on the data.
However, the archival process might be a good process to automate, providing the reason on the webpage and add it to X-CRAN-Comment and moving the comments to X-CRAN-History once it is unarchived.
Knowing more about how these actions are performed by the CRAN team and how the community could help on the process will be beneficial to all.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This blog was updated on 2022/01/02 to improve the parsing of actions and dates on packages. Resulting on a change on the first plot to include unarchived which slightly modified the second plot of reasons why packages are archived. This overall only affected the numbers of the plots not the conclusions or discussion.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       Ubuntu 20.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2022-05-09
##  pandoc   2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package      * version date (UTC) lib source
##  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
##  blogdown       1.9     2022-03-28 [1] CRAN (R 4.2.0)
##  bookdown       0.26    2022-04-15 [1] CRAN (R 4.2.0)
##  bslib          0.3.1   2021-10-06 [1] CRAN (R 4.2.0)
##  cli            3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
##  colorspace     2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
##  ComplexUpset * 1.3.3   2021-12-11 [1] CRAN (R 4.2.0)
##  crayon         1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
##  DBI            1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
##  digest         0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
##  dplyr        * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
##  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
##  evaluate       0.15    2022-02-18 [1] CRAN (R 4.2.0)
##  fansi          1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
##  farver         2.1.0   2021-02-28 [1] CRAN (R 4.2.0)
##  fastmap        1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
##  generics       0.1.2   2022-01-31 [1] CRAN (R 4.2.0)
##  ggplot2      * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
##  glue           1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
##  gtable         0.3.0   2019-03-25 [1] CRAN (R 4.2.0)
##  highr          0.9     2021-04-16 [1] CRAN (R 4.2.0)
##  htmltools      0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
##  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
##  jsonlite       1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
##  knitr          1.39    2022-04-26 [1] CRAN (R 4.2.0)
##  labeling       0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
##  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
##  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
##  munsell        0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
##  patchwork      1.1.1   2020-12-17 [1] CRAN (R 4.2.0)
##  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
##  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
##  purrr          0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
##  R6             2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
##  rlang          1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
##  rmarkdown      2.14    2022-04-25 [1] CRAN (R 4.2.0)
##  rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.2.0)
##  sass           0.4.1   2022-03-23 [1] CRAN (R 4.2.0)
##  scales         1.2.0   2022-04-13 [1] CRAN (R 4.2.0)
##  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
##  stringi        1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
##  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
##  tibble         3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
##  tidyselect     1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
##  utf8           1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
##  vctrs          0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
##  withr          2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
##  xfun           0.30    2022-03-02 [1] CRAN (R 4.2.0)
##  yaml           2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
## 
##  [1] /home/lluis/bin/R/4.2.0/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Packages submission and reviews; how does it work?</title>
      <link>https://llrs.dev/talk/user-2021/</link>
      <pubDate>Sat, 16 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/talk/user-2021/</guid>
      <description>
&lt;script src=&#34;https://llrs.dev/talk/user-2021/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;The abstract I presented to be accepted was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We benefit from others’ work on R and also by shared packages and for our programming tasks. Occasionally we might generate some piece of software that we want to share with the community. Usually sharing our work with the R community means submitting a package to an archive (CRAN, Bioconductor or others). While each individual archive has some rules they share some common principles.&lt;/p&gt;
&lt;p&gt;If your package follows their rules about the submission process and has a good quality according to their rules it will be included. All submissions have some common sections: First, an initial screening; second, a more profound manual review of the code. Then, if the suggestions are applied or correctly replied then the package is included in the archive.&lt;/p&gt;
&lt;p&gt;On each step some rules and criteria are used to decide if the package moves forward or not. Understanding what these rules say, common problems and comments from reviewers will help avoiding submitting a package to get it rejected. Reducing the friction between sharing our work, providing useful packages to the community and minimizing reviewers’ time and efforts.&lt;/p&gt;
&lt;p&gt;Looking at the review process of three archives of R packages, CRAN, Bioconductor and rOpenSci, I’ll explain common rules, patterns, timelines and checks required to get the package included, as well as personal anecdotes with them. The talk is based on the post analyzing reviews available here: &lt;a href=&#34;https://llrs.dev/tags/reviews/&#34; class=&#34;uri&#34;&gt;https://llrs.dev/tags/reviews/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I received excellent feedback from the reviewers and I got a full talk (I asked for a poster because I was nervous to present to a big audience).&lt;/p&gt;
&lt;p&gt;This talk also received one of the Accessibility Awards.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
