<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>packages | B101nfo</title>
    <link>https://llrs.dev/tags/packages/</link>
      <atom:link href="https://llrs.dev/tags/packages/index.xml" rel="self" type="application/rss+xml" />
    <description>packages</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>If it is code you can copy and reuse (MIT) if it is text, please cite and reuse CC-BY 2024.</copyright><lastBuildDate>Thu, 29 Feb 2024 19:00:00 +0200</lastBuildDate>
    <image>
      <url>img/map[gravatar:%!s(bool=false) shape:circle]</url>
      <title>packages</title>
      <link>https://llrs.dev/tags/packages/</link>
    </image>
    
    <item>
      <title>useR madrid: rtweet</title>
      <link>https://llrs.dev/talk/user-madrid-rtweet/</link>
      <pubDate>Thu, 29 Feb 2024 19:00:00 +0200</pubDate>
      <guid>https://llrs.dev/talk/user-madrid-rtweet/</guid>
      <description>


&lt;p&gt;This presentation was in Spanish. I shared the history of my involvement with rtweet and what is happening with the package and Twitter API.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>CRAN maintained packages</title>
      <link>https://llrs.dev/post/2023/05/03/cran-maintained-packages/</link>
      <pubDate>Wed, 03 May 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/05/03/cran-maintained-packages/</guid>
      <description>


&lt;p&gt;The role of package managers in software is paramount for developers.
In R the CRAN team provides a platform to tests and host packages.
This means ensuring that R dependencies are up to date and software required by some packages are also available in CRAN.&lt;/p&gt;
&lt;p&gt;This helps testing ~20000 packages frequently (daily for most packages) in several architectures and R versions.
In addition, they test updates for compatibility with the dependencies and test and review new packages.&lt;/p&gt;
&lt;p&gt;Most of the work with packages is automated but often requires human intervention (&lt;a href=&#34;https://journal.r-project.org/news/RJ-2022-4-cran/#cran-package-submissions&#34;&gt;50% of the submisions&lt;/a&gt;).
Another consuming activity is keeping up packages abandoned by their original maintainers.&lt;/p&gt;
&lt;p&gt;While newer packages are &lt;a href=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/&#34;&gt;archived from CRAN often&lt;/a&gt;, some old packages were adopted by CRAN.
The &lt;a href=&#34;https://cran.r-project.org/CRAN_team.htm&#34;&gt;CRAN team&lt;/a&gt; is &lt;a href=&#34;https://mastodon.social/@henrikbengtsson/110186925898457474&#34;&gt;looking for help&lt;/a&gt; maintining those.&lt;/p&gt;
&lt;p&gt;In this post I’ll explore the packages maintained by CRAN.&lt;/p&gt;
&lt;div id=&#34;cran-in-packages&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;CRAN in packages&lt;/h1&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages_db &amp;lt;- as.data.frame(tools::CRAN_package_db())
cran_author &amp;lt;- grep(&amp;quot;CRAN Team&amp;quot;, x = packages_db$Author, ignore.case = TRUE)
cran_authorsR &amp;lt;- grep(&amp;quot;CRAN Team&amp;quot;, x = packages_db$`Authors@R`, ignore.case = TRUE)
CRAN_TEAM_mentioned &amp;lt;- union(cran_author, cran_authorsR)
unique(packages_db$Package[CRAN_TEAM_mentioned])
## [1] &amp;quot;fBasics&amp;quot;   &amp;quot;fMultivar&amp;quot; &amp;quot;geiger&amp;quot;    &amp;quot;plotrix&amp;quot;   &amp;quot;RCurl&amp;quot;     &amp;quot;RJSONIO&amp;quot;  
## [7] &amp;quot;udunits2&amp;quot;  &amp;quot;XML&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In some of these package the CRAN team appears as contributors because they provided help/code to fix bugs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=geiger&#34;&gt;geiger&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=fMultivar&#34;&gt;fMultivar&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=fBasics&#34;&gt;fBasics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=udunits2&#34;&gt;udunits2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In others they are the maintainers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=XML&#34;&gt;XML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=RCurl&#34;&gt;RCurl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=RJSONIO&#34;&gt;RJSONIO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From these three packages RJSONIO is the newest (first release in 2010) and requires less updates (lately 1 or 2 a year).
However, in 2022 RCurl and XML required 4 and 5 updates respectively.
I will focus on these packages as these are the ones they are looking for new maintainers.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rcurl-and-xml&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;RCurl and XML&lt;/h1&gt;
&lt;div id=&#34;circular-dependency&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Circular dependency&lt;/h2&gt;
&lt;p&gt;Both XML and RCurl depend on each other.&lt;/p&gt;
&lt;p&gt;We can see that the packages are direct dependencies of one of their direct dependencies!
How can be that?
If we go the the &lt;a href=&#34;https://cran.r-project.org/package=RCurl&#34;&gt;RCurl&lt;/a&gt; website we see in “Suggests: XML”, and in the &lt;a href=&#34;https://cran.r-project.org/package=XML&#34;&gt;XML&lt;/a&gt; website the RCurl is there too.
This circular dependency is allowed because they have each other in Suggests.&lt;/p&gt;
&lt;p&gt;A first step to reduce any possible problem would be to separate them.
This would make it easier understanding which package is worth prioritizing and possible missteps will have less impact.&lt;/p&gt;
&lt;p&gt;If we look at &lt;a href=&#34;https://github.com/search?q=repo%3Acran%2FXML%20RCurl&amp;amp;type=code&#34;&gt;XML source code for RCurl we find&lt;/a&gt; some code in &lt;code&gt;inst/&lt;/code&gt; folder.
If these two cases were removed the package could remove its dependency to RCurl.&lt;/p&gt;
&lt;p&gt;Similarly, if we look at &lt;a href=&#34;https://github.com/search?q=repo%3Acran%2FRCurl%20XML&amp;amp;type=code&#34;&gt;RCurl source code for XML we find&lt;/a&gt; some code in &lt;code&gt;inst/&lt;/code&gt; folder and in some examples.
If these three cases were removed the package could remove its dependency to XML.&lt;/p&gt;
&lt;p&gt;RCurl has been &lt;a href=&#34;https://diffify.com/R/RCurl/1.95-4.9/1.98-1.12&#34;&gt;more stable&lt;/a&gt; than XML, which have seen &lt;a href=&#34;https://diffify.com/R/XML/3.98-1.7/3.99-0.14&#34;&gt;new functions added and one removed&lt;/a&gt; since CRAN is maintaining it.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;relevant-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Relevant data&lt;/h2&gt;
&lt;p&gt;We will look at 4 sets of data for each pacakge: &lt;a href=&#34;#dependencies&#34;&gt;dependencies&lt;/a&gt;, &lt;a href=&#34;#releases&#34;&gt;releases&lt;/a&gt;, &lt;a href=&#34;#maintainers&#34;&gt;maintainers&lt;/a&gt; and &lt;a href=&#34;#downloads&#34;&gt;downloads&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;dependencies&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Dependencies&lt;/h3&gt;
&lt;p&gt;Both packages have some system dependencies which might make the maintenance harder.
In addition they have a large number of dependencies.
We can gather the dependencies in CRAN and Bioconductor software packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tools&amp;quot;)
# Look up only software dependencies in Bioconductor
options(repos = BiocManager::repositories()[c(&amp;quot;BioCsoft&amp;quot;, &amp;quot;CRAN&amp;quot;)])
ap &amp;lt;- available.packages()
all_deps &amp;lt;- package_dependencies(c(&amp;quot;RCurl&amp;quot;, &amp;quot;XML&amp;quot;), 
                                 reverse = TRUE, db = ap, which = &amp;quot;all&amp;quot;)
all_unique_deps &amp;lt;- unique(unlist(all_deps, FALSE, FALSE))
first_deps &amp;lt;- package_dependencies(all_unique_deps, db = ap, which = &amp;quot;all&amp;quot;)
first_deps_strong &amp;lt;- package_dependencies(all_unique_deps, db = ap, which = &amp;quot;strong&amp;quot;)
strong &amp;lt;- sapply(first_deps_strong, function(x){any(c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;) %in% x)})
deps_strong &amp;lt;- package_dependencies(all_unique_deps, recursive = TRUE, 
                                 db = ap, which = &amp;quot;strong&amp;quot;)
first_rdeps &amp;lt;- package_dependencies(all_unique_deps, 
                                   reverse = TRUE, db = ap, which = &amp;quot;all&amp;quot;)
deps_all &amp;lt;- package_dependencies(all_unique_deps, recursive = TRUE, 
                                 db = ap, which = &amp;quot;all&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;They have 495 direct dependencies (and 8 more in annotation packages in Bioconductor: recount3, ENCODExplorerData, UCSCRepeatMasker, gDNAinRNAseqData, qdap, qdapTools, metaboliteIDmapping and curatedBreastData).&lt;/p&gt;
&lt;p&gt;These two packages with their dependencies are used one way or another by around 20000 packages (about 90% of CRAN and Bioconductor)!
If these packages fail the impact on the community will be huge.&lt;/p&gt;
&lt;p&gt;To reduce the impact of the dependencies we should look up the direct dependencies.
But we also looked at the reverse dependencies to asses the impact of the package in the other packages.&lt;/p&gt;
&lt;p&gt;Know which are these, and who maintain them will help decide what is the best course of action.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;releases&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Releases&lt;/h3&gt;
&lt;p&gt;A first approach is looking into the number of releases and dates to asses if the package has an active maintainer or not:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;archive &amp;lt;- tools:::CRAN_archive_db()[all_unique_deps]
packages &amp;lt;- tools::CRAN_package_db()
library(&amp;quot;dplyr&amp;quot;)
library(&amp;quot;BiocPkgTools&amp;quot;)
fr &amp;lt;- vapply(archive, function(x) {
  if (is.null(x)) {
    return(NA)
  }
  as.Date(x$mtime[1])
}, FUN.VALUE = Sys.Date())
fr_bioc &amp;lt;- biocDownloadStats(&amp;quot;software&amp;quot;) |&amp;gt; 
  filter(Package %in% all_unique_deps) |&amp;gt; 
  firstInBioc() |&amp;gt; 
  pull(Date, name = Package)
first_release &amp;lt;- c(as.Date(fr[!is.na(fr)]), as.Date(fr_bioc))[all_unique_deps]
last_update &amp;lt;- packages$Published[match(all_unique_deps, packages$Package)]
releases &amp;lt;- vapply(archive, NROW, numeric(1L)) + 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We only have information about CRAN packages:&lt;br /&gt;
Bioconductor has two releases every year, and while the maintainers can release patched versions of packages between them that information is not stored (or easily retrieved, they are still available in the &lt;a href=&#34;https://code.bioconductor.org&#34;&gt;git server&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Even if Bioconductor maintainers didn’t modify the package the version number increases with each release.
But the version update in the git doesn’t propagate to users automatically unless their checks pass.
For all these reasons it doesn’t make sense to count releases of packages in Bioconductor.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;maintainers&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Maintainers&lt;/h3&gt;
&lt;p&gt;Now that we know which packages are more active, we can look up for the people behind it.
This way we can prioritize working with maintainers that are known to be active&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maintainers &amp;lt;- packages_db$Maintainer[match(all_unique_deps, packages_db$Package)]
maintainers &amp;lt;- trimws(gsub(&amp;quot;&amp;lt;.+&amp;gt;&amp;quot;, &amp;quot;&amp;quot;, maintainers))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once again, the Bioconductor repository doesn’t provide a file to gather this kind of data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;downloads&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Downloads&lt;/h3&gt;
&lt;p&gt;Another variable we can use are the downloads from users of said packages.
Probably, packages more downloaded are used more and a breaking change on them will have impact on more people than other packages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;cranlogs&amp;quot;)
acd &amp;lt;- cran_downloads(intersect(all_unique_deps, packages_db$Package), 
                          when = &amp;quot;last-month&amp;quot;)
cran_pkg &amp;lt;- summarise(acd, downloads = sum(count), .by = package)
loc &amp;lt;- Sys.setlocale(locale = &amp;quot;C&amp;quot;)
bioc_d &amp;lt;- vapply(setdiff(all_unique_deps, packages_db$Package), function(x){
  pkg &amp;lt;- pkgDownloadStats(x)
  tail(pkg$Nb_of_downloads, 1)
  }, numeric(1L))
bioc_pkg &amp;lt;- data.frame(package = names(bioc_d), downloads = bioc_d)
downloads &amp;lt;- rbind(bioc_pkg, cran_pkg)
rownames(downloads) &amp;lt;- downloads$package
dwn &amp;lt;- downloads[all_unique_deps, ]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The logs are provided by the global mirror of CRAN (sponsored by Rstudio).&lt;br /&gt;
The Bioconductor infrastructure which provides total number of downloads and number of downloads from distinct IPs &lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Analysis&lt;/h2&gt;
&lt;p&gt;We collected the data that might be relevant.
Now, we can start looking all the data gathered:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;repo &amp;lt;- vector(&amp;quot;character&amp;quot;, length(all_unique_deps))
ap_deps &amp;lt;- ap[all_unique_deps, ]
repo[startsWith(ap_deps[, &amp;quot;Repository&amp;quot;], &amp;quot;https://bioc&amp;quot;)] &amp;lt;- &amp;quot;Bioconductor&amp;quot;
repo[!startsWith(ap_deps[, &amp;quot;Repository&amp;quot;], &amp;quot;https://bioc&amp;quot;)] &amp;lt;- &amp;quot;CRAN&amp;quot;
deps &amp;lt;- data.frame(package = all_unique_deps,
                   direct_dep_XML = all_unique_deps %in% all_deps$XML,
                   direct_dep_RCurl = all_unique_deps %in% all_deps$RCurl,
                   first_deps_n = lengths(first_deps),
                   deps_all_n = lengths(deps_all),
                   first_rdeps_n = lengths(first_rdeps),
                   first_deps_strong_n = lengths(first_deps_strong), 
                   deps_strong_n = lengths(deps_strong),
                   direct_strong = strong, 
                   releases = releases,
                   strong = strong, 
                   first_release = first_release,
                   last_release = last_update,
                   maintainer = maintainers,
                   downloads = dwn$downloads,
                   repository = repo) |&amp;gt; 
  mutate(type = case_when(direct_dep_XML &amp;amp; direct_dep_RCurl ~ &amp;quot;both&amp;quot;,
                          direct_dep_XML ~ &amp;quot;XML&amp;quot;,
                          direct_dep_RCurl ~ &amp;quot;RCurl&amp;quot;))
rownames(deps) &amp;lt;- NULL
head(deps)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;8%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;9%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;4%&#34; /&gt;
&lt;col width=&#34;3%&#34; /&gt;
&lt;col width=&#34;6%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;4%&#34; /&gt;
&lt;col width=&#34;5%&#34; /&gt;
&lt;col width=&#34;2%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;package&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_XML&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_RCurl&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_deps_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps_all_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_rdeps_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;first_deps_strong_n&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps_strong_n&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_strong&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;releases&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;strong&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;first_release&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;last_release&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;maintainer&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;downloads&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;repository&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;AnnotationForge&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;26&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;47&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2012-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;8113&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;AnnotationHubData&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;26&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;136&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2015-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6619&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;autonomics&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;61&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2499&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;34&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;104&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2021-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;91&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BaseSpaceR&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2013-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;218&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BayesSpace&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;34&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2459&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;24&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;161&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2020-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;221&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;BgeeDB&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;19&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2457&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;14&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;71&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2016-02-01&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;238&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Bioconductor&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I added some numbers and logical values that might help exploring this data.&lt;/p&gt;
&lt;p&gt;We will look at the &lt;a href=&#34;#distribution-dependencies&#34;&gt;packages dependencies between RCurl and XML&lt;/a&gt;, some plots to have a &lt;a href=&#34;#overview&#34;&gt;quick view&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;distribution-dependencies&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Distribution dependencies&lt;/h3&gt;
&lt;p&gt;Let’s see how many packages depend in each of them:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps |&amp;gt; 
  summarise(Packages = n(), deps = sum(first_deps_n),
            q25 = quantile(deps_all_n, probs = 0.25),
            mean_all = mean(deps_all_n),
            q75 = quantile(deps_all_n, probs = 0.75),
            .by = c(direct_dep_XML, direct_dep_RCurl)) |&amp;gt; 
  arrange(-Packages)&lt;/code&gt;&lt;/pre&gt;
&lt;table style=&#34;width:100%;&#34;&gt;
&lt;colgroup&gt;
&lt;col width=&#34;22%&#34; /&gt;
&lt;col width=&#34;25%&#34; /&gt;
&lt;col width=&#34;13%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;7%&#34; /&gt;
&lt;col width=&#34;13%&#34; /&gt;
&lt;col width=&#34;10%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_XML&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;direct_dep_RCurl&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Packages&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;deps&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;q25&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mean_all&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;q75&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;235&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3584&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2365.596&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2458.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;193&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3187&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2320.855&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2460.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;67&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1216&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2456&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2423.119&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2457.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There are ~40 more packages depending on XML than to RCurl and just 67 to both of them.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;overview&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We can plot some variables to get a quick overview of the packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggplot2&amp;quot;)
library(&amp;quot;ggrepel&amp;quot;)
deps_wo &amp;lt;- filter(deps, !package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;))
deps_wo |&amp;gt; 
  ggplot() +
  geom_point(aes(first_deps_n, downloads, shape = type)) +
  geom_text_repel(aes(first_deps_n, downloads, label = package),
                  data = filter(deps_wo, first_deps_n &amp;gt; 40 | downloads &amp;gt; 10^5)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Packages and downloads&amp;quot;, 
       x = &amp;quot;Direct dependencies&amp;quot;, y = &amp;quot;Downloads&amp;quot;, size = &amp;quot;Packages&amp;quot;)
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot1&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot1-1.png&#34; alt=&#34;Direct dependencies vs downloads. Many pakcages have up to 50 packages and most have below 1000 downloads in a month.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Direct dependencies vs downloads. Many pakcages have up to 50 packages and most have below 1000 downloads in a month.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There is an outlier on &lt;a href=&#34;#fig:plot1&#34;&gt;1&lt;/a&gt;, the mlr package has more than 10k downloads and close to 120 direct dependencies, but down to less than 15 strong dependencies !&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps_wo |&amp;gt; 
  ggplot() +
  geom_point(aes(first_deps_n, first_rdeps_n, shape = type)) +
  geom_text_repel(aes(first_deps_n, first_rdeps_n, label = package),
                  data = filter(deps_wo, first_deps_n &amp;gt; 60 | first_rdeps_n &amp;gt; 50)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Few dependencies but lots of dependents&amp;quot;,
    x = &amp;quot;Direct dependencies&amp;quot;, y = &amp;quot;Depend on them&amp;quot;, size = &amp;quot;Packages&amp;quot;)
## Warning: Transformation introduced infinite values in continuous y-axis
## Transformation introduced infinite values in continuous y-axis&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot2-1.png&#34; alt=&#34;Dependencies vs packages that depend on them. &#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: Dependencies vs packages that depend on them.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In general though, the packages that have more dependencies have less direct dependencies.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;ggplot2&amp;quot;)
library(&amp;quot;ggrepel&amp;quot;)
deps_wo &amp;lt;- filter(deps, !package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;))
deps_wo |&amp;gt; 
  ggplot() +
  geom_vline(xintercept = 20, linetype = 2) +
  geom_point(aes(first_deps_strong_n, downloads, shape = repository)) +
  geom_text_repel(aes(first_deps_strong_n, downloads, label = package),
                  data = filter(deps_wo, first_deps_strong_n &amp;gt; 20 | downloads &amp;gt; 10^5)) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  labs(title = &amp;quot;Packages and downloads&amp;quot;, 
       x = &amp;quot;Direct strong dependencies&amp;quot;, y = &amp;quot;Downloads&amp;quot;, shape = &amp;quot;Repository&amp;quot;)
## Warning: ggrepel: 20 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot3&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot3-1.png&#34; alt=&#34;Direct strong dependencies vs downloads. Many pakcages have more than 20 direct imports.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: Direct strong dependencies vs downloads. Many pakcages have more than 20 direct imports.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One observable effect is that many packages do not comply with current CRAN rules of having 20 strong dependencies (as &lt;a href=&#34;https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#index-_005fR_005fCHECK_005fEXCESSIVE_005fIMPORTS_005f&#34;&gt;described in R-internals&lt;/a&gt;).
This suggests that these CRAN packages are old or that this limit is not checked in packages updates.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data_maintainers &amp;lt;- deps_wo |&amp;gt; 
  filter(!is.na(maintainer)) |&amp;gt; 
  summarize(n = n(), downloads = sum(downloads), .by = maintainer)
data_maintainers |&amp;gt; 
  ggplot() +
  geom_point(aes(n, downloads)) +
  geom_text_repel(aes(n, downloads, label = maintainer),
                  data = filter(data_maintainers, n &amp;gt; 2 | downloads &amp;gt; 10^4)) +
  scale_y_log10(labels = scales::label_log()) +
  scale_x_continuous(breaks = 1:10, minor_breaks = NULL) +
  theme_minimal() +
  labs(title = &amp;quot;CRAN maintainers that depend on XML and RCurl&amp;quot;,
       x = &amp;quot;Packages&amp;quot;, y = &amp;quot;Downloads&amp;quot;)
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:plot-maintainers&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/plot-maintainers-1.png&#34; alt=&#34;Looking at maintainers and the number of downloads they have.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: Looking at maintainers and the number of downloads they have.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most maintainer have few packages, some highly used packages but some have many packages relatively highly used.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;finding-important-packages&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Finding important packages&lt;/h3&gt;
&lt;p&gt;We can use a PCA to find which packages are more important.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cols_pca &amp;lt;-  c(4:7, 15)
pca_all &amp;lt;- prcomp(deps_wo[, cols_pca], scale. = TRUE, center = TRUE)
summary(pca_all)
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.386 1.2478 0.9458 0.65380 0.44846
## Proportion of Variance 0.384 0.3114 0.1789 0.08549 0.04022
## Cumulative Proportion  0.384 0.6954 0.8743 0.95978 1.00000
pca_data &amp;lt;- cbind(pca_all$x, deps_wo)
ggplot(pca_data) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = repository, shape = repository)) +
  geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;PCA of the numeric variables&amp;quot;)
## Warning: ggrepel: 58 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-all&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-all-1.png&#34; alt=&#34;PCA of all packages.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: PCA of all packages.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We can see in the first PCA some packages that have many downloads and/or depend on many packages.
The second one are packages with many dependencies, as explained by &lt;code&gt;rotation&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_all$rotation[, 1:2]&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PC1&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PC2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_deps_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6521642&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.1528947&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;deps_all_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.3304698&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.0549046&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_rdeps_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1235972&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6948659&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;first_deps_strong_n&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6606765&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.0750116&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;downloads&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1170554&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.6965223&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;But more important is that are packages that are named in &lt;a href=&#34;#fig:pca-all&#34;&gt;5&lt;/a&gt;, there is the RUnit package, markdown and rgeos that have high number of downloads and many package depend on them one way or another.&lt;/p&gt;
&lt;p&gt;However we can focus on packages that without RCurl or XML wouldn’t work:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_strong &amp;lt;- prcomp(deps_wo[deps_wo$strong, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_strong)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4198 1.3005 0.9373 0.49421 0.41258
## Proportion of Variance 0.4032 0.3382 0.1757 0.04885 0.03404
## Cumulative Proportion  0.4032 0.7414 0.9171 0.96596 1.00000
pca_data_strong &amp;lt;- cbind(pca_strong$x, deps_wo[deps_wo$strong, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = repository, shape = repository)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Important packages depending on XML and RCurl&amp;quot;, 
       subtitle = &amp;quot;PCA of numeric variables of strong dependencies&amp;quot;,
       col = &amp;quot;Repository&amp;quot;, shape = &amp;quot;Repository&amp;quot;)
## Warning: ggrepel: 42 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-strong&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-strong-1.png&#34; alt=&#34;PCA of packages with strong dependency to XML or RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: PCA of packages with strong dependency to XML or RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The main packages that depend on XML and RCurl are from Biocondcutor, followed by mlr and rlist.
rlist has as dependency XML and only uses 3 functions from it.
mlr uses 5 different functions from XML.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca_weak &amp;lt;- prcomp(deps_wo[!deps_wo$strong, cols_pca], 
                   scale. = TRUE, center = TRUE)
summary(pca_weak)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4500 1.1578 0.9901 0.63980 0.40895
## Proportion of Variance 0.4205 0.2681 0.1960 0.08187 0.03345
## Cumulative Proportion  0.4205 0.6886 0.8847 0.96655 1.00000
pca_data_weak &amp;lt;- cbind(pca_weak$x, deps_wo[!deps_wo$strong, ])
ggplot(pca_data_weak) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
  geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_weak, abs(PC1)&amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;PCA of packages in CRAN&amp;quot;, col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-weak&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-weak-1.png&#34; alt=&#34;Packages with weak dependency to XML or RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 7: Packages with weak dependency to XML or RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;keep &amp;lt;- deps_wo$repository == &amp;quot;CRAN&amp;quot; &amp;amp; deps_wo$strong
pca_cran &amp;lt;- prcomp(deps_wo[keep, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_cran)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4174 1.3060 0.9244 0.51813 0.40278
## Proportion of Variance 0.4018 0.3412 0.1709 0.05369 0.03245
## Cumulative Proportion  0.4018 0.7430 0.9139 0.96755 1.00000
pca_data_strong &amp;lt;- cbind(pca_cran$x, deps_wo[keep, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Packages in CRAN&amp;quot;, 
       col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)
## Warning: ggrepel: 26 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-cran-1.png&#34; alt=&#34;PCA of packages on CRAN.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 8: PCA of packages on CRAN.
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;keep &amp;lt;- deps_wo$repository == &amp;quot;Bioconductor&amp;quot;  &amp;amp; deps_wo$strong
pca_bioc &amp;lt;- prcomp(deps_wo[keep, cols_pca], 
                     scale. = TRUE, center = TRUE)
summary(pca_bioc)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.4913 1.3703 0.8495 0.33584 0.25281
## Proportion of Variance 0.4448 0.3755 0.1443 0.02256 0.01278
## Cumulative Proportion  0.4448 0.8203 0.9647 0.98722 1.00000
pca_data_strong &amp;lt;- cbind(pca_bioc$x, deps_wo[keep, ])
ggplot(pca_data_strong) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(aes(PC1, PC2, col = type, shape = type)) +
    geom_text_repel(aes(PC1, PC2, label = package), 
                  data = filter(pca_data_strong, abs(PC1) &amp;gt; 2 | abs(PC2) &amp;gt; 2)) +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  labs(title = &amp;quot;Packages in Bioconductor&amp;quot;, 
       subtitle = &amp;quot;PCA of numeric variables of strong dependencies&amp;quot;,
       col = &amp;quot;Type&amp;quot;, shape = &amp;quot;Type&amp;quot;)
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:pca-bioc&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/pca-bioc-1.png&#34; alt=&#34;PCA of packages on Bioconductor.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 9: PCA of packages on Bioconductor.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GenomeInfoDb is the package that seems more important that only uses the &lt;code&gt;RCurl::getURL&lt;/code&gt; function.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;outro&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Outro&lt;/h2&gt;
&lt;p&gt;I wanted to explore a bit how these packages got into this position &lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;deps |&amp;gt; 
  filter(strong) |&amp;gt; 
  ggplot() +
  geom_vline(xintercept = as.Date(&amp;quot;2013-06-15&amp;quot;), linetype = 2) +
  geom_point(aes(first_release, downloads, col = type, shape = type, 
                 size = first_deps_strong_n)) +
  geom_label(aes(first_release, downloads, label = package),
             data = filter(deps, package %in% c(&amp;quot;XML&amp;quot;, &amp;quot;RCurl&amp;quot;)), show.legend = FALSE) +
  theme_minimal() +
  scale_y_log10(labels = scales::label_log()) +
  annotate(&amp;quot;text&amp;quot;, x = as.Date(&amp;quot;2014-6-15&amp;quot;), y = 5*10^5, 
           label = &amp;quot;CRAN maintained&amp;quot;, hjust = 0) +
  labs(x = &amp;quot;Release date&amp;quot;, y = &amp;quot;Downloads&amp;quot;, 
       title = &amp;quot;More packages added after CRAN maintenance than before&amp;quot;,
       subtitle = &amp;quot;Release date and downloads&amp;quot;,
       col = &amp;quot;Depends on&amp;quot;, shape = &amp;quot;Depends on&amp;quot;, size = &amp;quot;Direct strong dependencies&amp;quot;) 
## Warning: Removed 34 rows containing missing values (`geom_point()`).&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:deps-time&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2023/05/03/cran-maintained-packages/index.en_files/figure-html/deps-time-1.png&#34; alt=&#34;First release of packages in relation to the maintenance by CRAN of XML and RCurl.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 10: First release of packages in relation to the maintenance by CRAN of XML and RCurl.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Almost the CRAN team have been maintaining these packages longer than the previous maintainer(s?).&lt;/p&gt;
&lt;p&gt;Next, we look at the dependencies added after CRAN started maintaining them&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summarize(deps_wo,
          before = sum(first_release &amp;lt;= as.Date(&amp;quot;2013-06-15&amp;quot;), na.rm = TRUE), 
          later = sum(first_release &amp;gt; as.Date(&amp;quot;2013-06-15&amp;quot;), na.rm = TRUE),
          .by = type)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;type&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;before&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;later&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;both&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;14&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;RCurl&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;XML&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;63&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;156&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More packages have been released after CRAN is maintaining it than before.
Maybe packages authors trusted the CRAN team for their dependencies or there was no other alternative for the functionality.
This might also be explained by the expansion of CRAN (and Bioconductor) with more packages being added each day.
However, this places further pressure in the CRAN team to maintain those packages. Removing this burden might free more time for them or to dedicate to CRAN.&lt;/p&gt;
&lt;p&gt;A replacement for XML could be &lt;a href=&#34;https://cran.r-project.org/package=xml2&#34;&gt;xml2&lt;/a&gt;, first released in 2015 (which uses the same system dependency libxml2).&lt;br /&gt;
A replacement for RCurl could be &lt;a href=&#34;https://cran.r-project.org/package=curl&#34;&gt;curl&lt;/a&gt;, first released at the end of 2014 (which uses the same system dependency libcurl).&lt;/p&gt;
&lt;p&gt;Until their release there were no other replacement for these packages (if there are other packages, please let me know).
It is not clear to me if those packages at their first release could replace XML and RCurl.&lt;/p&gt;
&lt;p&gt;This highlight the importance of correct replacement of packages in the community.
Recent examples are the efforts taken by the &lt;a href=&#34;https://r-spatial.org/&#34;&gt;spatial community&lt;/a&gt; led by Roger Bivand, Edzer Pebesma.
Where packages have been carefully designed and planned to replace older packages that are going to be retired soon.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;recomendations&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Recomendations&lt;/h1&gt;
&lt;p&gt;As a final recommendations I think:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Disentangle the XML and RCurl circular dependency.&lt;/li&gt;
&lt;li&gt;Evaluate if the xml2 and curl packages provides enough functionality to replace XML and RCurl respectively.
If not see what should be added to these packages or how to develop alternative packages to fill the gap if needed.&lt;br /&gt;
Maybe a helping documentation about the alternative from XML and RCurl could be written to ease the transition and evaluate if the functionality is covered by these packages.&lt;/li&gt;
&lt;li&gt;Contact package maintainers to replace the functionality they currently depend on XML and RCurl as seen in &lt;a href=&#34;#fig:plot-maintainers&#34;&gt;4&lt;/a&gt; and the maintainers of packages seen in figures &lt;a href=&#34;#fig:pca-all&#34;&gt;5&lt;/a&gt;, &lt;a href=&#34;#fig:pca-strong&#34;&gt;6&lt;/a&gt;, &lt;a href=&#34;#fig:pca-cran&#34;&gt;8&lt;/a&gt;, and &lt;a href=&#34;#fig:pca-bioc&#34;&gt;9&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Set deprecation warnings on the XML and RCurl packages.&lt;/li&gt;
&lt;li&gt;Archive XML and RCurl packages in CRAN.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This might take years of moving packages around but I am confident that once the word is out, package developers will avoid XML and RCurl and current maintainers that depend on them will replace them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;On 2024/01/22 the &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010359.html&#34;&gt;CRAN team asked for a maintainer of XML&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## - Session info ---------------------------------------------------------------
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
##  os       Ubuntu 22.04.3 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C
##  tz       Europe/Madrid
##  date     2024-01-22
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## - Packages -------------------------------------------------------------------
##  package       * version     date (UTC) lib source
##  Biobase         2.62.0      2023-10-24 [1] Bioconductor
##  BiocFileCache   2.10.1      2023-10-26 [1] Bioconductor
##  BiocGenerics    0.48.1      2023-11-01 [1] Bioconductor
##  BiocManager     1.30.22     2023-08-08 [1] CRAN (R 4.3.1)
##  BiocPkgTools  * 1.20.0      2023-10-24 [1] Bioconductor
##  biocViews       1.70.0      2023-10-24 [1] Bioconductor
##  bit             4.0.5       2022-11-15 [1] CRAN (R 4.3.1)
##  bit64           4.0.5       2020-08-30 [1] CRAN (R 4.3.1)
##  bitops          1.0-7       2021-04-24 [1] CRAN (R 4.3.1)
##  blob            1.2.4       2023-03-17 [1] CRAN (R 4.3.1)
##  blogdown        1.18        2023-06-19 [1] CRAN (R 4.3.1)
##  bookdown        0.37        2023-12-01 [1] CRAN (R 4.3.1)
##  bslib           0.6.1       2023-11-28 [1] CRAN (R 4.3.1)
##  cachem          1.0.8       2023-05-01 [1] CRAN (R 4.3.1)
##  cli             3.6.2       2023-12-11 [1] CRAN (R 4.3.1)
##  codetools       0.2-19      2023-02-01 [2] CRAN (R 4.3.1)
##  colorspace      2.1-0       2023-01-23 [1] CRAN (R 4.3.1)
##  cranlogs      * 2.1.1       2019-04-29 [1] CRAN (R 4.3.1)
##  crul            1.4.0       2023-05-17 [1] CRAN (R 4.3.1)
##  curl            5.2.0       2023-12-08 [1] CRAN (R 4.3.1)
##  DBI             1.2.1       2024-01-12 [1] CRAN (R 4.3.1)
##  dbplyr          2.4.0       2023-10-26 [1] CRAN (R 4.3.2)
##  digest          0.6.34      2024-01-11 [1] CRAN (R 4.3.1)
##  dplyr         * 1.1.4       2023-11-17 [1] CRAN (R 4.3.1)
##  DT              0.31        2023-12-09 [1] CRAN (R 4.3.1)
##  evaluate        0.23        2023-11-01 [1] CRAN (R 4.3.2)
##  fansi           1.0.6       2023-12-08 [1] CRAN (R 4.3.1)
##  farver          2.1.1       2022-07-06 [1] CRAN (R 4.3.1)
##  fastmap         1.1.1       2023-02-24 [1] CRAN (R 4.3.1)
##  fauxpas         0.5.2       2023-05-03 [1] CRAN (R 4.3.1)
##  filelock        1.0.3       2023-12-11 [1] CRAN (R 4.3.1)
##  generics        0.1.3       2022-07-05 [1] CRAN (R 4.3.1)
##  ggplot2       * 3.4.4       2023-10-12 [1] CRAN (R 4.3.1)
##  ggrepel       * 0.9.5       2024-01-10 [1] CRAN (R 4.3.1)
##  gh              1.4.0       2023-02-22 [1] CRAN (R 4.3.1)
##  glue            1.7.0       2024-01-09 [1] CRAN (R 4.3.1)
##  graph           1.80.0      2023-10-24 [1] Bioconductor
##  gtable          0.3.4       2023-08-21 [1] CRAN (R 4.3.1)
##  highr           0.10        2022-12-22 [1] CRAN (R 4.3.1)
##  hms             1.1.3       2023-03-21 [1] CRAN (R 4.3.1)
##  htmltools       0.5.7       2023-11-03 [1] CRAN (R 4.3.2)
##  htmlwidgets   * 1.6.4       2023-12-06 [1] CRAN (R 4.3.1)
##  httpcode        0.3.0       2020-04-10 [1] CRAN (R 4.3.1)
##  httr            1.4.7       2023-08-15 [1] CRAN (R 4.3.1)
##  igraph          1.6.0       2023-12-11 [1] CRAN (R 4.3.1)
##  jquerylib       0.1.4       2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite        1.8.8       2023-12-04 [1] CRAN (R 4.3.1)
##  knitr         * 1.45        2023-10-30 [1] CRAN (R 4.3.2)
##  labeling        0.4.3       2023-08-29 [1] CRAN (R 4.3.2)
##  lifecycle       1.0.4       2023-11-07 [1] CRAN (R 4.3.2)
##  magrittr        2.0.3       2022-03-30 [1] CRAN (R 4.3.1)
##  memoise         2.0.1       2021-11-26 [1] CRAN (R 4.3.1)
##  munsell         0.5.0       2018-06-12 [1] CRAN (R 4.3.1)
##  pillar          1.9.0       2023-03-22 [1] CRAN (R 4.3.1)
##  pkgconfig       2.0.3       2019-09-22 [1] CRAN (R 4.3.1)
##  purrr           1.0.2       2023-08-10 [1] CRAN (R 4.3.1)
##  R6              2.5.1       2021-08-19 [1] CRAN (R 4.3.1)
##  RBGL            1.78.0      2023-10-24 [1] Bioconductor
##  Rcpp            1.0.12      2024-01-09 [1] CRAN (R 4.3.1)
##  RCurl           1.98-1.14   2024-01-09 [1] CRAN (R 4.3.1)
##  readr           2.1.5       2024-01-10 [1] CRAN (R 4.3.1)
##  rlang           1.1.3       2024-01-10 [1] CRAN (R 4.3.1)
##  rmarkdown       2.25        2023-09-18 [1] CRAN (R 4.3.1)
##  rorcid          0.7.0       2021-01-20 [1] CRAN (R 4.3.1)
##  RSQLite         2.3.5       2024-01-21 [1] CRAN (R 4.3.1)
##  rstudioapi      0.15.0      2023-07-07 [1] CRAN (R 4.3.1)
##  RUnit           0.4.32      2018-05-18 [1] CRAN (R 4.3.1)
##  rvest           1.0.3       2022-08-19 [1] CRAN (R 4.3.1)
##  sass            0.4.8       2023-12-06 [1] CRAN (R 4.3.1)
##  scales          1.3.0       2023-11-28 [1] CRAN (R 4.3.1)
##  sessioninfo     1.2.2       2021-12-06 [1] CRAN (R 4.3.1)
##  stringi         1.8.3       2023-12-11 [1] CRAN (R 4.3.1)
##  stringr         1.5.1       2023-11-14 [1] CRAN (R 4.3.1)
##  tibble          3.2.1       2023-03-20 [1] CRAN (R 4.3.1)
##  tidyselect      1.2.0       2022-10-10 [1] CRAN (R 4.3.1)
##  tzdb            0.4.0       2023-05-12 [1] CRAN (R 4.3.1)
##  utf8            1.2.4       2023-10-22 [1] CRAN (R 4.3.2)
##  vctrs           0.6.5       2023-12-01 [1] CRAN (R 4.3.1)
##  whisker         0.4.1       2022-12-05 [1] CRAN (R 4.3.1)
##  withr           3.0.0       2024-01-16 [1] CRAN (R 4.3.1)
##  xfun            0.41        2023-11-01 [1] CRAN (R 4.3.2)
##  XML             3.99-0.16.1 2024-01-22 [1] CRAN (R 4.3.1)
##  xml2            1.3.6       2023-12-04 [1] CRAN (R 4.3.1)
##  yaml            2.3.8       2023-12-11 [1] CRAN (R 4.3.1)
## 
##  [1] /home/lluis/bin/R/4.3.1
##  [2] /opt/R/4.3.1/lib/R/library
## 
## ------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes footnotes-end-of-document&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;the &lt;code&gt;maintainer&lt;/code&gt; function only works for installed packages, and I don’t have all these packages installed.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;Both logs only count those of their repository and not from other mirrors or approaches (RSPM, bspm, r2u, ….).&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;I recently found this as opposite of introduction/intro.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>experDesign: follow up</title>
      <link>https://llrs.dev/post/2023/04/09/experdesign-follow-up/</link>
      <pubDate>Sun, 09 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/04/09/experdesign-follow-up/</guid>
      <description>


&lt;p&gt;I am happy to announce a new release of experDesign.
Install it from CRAN with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;experDesign&amp;quot;)
library(&amp;quot;experDesign&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This new release has focused in more tricky aspects when designing an experiment:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Checking the samples of your experiment.&lt;/li&gt;
&lt;li&gt;How to continue stratifying your conditions after some initial batch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These functions should be used before carrying out anything once you have your samples collected.
You can use these functions and make an informed decision of what might happen with your experiment.&lt;/p&gt;
&lt;div id=&#34;checking-your-samples&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Checking your samples&lt;/h1&gt;
&lt;p&gt;The new function &lt;code&gt;check_data()&lt;/code&gt; will warn you if it finds some known issues with your data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;experDesign&amp;quot;)
library(&amp;quot;MASS&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we take the survey dataset from the MASS package we can see that it has some issues:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(survey, package = &amp;quot;MASS&amp;quot;)
check_data(survey)
## Warning: Two categorical variables don&amp;#39;t have all combinations.
## Warning: Some values are missing.
## Warning: There is a combination of categories with no replicates; i.e. just one
## sample.
## [1] FALSE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While if we fabricate our own dataset we might realize we have a problem&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rdata &amp;lt;- expand.grid(sex = c(&amp;quot;M&amp;quot;, &amp;quot;F&amp;quot;), class = c(&amp;quot;lower&amp;quot;, &amp;quot;median&amp;quot;, &amp;quot;high&amp;quot;))
stopifnot(&amp;quot;Same samples/rows as combinations of classes&amp;quot; = nrow(rdata) == 2*3)
check_data(rdata)
## Warning: There is a combination of categories with no replicates; i.e. just one
## sample.
## [1] FALSE
# We create some new samples with the same conditions
rdata2 &amp;lt;- rbind(rdata, rdata)
check_data(rdata2)
## [1] TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One might decide to go ahead with what is available or use only some of those samples or wait to collect more samples for the experiment&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;follow-up&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Follow up&lt;/h1&gt;
&lt;p&gt;Imagine you have 100 samples that you distribute in 4 batches of 25 samples each.
Later, you collect 80 more samples to analyze.
You want these new samples to be analyzed together with those previous 100 samples.
Will it be possible? How should you distribute your new samples in groups of 25?&lt;/p&gt;
&lt;p&gt;Using the same dataset from &lt;code&gt;MASS&lt;/code&gt; imagine if we first collected 118 observations and later 119 more:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;survey1 &amp;lt;- survey[1:118, ]
survey2 &amp;lt;- survey[119:nrow(survey), ]
# Using low number of iterations to speed the process 
# you should even use higher number than the default
fu &amp;lt;- follow_up(survey1, survey2, size_subset = 50, iterations = 10)
## Warning: There are some problems with the data.
## Warning: There are some problems with the new samples and the batches.
## Warning: There are some problems with the new data.
## Warning: There are some problems with the old data.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Following the previous new function it reports if there are problems with the observations.
One can check each collection with &lt;code&gt;check_data&lt;/code&gt; to know more about the problems found.&lt;/p&gt;
&lt;p&gt;If you have already performed the experiment on your observations you can also check the distribution:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Create the first batch
variables &amp;lt;- c(&amp;quot;Sex&amp;quot;, &amp;quot;Smoke&amp;quot;, &amp;quot;Age&amp;quot;)
survey1 &amp;lt;- survey1[, variables]
index1 &amp;lt;- design(survey1, size_subset = 50, iterations = 10)
## Warning: There might be some problems with the data use check_data().
r_survey &amp;lt;- inspect(index1, survey1)
# Create the second batch with &amp;quot;new&amp;quot; students
survey2 &amp;lt;- survey2[, variables]
survey2$batch &amp;lt;- NA
# Prepare the follow up
all_classroom &amp;lt;- rbind(r_survey, survey2)
fu2 &amp;lt;- follow_up2(all_classroom, size_subset = 50, iterations = 10)
## Warning: There are some problems with the data.
## Warning: There are some problems with the new samples and the batches.
## Warning: There are some problems with the new data.
## Warning: There are some problems with the old data.
tail(fu2)
## [1] &amp;quot;NewSubset2&amp;quot; &amp;quot;NewSubset2&amp;quot; &amp;quot;NewSubset2&amp;quot; &amp;quot;NewSubset2&amp;quot; &amp;quot;NewSubset2&amp;quot;
## [6] &amp;quot;NewSubset3&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using this function will help to decide which new observations go to which new batches.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;closing-remarks&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Closing remarks&lt;/h1&gt;
&lt;p&gt;The famous quote from Fisher goes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“To consult the statistician after an experiment is finished is often merely to ask him to conduct a &lt;em&gt;post mortem&lt;/em&gt; examination. He can perhaps say what the experiment died of.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This emphasizes the importance of involving a statistician early on in the experimental design process.&lt;br /&gt;
Unfortunately, in some cases, it may be too late to involve a statistician in the experimental design process or the reality of unforeseen circumstances messed the design of your carefully planned experiment.&lt;/p&gt;
&lt;p&gt;My aim with this package is to provide practical tools for statisticians, bioinformaticians, and anyone who works with data.
These tools are designed to be easy to use and can be used to analyze data in a variety of contexts.
Let me know if it is helpful in your case.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.2 (2022-10-31)
##  os       Ubuntu 22.04.2 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language en_US
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2023-04-09
##  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version  date (UTC) lib source
##  blogdown      1.16     2022-12-13 [1] CRAN (R 4.2.2)
##  bookdown      0.33     2023-03-06 [1] CRAN (R 4.2.2)
##  bslib         0.4.2    2022-12-16 [1] CRAN (R 4.2.2)
##  cachem        1.0.7    2023-02-24 [1] CRAN (R 4.2.2)
##  cli           3.6.1    2023-03-23 [1] CRAN (R 4.2.2)
##  digest        0.6.31   2022-12-11 [1] CRAN (R 4.2.2)
##  evaluate      0.20     2023-01-17 [1] CRAN (R 4.2.2)
##  experDesign * 0.2.0    2023-04-05 [1] CRAN (R 4.2.2)
##  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.2.2)
##  htmltools     0.5.4    2022-12-07 [1] CRAN (R 4.2.2)
##  jquerylib     0.1.4    2021-04-26 [1] CRAN (R 4.2.2)
##  jsonlite      1.8.4    2022-12-06 [1] CRAN (R 4.2.2)
##  knitr         1.42     2023-01-25 [1] CRAN (R 4.2.2)
##  MASS        * 7.3-58.1 2022-08-03 [2] CRAN (R 4.2.2)
##  R6            2.5.1    2021-08-19 [1] CRAN (R 4.2.2)
##  rlang         1.1.0    2023-03-14 [1] CRAN (R 4.2.2)
##  rmarkdown     2.20     2023-01-19 [1] CRAN (R 4.2.2)
##  rstudioapi    0.14     2022-08-22 [1] CRAN (R 4.2.2)
##  sass          0.4.5    2023-01-24 [1] CRAN (R 4.2.2)
##  sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.2.2)
##  xfun          0.37     2023-01-31 [1] CRAN (R 4.2.2)
##  yaml          2.3.7    2023-01-23 [1] CRAN (R 4.2.2)
## 
##  [1] /home/lluis/bin/R/4.2.2
##  [2] /opt/R/4.2.2/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>rtweet future</title>
      <link>https://llrs.dev/post/2023/02/16/rtweet-future/</link>
      <pubDate>Thu, 16 Feb 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/02/16/rtweet-future/</guid>
      <description>


&lt;div id=&#34;background-how-i-became-the-maintainer-of-rtweet&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Background: how I became the maintainer of rtweet&lt;/h2&gt;
&lt;p&gt;I didn’t want to maintain rtweet.
It might sound strange coming from its maintainer, but I didn’t want the responsibility of writting software 12k people install it monthly.
My offer was always to help with it so that the users could benefit from improvements and bug fixes on the package.
I initially thought that having permissions to close issues, label them would help the community and the maintainer.&lt;/p&gt;
&lt;p&gt;I was not totally altruistic.
As &lt;a href=&#34;https://ropensci.org/blog/2022/10/17/maintain-or-co-maintain-an-ropensci-package/&#34;&gt;recently recommended by Maëlle and Steffi&lt;/a&gt;, I had some interest in the package.
I had a bot posting some plots daily.
I wanted to have alt text in the tweets.
There was a pending PR for just that.
I could fork and use my own version or help with the package.&lt;/p&gt;
&lt;p&gt;I got in touch with rOpenSci about the package.
After some time waiting hearing back from the author of the package (Many thanks &lt;a href=&#34;https://mikewk.com/!&#34;&gt;Michael W. Kearney&lt;/a&gt;, I was given edit permissions to the repository (under the helping eye of Scott Chamberlain).
At the time, I got &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/471&#34;&gt;permission from rOpenSci&lt;/a&gt; there were 167 issues and PR open.&lt;/p&gt;
&lt;p&gt;As I started going through them and changing the source code, people showed up to help (Thanks!).
New contributors helped with closing issues via new PR or simply advising about naming functions or &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/445#issuecomment-790042423&#34;&gt;asking for its future&lt;/a&gt;.
Another developer gained access to repository and contributed with their expertise (resulting in some &lt;a href=&#34;https://twitter.com/hadleywickham/status/1365661250269683713&#34;&gt;funny moments&lt;/a&gt;).
Despite all (breaking) changes, we tried to make the package easier to maintain.
After waiting some time, for further community feedback, I released a new version through CRAN.&lt;/p&gt;
&lt;p&gt;Shortly after, for some reasons my bot got less views and engagement.
It made me realize that it might be an opportunity to start a professional project around the content of what this bot did.
In addition, although I learn a lot from the online communities I decided to connect more with the local community.
I am currently involved with re-launching the &lt;a href=&#34;https://barcelonar.org&#34;&gt;Barcelona R user group&lt;/a&gt; and organizing the &lt;a href=&#34;http://r-es.org/2023/02/09/las-xiii-jornadas-congreso-r-hispano-2023-seran-en-barcelona/&#34;&gt;Spanish R conference&lt;/a&gt;.
These activities take time an energy away from package maintenance (not just rtweet).
However, &lt;strong&gt;my support for the users of rtweet remains&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;images/allison_community_support.png&#34; alt=&#34;&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;The community support I received was great. I hope to pass on it. CC-BY-4.0 &lt;a href=&#34;https://twitter.com/allison_horst&#34;&gt;Artwork by &lt;span class=&#34;citation&#34;&gt;@allison_horst&lt;/span&gt;.&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;current-situation-supporting-the-v1-and-adding-support-to-the-v2-api&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Current situation: supporting the v1 and adding support to the v2 API&lt;/h2&gt;
&lt;p&gt;To maintain rtweet I needed to adapt and understand the Twitter’s API (v1) documentation.
The documentation didn’t always match the functionality.
rtweet follows the API tightly, any change might break the package.
For instance, the addition of the edit field broke the parsing of the tweets.&lt;/p&gt;
&lt;p&gt;At the same time one of the oldest issues in rtweet is moving to the new API v2.
With the 1.0 release, I decided it was time to stop adding features relying in it.
It was clear that Twitter was moving to deprecate and replace it with the newer API.
There are many benefits of the new API to developer and users.
But as I realized that I should still fix bugs related to them.&lt;/p&gt;
&lt;p&gt;The 1.1 release have recently provided some bug fixes and support for the v2.
There is also an outstanding issue preventting new users of connecting to Twitter (fixed in github).
The streaming endpoint in v1 stopped working before the announced date.
So the new release included support for the replacement endpoint in v2.&lt;/p&gt;
&lt;p&gt;As you have guessed, between releases I had been thinking and working to support API v2.
The foundations to support the new endpoints were already set and easily expanded to new endpoints.
In the development version of the package (in the devel branch), it is possible to retrieve bookmarks, retrieve the archive if you have academic access, among other endpoints.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;going-forward-supporting-rtweet-users-of-v2-api&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Going forward: supporting rtweet users of v2 API&lt;/h2&gt;
&lt;p&gt;Recently (~3 weeks), there have been &lt;a href=&#34;https://twitter.com/TwitterDev&#34;&gt;anouncements&lt;/a&gt; of future changes in who and how will be able to access the API.
Simultaneously, there have been unannounced restrictions affecting other tools using the API.
Recent changes and announcements are driven by need of money to sustain Twitter.
To maintain rtweet with its current functionality it might need funds.&lt;/p&gt;
&lt;p&gt;I will continue supporting rtweet and the freely accessible endpoints.
But given that I will have less time and energy for rtweet, I am looking for a &lt;strong&gt;&lt;em&gt;co&lt;/em&gt;-maintainer&lt;/strong&gt; to help me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Supporting new endpoints, using httr2, testing an API in CI, …&lt;/li&gt;
&lt;li&gt;Review changes to avoid new bugs.&lt;/li&gt;
&lt;li&gt;Help with issues and questions that the transition to API v2 and the current uncertainty bring.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Get in touch with me in &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/763&#34;&gt;this issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From time to time, I receive bug reports and petitions from premium API users.
Currently, premium users get access to more data, elevated request rates and other endpoints.
Helping them is usually challenging, and developing support for these endpoints will be difficult.
To continue supporting these users I’ve set up a &lt;a href=&#34;https://www.buymeacoffee.com/llrs&#34;&gt;☕ buymeacoffee&lt;/a&gt; and I, and my co-maintainer, will be open to consulting jobs about rtweet.
With these jobs and funding we will be able to support them and implement new endpoints for all the users.&lt;/p&gt;
&lt;p&gt;There is scarce information about the API changes and prices.
Changes might come suddenly.
We’ll do our best to keep up and inform all users.
I hope this will be a good way to &lt;strong&gt;continue supporting the package an the community&lt;/strong&gt; of users.
Let me know if you have other suggestions.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;images/decorated_r.png&#34; alt=&#34;&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Decorating/contributing to R. CC-BY-4.0 &lt;a href=&#34;https://twitter.com/allison_horst&#34;&gt;artwork by &lt;span class=&#34;citation&#34;&gt;@allison_horst&lt;/span&gt;.&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Thanks Maëlle for your support and reviewing the post.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Accessing REDCap from R</title>
      <link>https://llrs.dev/post/2023/02/08/accessing-redcap-from-r/</link>
      <pubDate>Wed, 08 Feb 2023 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2023/02/08/accessing-redcap-from-r/</guid>
      <description>


&lt;p&gt;In this post, I want to summarize some of the packages to connect to &lt;a href=&#34;https://www.project-redcap.org/&#34;&gt;REDCap&lt;/a&gt;.
For those who don’t know, REDCap is a database designed for clinical usage, which allows easy data collection of patients’ responses by clinicians and interactions with the patients via surveys.&lt;/p&gt;
&lt;p&gt;It has specific features such as scheduling surveys sent to patients, compatibility with tablets and mobile phones for data entry while visiting patients, grouping data in instruments (for repeating the same questions multiple times), multiple choice and check buttons, and different arms (like paths for patients).
Most importantly is relatively easy to manage by clinical administrators.&lt;/p&gt;
&lt;p&gt;In CRAN there are ~11 &lt;a href=&#34;https://search.r-project.org/?P=REDCap&amp;amp;SORT=&amp;amp;HITSPERPAGE=10&amp;amp;DB=cran-info&amp;amp;DEFAULTOP=and&amp;amp;FMT=query&amp;amp;xDB=all&amp;amp;xFILTERS=.%7E%7E&#34;&gt;packages mentioning it&lt;/a&gt; at the time of writing it.
The purpose of this post is to help decide which packages can be helpful in which situations.
This post won’t be a deep analysis or comparison of capabilities, it describes some of the best and worse features of each package.&lt;/p&gt;
&lt;div id=&#34;redcapr&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;REDCapR&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=REDCapR&#34;&gt;REDCapR&lt;/a&gt; is the official package to connect to the database.
It allows you to read, write and filter the requests.
It has some security-related functions.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;redcaptidier&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;REDCapTidieR&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=REDCapTidieR&#34;&gt;REDCapTidieR&lt;/a&gt; is a package that provides summaries of tables and helps with nested tibbles data by arm.
It depends on REDCapR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;tidyredcap&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;tidyREDCap&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=tidyREDCap&#34;&gt;tidyREDCap&lt;/a&gt; is a package that simplifies the tables for instruments and choose-all or choose-one question types.
It is easy to make tables and it depends on REDCapR.
It requires the first and last columns to make instruments.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;images/redcap_design.jpg&#34; alt=&#34;&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Screenshot of a design with several instruments in a single arm (from &lt;a href=&#34;https://www.project-redcap.org/&#34; class=&#34;uri&#34;&gt;https://www.project-redcap.org/&lt;/a&gt;)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;redcapexporter&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;REDCapExporter&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=REDCapExporter&#34;&gt;REDCapExporter&lt;/a&gt; is a package to build a data package from a database for redistribution.
It does not depend on REDCapR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;redcapapi&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;redcapAPI&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=redcapAPI&#34;&gt;redcapAPI&lt;/a&gt; is a package for making data accessible and analysis-ready as quickly as possible with huge documentation in a wiki but has no vignette or examples and it does not depend on REDCapR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;redcapdm&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;REDCapDM&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=REDCapDM&#34;&gt;REDCapDM&lt;/a&gt; is a package that provides functions to read and manage REDCap data and identify missing or extreme values as well as transform the data provided by the API.
It depends on REDCapR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reviewr&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;ReviewR&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=ReviewR&#34;&gt;ReviewR&lt;/a&gt; is a package that creates a shiny website with data from the database to explore it.
It uses the REDCapR to connect to your instance.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rccola&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;rccola&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=rccola&#34;&gt;rccola&lt;/a&gt; is a package to provide a secure connection to the database but it doesn’t provide any handling of the data.
It uses redcapAPI to connect to the database.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2023/02/08/accessing-redcap-from-r/index.en_files/figure-html/unnamed-chunk-1-1.png&#34; alt=&#34;Barplot with the dependencies: from less to more: REDCapExporter, rccola, redcapAPI, REDCapR, tidyREDCap, REDCapDM, REDCapTidieR, ReviewR&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;other-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Other packages&lt;/h2&gt;
&lt;p&gt;Other packages mention REDCap:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=nmadb&#34;&gt;nmadb&lt;/a&gt;: which implements its own connection procedure for a specific REDCap database of network meta-analyses.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=distcomp&#34;&gt;distcomp&lt;/a&gt;: Allows to do computation on a distributed data also in REDCap.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=cgmanalysis&#34;&gt;cgmanalysis&lt;/a&gt;: which mentions that data produced is compatible with REDCap.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;I’m sure that many packages briefly described here can do much more than what I understood from a glance at their documentation and DESCRIPTION.&lt;/p&gt;
&lt;p&gt;Most packages provide some data for the examples (and probably tests), while others do not.
This is a technical problem that might impact users if there are no examples in the functions.&lt;/p&gt;
&lt;p&gt;REDCapR is used by most packages to access the database, but most of the packages focus on transforming the data provided by the API (or data exported) or the exported data.
It highlights that the data exported is useful but that depending on the preferences of the users it needs to be transformed for easy usage.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Exploring CRAN&#39;s files: part 2</title>
      <link>https://llrs.dev/post/2022/07/28/cran-files-2/</link>
      <pubDate>Thu, 28 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2022/07/28/cran-files-2/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/&#34;&gt;first post&lt;/a&gt; of the series we briefly explored packages available on CRAN.
Now I’ll focus on history of the packages and its size using the following files:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages &amp;lt;- tools::CRAN_package_db()
current &amp;lt;- tools:::CRAN_current_db()
archive &amp;lt;- tools:::CRAN_archive_db()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this part we will use two files: The &lt;code&gt;current&lt;/code&gt; and the &lt;code&gt;archive&lt;/code&gt;, let’s see why.&lt;/p&gt;
&lt;div id=&#34;current-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;current file&lt;/h3&gt;
&lt;p&gt;The current database has has the package size, dates of modification, which I assume is date added to CRAN and user name of who last modified it.
This is the same information returned by &lt;a href=&#34;https://search.r-project.org/R/refmans/base/html/file.info.html&#34;&gt;&lt;code&gt;file.info&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;current[1, 1:10]
##     size isdir mode               mtime               ctime               atime
## A3 42810 FALSE  664 2015-08-16 23:05:54 2022-09-03 12:02:27 2022-09-03 14:00:19
##     uid  gid  uname    grname
## A3 1001 1001 hornik cranadmin&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;archive-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;archive file&lt;/h3&gt;
&lt;p&gt;The archive database returns the same information, but as you might guess by the name it doesn’t provide information about current packages but for packages in the archive and no longer available by default.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;archive[[1]]
##                     size isdir mode               mtime               ctime
## A3/A3_0.9.1.tar.gz 45252 FALSE  664 2013-02-07 10:00:29 2022-08-22 18:14:53
## A3/A3_0.9.2.tar.gz 45907 FALSE  664 2013-03-26 19:58:40 2022-08-22 18:14:53
##                                  atime  uid  gid  uname    grname
## A3/A3_0.9.1.tar.gz 2022-08-22 17:39:50 1001 1001 hornik cranadmin
## A3/A3_0.9.2.tar.gz 2022-08-22 17:39:50 1010 1001 ligges cranadmin&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The date matches that available on the &lt;a href=&#34;https://cran.r-project.org/src/contrib/Archive/A3/&#34;&gt;web’s old sources&lt;/a&gt;, so we can be confident of it’s meaning.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;cran-history&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;CRAN history&lt;/h2&gt;
&lt;p&gt;As we have seen there are some files about the archives of CRAN.
These include information about date of modification (moving/editing) and user who did it and of course name and sometimes version of the package.
These archives are the great treasure of CRAN because they help to make reproducible long time ago run experiments or analysis.&lt;/p&gt;
&lt;p&gt;Note that I’m not totally sure that this archive contains the full record of packages, some initial packages might be missing.
I’m also aware of some packages removed by CRAN which do not longer appear on this records.&lt;/p&gt;
&lt;p&gt;Nevertheless, this should provide an accurate picture of packages available through time.
Also as there is no information when a package is archived (here, &lt;a href=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/&#34;&gt;there is on PACKAGES.in&lt;/a&gt;) so I might overestimate the packages available at any given moment.&lt;/p&gt;
&lt;p&gt;Remember the plot about &lt;a href=&#34;#accepted&#34;&gt;acceptance of packages on CRAN?&lt;/a&gt;
That plot only looked at current packages available, let’s check it with all the archive:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:accumulative-packages&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/accumulative-packages-1.png&#34; alt=&#34;*Packages on CRAN archive by their addition to it.* There are over 125000 archives on CRAN.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: &lt;em&gt;Packages on CRAN archive by their addition to it.&lt;/em&gt; There are over 125000 archives on CRAN.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;All these packages come from packages with few releases and packages with many releases.
If we look at which packages had the most releases:&lt;/p&gt;
&lt;template id=&#34;41fb6fac-ce02-4889-ac51-217e365f4058&#34;&gt;&lt;style&gt;
.tabwid table{
  border-spacing:0px !important;
  border-collapse:collapse;
  line-height:1;
  margin-left:auto;
  margin-right:auto;
  border-width: 0;
  display: table;
  margin-top: 1.275em;
  margin-bottom: 1.275em;
  border-color: transparent;
}
.tabwid_left table{
  margin-left:0;
}
.tabwid_right table{
  margin-right:0;
}
.tabwid td {
    padding: 0;
}
.tabwid a {
  text-decoration: none;
}
.tabwid thead {
    background-color: transparent;
}
.tabwid tfoot {
    background-color: transparent;
}
.tabwid table tr {
background-color: transparent;
}
.katex-display {
    margin: 0 0 !important;
}
&lt;/style&gt;&lt;div class=&#34;tabwid&#34;&gt;&lt;style&gt;.cl-e305f260{}.cl-e2fc13c6{font-family:&#39;DejaVu Sans&#39;;font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-e2fc2fdc{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-e2fc2fe6{margin:0;text-align:right;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-e2fc7a46{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a5a{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a64{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a6e{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a6f{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a82{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a8c{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a96{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7a97{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aa0{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aa1{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aaa{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7aab{width:100.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-e2fc7ab4{width:69.7pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}&lt;/style&gt;&lt;table class=&#39;cl-e305f260&#39;&gt;
&lt;thead&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7aab&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;package&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7ab4&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Releases&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;spatstat&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;206&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Matrix&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;204&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a6f&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;mgcv&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a82&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;162&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;RcppArmadillo&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;150&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;rgdal&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;146&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;nlme&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;143&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;caret&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;139&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;spdep&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;139&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;lattice&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;137&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;plotrix&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;131&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a6f&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;sp&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a82&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;128&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;XML&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;126&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;Rcmdr&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;123&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a97&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;lme4&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aa0&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;122&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;gstat&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;121&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a8c&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;arm&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a96&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;119&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;foreign&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;117&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a5a&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;party&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a46&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;117&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7a64&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;maptools&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7a6e&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;113&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-e2fc7aa1&#34;&gt;&lt;p class=&#34;cl-e2fc2fdc&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;raster&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-e2fc7aaa&#34;&gt;&lt;p class=&#34;cl-e2fc2fe6&#34;&gt;&lt;span class=&#34;cl-e2fc13c6&#34;&gt;108&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/template&gt;
&lt;div class=&#34;flextable-shadow-host&#34; id=&#34;c207439a-5643-4e95-950e-721182ef54dd&#34;&gt;&lt;/div&gt;
&lt;script&gt;
var dest = document.getElementById(&#34;c207439a-5643-4e95-950e-721182ef54dd&#34;);
var template = document.getElementById(&#34;41fb6fac-ce02-4889-ac51-217e365f4058&#34;);
var caption = template.content.querySelector(&#34;caption&#34;);
if(caption) {
  caption.style.cssText = &#34;display:block;text-align:center;&#34;;
  var newcapt = document.createElement(&#34;p&#34;);
  newcapt.appendChild(caption)
  dest.parentNode.insertBefore(newcapt, dest.previousSibling);
}
var fantome = dest.attachShadow({mode: &#39;open&#39;});
var templateContent = template.content;
fantome.appendChild(templateContent);
&lt;/script&gt;

&lt;p&gt;Surprisingly there are packages with more than 200 versions on CRAN!&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-distribution&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-distribution-1.png&#34; alt=&#34;*Releases distirbution*. Packages and number of releases&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: &lt;em&gt;Releases distirbution&lt;/em&gt;. Packages and number of releases
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most packages have 1 release, usually packages have 3, but the mean is around 6.&lt;/p&gt;
&lt;p&gt;Given all this different versions of packages how big are all the packages on CRAN?&lt;/p&gt;
&lt;div id=&#34;cran-size&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;CRAN size&lt;/h3&gt;
&lt;p&gt;Have you ever wondered how big is CRAN? According to the memory size of the source packages all CRAN source packages are approximately 96.8 Gb.&lt;/p&gt;
&lt;p&gt;This doesn’t include binaries for multiple architectures and OS.
The package size might indicate whether the package has considerable amount of data.&lt;/p&gt;
&lt;p&gt;Looking back to the size of the packages along time we can see this pattern:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:packages-size&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/packages-size-1.png&#34; alt=&#34;*Package and their median size.* Archived packages have become bigger since 2014. Packages on CRAN have been getting bigger since 2017.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: &lt;em&gt;Package and their median size.&lt;/em&gt; Archived packages have become bigger since 2014. Packages on CRAN have been getting bigger since 2017.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Packages available on CRAN are smaller than those no longer on CRAN.
But versions of packages on CRAN that got archived are usually bigger than current versions.
Packages no longer on CRAN are usually bigger.
Median size of packages is increasing (quickly).&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-size&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-size-1.png&#34; alt=&#34;*Size of package with releases.* Package are usually small but seem to gain weight when updating.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: &lt;em&gt;Size of package with releases.&lt;/em&gt; Package are usually small but seem to gain weight when updating.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Typically packages increase their size with each new release up to when they reach 50 releases.
For higher releases this plot depends on very few packages and might not be representative.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:release-size2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/release-size2-1.png&#34; alt=&#34;*Size of package with releases by availability.* Packages no longer in CRAN are usually smaller than those in it. The continous black line is CRAN&#39;s current threshold, while the discontinous black line is current median size.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: &lt;em&gt;Size of package with releases by availability.&lt;/em&gt; Packages no longer in CRAN are usually smaller than those in it. The continous black line is CRAN’s current threshold, while the discontinous black line is current median size.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Here we can appreciate better how packages tend to be below the CRAN threshold.
There isn’t much of a difference between packages available on CRAN and those archived.&lt;/p&gt;
&lt;p&gt;If we look at the size of package of the first release over time we’ll see a representative view:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:size-time&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/28/cran-files-2/index.en_files/figure-html/size-time-1.png&#34; alt=&#34;*Size of the first release by time*. Package size increases with time with a peak around 2010 and increasing again since 2014 but still hasn&#39;t surprased the previous record.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: &lt;em&gt;Size of the first release by time&lt;/em&gt;. Package size increases with time with a peak around 2010 and increasing again since 2014 but still hasn’t surprased the previous record.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Package size tends to increase except for the brief period 2010-2014.
Currently it increases less than before that period but is close to its maximum.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Most packages are not updated too much, between 1 and 3 times.
But there are packages that are updated quite a lot, this might mean they are data packages and not software packages or that they have frequent minor and major updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most current packages have smaller size than those archived.
Packages no longer available usually had bigger size than those packages still on CRAN.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Surprisingly packages increase their size a lot till the 25 release.
But also with time except for a period in 2010 and 2014.
This decreasing period might be due to a change in CRAN policy.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;future-parts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Future parts&lt;/h2&gt;
&lt;p&gt;On future posts I’ll explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;patterns accepting packages and updates in packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the relation between dependencies, initial release and updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;who handled the packages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Exploring CRAN&#39;s files: part 1</title>
      <link>https://llrs.dev/post/2022/07/23/cran-files-1/</link>
      <pubDate>Sat, 23 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2022/07/23/cran-files-1/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;There are many great things in base R, one of them is the &lt;a href=&#34;https://search.r-project.org/R/refmans/tools/html/00Index.html&#34;&gt;tools package&lt;/a&gt;.
This package has the functions that are used to build, check and create packages, documentation and manuals.&lt;/p&gt;
&lt;p&gt;As I wanted to know how CRAN works and its changes I was looking into the source code of tools.
I found some internal functions that access freely available files with information about CRAN packages.
These private functions are at the &lt;a href=&#34;https://svn.r-project.org/R/trunk/src/library/tools/R/CRANtools.R&#34;&gt;CRANtools.R file&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;packages &amp;lt;- tools::CRAN_package_db()
# current &amp;lt;- tools:::CRAN_current_db()
# archive &amp;lt;- tools:::CRAN_archive_db()
# issues &amp;lt;- tools::CRAN_check_issues()
# alias &amp;lt;- tools:::CRAN_aliases_db()
# rdxrefs &amp;lt;- tools:::CRAN_rdxrefs_db()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As I was not sure of the information on these files I asked on &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-devel/2022-May/081770.html&#34;&gt;R-devel&lt;/a&gt; but I did not receive an answer.
They seem to be quite obscure and as private functions might be removed without notice and shouldn’t be used in any dependency.
However, as the files contain information about CRAN they might provide interesting clues about the history of CRAN and how it is operated.&lt;/p&gt;
&lt;p&gt;On this post I will focus on the first file.
I’ll explore a couple of fields and in future posts I will use the other files to explore more about CRAN history.&lt;/p&gt;
&lt;div id=&#34;packages-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;packages file&lt;/h3&gt;
&lt;p&gt;First of all a very brief exploration of what is in this file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##    Package Version Priority                        Depends
## 1       A3   1.0.0     &amp;lt;NA&amp;gt; R (&amp;gt;= 2.15.0), xtable, pbapply
## 2 AATtools   0.0.1     &amp;lt;NA&amp;gt;                   R (&amp;gt;= 3.6.0)
## 3   ABACUS   1.0.0     &amp;lt;NA&amp;gt;                   R (&amp;gt;= 3.1.0)
##                                 Imports LinkingTo
## 1                                  &amp;lt;NA&amp;gt;      &amp;lt;NA&amp;gt;
## 2  magrittr, dplyr, doParallel, foreach      &amp;lt;NA&amp;gt;
## 3 ggplot2 (&amp;gt;= 3.1.0), shiny (&amp;gt;= 1.3.1),      &amp;lt;NA&amp;gt;
##                               Suggests Enhances    License License_is_FOSS
## 1                  randomForest, e1071     &amp;lt;NA&amp;gt; GPL (&amp;gt;= 2)            &amp;lt;NA&amp;gt;
## 2                                 &amp;lt;NA&amp;gt;     &amp;lt;NA&amp;gt;      GPL-3            &amp;lt;NA&amp;gt;
## 3 rmarkdown (&amp;gt;= 1.13), knitr (&amp;gt;= 1.22)     &amp;lt;NA&amp;gt;      GPL-3            &amp;lt;NA&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Packages has similar information as &lt;code&gt;available.packages()&lt;/code&gt; but with many more columns with published date, reverse dependencies, X-CRAN-Comment, who packaged it…
Also note that all this packages are not filtered to match R version, OS_type, subarch and there are almost duplicates (I learned about this filtering while reading the great documentation of &lt;a href=&#34;https://search.r-project.org/R/refmans/utils/html/available.packages.html&#34;&gt;&lt;code&gt;available.packages()&lt;/code&gt;&lt;/a&gt; and also finding some mentions online).&lt;/p&gt;
&lt;p&gt;As we have data from several years I’ll sometimes show the release dates of different R versions to provide some context.
Without further delay let’s explore the data!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;accepted&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Published packages&lt;/h2&gt;
&lt;p&gt;CRAN started some time ago (in 1997) but it hasn’t remained frozen.
The package archive (the A in CRAN) has been updating since then.
For instance the current packages do not include packages that were removed, archived or those replaced by updates.&lt;/p&gt;
&lt;p&gt;First packages are submitted to CRAN and once accepted they are published.
As accepted and published usually are almost instantaneous I might use them as synonyms.
Looking at the current available packages and their publication date, we can see the following:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:daily-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/daily-cran-1.png&#34; alt=&#34;ggplot2 plot of date vs packages accepted on a given day. Until2020 less than 10 packages were accepted daily. Lately more than 30 are added to CRAN. The plot also displays the R release versions from 2.12 in 2010 to 4.2.0 in 2022.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: &lt;em&gt;Packages accepted on CRAN by the publication date.&lt;/em&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The oldest package added was in 2010.
This means a package without issues, dependencies changes, bugs detected by the automatic checks since 12 years!&lt;/p&gt;
&lt;p&gt;The daily rate of acceptance has increased from less than 10 a day till 2020 to more than 30 this year 2022.
If we summarize that information for month we see the same, but the little bump in 2020 disappears but we see other patterns:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:monthly-cran&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/monthly-cran-1.png&#34; alt=&#34;ggplot figure with the monthly published packages. till 2015 it raises very slowly, then in is around 50 monthly packages and there are some wobbles. In 2022 it raised to over 800 packages.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: &lt;em&gt;Monthly packages published to CRAN&lt;/em&gt;. Some monthly variance is observed.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Instead of just one bump we see some waves with less packages on CRAN accepted late in the year and an increase of packages the first months of the year.&lt;/p&gt;
&lt;p&gt;If we look at the accumulated packages on CRAN we see an exponential growth:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-cumsum&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-cumsum-1.png&#34; alt=&#34;Plot with the accumulative number of packages in CRAN. Raising from a few 10 to currently more than 18000.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: &lt;em&gt;Acumulation of packages&lt;/em&gt;. Most of the packages have been published in the last 2 years.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In fact, most packages currently on CRAN where added since March 2021 than all the previous years.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-perc&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-perc-1.png&#34; alt=&#34;Line with percentages of packages in CRAN by date. Close to 50% of current packages were published between 2010 and 2021.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: &lt;em&gt;Percentage of current packages on CRAN according to their date of publication&lt;/em&gt;. Most of them were published/updated on the last year and a half.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is a good time to remind that the date being used is the date of publication of this version of the packages.
Many had previous versions on CRAN:&lt;/p&gt;
&lt;template id=&#34;9668142b-64d5-4c3d-842e-fbcef8304c16&#34;&gt;&lt;style&gt;
.tabwid table{
  border-spacing:0px !important;
  border-collapse:collapse;
  line-height:1;
  margin-left:auto;
  margin-right:auto;
  border-width: 0;
  display: table;
  margin-top: 1.275em;
  margin-bottom: 1.275em;
  border-color: transparent;
}
.tabwid_left table{
  margin-left:0;
}
.tabwid_right table{
  margin-right:0;
}
.tabwid td {
    padding: 0;
}
.tabwid a {
  text-decoration: none;
}
.tabwid thead {
    background-color: transparent;
}
.tabwid tfoot {
    background-color: transparent;
}
.tabwid table tr {
background-color: transparent;
}
&lt;/style&gt;&lt;div class=&#34;tabwid&#34;&gt;&lt;style&gt;.cl-3baefb4c{}.cl-3ba22c8c{font-family:&#39;DejaVu Sans&#39;;font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3ba253e2{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3ba253ec{margin:0;text-align:right;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3ba2b7e2{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b7f6{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b7f7{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b800{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b80a{width:88.3pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3ba2b814{width:72.5pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}&lt;/style&gt;&lt;table class=&#39;cl-3baefb4c&#39;&gt;
&lt;thead&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b80a&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;First release&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b814&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;Packages&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b7e2&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b7f6&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;14,294&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&#34;overflow-wrap:break-word;&#34;&gt;&lt;td class=&#34;cl-3ba2b7f7&#34;&gt;&lt;p class=&#34;cl-3ba253e2&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;td class=&#34;cl-3ba2b800&#34;&gt;&lt;p class=&#34;cl-3ba253ec&#34;&gt;&lt;span class=&#34;cl-3ba22c8c&#34;&gt;4,113&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/template&gt;
&lt;div class=&#34;flextable-shadow-host&#34; id=&#34;1027b3f4-86a2-414b-90aa-a3bab733e0c0&#34;&gt;&lt;/div&gt;
&lt;script&gt;
var dest = document.getElementById(&#34;1027b3f4-86a2-414b-90aa-a3bab733e0c0&#34;);
var template = document.getElementById(&#34;9668142b-64d5-4c3d-842e-fbcef8304c16&#34;);
var caption = template.content.querySelector(&#34;caption&#34;);
if(caption) {
  caption.style.cssText = &#34;display:block;text-align:center;&#34;;
  var newcapt = document.createElement(&#34;p&#34;);
  newcapt.appendChild(caption)
  dest.parentNode.insertBefore(newcapt, dest.previousSibling);
}
var fantome = dest.attachShadow({mode: &#39;open&#39;});
var templateContent = template.content;
fantome.appendChild(templateContent);
&lt;/script&gt;

&lt;/div&gt;
&lt;div id=&#34;delays&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Processing time&lt;/h2&gt;
&lt;p&gt;Previously I found that &lt;a href=&#34;https://llrs.dev/post/2021/01/31/cran-review/&#34;&gt;CRAN submissions&lt;/a&gt; present some key differences between new packages and already published packages which impact how long do they need to wait to be published on CRAN.
With the existing data we can compare how fast is the process by comparing the published date with the build date.&lt;/p&gt;
&lt;p&gt;The build date is added to the tar.gz file automatically when the developer builds the package via &lt;code&gt;R CMD build&lt;/code&gt;. However, the published date is set by CRAN once the packages are accepted on CRAN.&lt;/p&gt;
&lt;p&gt;To visualize the differences I will also compare if there is some difference with new packages and those that were already on CRAN:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-delays&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-delays-1.png&#34; alt=&#34;Histogram of packages and the time between build and publication. They take less than 50 days usually.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: &lt;em&gt;Histogram of time difference between building and publishing a package.&lt;/em&gt; Color indicates if the package is new to CRAN or not. Most of the published packages take more or less the same time regardless of if it is the first time or not.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There doesn’t seem to be much difference between date of building and date of publication according to if it is the first release or not.
The precision is just a day and this is usually a fast process well below 50 days.
Few packages exceed spend so much after build before publication and they are too few to be noticeable at this scale.
Since 2016/05/02 there is a &lt;a href=&#34;https://github.com/r-devel/r-svn/blob/676c1183801648b68f8f6719701445b2f9a5e3fd/src/library/tools/R/QC.R#L7583&#34;&gt;check&lt;/a&gt; that raises an issue if the build is older than a month.&lt;/p&gt;
&lt;p&gt;Note that one might need to build multiple times the package before it is accepted.
Packages published for the first time on CRAN might have been submitted previously, but when they finally built and pass the checks and manual review they are handled as fast as packages already on CRAN.&lt;/p&gt;
&lt;p&gt;However, this time between build and acceptance might have changed with time:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-delays2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-delays2-1.png&#34; alt=&#34;Smoothed lines of published packages with different linetype and color depending on if it is the first time they are on CRAN or not. New packages currently take less than 4 days and old packages less than 2. This is down from 2018 to 2021, when new packages took above 4 days to be published on CRAN&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: &lt;em&gt;Processing time between building the package and being published by date.&lt;/em&gt; There is a high difference between new packages and old ones. New packages usually take more time while existing packages take less than a day currently.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We clearly see a difference in processing time for those packages already on CRAN and those that are not.
Keep in mind that for the few packages from before 2016 the estimation might not be accurate.
At the same time this is consistent with the manual review process (For more information see &lt;a href=&#34;https://llrs.dev/post/2021/01/31/cran-review/&#34;&gt;my previous post&lt;/a&gt; about the review process of CRAN or my &lt;a href=&#34;https://llrs.dev/talk/user-2021/&#34;&gt;talk at the useR2021&lt;/a&gt;).
It also means that there is a huge variation of time about how packages are handled.
However this seems to be reducing: while in 2010 it took around 2 weeks, nowadays it takes less than a week and getting closer to a 1 day of median time between a package being built and appearing on CRAN that takes for existing packages.&lt;/p&gt;
&lt;p&gt;This difference might be explainable due to experience: authors and maintainers whose package(s) are already in CRAN know better how to submit a new version without problems the checks.&lt;/p&gt;
&lt;p&gt;It could also be that new packages need more time from the CRAN team.
In 2020 we see it took longer than in previous years for packages to be added on CRAN.
Maybe the increase in the processing time in 2020 was due the huge volume of submissions CRAN received or more checks on the developer side before submitting it to CRAN.&lt;/p&gt;
&lt;p&gt;Both explanations are not mutually exclusive.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
More packages published the same day mean more processing time? It doesn’t look like it.
&lt;/summary&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:cran-reasons&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://llrs.dev/post/2022/07/23/cran-files-1/index.en_files/figure-html/cran-reasons-1.png&#34; alt=&#34;ggplot graphic with the time of processing time and the number of packages accepted the same day. New packages have less delay than already published packages, but the more packages are accepted, the less delay there is.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 7: &lt;em&gt;Packages accepted the same day and processing time.&lt;/em&gt;New packages are accepted sooner than packages on CRAN respect to the builddate.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Surprisingly, we see a lot of variation on the delay of packages already accepted on CRAN.
In addition, the more new packages accepted the same day, the less delay there is.
I think this just means that when reviewers work on the submission queue several packages might be approved.&lt;/p&gt;
&lt;p&gt;This might also mean packages have already been built several times before finally being accepted and now the errors, warnings and notes have been solved.
Last, this could indicate that developers with their package already on CRAN wait a bit between building and submitting the package as the developer might be taking some time to double check before submission (dependencies, on several machines, other?) or a time zone difference (submitting in the noon of a region but at the reviewers night).&lt;/p&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;There are packages that for 12 years have been working without problems despite the several major changes in R (See figure &lt;a href=&#34;#fig:daily-cran&#34;&gt;1&lt;/a&gt;).
This speaks volumes of the packages’ quality, and the backward compatibility that the R core aims and CRAN checks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRAN accepts an incredible amount of packages daily and monthly.
The system and the team are doing an incredible work mostly on their free time (See figure &lt;a href=&#34;#fig:monthly-cran&#34;&gt;2&lt;/a&gt;).
Many thanks!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Accepted packages are handled very fast, in less than a week usually (See figure &lt;a href=&#34;#fig:cran-reasons&#34;&gt;7&lt;/a&gt;).
But it is not possible to distinguish alone time in the submission system and time on the developer computer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;future-parts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Future parts&lt;/h2&gt;
&lt;p&gt;We’ve explored a snapshot of current packages and a brief window of all the history of CRAN.
There is much more that can be done with all the other files.&lt;/p&gt;
&lt;p&gt;On future posts I’ll explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;patterns accepting packages and updates in packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;who handled the packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Size of packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the relation between dependencies, initial release and updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other suggestions?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Edit&lt;/strong&gt;: Many thanks to &lt;a href=&#34;https://masalmon.eu/&#34;&gt;Maëlle Salmon&lt;/a&gt; and &lt;a href=&#34;https://dirk.eddelbuettel.com/&#34;&gt;Dirk Eddelbuettel&lt;/a&gt; for their feedback on an initial version of this series of posts.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## - Session info -------------------------------------------------------------------------------------------------------
##  setting  value
##  version  R version 4.2.1 (2022-06-23)
##  os       Ubuntu 20.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C
##  tz       Europe/Madrid
##  date     2022-07-23
##  pandoc   2.18 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
## 
## - Packages -----------------------------------------------------------------------------------------------------------
##  package      * version    date (UTC) lib source
##  assertthat     0.2.1      2019-03-21 [2] RSPM (R 4.2.0)
##  base64enc      0.1-3      2015-07-28 [2] CRAN (R 4.0.0)
##  blogdown       1.10       2022-05-10 [2] RSPM (R 4.2.0)
##  bookdown       0.27       2022-06-14 [2] RSPM (R 4.2.0)
##  bslib          0.4.0      2022-07-16 [2] RSPM (R 4.2.0)
##  cachem         1.0.6      2021-08-19 [2] RSPM (R 4.2.0)
##  cli            3.3.0      2022-04-25 [2] RSPM (R 4.2.0)
##  codetools      0.2-18     2020-11-04 [2] RSPM (R 4.2.0)
##  colorspace     2.0-3      2022-02-21 [2] RSPM (R 4.2.0)
##  crayon         1.5.1      2022-03-26 [2] RSPM (R 4.2.0)
##  curl           4.3.2      2021-06-23 [2] RSPM (R 4.2.0)
##  data.table     1.14.2     2021-09-27 [2] RSPM (R 4.2.0)
##  DBI            1.1.3      2022-06-18 [2] RSPM (R 4.2.0)
##  digest         0.6.29     2021-12-01 [2] RSPM (R 4.2.0)
##  dplyr        * 1.0.9      2022-04-28 [2] RSPM (R 4.2.0)
##  ellipsis       0.3.2      2021-04-29 [2] RSPM (R 4.2.0)
##  evaluate       0.15       2022-02-18 [2] RSPM (R 4.2.0)
##  fansi          1.0.3      2022-03-24 [2] RSPM (R 4.2.0)
##  farver         2.1.1      2022-07-06 [2] RSPM (R 4.2.0)
##  fastmap        1.1.0      2021-01-25 [2] RSPM (R 4.2.0)
##  flextable    * 0.7.2      2022-06-12 [2] RSPM (R 4.2.0)
##  forcats      * 0.5.1      2021-01-27 [2] RSPM (R 4.2.0)
##  gdtools        0.2.4      2022-02-14 [2] RSPM (R 4.2.0)
##  generics       0.1.3      2022-07-05 [2] RSPM (R 4.2.0)
##  geomtextpath * 0.1.0      2022-01-24 [2] CRAN (R 4.2.1)
##  ggplot2      * 3.3.6.9000 2022-06-29 [2] Github (tidyverse/ggplot2@7571122)
##  ggrepel      * 0.9.1      2021-01-15 [2] RSPM (R 4.2.0)
##  glue           1.6.2      2022-02-24 [2] RSPM (R 4.2.0)
##  gtable         0.3.0      2019-03-25 [2] CRAN (R 4.0.0)
##  highr          0.9        2021-04-16 [2] RSPM (R 4.2.0)
##  htmltools      0.5.3      2022-07-18 [2] RSPM (R 4.2.0)
##  jquerylib      0.1.4      2021-04-26 [2] RSPM (R 4.2.0)
##  jsonlite       1.8.0      2022-02-22 [2] RSPM (R 4.2.0)
##  knitr          1.39       2022-04-26 [2] RSPM (R 4.2.0)
##  labeling       0.4.2      2020-10-20 [2] RSPM (R 4.2.0)
##  lattice        0.20-45    2021-09-22 [3] CRAN (R 4.2.0)
##  lifecycle      1.0.1      2021-09-24 [2] RSPM (R 4.2.0)
##  lubridate    * 1.8.0      2021-10-07 [2] RSPM (R 4.2.0)
##  magrittr       2.0.3      2022-03-30 [2] RSPM (R 4.2.0)
##  Matrix         1.4-1      2022-03-23 [2] RSPM (R 4.2.0)
##  mgcv           1.8-40     2022-03-29 [2] RSPM (R 4.2.0)
##  munsell        0.5.0      2018-06-12 [2] RSPM (R 4.2.0)
##  nlme           3.1-158    2022-06-15 [2] RSPM (R 4.2.0)
##  officer        0.4.3      2022-06-12 [2] RSPM (R 4.2.0)
##  pillar         1.8.0      2022-07-18 [2] RSPM (R 4.2.0)
##  pkgconfig      2.0.3      2019-09-22 [2] RSPM (R 4.2.0)
##  purrr          0.3.4      2020-04-17 [2] RSPM (R 4.2.0)
##  R6             2.5.1      2021-08-19 [2] RSPM (R 4.2.0)
##  Rcpp           1.0.9      2022-07-08 [2] RSPM (R 4.2.0)
##  rlang          1.0.4      2022-07-12 [2] RSPM (R 4.2.0)
##  rmarkdown      2.14       2022-04-25 [2] RSPM (R 4.2.0)
##  rstudioapi     0.13       2020-11-12 [2] RSPM (R 4.2.0)
##  rversions    * 2.1.1      2021-05-31 [2] RSPM (R 4.2.0)
##  sass           0.4.2      2022-07-16 [2] RSPM (R 4.2.0)
##  scales         1.2.0      2022-04-13 [2] RSPM (R 4.2.0)
##  sessioninfo    1.2.2      2021-12-06 [2] RSPM (R 4.2.0)
##  stringi        1.7.8      2022-07-11 [2] RSPM (R 4.2.0)
##  stringr        1.4.0      2019-02-10 [2] RSPM (R 4.2.0)
##  systemfonts    1.0.4      2022-02-11 [2] RSPM (R 4.2.0)
##  textshaping    0.3.6      2021-10-13 [2] RSPM (R 4.2.0)
##  tibble         3.1.7      2022-05-03 [2] RSPM (R 4.2.0)
##  tidyr        * 1.2.0      2022-02-01 [2] RSPM (R 4.2.0)
##  tidyselect     1.1.2      2022-02-21 [2] RSPM (R 4.2.0)
##  utf8           1.2.2      2021-07-24 [2] RSPM (R 4.2.0)
##  uuid           1.1-0      2022-04-19 [2] RSPM (R 4.2.0)
##  vctrs          0.4.1      2022-04-13 [2] RSPM (R 4.2.0)
##  withr          2.5.0      2022-03-03 [2] RSPM (R 4.2.0)
##  xfun           0.31       2022-05-10 [2] RSPM (R 4.2.0)
##  xml2           1.3.3      2021-11-30 [2] RSPM (R 4.2.0)
##  yaml           2.3.5      2022-02-21 [2] RSPM (R 4.2.0)
##  zip            2.2.0      2021-05-31 [2] RSPM (R 4.2.0)
## 
##  [1] /home/lluis/bin/R/4.2.1
##  [2] /usr/lib/R/site-library
##  [3] /usr/lib/R/library
## 
## ----------------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Upgrading rtweet to 1.0.2</title>
      <link>https://llrs.dev/post/2022/07/04/rtweet-1-0-0/</link>
      <pubDate>Mon, 04 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2022/07/04/rtweet-1-0-0/</guid>
      <description>


&lt;p&gt;In this post I will provide some examples of what has changed between rtweet 0.7.0 and rtweet 1.0.2.
I hope both the changes and this guide will help all users.
I highlight the most important and interesting changes in this blog post, and for a full list of changes you can consult it on the &lt;a href=&#34;https://docs.ropensci.org/rtweet/news/index.html&#34;&gt;NEWS&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;big-breaking-changes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Big breaking changes&lt;/strong&gt;&lt;/h2&gt;
&lt;div id=&#34;more-consistent-output&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;More consistent output&lt;/h3&gt;
&lt;p&gt;This is probably what will affect the most users.
All functions that return data about tweets&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; now return the same columns.&lt;/p&gt;
&lt;p&gt;For example if we search some tweets we’ll get the following columns:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; tweets &amp;lt;- search_tweets(&amp;quot;weather&amp;quot;)
&amp;gt; colnames(tweets)
 [1] &amp;quot;created_at&amp;quot;                    &amp;quot;id&amp;quot;                           
 [3] &amp;quot;id_str&amp;quot;                        &amp;quot;full_text&amp;quot;                    
 [5] &amp;quot;truncated&amp;quot;                     &amp;quot;display_text_range&amp;quot;           
 [7] &amp;quot;entities&amp;quot;                      &amp;quot;metadata&amp;quot;                     
 [9] &amp;quot;source&amp;quot;                        &amp;quot;in_reply_to_status_id&amp;quot;        
[11] &amp;quot;in_reply_to_status_id_str&amp;quot;     &amp;quot;in_reply_to_user_id&amp;quot;          
[13] &amp;quot;in_reply_to_user_id_str&amp;quot;       &amp;quot;in_reply_to_screen_name&amp;quot;      
[15] &amp;quot;geo&amp;quot;                           &amp;quot;coordinates&amp;quot;                  
[17] &amp;quot;place&amp;quot;                         &amp;quot;contributors&amp;quot;                 
[19] &amp;quot;is_quote_status&amp;quot;               &amp;quot;retweet_count&amp;quot;                
[21] &amp;quot;favorite_count&amp;quot;                &amp;quot;favorited&amp;quot;                    
[23] &amp;quot;retweeted&amp;quot;                     &amp;quot;lang&amp;quot;                         
[25] &amp;quot;quoted_status_id&amp;quot;              &amp;quot;quoted_status_id_str&amp;quot;         
[27] &amp;quot;quoted_status&amp;quot;                 &amp;quot;possibly_sensitive&amp;quot;           
[29] &amp;quot;retweeted_status&amp;quot;              &amp;quot;text&amp;quot;                         
[31] &amp;quot;favorited_by&amp;quot;                  &amp;quot;scopes&amp;quot;                       
[33] &amp;quot;display_text_width&amp;quot;            &amp;quot;quoted_status_permalink&amp;quot;      
[35] &amp;quot;quote_count&amp;quot;                   &amp;quot;timestamp_ms&amp;quot;                 
[37] &amp;quot;reply_count&amp;quot;                   &amp;quot;filter_level&amp;quot;                 
[39] &amp;quot;query&amp;quot;                         &amp;quot;withheld_scope&amp;quot;               
[41] &amp;quot;withheld_copyright&amp;quot;            &amp;quot;withheld_in_countries&amp;quot;        
[43] &amp;quot;possibly_sensitive_appealable&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;rtweet now minimizes the processing of tweets and only returns the same data as provided by the API while making it easier to handle by R.
However, to preserve the nested nature of the data returned some fields are now nested inside other.
For example, previously fields &lt;code&gt;&#34;bbpx_coords&#34;&lt;/code&gt;, &lt;code&gt;&#34;geo_coords&#34;&lt;/code&gt;, &lt;code&gt;&#34;coords_coords&#34;&lt;/code&gt; were returned as separate columns, but they are now nested inside &lt;code&gt;&#34;place&#34;&lt;/code&gt;, &lt;code&gt;&#34;coordinates&#34;&lt;/code&gt; or &lt;code&gt;&#34;geo&#34;&lt;/code&gt; depending where they are provided.
Some columns previously calculated by rtweet are now not returned, like &lt;code&gt;&#34;rtweet_favorite_count&#34;&lt;/code&gt;.
At the same time it provides with new columns about each tweet like the &lt;code&gt;&#34;withheld_*&#34;&lt;/code&gt; columns.&lt;/p&gt;
&lt;p&gt;If you scanned through the columns you might have noticed that columns &lt;code&gt;&#34;user_id&#34;&lt;/code&gt; and &lt;code&gt;&#34;screen_name&#34;&lt;/code&gt; are no longer returned.
This data is still returned by the API but it is now made available to the rtweet users via &lt;code&gt;users_data()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; colnames(users_data(tweets))
 [1] &amp;quot;id&amp;quot;                      &amp;quot;id_str&amp;quot;                 
 [3] &amp;quot;name&amp;quot;                    &amp;quot;screen_name&amp;quot;            
 [5] &amp;quot;location&amp;quot;                &amp;quot;description&amp;quot;            
 [7] &amp;quot;url&amp;quot;                     &amp;quot;protected&amp;quot;              
 [9] &amp;quot;followers_count&amp;quot;         &amp;quot;friends_count&amp;quot;          
[11] &amp;quot;listed_count&amp;quot;            &amp;quot;created_at&amp;quot;             
[13] &amp;quot;favourites_count&amp;quot;        &amp;quot;verified&amp;quot;               
[15] &amp;quot;statuses_count&amp;quot;          &amp;quot;profile_image_url_https&amp;quot;
[17] &amp;quot;profile_banner_url&amp;quot;      &amp;quot;default_profile&amp;quot;        
[19] &amp;quot;default_profile_image&amp;quot;   &amp;quot;withheld_in_countries&amp;quot;  
[21] &amp;quot;derived&amp;quot;                 &amp;quot;withheld_scope&amp;quot;         
[23] &amp;quot;entities&amp;quot; &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This blog post should help you find the right data columns, but if you don’t find what you are looking for it might be nested inside a column.&lt;br /&gt;
Try using &lt;code&gt;dplyr::glimpse()&lt;/code&gt; to explore the data and locate nested columns.
For example the entities column (which is present in both tweets and users) have the following useful columns:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; names(tweets$entities[[1]])
[1] &amp;quot;hashtags&amp;quot;      &amp;quot;symbols&amp;quot;       &amp;quot;user_mentions&amp;quot; &amp;quot;urls&amp;quot;         
[5] &amp;quot;media&amp;quot; &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Similarly if you look up a user via &lt;code&gt;search_users()&lt;/code&gt; or &lt;code&gt;lookup_users()&lt;/code&gt; you’ll get consistent data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; users &amp;lt;- lookup_users(c(&amp;quot;twitter&amp;quot;, &amp;quot;rladiesglobal&amp;quot;, &amp;quot;_R_Foundation&amp;quot;))
&amp;gt; colnames(users)
 [1] &amp;quot;id&amp;quot;                      &amp;quot;id_str&amp;quot;                 
 [3] &amp;quot;name&amp;quot;                    &amp;quot;screen_name&amp;quot;            
 [5] &amp;quot;location&amp;quot;                &amp;quot;description&amp;quot;            
 [7] &amp;quot;url&amp;quot;                     &amp;quot;protected&amp;quot;              
 [9] &amp;quot;followers_count&amp;quot;         &amp;quot;friends_count&amp;quot;          
[11] &amp;quot;listed_count&amp;quot;            &amp;quot;created_at&amp;quot;             
[13] &amp;quot;favourites_count&amp;quot;        &amp;quot;verified&amp;quot;               
[15] &amp;quot;statuses_count&amp;quot;          &amp;quot;profile_image_url_https&amp;quot;
[17] &amp;quot;profile_banner_url&amp;quot;      &amp;quot;default_profile&amp;quot;        
[19] &amp;quot;default_profile_image&amp;quot;   &amp;quot;withheld_in_countries&amp;quot;  
[21] &amp;quot;derived&amp;quot;                 &amp;quot;withheld_scope&amp;quot;         
[23] &amp;quot;entities&amp;quot;               &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can use &lt;code&gt;tweets_data()&lt;/code&gt; to retrieve information about their latest tweet:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; colnames(tweets_data(users))
 [1] &amp;quot;created_at&amp;quot;                    &amp;quot;id&amp;quot;                           
 [3] &amp;quot;id_str&amp;quot;                        &amp;quot;text&amp;quot;                         
 [5] &amp;quot;truncated&amp;quot;                     &amp;quot;entities&amp;quot;                     
 [7] &amp;quot;source&amp;quot;                        &amp;quot;in_reply_to_status_id&amp;quot;        
 [9] &amp;quot;in_reply_to_status_id_str&amp;quot;     &amp;quot;in_reply_to_user_id&amp;quot;          
[11] &amp;quot;in_reply_to_user_id_str&amp;quot;       &amp;quot;in_reply_to_screen_name&amp;quot;      
[13] &amp;quot;geo&amp;quot;                           &amp;quot;coordinates&amp;quot;                  
[15] &amp;quot;place&amp;quot;                         &amp;quot;contributors&amp;quot;                 
[17] &amp;quot;is_quote_status&amp;quot;               &amp;quot;retweet_count&amp;quot;                
[19] &amp;quot;favorite_count&amp;quot;                &amp;quot;favorited&amp;quot;                    
[21] &amp;quot;retweeted&amp;quot;                     &amp;quot;lang&amp;quot;                         
[23] &amp;quot;retweeted_status&amp;quot;              &amp;quot;possibly_sensitive&amp;quot;           
[25] &amp;quot;quoted_status&amp;quot;                 &amp;quot;display_text_width&amp;quot;           
[27] &amp;quot;user&amp;quot;                          &amp;quot;full_text&amp;quot;                    
[29] &amp;quot;favorited_by&amp;quot;                  &amp;quot;scopes&amp;quot;                       
[31] &amp;quot;display_text_range&amp;quot;            &amp;quot;quoted_status_id&amp;quot;             
[33] &amp;quot;quoted_status_id_str&amp;quot;          &amp;quot;quoted_status_permalink&amp;quot;      
[35] &amp;quot;quote_count&amp;quot;                   &amp;quot;timestamp_ms&amp;quot;                 
[37] &amp;quot;reply_count&amp;quot;                   &amp;quot;filter_level&amp;quot;                 
[39] &amp;quot;metadata&amp;quot;                      &amp;quot;query&amp;quot;                        
[41] &amp;quot;withheld_scope&amp;quot;                &amp;quot;withheld_copyright&amp;quot;           
[43] &amp;quot;withheld_in_countries&amp;quot;         &amp;quot;possibly_sensitive_appealable&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can merge them via:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;users_and_last_tweets &amp;lt;- cbind(users, id_str = tweets_data(users)[, &amp;quot;id_str&amp;quot;])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the future (&lt;a href=&#34;#future&#34;&gt;see below&lt;/a&gt;), with helper functions managing the output of rtweet will become easier.&lt;/p&gt;
&lt;p&gt;Finally, &lt;code&gt;get_followers()&lt;/code&gt; and &lt;code&gt;get_friends()&lt;/code&gt; now return the same columns:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;&amp;gt; colnames(get_followers(&amp;quot;_R_Foundation&amp;quot;))
[1] &amp;quot;from_id&amp;quot; &amp;quot;to_id&amp;quot;  
&amp;gt; colnames(get_friends(&amp;quot;_R_Foundation&amp;quot;))
[1] &amp;quot;from_id&amp;quot; &amp;quot;to_id&amp;quot;  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make it easier to build networks of connections (although you might want to convert screen names to ids or vice versa).&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;more-consistent-interface&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;More consistent interface&lt;/h3&gt;
&lt;p&gt;All paginated functions that don’t return tweets now use a consistent pagination interface (except the premium endpoints).
They all store the “next cursor” in an &lt;code&gt;rtweet_cursor&lt;/code&gt; attribute, which will be automatically retrieved when you use the &lt;code&gt;cursor&lt;/code&gt; argument.
This will make it easier to continue a query you started:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;users &amp;lt;- get_followers(&amp;quot;_R_Foundation&amp;quot;)
users
     
# use `cursor` to find the next &amp;quot;page&amp;quot; of results
more_users &amp;lt;- get_followers(&amp;quot;_R_Foundation&amp;quot;, cursor = users)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;They support &lt;code&gt;max_id&lt;/code&gt; and &lt;code&gt;since_id&lt;/code&gt; to find earlier and later tweets respectively:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Retrieve all the tweets made since the previous request
newer &amp;lt;- search_tweets(&amp;quot;weather&amp;quot;, since_id = tweets)
# Retrieve tweets made before the previous request
older &amp;lt;- search_tweets(&amp;quot;weather&amp;quot;, max_id = tweets)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want more tweets than it is allowed by the rate limits of the API, you can use &lt;code&gt;retryonratelimit&lt;/code&gt; to wait as long as needed:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;long &amp;lt;- search_tweets(&amp;quot;weather&amp;quot;, n = 1000, retryonratelimit = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will keep busy your terminal until the 1000 tweets are retrieved.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;saving-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Saving data&lt;/h3&gt;
&lt;p&gt;An unexpected consequence of returning more data (now matching that returned by the API) is that it is harder to save it in a tabular format.
For instance one tweet might have one media, mention two users and have three hashtags.
There isn’t a simple way to save it in a single row uniformly for all tweets or
it could lead to confusion.&lt;/p&gt;
&lt;p&gt;This resulted in deprecating &lt;code&gt;save_as_csv&lt;/code&gt;, &lt;code&gt;read_twitter_csv&lt;/code&gt; and related functions because they don’t work with the new data structure and it won’t be possible to load the complete data from a csv.
They will be removed in later versions.&lt;/p&gt;
&lt;p&gt;Many users will benefit from saving to RDS (e.g., &lt;code&gt;saveRDS()&lt;/code&gt; or &lt;code&gt;readr::write_rds()&lt;/code&gt;), and those wanting to export to tabular format can simplify the data to include only that of interest before saving with generic R functions (e.g., &lt;code&gt;write.csv()&lt;/code&gt; or &lt;code&gt;readr::write_csv()&lt;/code&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;other-breaking-changes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Other breaking changes&lt;/strong&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Accessibility is important and for this reason if you tweet via &lt;code&gt;post_tweet()&lt;/code&gt; and add an image, gif or video you’ll need to provide the media alternative text.
Without &lt;code&gt;media_alt_text&lt;/code&gt; it will not allow you to post.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;tweet_shot()&lt;/code&gt; has been deprecated as it no longer works correctly.
It might be possible to bring it back, but the code is complex and I do not understand enough to maintain it.
If you’re interested in seeing this feature return, checkout the discussion about this &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/458&#34;&gt;issue&lt;/a&gt; and let me know if you have any suggestions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;rtweet also used to provide functions for data on &lt;code&gt;emojis&lt;/code&gt;, &lt;code&gt;langs&lt;/code&gt; and &lt;code&gt;stopwordslangs&lt;/code&gt;.
These are useful resources for text mining in general - not only in tweets - however they need to be updated to be helpful and would be better placed in other packages, for instance emojis is now on the &lt;a href=&#34;https://cran.r-project.org/package=bdpar&#34;&gt;bdpar package&lt;/a&gt;.
Therefore they are no longer available in rtweet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The functions like &lt;code&gt;suggested_*()&lt;/code&gt; have been removed as they have been broken since 2019.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;easier-authentication&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Easier authentication&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;An exciting part of this release has been a big rewrite of the authentication protocol.
While it is compatible with previous rtweet authentication methods it has also some important new functions which make it easier to work with rtweet and the twitter API in different ways.&lt;/p&gt;
&lt;div id=&#34;different-ways-to-authenticate&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Different ways to authenticate&lt;/h3&gt;
&lt;p&gt;If you just want to test the package, use the default authentication &lt;code&gt;auth_setup_default()&lt;/code&gt; that comes with rtweet.
If you use it for one or two days you won’t notice any problem.&lt;/p&gt;
&lt;p&gt;If you want to use the package for more than a couple of days, I recommend you set up your own token via &lt;code&gt;rtweet_user()&lt;/code&gt;.
It will open a window to authenticate via the authenticated account in your default browser.
This authentication won’t allow you to do everything but it will avoid running out of requests and being rate-limited.&lt;/p&gt;
&lt;p&gt;If you plan to make heavy use of the package, I recommend registering yourself as developer and using one of the following two mechanisms, depending on your plans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Collect data and analyze: &lt;code&gt;rtweet_app()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Set up a bot: &lt;code&gt;rtweet_bot()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Find more information in the &lt;a href=&#34;https://docs.ropensci.org/rtweet/articles/auth.html&#34;&gt;Authentication with rtweet vignette&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;storing-credentials&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Storing credentials&lt;/h3&gt;
&lt;p&gt;Previously rtweet saved each token created, but now non-default tokens are only saved if you ask. You can save them manually via &lt;code&gt;auth_save(token, &#34;my_app&#34;)&lt;/code&gt;.
Bonus, if you name your token as default (&lt;code&gt;auth_save(token, &#34;default&#34;)&lt;/code&gt;) it will be used automatically upon loading the library.&lt;/p&gt;
&lt;p&gt;Further, tokens are now saved in the location output by &lt;code&gt;tools::R_user_dir(&#34;rtweet&#34;, &#34;config&#34;)&lt;/code&gt;, rather than in your home directory.
If you have previous tokens saved or problems identifying which token is which use &lt;code&gt;auth_sitrep()&lt;/code&gt;.
This will provides clues to which tokens might be duplicated or misconfigured but it won’t check if they work.
It will also automatically move your tokens to the new path.&lt;/p&gt;
&lt;p&gt;To check which credentials you have stored use &lt;code&gt;auth_list()&lt;/code&gt; and load them via &lt;code&gt;auth_as(&#34;my_app&#34;)&lt;/code&gt;.
All the rtweet functions will use the latest token loaded with &lt;code&gt;auth_as&lt;/code&gt; (unless you manually specify one when calling it).
If you are not sure which token you are using you can use &lt;code&gt;auth_get()&lt;/code&gt; it will return the token in use, list them or ask you to authenticate.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;other-changes-of-note&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Other changes of note&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;This is a list of other changes that aren’t too big or are not breaking changes but are worthy enough of a mention:&lt;/p&gt;
&lt;div id=&#34;iteration-and-continuation-of-requests&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Iteration and continuation of requests&lt;/h3&gt;
&lt;p&gt;Using cursors, pagination or waiting until you can make more queries is now easier.
For example you can continue previous requests via:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;users &amp;lt;- get_followers(&amp;quot;_R_Foundation&amp;quot;)
users

# use `cursor` to find the next &amp;quot;page&amp;quot; of results
more_users &amp;lt;- get_followers(&amp;quot;_R_Foundation&amp;quot;, cursor = users)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;additions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Additions&lt;/h3&gt;
&lt;p&gt;There is now a function to find a thread of a user.
You can start from any tweet and it will find all the tweets of the thread:
&lt;code&gt;tweet_threading(&#34;1461776330584956929&#34;)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;There is a lot of interest in downloading and keeping track of interactions on Twitter.
The amount of interest is big enough that Twitter is releasing a new API to provide more information of this nature.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;future&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Future&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Twitter API v2 is being released and soon it will replace API v1.
rtweet up to now, including this release, uses API v1 so it will need to adapt to the new endpoints and new data returned.&lt;/p&gt;
&lt;p&gt;First will be the streaming endpoints in November, so expect more (breaking?) changes around those dates if not earlier.&lt;/p&gt;
&lt;p&gt;I would also like to make it easier for users, dependencies and the package itself to handle the outputs.
To this regard I would like to provide some classes to handle the different type of objects it returns.&lt;/p&gt;
&lt;p&gt;This will help avoid some of the current shortcomings.
Specifically I would like to provide functions to make it easier to reply to previous tweets,
extract nested data, and subset tweets and the accompanying user information.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;&lt;strong&gt;Conclusions&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;While I made many breaking changes I hope these changes will smooth future development and help both users and maintainers.&lt;/p&gt;
&lt;p&gt;Feel free to ask on the &lt;a href=&#34;https://discuss.ropensci.org/tag/rtweet&#34;&gt;rOpenSci community&lt;/a&gt; if you have questions about the transition or find something amiss.
Please let me know! It will help me prioritize which endpoints are more relevant to the community.
(And yes, the academic archive endpoint is on the radar.)&lt;/p&gt;
&lt;p&gt;It is also possible that I overlooked something and I thought the code is working when it isn’t.
For example, after several months of changing the way the API is parsed, several users found it wasn’t handling some elements.
Let me know of such or similar cases and I’ll try to fix it.&lt;/p&gt;
&lt;p&gt;In case you find a bug, check the open issues and if it has not already been reported, open an &lt;a href=&#34;https://github.com/ropensci/rtweet/issues/&#34;&gt;issue on GitHub&lt;/a&gt;.
Don’t forget to make a &lt;a href=&#34;https://cran.r-project.org/web/packages/reprex/readme/README.html&#34;&gt;reprex&lt;/a&gt; and if possible provide the id of the tweets you are having trouble with.
Unfortunately it has happened that when I came to look at a bug I couldn’t reproduce it as I wasn’t able to find the tweet which caused the error.&lt;/p&gt;
&lt;p&gt;This release includes contributions from Hadely Wicham, Bob Rudis, Alex Hayes, Simon Heß, Diego Hernán, Michael Chirico, Jonathan Sidi, Jon Harmon, Andrew Fraser and many other that reported bugs or provided feedback.
Many thanks all for using it, your interest to keep it working and improving rtweet for all.&lt;/p&gt;
&lt;p&gt;Finally, you can read the whole &lt;a href=&#34;https://docs.ropensci.org/rtweet/news/index.html&#34;&gt;NEWS online&lt;/a&gt; and the examples.&lt;/p&gt;
&lt;p&gt;Happy tweeting!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgements&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Acknowledgements&lt;/h2&gt;
&lt;p&gt;This is a repost of the &lt;a href=&#34;https://ropensci.org/blog/2022/07/21/rtweet-1-0-0/&#34;&gt;entry for rOpenSci&lt;/a&gt;.
The post was edited and improved by Yanina Bellini Saibene and Steffi LaZerte, the community manager and assistant. Many thanks&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes footnotes-end-of-document&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;Specifically these: &lt;code&gt;get_favorites()&lt;/code&gt;, &lt;code&gt;get_favorites_user()&lt;/code&gt;, &lt;code&gt;get_mentions()&lt;/code&gt;,
&lt;code&gt;get_my_timeline()&lt;/code&gt;, &lt;code&gt;get_retweets()&lt;/code&gt;, &lt;code&gt;get_timeline()&lt;/code&gt;, &lt;code&gt;get_timeline_user()&lt;/code&gt;,
&lt;code&gt;lists_statuses()&lt;/code&gt;, &lt;code&gt;lookup_statuses()&lt;/code&gt;, &lt;code&gt;lookup_tweets()&lt;/code&gt;, &lt;code&gt;search_30day()&lt;/code&gt;,
&lt;code&gt;search_fullarchive()&lt;/code&gt;, &lt;code&gt;search_tweets()&lt;/code&gt;, &lt;code&gt;tweet_shot()&lt;/code&gt; and &lt;code&gt;tweet_threading()&lt;/code&gt;.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Reasons why packages are archived on CRAN</title>
      <link>https://llrs.dev/post/2021/12/07/reasons-cran-archivals/</link>
      <pubDate>Tue, 07 Dec 2021 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/post/2021/12/07/reasons-cran-archivals/</guid>
      <description>


&lt;p&gt;On the Repositories working group of the R Consortium Rich FitzJohn posted &lt;a href=&#34;https://github.com/RConsortium/r-repositories-wg/issues/8#issuecomment-979486806&#34;&gt;a comment&lt;/a&gt; to &lt;a href=&#34;https://cran.r-project.org/src/contrib/PACKAGES.in&#34;&gt;a file&lt;/a&gt; that seems to be were the CRAN team stores and uses to check the package history.&lt;/p&gt;
&lt;p&gt;The structure is not defined anywhere I could find (I haven’t looked much to be honest).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Package: &amp;lt;package name&amp;gt;
X-CRAN-Comment: Archived on YYYY-MM-DD as &amp;lt;reason&amp;gt;.
X-CRAN-History: Archived on YYYY-MM-DD as &amp;lt;reason&amp;gt;.
  Unarchived on YYYY-MM-DD.
  .
  &amp;lt;Optional clarification of archival reason&amp;gt;
&amp;lt;Optional fields like License_restricts_use, Replaced_by, Maintainer: ORPHANED, OS_type: unix&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I think the X-CRAN-Comment is what appears on the website of an archived package, like on &lt;a href=&#34;https://cran.r-project.org/package=radix&#34;&gt;radix package&lt;/a&gt;. However, other comments on the website do not appear on that file.&lt;/p&gt;
&lt;p&gt;In addition, the file doesn’t have some records of archiving and unarchiving of some packages, but there are old records from 2013 or before to now. But we can use it to see understand what are the &lt;em&gt;reasons&lt;/em&gt; of archiving packages, which seems to be the main purpose of the file.&lt;/p&gt;
&lt;div id=&#34;the-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;The data&lt;/h1&gt;
&lt;p&gt;First step is read the record.
As it seems that it has some &lt;code&gt;key: value&lt;/code&gt; structure similar to DESCRIPTION file of packages it seems it is a DCF format: Debian Control File format which is easy to read with the &lt;code&gt;read.dcf&lt;/code&gt; function.&lt;/p&gt;
&lt;div id=&#34;exploring&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exploring&lt;/h2&gt;
&lt;p&gt;A brief exploration of the data:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
comment
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
history
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3612
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2345
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
434
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
70
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages have either comments or history but relatively few both.
I’m not sure when either of them is used, as I would expect that all that have history would have a comment.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Replaced_by
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6360
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
101
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages are simply replaced by some other package.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Maintainer
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
6366
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
95
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Most of the packages that have a Maintainer field are orphaned/archived.
Does it mean that all the others are not orphaned?&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;extracting-reasons&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Extracting reasons&lt;/h2&gt;
&lt;p&gt;Now that it is in R data structure, we can extract the relevant information, dates, type of action and reasons for each archivation event.
I use &lt;code&gt;strcapture&lt;/code&gt; for this task with a regex to extract the action, the date and the explanation it migh have.&lt;/p&gt;
&lt;p&gt;I don’t know how the file is written probably it is a mix of automated tools and manual editing so there isn’t a simple way to collect all the information in a structured way.
Simply because the structure has been changing along the years as well as the details of what is stored has changed, or there are missing events.
However, the extracted information should be enough for our purposes.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Action
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Events
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
archived
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
7096
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
orphaned
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
341
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
removed
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
113
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
renamed
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
replaced
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
unarchived
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2973
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As expected the most common recorded event are archivations, but there are some orphaned packages and even some removed packages.
Also note the number of orphaned packages is greater than those with the Maintainer field, supporting my theory that the format has changed and that this shouldn’t be taken as an exhaustive and complete analysis of archivations.&lt;/p&gt;
&lt;p&gt;How are they along time?&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/index.en_files/figure-html/plots_df-1.png&#34; width=&#34;864&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Even if there are some events recorded from 2009 it seems that this file has been more used more recently (last commit related to &lt;a href=&#34;https://github.com/wch/r-source/blame/trunk/src/library/tools/R/QC.R#L7778&#34;&gt;this was on 2015&lt;/a&gt;).
I know that there are some old events not recorded on the file, because there are some packages currently present on CRAN that they had been archived but do not have an unarchived action, so conversely it could happen.
So, this doesn’t necessarily mean that there are currently more packages archived from CRAN. But it is a clear indication that now at least there is a more accurate record of archived packages on this file.&lt;/p&gt;
&lt;p&gt;Another source of records of archived packages might be &lt;a href=&#34;http://dirk.eddelbuettel.com/cranberries/cran/removed/&#34;&gt;cranberries&lt;/a&gt;. It would be nice to compare this file with the records on the database there.&lt;/p&gt;
&lt;p&gt;Now that most of the package events are collected and we have the reasons of the actions, we can explore and classify the reasons.
Using some simple regex I explore for key words or sentences.&lt;/p&gt;
&lt;p&gt;We can look at the most frequent error reasons for archiving packages, patterns I found with more than 100 cases:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://llrs.dev/post/2021/12/07/reasons-cran-archivals/index.en_files/figure-html/reasons_top-1.png&#34; width=&#34;864&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The most frequent error is that errors are not corrected or checks, even when there are reminders.&lt;br /&gt;
Next are the packages archived because they depend on other packages already not on CRAN.&lt;br /&gt;
There are some packages that are replaced by others and some maintainers might not want to continue supporting the package when they receive a message from CRAN about fixing an error.&lt;/p&gt;
&lt;p&gt;Policy violation makes to the top 5 but with less than 500 events.
Dependencies problems are the sixth cause, followed by email errors (bouncing, incorrect email…) and then come very sporadic problems about license, not fixing on updates of R, authorship problems or requests from authors.&lt;/p&gt;
&lt;p&gt;Some of these errors happen at the same time for each event, but grouping these reasons together we get a similar table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
package_not_corrected
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
request_maintainer
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
dependencies
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
other
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
events
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4366
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1530
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
767
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
374
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
15
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
13
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Surprisingly the second most frequent group of archiving actions are due to many different reasons.
This is probably the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pareto_principle&#34;&gt;Pareto’s principle&lt;/a&gt; in action because they are around 15% of the archiving events but the causes are very diverse between them.&lt;/p&gt;
&lt;p&gt;However, if we look at the packages which were archived (not at the request of maintainers), most of them just happen once:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Events
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
packages
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5304
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
594
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
115
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
4
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
31
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
5
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
8
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This suggests that once a package is archived maintainers do not make the effort to put it back on CRAN except on very few cases were there are multiple attempts.
To check we can see the current available packages and see how many of those are still present on CRAN:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
CRAN
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
Packages
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Proportion
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
no
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3869
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
64%
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
yes
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
2183
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
36%
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Many packages are currently on CRAN despite their past archivation but close to 64% are currently not on CRAN.&lt;/p&gt;
&lt;p&gt;Almost all that are on CRAN have now no &lt;code&gt;X-CRAN-Comment&lt;/code&gt;, except for a few:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
Package
&lt;/th&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
X-CRAN-Comment
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
geiger
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
&lt;p&gt;Orphaned and corrected on 2022-05-09.&lt;/p&gt;
Repeated notifications about USE_FC_LEN_T were ignored.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
alphahull
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Versions up to 2.3 have been removed for mirepresentation of authorship.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
udunits2
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Orphaned on 2022-01-06 as installation problems were not corrected.
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
bibtex
&lt;/td&gt;
&lt;td style=&#34;text-align:left;&#34;&gt;
Orphaned and corrected on 2020-09-19 as check problems were not corrected in time.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;CRAN team might have missed these few packages and didn’t move the comments to X-CRAN-history.&lt;/p&gt;
&lt;p&gt;There are some packages that are not archived that don’t have a CRAN-history happens too, but they usually have other fields changed.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;discussion&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Discussion&lt;/h1&gt;
&lt;p&gt;Most packages archived on CRAN are due to the maintainers not correcting errors found on the package by CRAN checks.
It is clear that the checks that CRAN help packages to have a high quality but it has high cost on the maintainers and specially on CRAN team.
Maintainers don’t seem to have enough time to fix the issues on time.
And the CRAN team sends personalized reminders to maintainers and sometimes patches to the packages.&lt;/p&gt;
&lt;p&gt;Although the desire to have packages corrected and with no issues is the common goal there are few options on light of these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Be more restrictive&lt;/p&gt;
&lt;p&gt;Prevent a package to be accepted if it breaks dependencies or archive packages when they fail checks.
This will make it harder to keep packages on CRAN but would lift some pressure on the CRAN team.
This would go against the current on other languages repositories, which often they don’t check the packages/modules and even have less restrictions on dependencies (so it might be an unpopular decision).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Be more permissive:&lt;/p&gt;
&lt;p&gt;One option would be to allow for more time for maintainers to fix issues. I haven’t find any report of how long does it take for a package since an error to a fix on CRAN but often it is quite long.
I have seen packages with a warning for months if not years and they weren’t archived from CRAN.&lt;/p&gt;
&lt;p&gt;Maybe if users get a warning on installing packages that a package or one of its dependencies is not clear on all CRAN checks (without error or warnings).
This might help to make users more conscious of their dependencies but this might add pressure to maintainers who already don’t have enough time to fix the problems of their packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide more help or tools to maintainers&lt;/p&gt;
&lt;p&gt;Another option is to provide a mechanism for maintainers to receive help or fix the package.
Currently CRAN requires that new packages that break dependencies to give enough notice in advance to other maintainers to fix their package.
On &lt;a href=&#34;https://stat.ethz.ch/mailman/listinfo/r-package-devel&#34;&gt;R-pkg-devel mailing list&lt;/a&gt; there are often requests for help on submitting and fixing some errors detected by CRAN checks which often result on other maintainers sharing their solutions for the same problem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There high percentage of packages that once archived do not come back to CRAN might be a good place to start helping maintainers and an opportunity for users to step in and help maintainers of packages they have been using.
There is need for something else? How would that work?&lt;/p&gt;
&lt;p&gt;At the same time it is admirable that after so many years there are few errors on the data.
However, the archival process might be a good process to automate, providing the reason on the webpage and add it to X-CRAN-Comment and moving the comments to X-CRAN-History once it is unarchived.
Knowing more about how these actions are performed by the CRAN team and how the community could help on the process will be beneficial to all.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This blog was updated on 2022/01/02 to improve the parsing of actions and dates on packages. Resulting on a change on the first plot to include unarchived which slightly modified the second plot of reasons why packages are archived. This overall only affected the numbers of the plots not the conclusions or discussion.&lt;/p&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;details&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       Ubuntu 20.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2022-05-09
##  pandoc   2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package      * version date (UTC) lib source
##  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
##  blogdown       1.9     2022-03-28 [1] CRAN (R 4.2.0)
##  bookdown       0.26    2022-04-15 [1] CRAN (R 4.2.0)
##  bslib          0.3.1   2021-10-06 [1] CRAN (R 4.2.0)
##  cli            3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
##  colorspace     2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
##  ComplexUpset * 1.3.3   2021-12-11 [1] CRAN (R 4.2.0)
##  crayon         1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
##  DBI            1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
##  digest         0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
##  dplyr        * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
##  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
##  evaluate       0.15    2022-02-18 [1] CRAN (R 4.2.0)
##  fansi          1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
##  farver         2.1.0   2021-02-28 [1] CRAN (R 4.2.0)
##  fastmap        1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
##  generics       0.1.2   2022-01-31 [1] CRAN (R 4.2.0)
##  ggplot2      * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
##  glue           1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
##  gtable         0.3.0   2019-03-25 [1] CRAN (R 4.2.0)
##  highr          0.9     2021-04-16 [1] CRAN (R 4.2.0)
##  htmltools      0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
##  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
##  jsonlite       1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
##  knitr          1.39    2022-04-26 [1] CRAN (R 4.2.0)
##  labeling       0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
##  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
##  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
##  munsell        0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
##  patchwork      1.1.1   2020-12-17 [1] CRAN (R 4.2.0)
##  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
##  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
##  purrr          0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
##  R6             2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
##  rlang          1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
##  rmarkdown      2.14    2022-04-25 [1] CRAN (R 4.2.0)
##  rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.2.0)
##  sass           0.4.1   2022-03-23 [1] CRAN (R 4.2.0)
##  scales         1.2.0   2022-04-13 [1] CRAN (R 4.2.0)
##  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
##  stringi        1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
##  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
##  tibble         3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
##  tidyselect     1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
##  utf8           1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
##  vctrs          0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
##  withr          2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
##  xfun           0.30    2022-03-02 [1] CRAN (R 4.2.0)
##  yaml           2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
## 
##  [1] /home/lluis/bin/R/4.2.0/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Workshop: Creating packages</title>
      <link>https://llrs.dev/talk/workshop-creating-packages/</link>
      <pubDate>Wed, 20 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://llrs.dev/talk/workshop-creating-packages/</guid>
      <description>
&lt;script src=&#34;https://llrs.dev/talk/workshop-creating-packages/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;I was invited by Francis Mensha on R-devel slack to give this workshop on April 29th for the R User Group Ghana.&lt;/p&gt;
&lt;p&gt;First time teaching R stats and remote I expected people to follow the workshop and run the code and then ask questions but I was asked to do it myself. Hope it was easy enough to follow.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
