Accessing REDCap from R

Wed, 08 Feb 2023 00:00:00 +0000

In this post, I want to summarize some of the packages to connect to REDCap. For those who don’t know, REDCap is a database designed for clinical usage, which allows easy data collection of patients’ responses by clinicians and interactions with the patients via surveys.

It has specific features such as scheduling surveys sent to patients, compatibility with tablets and mobile phones for data entry while visiting patients, grouping data in instruments (for repeating the same questions multiple times), multiple choice and check buttons, and different arms (like paths for patients). Most importantly is relatively easy to manage by clinical administrators.

In CRAN there are ~11 packages mentioning it at the time of writing it. The purpose of this post is to help decide which packages can be helpful in which situations. This post won’t be a deep analysis or comparison of capabilities, it describes some of the best and worse features of each package.

REDCapR

REDCapR is the official package to connect to the database. It allows you to read, write and filter the requests. It has some security-related functions.

REDCapTidieR

REDCapTidieR is a package that provides summaries of tables and helps with nested tibbles data by arm. It depends on REDCapR.

tidyREDCap

tidyREDCap is a package that simplifies the tables for instruments and choose-all or choose-one question types. It is easy to make tables and it depends on REDCapR. It requires the first and last columns to make instruments.

Screenshot of a design with several instruments in a single arm (from https://www.project-redcap.org/)

REDCapExporter

REDCapExporter is a package to build a data package from a database for redistribution. It does not depend on REDCapR.

redcapAPI

redcapAPI is a package for making data accessible and analysis-ready as quickly as possible with huge documentation in a wiki but has no vignette or examples and it does not depend on REDCapR.

REDCapDM

REDCapDM is a package that provides functions to read and manage REDCap data and identify missing or extreme values as well as transform the data provided by the API. It depends on REDCapR.

ReviewR

ReviewR is a package that creates a shiny website with data from the database to explore it. It uses the REDCapR to connect to your instance.

rccola

rccola is a package to provide a secure connection to the database but it doesn’t provide any handling of the data. It uses redcapAPI to connect to the database.

Other packages

Other packages mention REDCap:

nmadb: which implements its own connection procedure for a specific REDCap database of network meta-analyses.
distcomp: Allows to do computation on a distributed data also in REDCap.
cgmanalysis: which mentions that data produced is compatible with REDCap.

Conclusion

I’m sure that many packages briefly described here can do much more than what I understood from a glance at their documentation and DESCRIPTION.

Most packages provide some data for the examples (and probably tests), while others do not. This is a technical problem that might impact users if there are no examples in the functions.

REDCapR is used by most packages to access the database, but most of the packages focus on transforming the data provided by the API (or data exported) or the exported data. It highlights that the data exported is useful but that depending on the preferences of the users it needs to be transformed for easy usage.

Concepts around open source/free software

Wed, 16 Mar 2022 00:00:00 +0000

This post is to lay out some concepts I picked up after reading “The Making and maintenance of open source software”. Having these concepts in mind might help me on my contributions to R and OSS in general. I write these thought to come back to in future posts.

The book classify projects by two axis, contribution growth and user growth:

	High user growth	low user growth
High contributor growth	Federations	Clubs
Low contributor growth	Stadiums	Toys

And classify projects according to the following characteristics :

Technical
Support
Ease of participation
User adoption
Contributor growth

Code seems like a common good which require the following characteristics:

Intrinsic motivation
Modular
Granular
Low cost of coordination

On the author’s opinion only maintainers are interested in the success of the whole community and need to make trade off between different interest of the community around the project.

Motivation is very important and I classify based on the source of motivation and the sign of it:

	Positive	Negative
Intrinsic	Learn skills	Burn out
Extrinsic	Social benefits	Friction, or lack of feedback

Following the book, contributors can be grouped in two:

Invested: Lurk before making a contribution, learn the quirks of the community
Casual: Adding value to themselves and other

Contributors might spend many time learning about the community before making their first contribution (or show themselves). That’s why only knowing if this is the first contribution of someone doesn’t mean they will continue contributing on the project.

Users, can be classified in two groups: passive, they use the software and nothing else, or active. Active users might do one of the following:

Educate others: write a blog post, or material
Spread the word: Announce they use the software
Support: Solve other’s users questions
Fill bug reports

The health of the project depends on the popularity dependencies and active and future maintenance of the software.

However, the book says that one contributor is not the same as the other. For instance removing a maintainer causes more harm than a casual contributor.

The source of this is that software is like a puppy. The value of the code is how live it is, static code has null value. but once it is being used it is very valuable.

For this reason the maintenance costs once there are users is very high. However, in general there are few ways to know how many users does a piece of software have.

This produces marginal costs to maintainer, which are driven by how are these goods:

	Excludable	Non-Excludable
Rivalrous	Private goods	Commons good
Non-Rivalrous	Club goods	Public goods

Costs are mainly attention from the maintainers from the users and contributors. Users are like a cars in a highway initially there is no problem, but at high levels of traffic adding new lanes don’t solve traffic jam.

However, the cost increase with more request, the bandwidth to download software and hosting

More users leads to more requests, which lead to a competition for maintainers to do less proactive work and do more reactive work.

This leads to start on very simple organiztion and evolve to more disorganized complexity and then to a organized complexity to just cope with the costs of the project. On this organization complexity relationships with maintainers become important.

Value = usage+dependencies-maintenance+substitutability+switching cost+enabling while Cost = development +maintenance+attention.

The common (and scarce) good is the attention of both maintainers and developers. Which requires judgement call on which kind of requests dedicate their time: extractive or non-extractive requests.

The benefit for maintainers once the reputation/recognition is enough is almost non existent.

The book cites several communities Python, ruby, Linux, javascript, java, but I don’t think they used R community as a source. So what are the implication of these concepts to the R community? How do we help maintainers to keep up with their work or let in new maintainers?

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Ubuntu 20.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2022-03-16
##  pandoc   2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.8.1   2022-02-19 [1] Github (rstudio/blogdown@9af7733)
##  bookdown      0.24    2021-09-02 [1] CRAN (R 4.1.2)
##  bslib         0.3.1   2021-10-06 [1] CRAN (R 4.1.2)
##  cli           3.2.0   2022-02-14 [1] CRAN (R 4.1.2)
##  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.2)
##  evaluate      0.15    2022-02-18 [1] CRAN (R 4.1.2)
##  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.2)
##  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.1.2)
##  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.1.2)
##  knitr         1.37    2021-12-16 [1] CRAN (R 4.1.2)
##  magrittr      2.0.2   2022-01-26 [1] CRAN (R 4.1.2)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
##  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.1.2)
##  rmarkdown     2.13    2022-03-10 [1] CRAN (R 4.1.2)
##  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.2)
##  sass          0.4.0   2021-05-12 [1] CRAN (R 4.1.2)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
##  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.2)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.2)
##  xfun          0.30    2022-03-02 [1] CRAN (R 4.1.2)
##  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.1.2)
## 
##  [1] /home/lluis/bin/R/4.1.2/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

software | B101nfo