An introduction to using Rcpp modules in an R package

 

 

Introduction

The aim of this post is to provide readers with a minimal example demonstrating the use of Rcpp modules within an R package. The code and all files for this example can be found on https://github.com/jmcurran/minModuleEx.

What are Rcpp Modules?

Rcpp modules allow R programmers to expose their C++ class to R. By “expose” I mean the ability to instantiate a C++ object from within R, and to call methods which have been defined in the C++ class definition. I am sure there are many reasons why this is useful, but the main reason for me is that it provides a simple mechanism to create multiple instances of the same class. An example of where I have used this is my multicool package which is used to generate permutations of multisets. One can certainly imagine a situation where you might need to generate the permutations of more than two multisets at the same time. multicool allows you to do this by instantiating multiple multicool objects.

The Files

I will make the assumption that you, the reader, know how to create a package which uses Rcpp. If you do not know how to do this, then I suggest you look at the section entitled “Creating a New Package” here on the Rstudio support site. Important: Although it is mentioned in the text, the image displayed on this page does not show that you should change the Type: drop down box to Package w/ Rcpp.

Creating a package with Rcpp

This makes sure that a bunch of fields are set for you in the DESCRIPTION file that ensure Rcpp is linked to and imported.

There are five files in this minimal example. They are

  • DESCRIPTION
  • NAMESPACE
  • R/minModuleEx-package.R
  • src/MyClass.cpp
  • R/zzz.R

I will discuss each of these in turn.

DESCRIPTION

This is the standard DESCRIPTION file that all R packages have. The lines that are important are:

Depends: Rcpp (>= 0.12.8)
Imports: Rcpp (>= 0.12.8)
LinkingTo: Rcpp
RcppModules: MyModule

The imports and LinkingTo lines should be generated by Rstudio. The RcppModules: line should contain the names(s) of the module(s) that you want to use in this package. I have only one module in this package which is unimaginatively named MyModule. The module exposes two classes, MyClass and AnotherClass.

NAMESPACE and R/minModule-Ex.R

The first of these is the standard NAMESPACE file and it is automatically generated using roxygen2. To make sure this happens you need select Project Options… from the Tools menu. It will bring up the following dialogue box:

Project Options

Select the Build Tools tab, and make sure that the Generate documentation with Roxygen checkbox is ticked, then click on the Configure… button and make sure that that all the checkboxes that are checked below are checked:

Configuring Roxygen

Note: If you don’t want to use Roxygen, then you do not need the R/minModuleEx-package.R file, and you simply need to put the following three lines in the NAMESPACE file:

export(AnotherClass)
export(MyClass)
useDynLib(minModuleEx)

You need to notice two things. Firstly this NAMESPACE explicitly exports the two classes MyClass and AnotherClass. This means these classes are available to the user from the command prompt. If you only want access to the classes to be available to R functions in the package, then you do not need to export them. Secondly, as previously noted, if you are using Roxygen, then these export statements are generated dynamically from the comments just before each class declaration in the C++ code which is discussed in the next section. The useDynLib(minModuleEx) is generated from the line

#' @useDynLib minModuleEx

in the R/minModuleEx-package.R file.

src/MyClass.cpp

This file contains the C++ class definition of each class (MyClass and AnotherClass). There is nothing particularly special about these class declarations, although the comment lines before the class declarations,

//' @export MyClass
class MyClass{

and

//' @export AnotherClass
class AnotherClass{

, generate the export statements in the NAMESPACE file.

This file also contains the Rcpp Module definition:

RCPP_MODULE(MyModule) {
  using namespace Rcpp;

  class_<MyClass>( "MyClass")
    .default_constructor("Default constructor") // This exposes the default constructor
    .constructor<NumericVector>("Constructor with an argument") // This exposes the other constructor
    .method("print", &MyClass::print) // This exposes the print method
    .property("Bender", &MyClass::getBender, &MyClass::setBender) // and this shows how we set up a property
  ;

  class_<AnotherClass>("AnotherClass")
    .default_constructor("Default constructor")
    .constructor<int>("Constructor with an argument")
    .method("print", &AnotherClass::print)
  ;
}

In this module I have:

  1. Two classes MyClass and AnotherClass.
  2. Each class class has:
    • A default constructor
    • A constructor which takes arguments from R
    • A print method
  3. In addition, MyClass demonstrates the use of a property field which (simplistically) provides the user with simple retrieval from and assignment to a scalar class member variable. It is unclear to me whether it works for more data types, but anecdotally, I had no luck with matrices.

R/zzz.R

As you might guess from the nonsensical name, it is not essential to call this file zzz.R. The name comes from a suggestion from Dirk Eddelbuettel. It contains a single, but absolutely essential line of code

loadModule("MyModule", TRUE)

This code can actually be in any of the R files in your package. However, if you explicitly put it in R/zzz.R then it is easy to remember where it is.

Using the Module from R

Once the package is built and loaded, using the classes from the module is very straightforward. To instantiate a class you use the new function. E.g.

m = new(MyClass)
a = new(AnotherClass)

This code will call the default constructor for each class. If you want to call a constructor which has arguments, then they can be added to the call to new. E.g.

set.seed(123)
m1 = new(MyClass, rnorm(10))

Each of these objects has a print method which can be called using the $ operator. E.g.

m$print()
a$print()
m1$print()

The output is

> m$print()
1.000000 2.000000 3.000000
> a$print()
0
> m1$print()
1.224082 0.359814 0.400771 0.110683 -0.555841 1.786913 0.497850 -1.966617 0.701356 -0.472791

The MyClass class has a module property – a concept also used in C#. A property is a scalar class member variable that can either be set or retrieved. For example, m1 has been constructed with the default value of bBender = FALSE, however we can change it to TRUE easily

m1$Bender = TRUE
m1$print()

Now our object m1 behaves more like Bender when asked to do something 🙂

> m1$print()
Bite my shiny metal ass!

Hopefully this will help you to use Rcpp modules in your project. This is a great feature of Rcpp and really makes it even more powerful.

Share Button

Leave a Reply