R Markdown and undergraduates

I am seriously considering the introduction of R Markdown for assignments in our second year statistics course. The folks at RStudio have made some great improvements in the latest version of R Markdown (R Markdown V2), which allow you to add a Markdown document template to your R package, which in turn does things like let you provide a document skeleton for the user with as much information as you like, link CSS files (if you are doing HTML), and specify the output document format as well. The latter is an especially important addition to RStudio.

The lastest version of RStudio incorporates Pandoc which is a great format translation utility (and probably more) written by John Macfarlane. It is an important addition to RStudio because it makes it easy to author documents in Microsoft Word, as well as HTML, LaTeX, and PDF. I am sure that emphasizing the importance having the option to export to Word will cause some eye-rolling and groans, but I would remind you that we are teaching approximately 800 undergrads a year in this class, most of who will never ever take another statistics class again, and join a workforce where Microsoft Word is the dominant platform. I like LaTeX too (I do not think I will ever write another book ever again in Word), but it is not about what I like. I should also mention that there are some pretty neat features in the new R Markdown like authoring HTML slides in ioslides format, or PDF/Beamer presentations, and creating HTML documents with embedded Shiny apps (interactive statistics apps).

I think on the whole the students should deal with this pretty well, especially since they can tidy up their documents to their own satisfaction in Word — not saying that RStudio produces messy documents, but rather that the facility to edit post rendering is available.

Help?

However there is one stumbling block that I hope my readers might provide some feedback on — the issue of loading data. My class is a data analysis class. Every assignment comes with its own data sets. The students are happy, after a while, using read.csv() or read.table in conjunction with file.choose(). However, from my own point of view, reproducible research documents with commands that require user input quickly become tedious because you tend to compile/render multiple times whilst getting your code and your document right. So we are going to have to teach something different. As background, our institution has large computing labs that any registered student can use. The machines boot in either Linux or Windows 7 (currently, and I do not think that is likely to change soon given how much people loathe Windows 8 and what a headache it is for IT support). There is moderate market penetration of Apple laptops in the student body (I would say around 10%). So here is my problem — we have to teach the concept of file paths to a large body of students who on the whole do not have this concept in their skill set and who will find it foreign/archaic/voodoo. They will also regard this as another burdensome thing to learn on top of a whole lot of other things they do not want to learn like R and R Markdown. To make things worse, we have to deal with file paths over multiple platforms.

My thoughts so far are:

  • Making tutorial videos
  • Providing the data for each assignment in an R package that is loaded at the start of the document
  • Providing code in the template document that reads the data from the web

I do not really like the last two options as they let the students avoid learning how to read data into R. Obviously this is not a problem for those who do not go on, but it shifts the burden for those who do. So your thoughts please.

Update

One option that has sort of occurred to me before is that in the video I could show how the fully qualified path name to a file can be obtained using file.choose() and then then students could simply copy and paste that into their R code.

Share Button

6 thoughts on “R Markdown and undergraduates”

  1. Have you tried to teach them the meaning of a working directory? After they understand this concept, use relative paths from now on. Or to make it even easier, create projects in RStudio. When you open an RStudio project, the working directory is automatically set to the root directory of the project. Then live in this small universe and make the world self-contained by using relative paths only.

    file.choose() is just too bad for reproducible research, since its returned value is unpredictable. Any functions that require human interactions should probably be avoided in R Markdown documents.

    1. The concept of a working directory is what I would have to teach them Yihui. I think a project is probably too heavyweight for an undergrad class that mostly will not go on to do further statistics. I am thinking about a three stage process where I get them to save the data file to a particular location, save their Markdown document to the same location and then set the working directory to that location. I do expect a moderate amount of confusion however.

      Whilst I have your attention however — the documentation on markdown.rstudio.com regarding the way metadata is extracted and used is quite sparse. For example I would like some control over how the date field is used. Any pointers?

  2. I had great success using RMarkdown last year in an ANOVA class. I solved the data entry problem by using read.csv(“http://website/data.csv”). That is, I put all the data on a website and let R download the file each time.

    It requires an internet connection, but that does not seem to be a problem these days. Students have personal web pages, so they can out their own data there for you to grab. Worked great for me.

    1. Thanks Michael. I have thought about that – I used that mechanism a book I wrote. However, I do want the students to be able to read a file from disk when they leave my course 🙂

  3. I find myself writing code like this at the start of a data analysis script:
    wbf <- # file.choose()
    "C:\\Files\\Consulting\\David\\Rift Valley\\Probe.csv"
    #"C:\\Users\\maj\\Documents\\Consulting\\David\\Rift Valley\\Probe.csv"
    wbdf <- read.csv(wbf, header=TRUE)

    I usually begin with just the file.choose() and then replace that with the pathname.

    Here by moving the #'s around I can find the data on two of my machines, or just look for it all over again. The old pathnames jog my memory about what I was doing and where. But your "update" idea is probably simpler for teaching.

    1. There are a number of things that we sweep under the carpet Murray as “too hard.” File paths are not rocket science, and I think that is the way that I am going to approach this. Namely, “here is something I expect you to learn. Here is how it works, and here are some examples.”

Leave a Reply