Intermingle R and Python in R Markdown

This blog post shows how to use R and Python inside one document and how to easily intermingle code, plain text and graphics and share data between R and Python.

Why R Markdown?

R Markdown is great to make reproducible reports or presentations and allows to intermingle code, plain text and figures. It comes with a wide variety of output formats, e.g. pdf, html, word or latex beamer slides. You can even embed shiny apps in a html report. For example this document was generated with R Markdown.

In the following we will show how to combine R and Python code chunks inside one R Markdown document and how to share data between these chunks. Therefore R Markdown is a great tool for data scientists who wants to make the most out of the tools R and Python offer for data science.

R code chunks

Using R code in R Markdown is easy:

library(ggplot2)
head(iris)
ggplot(iris, aes(Sepal.Length, Sepal.Width, col = Species)) +
  geom_point(size = 3) + theme_bw()

Python code chunks

We can also include Python code chunks. We will use knitr::knit_engines$set(python = reticulate::eng_python) in the setup to render the Python code. Make sure to install the newest reticulate package via devtools::install_github("rstudio/reticulate").

import numpy as np
x = np.arange(10)
y = x**2

And the objects are still available in the next Python chunk.

print(y)

We can also include plots.

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

Share data between R and Python chunks

To share objects between Python and R chunks we need to save the files at the end of a chunk and load again in the next chunk. The best way to do this is probably feather.

Get the iris data set into Python using feather. First save it in an R chunk.

library(feather)
write_feather(iris, "iris.feather")

Then we can load the file in a Python chunk.

import feather
iris = feather.read_dataframe("iris.feather")
print(iris.head())

To share data frames from a Python chunk to an R chunk you can run

import pandas as pd
d = {'col1': [1, 2, 6, 8], 'col2': ["first", "lab", "d", "car"]}
df = pd.DataFrame(data = d)
feather.write_dataframe(df, "df.feather")
print(df.head())

and load it in an R chunk with

df <- read_feather("df.feather")
df
Contents