What is ActivePapers?¶
ActivePapers is an on-going project developped by Konrad Hinsen whose aim is to make a computational calculation reproducible and publishable. You can find all the details of the project here.
Design?¶
To make computational calculation reproducible and publishable, it is proposed to create an ActivePaper file that will contain all your data and code in one unique file. Again, see the details here. ActivePapers implementations use HDF5 as the underlying storage format, that means that an ActivePaper is an HDF5 file. One advantage is that you can inspect the datasets in an ActivePaper with generic HDF5 tools like HDFView.
How do we create an ActivePaper file?¶
To create our ActivePaper file, we will use the ActivePapers Python edition. You can find an installation guide here, it should be pretty straightforward.
A more comprehensive tutorial already exists. What I proposed here is only to create a very simple ActivePaper and show how to extract the data.
With this ActivePaper, I want to create 2 arrays, add these 2 arrays and generate a plot. So the first thing to do is to write the Python code that will do these operations.
Creating the arrays¶
I write this code in the file 'create_data.py':
from activepapers.contents import data
import numpy as np
# Create groups for the input data
inputs = data.create_group('inputs')
# creating a numpy array
arr = np.arange(100)
# Adding the numpy array to the groups
inputs['dataset_1'] = arr
inputs['dataset_2'] = arr
Download create_data.py
Adding the arrays¶
I write this code in the file 'adding_data.py':
from activepapers.contents import data
import numpy as np
# Create group for the output data
output = data.create_group('output')
input_data = data['inputs']
# Adding the 2 inputs array
arr_1 = input_data['dataset_1'][:].astype(np.int)
arr_2 = input_data['dataset_2'][...].astype(np.int)
sum = arr_1 + arr_2
# Writing the output
output['sum'] = sum
Download adding_data.py
Plot¶
I write this code in the file 'plot.py':
import matplotlib
matplotlib.use('PDF') # if I don't use it, the pdf produced is corrupted?
import matplotlib.pyplot as plt
import numpy as np
from activepapers.contents import data, open_documentation
def plot(x, y, fontsize=19, output='plot.pdf'):
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(1,1,1)
# data
ax.plot(x, y, '-', linewidth=2.0, label='normal')
# legend
ax.set_xlabel('x', fontsize=fontsize)
ax.set_ylabel('y', fontsize=fontsize)
# police
ax.tick_params(labelsize=fontsize)
# Add and specify different settings for minor grids
x_max = x.max()
ax.set_xticks(np.arange(0.0, x_max+1, 10.0), minor = True)
y_max = y.max()
ax.set_yticks(np.arange(0.0, y_max+1, 10.0), minor = True)
ax.grid(which = 'minor', alpha = 0.9)
return fig
# Plotting and saving in documentation
x = data['inputs/dataset_1'][:]
y = data['output/sum'][:]
fig = plot(x, y)
fig.savefig(open_documentation('plot.pdf', 'w')) #save plot in /documentation/
Download plot.py
Documentation¶
You can also add a documentation. For example, you can add this type of README.txt:
1) DATA
=======
Inputs: creating 2 arrays
Output: adding these 2 arrays
2) CODE
=======
create_data.py: create the inputs
adding_data.py: compute the output
plot.py: plot the data
Creating the ActivePapers¶
Now we can generate the ActivePaper:
aptool -p test.ap create -d matplotlib
Here we create an ActivePaper named 'test.ap' and external dependencies (here matplolib), i.e. Python modules that are required but not available as ActivePapers.
Then we add the README.txt and the Python code into the ActivePaper:
aptool checkin -t text documentation/README.txt
aptool checkin -t calclet code/*.py
Then it becomes "magic". You can actually run the codes inside the ActivePapers and the results will be generate inside the ActivePaper:
aptool run create_data # creating the data
aptool run adding_data # adding the data
aptool run plot # generating the plot
So now, we have one unique file test.ap containing the inputs and outputs. You can inspect the file aptool:
aptool ls
As expected, it produces:
code/adding_data
code/create_data
code/plot
data/inputs/dataset_1
data/inputs/dataset_2
data/output/sum
documentation/README
documentation/plot.pdf
As the ActivePaper file is in fact a HDF5 file, you can read the datasets with many generic HDF5 tools, in particular HDFView. We can also do it with Python via the library h5py, for example let's print the output dataset with this python script:
import h5py as h5py
with h5py.File('test.ap', 'r') as f:
dset_output = f['data/output/sum']
print(dset_output)
print(dset_output[:])
You can also easily extract the code and the documentation via:
aptool checkout documentation
aptool checkout code