Accessing simulation data

A single run generates a SamplePath object which is a recursive array provided by RecursiveArrayTools.jl.

sample_path = simulate(state, model, SortingDirect(), tfinal = 100.0, save_points = 0:25:100.0)

sample_path
t: 5-element Array{Float64,1}:
   0.0
  25.0
  50.0
  75.0
 100.0
x: 5-element Array{Array{Int64,1},1}:
 [10, 0, 0, 0, 0]   
 [1, 9, 1, 16, 58]  
 [0, 10, 0, 17, 141]
 [0, 10, 0, 18, 143]
 [0, 10, 0, 17, 142]

Access the state at the first time point:

sample_path[1]
5-element Array{Int64,1}:
 10
  0
  0
  0
  0

Access the third component at the second time point:

sample_path[3,2]
1

Access entire history for the third and fourth components:

sample_path[[3,4], :]
2×5 Array{Int64,2}:
 0   1   0   0   0
 0  16  17  18  17

Running multiple simulations generates an Ensemble, a collection of SamplePath objects; that is, Vector{SamplePath}. These objects also support indexing:

# ensemble of 10 trajectories
ensemble = [simulate(state, model, SortingDirect(), tfinal = 100.0, save_points = 0:25:100.0) for _ in 1:10];
10-element Array{SamplePath{Int64,2,Array{Array{Int64,1},1},Array{Float64,1}},1}:
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [3, 7, 0, 7, 53], [0, 10, 0, 10, 48], [0, 10, 0, 15, 104], [1, 9, 0, 12, 105]]  
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [4, 6, 0, 5, 14], [5, 5, 0, 3, 15], [4, 6, 0, 5, 12], [6, 4, 0, 7, 11]]         
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [3, 7, 0, 8, 31], [0, 10, 1, 15, 176], [1, 9, 0, 24, 184], [0, 10, 0, 15, 186]] 
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [2, 8, 0, 6, 30], [2, 8, 0, 3, 31], [2, 8, 0, 10, 26], [3, 7, 0, 7, 28]]        
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [3, 7, 1, 4, 22], [1, 9, 0, 15, 106], [0, 10, 0, 14, 129], [1, 9, 0, 17, 124]]  
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [3, 7, 0, 11, 57], [1, 9, 0, 9, 55], [1, 9, 0, 10, 53], [2, 8, 0, 11, 53]]      
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [1, 9, 0, 11, 57], [0, 10, 0, 14, 112], [1, 9, 0, 15, 112], [0, 10, 0, 12, 111]]
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [10, 0, 1, 0, 0], [1, 9, 2, 18, 193], [2, 8, 0, 28, 255], [0, 10, 0, 23, 252]]  
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [2, 8, 1, 6, 25], [1, 9, 0, 10, 39], [3, 7, 0, 10, 39], [1, 9, 0, 5, 39]]       
 t: [0.0, 25.0, 50.0, 75.0, 100.0]
x: Array{Int64,1}[[10, 0, 0, 0, 0], [3, 7, 0, 7, 15], [2, 8, 0, 4, 16], [6, 4, 0, 11, 33], [0, 10, 1, 14, 163]]     

Retrieve an individual sample path:

ensemble[2]
t: 5-element Array{Float64,1}:
   0.0
  25.0
  50.0
  75.0
 100.0
x: 5-element Array{Array{Int64,1},1}:
 [10, 0, 0, 0, 0]
 [4, 6, 0, 5, 14]
 [5, 5, 0, 3, 15]
 [4, 6, 0, 5, 12]
 [6, 4, 0, 7, 11]

Index into a SamplePath:

ensemble[2][[3,4], :]
2×5 Array{Int64,2}:
 0  0  0  0  0
 0  5  3  5  7

Working with DataFrames

Both SamplePath and Ensemble implement the IterableTables.jl interface. This means that they act as data sources that can be converted into any supported data sink. For example, you can convert an Ensemble into a DataFrame:

using DataFrames

DataFrame(ensemble)

50 rows × 7 columns

trialtX1X2X3X4X5
Int64Float64Int64Int64Int64Int64Int64
110.0100000
2125.0370753
3150.001001048
4175.0010015104
51100.019012105
620.0100000
7225.0460514
8250.0550315
9275.0460512
102100.0640711
1130.0100000
12325.0370831
13350.0010115176
14375.019024184
153100.0010015186
1640.0100000
17425.0280630
18450.0280331
19475.02801026
204100.0370728
2150.0100000
22525.0371422
23550.019015106
24575.0010014129
255100.019017124
2660.0100000
27625.03701157
28650.0190955
29675.01901053
306100.02801153
3170.0100000
32725.01901157
33750.0010014112
34775.019015112
357100.0010012111
3680.0100000
37825.0100100
38850.019218193
39875.028028255
408100.0010023252
4190.0100000
42925.0281625
43950.01901039
44975.03701039
459100.0190539
46100.0100000
471025.0370715
481050.0280416
491075.06401133
5010100.0010114163

Each component of the state vector appears as a column, along with trajectory trial and timestamp t columns. If the conversion does not produce the expected result, one may be able to force the correct behavior using the tablefy function:

import BioSimulator: tablefy

DataFrame(tablefy(sample_path)) # DataFrame(sample_path) is currently incorrect

5 rows × 6 columns

tX1X2X3X4X5
Float64Int64Int64Int64Int64Int64
10.0100000
225.01911658
350.0010017141
475.0010018143
5100.0010017142

Saving simulation results

Because SamplePath and Ensemble support iteration, you can save simulation data directly using Julia's I/O interface. Alternatively, you can use existing packages such as CSV.jl:

using CSV

CSV.write(file, result, delim = '\t')

Obtaining summary statistics

Summary statistics for ensembles are supported directly:

using Statistics

mean(ensemble)
std(result)
var(result)

Customizing output

It is possible to customize the data stored in a SamplePath with the keyword argument save_function in simulate. This function must accept exactly three arguments:

function myfunction(simulator, state, model)
    # your code here
end

It has access to data stored in a simulator object, the process's state, and the underlying model. This customized function should return the data to be stored in a SamplePath. Data can be any object, including custom Julia types. As an example, the function save_state is one of the default options that simply saves a copy of the system state at a given time. Note the use of annotations to dispatch on the type of state

# this gets called for well-mixed simulations
save_state(simulator, state::Vector{T}, model) where T <: Int = copy(state)

# this gets called for interacting particle systems
save_state(simulator, state::Lattice, model) = Configuration(state)

Another option is save_rates which copies the rates for each possible jump

save_rates(simulator, state, model) = copy(jump_rates(simulator))

BioSimulator provides the following high-level interface for accessing data:

BioSimulator.jump_ratesFunction
jump_rates(simulator::ExactSimulator)

Retrieve the current rates for each jump.

If the cache type is HasRates(), then rate[j] is the rate for reaction channel j. Otherwise if the cache type is HasSums(), then rate[j] is the cumulative sum(rate[1:j]).

Tip
  1. Avoid heavy computations inside a save function. It is usually better to store the data needed and do the work outside the simulation loop. Benchmark different implementations to see what approach works best for your application.

  2. Some data, such as jump_rates(simulator) and state are stored as numerical arrays. These objects are mutable so you should generally return copies rather than the object itself.

  3. The generated SamplePath object will support multi-dimensional indexing provided its data is stored as an array. For example, if we set save_function = save_rates, then trajectory[k,j] will return rate[j] at time trajectory.t[k].