7 Software Comparison

7.1 Summary

In R, we implemented the power simulation process completely, we used all parameters (\(\beta_0\), \(\beta_1\), \(\omega_0\), \(\tau_0\), \(\tau_1\), \(\rho\), \(\sigma\)) to generate simulated data (see functions sim_data and single_run).

In Python, we also used all parameters (\(\beta_0\), \(\beta_1\), \(\omega_0\), \(\tau_0\), \(\tau_1\), \(\rho\), \(\sigma\)) to generate simulated data (see functions sim_data and single_run). But there are two slight differences between r and python:

Simulation of the sampling of subjects:
- R: we used the function rnorm_multi form package {faux} to generate a table of n simulated value from a multivariate normal distribution by specifying the mean (\(\mu = 0\)) and standard deviations (\(sd = (\tau_0, \tau_1)\)), plus the correlations (\(r = \rho\)).
- Python: we used the function multivariate_normal from package {numpy} to generate a table of n simulated values from a multivariate normal distribution by specifying the means (\(\mu = [0, 0]\)) and covariance matrix (\(cov = [[\tau_0^2, \rho * \tau_0 * \tau_1], [\rho * \tau_0 * \tau_1, \tau_1^2]]\)).
Mixed Effects Models:
- R: we indicated the correlation between the random intercept and random slope of subjects in the formula of lmer function liking_ij ~ 1 + genre_i + (1 | song_id) + (1 + genre_i | subj_id).
- Python: due to the inability of the function mixedlm from package {statsmodels}, we didn’t indicate the correlations in the model.

In Stata, we only used 4 parameters (\(\beta_0\), \(\beta_1\), \(\tau_0\), \(\sigma\)) to generate simulated data. So there are noticeable changes in the code of data simulation and mixed effects analyze process (see functions sim_data and single_run).

7.2 Absolute Running Time

Note: All the results presented below are based on a laptop, here are the device details.

Table 7.1: Basic Information
	Info
Model	Microsoft Laptop 3
Windows Edition	Windows 11 Home Insider Preview 25977.1000
CPU	Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 4 cores
RAM	DDR4 16GB

Now we will list each absolute running time of each software, there are a lot of differences among them, here are the details.

Table 7.2: Absolute Running Time
Software	Info	Version	Parameters	Replicates	Absolute Time
R	R is a dynamic and high-efficient programming language for statistical computing and graphics.	\(\substack{\text{R: 4.3.1} \\ \text{lmerTest: 3.1} \\ \text{lme4: 1.1}}\)	\(\beta_0\), \(\beta_1\), \(\omega_0\), \(\tau_0\), \(\tau_1\), \(\rho\), \(\sigma\)	100	4m13s
Python	Python is a high-level, general-purpose programming language.	\(\substack{\text{Python: 3.11.6} \\ \text{numpy: 1.26.1} \\ \text{pandas: 2.1.1} \\ \text{statsmodels: 0.14.0}}\)	\(\beta_0\), \(\beta_1\), \(\omega_0\), \(\tau_0\), \(\tau_1\), \(\rho\), \(\sigma\)	30	21m27s
Stata	Stata is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting.	\(\substack{\text{Stata: Stata17MP}}\)	\(\beta_0\), \(\beta_1\), \(\tau_0\), \(\sigma\)	30	6min7s