Recently, I've spent some time implementing a particular task in both Stata and in Python. Despite this very useful reference, I've had a lot of struggle getting the output in both platforms exactly identical. Here is a list of things to watch out for when you're puzzled by why the two sets of code yield different output.
Suppose you have a dataframe with columns called permno
, date
, retrf
, and mktrf
.
retrf
is the excess stock return and mktrf
is the excess market return.The goal is to compute the beta for each stock in a 60-month rolling window with at least 24 valid observations.
Stata Implementation
m = tm(2020m12)
gen byte window = inrange(date,`m'-60,`m'-1)
egen byte obs = count(retrf) if window, by(permno)
egen Mmktrf = mean(mktrf) if retrf<. & window & obs>=24, by(permno)
gen xx = (mktrf-Mmktrf)^2 if retrf<. & window & obs>=24
gen xy = (mktrf-Mmktrf)*retrf if retrf<. & window & obs>=24
egen Mxx = mean(xx), by(permno)
egen Mxy = mean(xy), by(permno)
replace beta = Mxy/Mxx if date ==`m'
Python Implementation
ym = '2020-12-01'
sub_df = df[(df['ldate'] >= ym - pd.DateOffset(months = 60)) &
(df['ldate'] <= ym - pd.DateOffset(months = 1))]
covariance = sub_df[['ldate', 'daret_rf', 'mktrf', 'permno']].groupby('permno').cov(min_periods = 24, ddof = 0).reset_index()
numerator = covariance[['permno', 'daret_rf']][1::2].set_index('permno')['daret_rf']
denominator = covariance[['permno', 'mktrf']][1::2].set_index('permno')['mktrf']
betas = DataFrame(numerator.divide(denominator)).reset_index()
Nota Bene