Programming in Stata vs. Python

⬅️ Back to list of blog posts

Recently, I've spent some time implementing a particular task in both Stata and in Python. Despite this very useful reference, I've had a lot of struggle getting the output in both platforms exactly identical. Here is a list of things to watch out for when you're puzzled by why the two sets of code yield different output.

1. Computing Beta

Suppose you have a dataframe with columns called permno, date, retrf, and mktrf .

retrf is the excess stock return and mktrf is the excess market return.

The goal is to compute the beta for each stock in a 60-month rolling window with at least 24 valid observations.

Stata Implementation

m = tm(2020m12)
gen byte window = inrange(date,`m'-60,`m'-1)
egen byte obs = count(retrf) if window, by(permno)

egen Mmktrf = mean(mktrf) if retrf<. & window & obs>=24, by(permno)

gen xx = (mktrf-Mmktrf)^2 if retrf<. & window & obs>=24
gen xy = (mktrf-Mmktrf)*retrf if retrf<. & window & obs>=24

egen Mxx = mean(xx), by(permno)
egen Mxy = mean(xy), by(permno)

replace beta = Mxy/Mxx if date ==`m'

Python Implementation

ym = '2020-12-01'
sub_df = df[(df['ldate'] >= ym - pd.DateOffset(months = 60)) &
					  (df['ldate'] <= ym - pd.DateOffset(months = 1))]

covariance = sub_df[['ldate', 'daret_rf', 'mktrf', 'permno']].groupby('permno').cov(min_periods = 24, ddof = 0).reset_index()
numerator = covariance[['permno', 'daret_rf']][1::2].set_index('permno')['daret_rf']
denominator = covariance[['permno', 'mktrf']][1::2].set_index('permno')['mktrf']

betas =  DataFrame(numerator.divide(denominator)).reset_index()

Nota Bene

The above Stata implementation uses population covariance (as seen above) while the Python covariance uses the sample covariance.