EDA: Exploratory data analysis 1. Read the Protocol 2. Familarize the CRF 3. Start EDA with the data file 4. Clear goal in mind=> the primary outcome (DV) and, probably, the IDV of interest Stata commands: clear set more off cd "C:\DMHT_2553_54_55\" use "07_DMHT2554_Data_STATA.dta" , clear * Continuous outcome -> dichotomize with a complex definition desc lookfor 18 li b1811 b1812 in 1/20 li b1811 b1812 b1821 b1822 in 1/20 replace b1811 = . if b1811 < 30 replace b1812 = . if b1812 < 30 replace b1821 = . if b1821 < 30 replace b1822 = . if b1822 < 30 gen bpsys = (b1811 + b1821)/2 gen bpdia = (b1812 + b1822)/2 li b1811 b1812 b1821 b1822 bpsys bpdia in 1/20 gen bpyn = . replace bpyn = 1 if bpsys <130 & bpdia < 80 /*Need Ref*/ replace bpyn = 0 if bpyn !=1 & (bpsys !=. | bpdia != .) li b1811 b1812 b1821 b1822 bpsys bpdia bpyn in 1/20 tab bpyn tab opd tab bpyn tab opd bpyn tab opd bpyn, row xi: logistic bpyn i.opd * Continuos outcome longitudinally use "09_DMHT2553_54_55_Data_Final_20130522.dta", clear tabstat dm_hba1c, stat(n mean sd median min max) by(year) tabstat dm_hba1c, stat(n mean sd median min max) by(type) tabstat dm_hba1c, stat(n mean sd median min max) by(sex) ttest dm_hba1c, by(sex) regress dm_hba1c sex