************************************** ********* Analysis 3 of PCTC ********* ******** Vu Dien - 12 Sep 2013 ******* ************************************** *========================================================================== ***** Independent variables ********** *-------------------------------------------------------------------------- * Variables Description Values *-------------------------------------------------------------------------- * bw birth weight continuous * bwgroup birth weight, 2 groups 1 Low BW 0 Normal BW *========================================================================== ***** Dependent variables ************ *-------------------------------------------------------------------------- * Variables Description Values *-------------------------------------------------------------------------- * time time to eruption of the 1st tooth months * erupted an erupted tooth 1 Yes 0 No *========================================================================== ***** Potential confounding factors ** *-------------------------------------------------------------------------- * Variables Description Values *-------------------------------------------------------------------------- * msmoke mother's smoking status 1 Yes 0 No * alc mother's alcohol drinking status 1 Yes 0 No * shs mother's passive smoking status 1 Yes 0 No * mage mother's age continuous * medu mother's highest education level 1-->6 * income family's income continuous * sex child's gender 1 male * 2 female * ga gestational age at labor continuous * site study site 1 North * 2 Northeast * 3 Central * 4 South * 5 Bangkok * bfeed breast feeding 1 Yes 0 No *========================================================================== *========================================================================== *Step 1: Find the code of variables in CRF files *========================================================================== *-------------------------------------------------------------------------------------- * variable CRF file name in that file create new name *-------------------------------------------------------------------------------------- * shs ANT_B02B_ENG b02b_a HB22 (current) yes * cigs ANT_B02B_ENG b02b_a HB22A yes * msmoke ANT_B02A_ENG b02a_a B22 (current) yes * mcigs ANT_B02A_ENG b02a_a B22A yes * erupted ANT_C08_EN (6 months) c08_1_a c85 (at 6 months) yes * ANT_D03_EN (12months) d03_a d31 (at 12 months) yes * time ANT_C08_EN (6 months) c08_1_a c85 (at 6 months) yes * ANT_D03_EN (12months) d03_a d31 (at 12 months) yes * mage ANT_K02_ENG k02_a k21e1 yes * medu ANT_K02_ENG aj_ladda_23apr k21ig yes * alc ANT_B02A_ENG b02a_a B23 (yes/no) yes * income aj_ladda_23apr income no * sex aj_ladda_23apr sex no * bw ANT_B05_ENG b05_a B53B yes * ga ANT_B04_ENG b04_a B42 yes * site ANT_B02B_ENG b02b_a idmot (the 1st char) yes * bfeed ANT_C02_ENG c02_a C21A (1 month) yes (0 No 1 Yes) * ANT_C07_ENG c07_a C71A (3 months) yes (0 being used 1 stop) * ANT_C03B_ENG c03b_a C3B8 (3 months) yes (1 breast 2 bottle 3 both) * ANT_C08_ENG c08_1_a C81A1 (6 months) yes (1 breast 2 stop) *-------------------------------------------------------------------------------------- *================================================================== *Step 2: Convert the files which contain those variables into Stata *================================================================== *Using command INSHEET to convert .txt into .dta insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\b02b_a.txt",clear /*4256 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b02b_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\b02a_a.txt",clear /*4421 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b02a_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\c08_1_a.txt",clear /*4370 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c08_1_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\d03_a.txt",clear /*4116 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\d03_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\k02_a.txt",clear /*4490 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\k02_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\Aj_ladda_23apr.txt", clear /*4245 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\Aj_ladda_23apr.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\b05_a.txt",clear /*4379 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b05_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\b04_a.txt",clear /*4355 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b04_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\c02_a.txt",clear /*1786 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c02_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\c07_a.txt",clear /*4398 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c07_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\c03b_a.txt",clear /*4316 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c03b_a.dta" insheet using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Dataset\PCTC\PCTC Data\aj_ja_nbl.txt",clear /*4245 obs*/ save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\aj_ja_nbl.dta" *---------------------------------------------------------------------------------------------- *============================================================= *Step 3: Drop the variables which will not be used to analyze. * In other words, keep only interested variables *============================================================= *Using command KEEP to keep only the variables of interest use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b02b_a.dta",clear keep idmot hb21 hb22 hb22a save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b02b_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b02a_a.dta",clear keep idmot b22 b22a b23 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b02a_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c08_1_a.dta",clear gen sdate6=date /*create SDATE6: Date of interview at 6 months*/ keep idchd c81a1 c85 sdate6 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c08_1_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\d03_a.dta",clear gen sdate12=date /*create SDATE12: Date of interview at 12 months*/ keep idchd d31 sdate12 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\d03_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\k02_a.dta",clear keep idmot k21e1 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\k02_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\Aj_ladda_23apr.dta",clear keep idchd idmot sex k21ig income save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\Aj_ladda_23apr.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b05_a.dta",clear keep idchd b53b save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b05_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\b04_a.dta",clear keep idmot b42 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b04_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c02_a.dta",clear keep idchd c21a save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c02_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c07_a.dta",clear keep idchd c71a save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c07_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\c03b_a.dta",clear keep idchd c3b8 save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c03b_a.dta" use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC converting\aj_ja_nbl.dta",clear keep idchd cbirthd save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\aj_ja_nbl.dta" *------------------------------------------------------------------------------------------- *================================================================================== *Step 5: Merge files altogether, using file "Aj_ladda_23apr.dta" as the master file *================================================================================== *Merge the master file with 4 files which have IDMOT only, using option unmatched(master) to keep IDCHD in master file if IDMOT missing in any of 4 files *Using command JOINBY with option unmatch(both) to merge files, save as finaldataset.dta use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\Aj_ladda_23apr.dta", clear joinby idmot using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b02a_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset.dta" joinby idmot using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b02b_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset.dta", replace joinby idmot using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\k02_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset.dta", replace joinby idmot using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b04_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset.dta", replace *Now, there are 4245 obs in finaldataset.dta *Next, open the Master file which is c08_1_a.dta (4370 obs), then merge master file with finaldataset.dta, using IDCHD as key variable, unmatched(master) use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c08_1_a.dta", clear /*4370 obs*/ joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta" joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\b05_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", replace joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c02_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", replace joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c07_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", replace joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\c03b_a.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", replace joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\aj_ja_nbl.dta", unmatched(master) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", replace *Now, there are 4370 obs in finaldataset.dta *Now there is only one file left to be merged. That file is "d03_a.dta", which contains variable D31 (time of tooth eruption interviewed at 12 months) *We will merge that file with the file finaldataset1.dta *Case 3: unmatched(using) ==> 4116 obs use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1.dta", clear joinby idchd using "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC selecting vars\d03_a.dta", unmatched(using) drop _merge save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\PCTC joinby\finaldataset1_using.dta",replace *--------------------------------------------------------------------------------------------------------------- *=========================== *Step 6: Creat new variables *=========================== use "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\Seminar 3\PCTC joinby\finaldataset1_using.dta", clear save "D:\Hoc Hanh\KKU\YEAR II\TAKASILA\Assignment\Seminar 3\PCTC joinby\finaldataset1_using_analyze.dta" *Create variable DOB based on CBIRTHD split cbirthd, p(/) destring cbirthd3, replace replace cbirthd3=cbirthd3+543 /*BE year = CE year + 543*/ tostring cbirthd3, replace gen newcbirthd=cbirthd1 + "/"+ cbirthd2 + "/"+ cbirthd3 gen dob=date(newcbirthd, "MDY") format %td dob /* in human readable format*/ *Create variable SD6 and SD12 (Date of survey at 6 months and 12 months) tostring sdate6, replace replace sdate6 = "0" + sdate6 if strlen(sdate6)==5 gen sd6=date(sdate6,"DMY",2600) format %td sd6 tostring sdate12, replace replace sdate12 = "0" + sdate12 if strlen(sdate12)==5 gen sd12=date(sdate12,"DMY",2600) format %td sd12 gen childage=(sd12-dob)/30.5 *Create var TIME: the month when 1st tooth erupted gen time = c85 replace time = "" if regexm(time, "^[0-9]") == 0 /*All data started with non-numeric was now missing value*/ replace time = "" if time == "0" /*Force changes due to impossible value*/ destring time, replace replace time = d31 if time == . & d31!=-9 /*Change to the month of eruption assessed by 12months (var d31)*/ *Create variable ERUPTED: tooth eruption status (1 yes, 0 no) gen erupted = 0 replace erupted = 1 if time >=1 & time!=. replace time=childage if time==. | time==0 /*after replace time by d31, there are still missing values and value 0*/ *Create var SHS: secondhand smoking (yes/no) recode hb21 (-9=.) recode hb22 (-9=.) (-8=.) recode hb22a (-9=.) (-8=.) gen shs=. replace shs = 1 if hb22a != 0 & hb22a !=. replace shs = 1 if hb21 == 1 replace shs = 1 if hb22 == 1 replace shs = 0 if hb21 == 2 replace shs = 0 if hb22a == 0 & hb21 != 1 la def noyes 0 "No" 1 "Yes" /*Create label*/ la val shs noyes /* *Create var CIGS: number of cigarettes smoked by fathers (continuous) gen cigs=hb22a replace cigs = 0 if shs == 0 *Create var MSMOKE: mother's smoking status (yes/no) gen msmoke=b22 recode msmoke (-9=.) (-8=.) la val msmoke noyes *Create var MCIGS: number of cigarettes smoked by mothers (continuous) gen mcigs=b22a recode mcigs (-9=.) (-8=.) */ *Create var MAGE, MEDU, ALC: age, education level, alcohol status of mothers gen mage=k21e1 recode mage (-9=.) (0=.) gen magegr=1 replace magegr=2 if mage>24 la def l_magegr 1 "Youth" 2 "Adult" la val magegr l_magegr gen medu=k21ig recode medu (-9=.) la def l_medu 1 "Illiterate" 2 "Primary school" 3 "High school" 4 "Vocational training" 5 "University and higher" 6 "Other" la val medu l_medu gen alc=b23 recode alc (-9=.) (2=0) la val alc noyes /* *Create intervals and new categorical variables for MAGE xtile mage3t = mage, nq(3) la def lmage3t 1 "13-24" 2 "25-30" 3 "31-48" /*Create label*/ la val mage3t lmage3t tabstat mage, stat (n min max) by(mage3t) */ *Create var BW, GA: birthweight and gestational age of infants gen bw=b53b gen ga=b42 *Create new categorical variable of Birthweight gen bwgroup=. replace bwgroup=0 if bw>=2500 replace bwgroup=1 if bw<2500 la def l_bwgroup 0 "Normal BW" 1 "Low BW" la val bwgroup l_bwgroup *Create new categorical variable of Gestational Age gen gagroup=. replace gagroup=0 if ga>=37 replace gagroup=1 if ga<37 la def l_termbirth 0 "Term birth" 1 "Preterm birth" la val gagroup l_termbirth *Create var SITE: the study sites gen site=trunc(idchd/10000000) la def l_site 1 "North" 2 "Northeast" 3 "Central" 4 "South" 5 "Bangkok" la val site l_site *Recode variable SEX recode sex (2=0) la def l_sex 0 "female" 1 "male" la val sex l_sex *Create var BFEED (breast feeding 1 Yes 0 No) destring c21a, gen (c21b) gen bfeed=. replace bfeed=0 if c21b == 0 replace bfeed=0 if c3b8 == 2 replace bfeed=1 if c21b == 1 | c3b8 == 1 | c71a == 0 | c81a1 == 1 la val bfeed noyes *================================= *Step 7: Start to analyze the data *================================= *------------------------------------------------------------------------------------------------------ * Table 1: Demographic characteristics * For mothers tab magegr site, col miss tab medu site, col miss tab alc site, col miss tab shs site, col miss * For infants tab sex site, col miss tab gagroup site, col miss tab bfeed site, col miss *------------------------------------------------------------------------------------------------------ *------------------------------------------------------------------------------------------------------ * Table 2: Percentage of Low BW in pregnant women among 5 sites tab bwgroup site, col miss *------------------------------------------------------------------------------------------------------ *------------------------------------------------------------------------------------------------------ * Table 3: Crude hazard ratios (HR) of tooth eruption for each explanatory factor * Event = erupted stset time, failure(erupted) local listvar "bwgroup mage i.medu alc shs sex gagroup bfeed" foreach var of local listvar { stcox `var', strata(site) } local listvar "bwgroup medu alc shs sex gagroup bfeed" foreach var of local listvar { stsum, by(`var') } *------------------------------------------------------------------------------------------------------ *------------------------------------------------------------------------------------------------------ * Table 4: Adjusted HR of tooth eruption for each explanatory factor *------------------------------------------------------------------------------------- * Step 1: Stratified analysis *Section 3.1 Effect of MEDU on the association between BW and ERUPTED cc erupted bwgroup, by(medu) /*Test of homogeneity (M-H) p=0.347 */ *Section 3.2 Effect of ALC on the association between BW and ERUPTED cc erupted bwgroup, by(alc) /*Test of homogeneity (M-H) p=0.515 */ *Section 3.3 Effect of SHS on the association between BW and ERUPTED cc erupted bwgroup, by(shs) /*Test of homogeneity (M-H) p=0.924 */ *Section 3.4 Effect of SEX on the association between BW and ERUPTED cc erupted bwgroup, by(sex) /*Test of homogeneity (M-H) p=0.290 */ *Section 3.5 Effect of GAGROUP on the association between BW and ERUPTED cc erupted bwgroup, by(gagroup) /*Test of homogeneity (M-H) p=0.245 */ *Section 3.6 Effect of BFEED on the association between BW and ERUPTED cc erupted bwgroup, by(bfeed) /*Test of homogeneity (M-H) p=0.836 */ *Section 3.7 Effect of MAGEGR on the association between BW and ERUPTED cc erupted bwgroup, by(magegr) /*Test of homogeneity (M-H) p=0.654 */ *No p-value of test of homogeneity <= 0.2. Therefore, we do not create any interaction term. *------------------------------------------------------------------------------------- * Step 2: Multivariable analysis : Cox regression * The initial model – the full model stcox bwgroup sex gagroup, strata(site) * SO THIS IS THE FINAL MODEL stcox bwgroup sex gagroup, strata(site) tab bwgroup erupted, row tab sex erupted, row tab gagroup erupted, row *------------------------------------------------------------------------------------------------------ stset time, failure(erupted) stci /*Median time to tooth eruption*/ graph box time, over(site) /*Box plot to see median time*/ * To see the equality of the survival function between Normal BW and Low BW group * H0: S[NormalBW](t) = S[LowBW](t) (Survival is the same) * H1: S[NormalBW](t) # S[LowBW](t) (Survival is not the same) sts test bwgroup, strata(site) *------------------------------------------------------------------------------------------------------ *------------------------------------------------------------------------------------------------------ * Figure 3: Difference in the probability of erupted tooth between SHS group and non-SHS group * Event = erupted stset time, failure(erupted) sts graph, by(bwgroup) ci fail tmin(2) tmax(15) *------------------------------------------------------------------------------------------------------ *The End *------------------------------------------------------------------------------------------------------ * Thinking about imputation for missing data mi set mlong mi register imputed mage mi impute regress mage shs cigs bw, add(200) rseed(10394) mi estimate: stcox shs age * Table 6: Crude effect of each factor on DTE * Multiple logistic regression *====================================================================================================== ******************************************************************************************************* * Research question is that "Does SHS affect DTE?" * SHS is the "risk of interest" ******************************************************************************************************* *====================================================================================================== *==================================================================================== ***** Dependent and independent variables *------------------------------------------------------------------------------------ * Variables Description Values *------------------------------------------------------------------------------------ * delay delayed first tooth eruption 1 Yes 0 No * shs secondhand smoking 1 Yes 0 No *==================================================================================== ***** Potential confounding factors *------------------------------------------------------------------------------------ * Variables Description Values *------------------------------------------------------------------------------------ * msmoke mother's smoking status 1 Yes 0 No * mage mother's age 1 13-24 * 2 25-30 * 3 31-48 * medu mother's highest education level 1-->6 * alc mother's alcohol drinking status 1 Yes 0 No * income family's income 1 Low <=66k * 2 Medium 66k-158k * 3 High >=158k * sex child's gender 1 male * 0 female * bwgroup birth weight 1 Low BW * 0 Normal BW * gagroup gestational age at labor 1 < 37wks * 0 >=37 wks * site study site 1 North * 2 Northeast * 3 Central * 4 South * 5 Bangkok *===================================================================================== * Step 1: Exploring the data and univariate analysis list delay shs msmoke mage medu alc income sex bw ga site tab delay ci delay *------------------------------------------------------------------------------------- * Step 2: Bivariate (crude) analysis * Section 2.1 Crude effect of SHS on DELAY cs delay shs, or *Section 2.2 Crude effect of MSMOKE on DELAY cs delay msmoke, or /*p =0.67*/ *Section 2.3 Crude effect of MAGE3T on DELAY tab mage3t delay, row chi2 exact /*p =0.068*/ csi 248 305 1159 1325, or /*to see OR of group 2 compared to group 1, p=0.44*/ csi 285 305 1070 1325, or /*to see OR of group 3 compared to group 1, p=0.11*/ logistic delay mage *Section 2.4 Crude effect of MEDU on DELAY tab medu delay, row chi2 exact /*p=0.01*/ replace medu=5 if medu==6 /*collapsed two categories because of small number of category 6*/ tab medu delay, row chi2 exact /*p=<0.001*/ csi 443 37 1694 190, or csi 239 37 989 190, or csi 52 37 303 190, or csi 68 37 390 190, or *or can use this command logistic delay i.medu *Section 2.5 Crude effect of ALC on DELAY cs delay alc, or /*p =0.79*/ *Section 2.6 Crude effect of INCOME3T on DELAY tab income3t delay, row chi2 exact /*p<0.001*/ csi 254 328 1188 1116, or csi 249 328 1191 1116, or logistic delay income *Section 2.7 Crude effect of SEX on DELAY cs delay sex, or /*p<0.001*/ *Section 2.8 Crude effect of BWGROUP on DELAY cs delay bwgroup, or /*p<0.001*/ *Section 2.9 Crude effect of GAGROUP on DELAY cs delay gagroup, or /*p=0.02*/ *Section 2.10 Crude effect of SITE on DELAY tab site delay, row chi2 /*p<0.001*/ csi 319 125 829 697, or csi 154 125 760 697, or csi 139 125 695 697, or csi 103 125 582 697, or *or can use this command logistic delay i.site *------------------------------------------------------------------------------------- * Step 3: Stratified analysis *Section 3.1 Effect of BWGROUP on the association between SHS and DELAY cc delay shs, by(bwgroup) /*Test of homogeneity (M-H) p = 0.13*/ *Section 3.2 Effect of GAGROUP on the association between SHS and DELAY cc delay shs, by(gagroup) /*Test of homogeneity (M-H) p < 0.001*/ *Section 3.3 Effect of SITE on the association between SHS and DELAY cc delay shs, by(site) /*Test of homogeneity (M-H) p = 0.023*/ *Section 3.4 Effect of SEX on the association between SHS and DELAY cc delay shs, by(sex) /*Test of homogeneity (M-H) p < 0.001*/ *Section 3.5 Effect of INCOME on the association between SHS and DELAY cc delay shs, by(income3t) /*Test of homogeneity (M-H) p < 0.725*/ *Section 3.6 Effect of MAGE on the association between SHS and DELAY cc delay shs, by(mage3t) /*Test of homogeneity (M-H) p < 0.0134*/ *Section 3.7 Effect of MEDU on the association between SHS and DELAY cc delay shs, by(medu) /*Test of homogeneity (M-H) p < 0.074*/ * We select SHS*SEX, SHS*BWGROUP, and SHS*GAGROUP *------------------------------------------------------------------------------------- * Step 4: Multivariable analysis : Logistic regression * Create interaction variables gen s_sex = shs * sex gen s_mage = shs * mage gen s_gagr = shs * gagroup * Section 4.1. The initial model – the full model xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_sex s_mage s_gagr est store full * Section 4.2. Model without s_mage as s_mage has highest p value of 0.016 xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_sex s_gagr lrtest full, force /* p=0.015, so need to keep s_mage in the model*/ * Section 4.3. Model without s_sex as s_mage has higher ordered term xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_mage s_gagr lrtest full, force /* p<0.001, so need to keep s_sex in the model*/ * Section 4.4. Model without s_gagr as s_gagr has higher ordered term xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_sex s_mage lrtest full, force /* p<0.001, so need to keep s_gagr in the model*/ * Considering 3 interaction terms, we decided to keep only s_gagr ********************************************************************* * Now, we start to run again from step 1 with the full model * Full model xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_gagr est store full * Remove medu xi: logistic delay shs mage sex bwgroup gagroup i.site s_gagr lrtest full, force * p = 0.027, so we can't remove this variable medu * Remove mage xi: logistic delay shs i.medu sex bwgroup gagroup i.site s_gagr lrtest full, force * p < 0.001, so we can't remove this variable mage * The final model is the 1st model (full model) xi: logistic delay shs mage i.medu sex bwgroup gagroup i.site s_gagr *------------------------------------------------------------------------------------------ * I tried this one: * Backward Stepwise: xi: sw logistic delay shs mage i.medu sex bwgroup gagroup i.site s_mage, pr(0.2) est store full * Step 5: Assessing model adequacy: test for goodness of fit of the model estat gof /*goodness-of-fit test*/ * Step 6: Obtaining measure of associations from the model *------------------------------------------------------------------------------------------------------ *------------------------------------------------------------------------------------------------------ * Table 7: Model of Association between the number of cigarettes smoked by the fathers and the time of first tooth eruption corr cigs time * then I see no correlation between cigs and time regress time cigs mage bw ga income alc * Draw a regression line with 95% CI twoway lfitci time cigs, stdf || scatter time cigs /*stdf: SE for the forecast*/ *------------------------------------------------------------------------------------------------------ * Compare the missing group and the completed group * Create variable PS (0: completed, 1: missing) gen ps=0 replace ps=1 if shs==. la def l_ps 0 "Completed" 1 "Missing" la val ps l_ps tab medu ps, col chi2 tab alc ps, col chi2 tab sex ps, col chi2 tab bwgroup ps, col chi2 tab gagroup ps, col chi2