

All variables that we want to convert into one column we can put into the varying parameter and R will sort them out based on the naming patterns. For the first example, you would use sep="_", the second would be sep="", and the third is sep="." and all of these are valid. If the dataset is not in this format, you will have a lot of problems and I suggest changing your variable names before trying to convert. As long as they follow the same text and number pattern. It can be anything with a pattern like this:Įtc. Specifically, the columns that I want to turn into one column should all follow the same structure in the naming convention. Ok, so the most important thing about the reshape function is that you have to give it variable names that it can understand. Sep = the symbol that separates the name of a varying column from its number Idvar = variable in your dataset that identifies multiple records from the same individualĭirection = "wide" if you're going from long to wide and "long" if you're going from wide to long Timevar = name of new variable that differentiates multiple observations from the same individual Varying = columns in the wide format that correspond to a single column in the long format Reshape(data, varying = NULL, timevar = "time", idvar = "id", direction, sep = "" ) The reshape() function takes in a number of important parameters that will be necessary for our transformation (there are more parameters than this, but I've boiled it down to the ones that are crucial): You don't need to use melt and cast at all, which are difficult to manipulate in my opinion.

However, after a good deal of struggle and looking things up, I found that the reshape() function is the most intuitive and user-friendly for the needs of this problem. There are a number of ways of doing this in R, including melt() and cast(), plyr(), aggregate(), and others. Specifically, I would like the columns to just be the caseid of the mother, age of the mother, year of birth of the child, and sex of the child. Because Stata reads in the whole data set at the same time, Stata has special commands to construct variables for data in long format. I would like to convert this dataset to one where I have one observation per child born - i.e. from the wide format to the long format. Since she did not have a third birth yet, her values for b2_03 and b4_03 are missing. So the first subject, aged 30, has had two births - one in 2000 and one in 2005, both boys. Here v012 is the mother's age, all the b2 variables are year of births, and the b4 variables are the sex of the child.
