ReshapeWide KeepItems property in SDTL

9 posts / 0 new
Last post
ReshapeWide KeepItems property in SDTL

Dear all,

I am working on mapping transformations done within an ETL platform called Pentaho Data Intergration and one of the tranformations I would like to map is the Pentaho denormalizer. It is equivalent to ReshapeWide in SDTL. I have identified my IdVar and IndexVar but I have not been able to work out the "KeepItems", what sort of items go into this array?

Thanks!

ReshapeWide

Hi Chifundo,

ReshapeWide is a real problem.  As you know, the number of variables created by ReshapeWide depends upon the data.  Since C2Metadata is limited to only working with metadata files, we don't have a way to determine which variables will be created by a ReshapeWide command.  So, we have set it aside.

However, I would be willing to work with you to come up with a viable ReshapeWide schema for cases where the number of new variables is known.

Please take a look at the SDTL changes that are described in this document, which don't appear in the SDTL COGS yet:

https://docs.google.com/spreadsheets/d/1BBpcygqalNAb-35_P077wXHGt3SVYQ6D...

The proposals include a new approach to the  ReshapeLong schema, and the addition of ReshapeItemDescription, CompositeVariableName, and IteratorDescription, which may be helpful in creating a workable ReshapeWide schema.

Let's talk about it.

         George

ReshapeWide KeepItems property in SDTL

Hi Chifundo,

Below is a way that we can describe a ReshapeWide in SDTL.  The example is similar to an example in the Stata manual.  This would involve the addition of a couple of properties to the ReshapeItemDescription item in SDTL.  However, this example gives explicit values for the number of new variables created -- see the NumberRangeExpression.  For this to work, you would need to know what values the index variable (i.e., year) takes in the dataset.  For categorical variables this information could be in the DDI already.  The DDI may also have minimum and maximum values, but it would probably not show gaps.  

What do you think?

        George

 

 

Stata command:
reshape wide inc ue, i(region id) j(year)
 

SDTL:
{"command": "ReshapeWide",
    "MakeItems": [
            "ReshapeItemDescription":
                {"$type": "ReshapeItemDescription",
                "SourceVariableName": "inc",
                "Stub": "inc",
                "IndexValueList":
                {"$type":"NumberRangeExpression",
                        "From": "80",
                        "To": "83",
                        "By": "1"}
                    },
            "ReshapeItemDescription":
                {"$type": "ReshapeItemDescription",
                "SourceVariableName": "ue",
                "Stub": "ue",
                "IndexValueList":
                {"$type":"NumberRangeExpression",
                        "From": "80",
                        "To": "83",
                        "By": "1"}
                    }
            ],
    "IDVariables": [
                {"$type": "VariableSymbolExpression","VariableName": "region"},
                {"$type": "VariableSymbolExpression","VariableName": "id"}
                ]
        }
      

 

Thanks George.

Thanks George.

I have added my anonymised data, both long and wide formats here: https://docs.google.com/spreadsheets/d/1KLM2L0nhcl6HNUN5GUCDIJovhJ5bmELd...

I do understand the problem that one runs into with reshape wide if they do not have the data being manipulated. We are working on potentially repeatable data transformations across a network of research sites, the data will be available to the individual developing the documentation so the solution that you are providing here works for us.

I think DDI codebook provides frequency tables for a categorical index variable. these frequency tables will not tell which observations have which category so, i guess not sufficient for the reshape to work properly?

Here is my attempt at creating the SDTL for my ReshapeWide:

{"command": "ReshapeWide",

      "MakeItems": [

         "ReshapeItemDescription":

          {"$type": "ReshapeItemDescription",

         "SourceVariableName": "Records",

         "Stub": "",

         "IndexValueList":

         {"$type":"StringListExpression",

           "From": "illegal",

           "To": "legal",

           "By": "1"}

       }

   ],

"IDVariables": [

     {"$type": "VariableSymbolExpression","VariableName": "Centre"},

     {"$type": "VariableSymbolExpression","VariableName": "MetricTable"},

     {"$type": "VariableSymbolExpression","VariableName": "QMetric"}

     ]

}

Did I get the SDTL right?

 

Many thanks!

 

Chifundo

I am probably not using the

I am probably not using the stringlistexpression correctly, not sure how to specify the elements of the list...

ReshapeWide KeepItems property in SDTL

This looks right, but I think that the StringListExpression should look like this:  

{"$type":"StringListExpression",
    "values":[ "illegal", "legal"]
    }

We also have a StringRangeExpression, but it expects and alphabetic progression.  So, listing the values is better here.

I'll add this to SDTL COGS, but with a not about reservations on using it.

ReshapeWide KeepItems property in SDTL

Also, we need this to identify the variable that is the source of the index:

"IndexVariableName":"Status"

You can see revised examples here:

https://gitlab.com/altergc/sdtl-cogs/blob/master/CompositeTypes/ReshapeW...

 

I added data to example 2...

Great! many thanks. I will update the SDTL for my reshape wide with the StringListExpression and IndexVariableName as you have indicated in your last response.

I created a fork from here: https://gitlab.com/altergc/sdtl-cogs/blob/master/CompositeTypes/ReshapeW...

I then added the data for example 2 and the Stata reshape wide line to help users to understand. I submitted a merge request for your review to see if my additions are necessary

I added data to example 2...

Thanks!

Please let me know if you discover any problems as you implement this.

Log in or register to post comments