Metadata-Version: 1.0
Name: EasyFrames
Version: 0.1.2
Summary: Classes and methods for executing stata-like commands easily for pandas dataframes.
Home-page: http://pypi.python.org/pypi/EasyFrames/
Author: Shafique Jamal
Author-email: shafique.jamal@gmail.com
License: LICENSE.txt
Description: # EasyFrames
        
        ## Summary
        
        This package makes it easier to perform some basic operations using a Pandas dataframe. For example, suppose you have the following dataset:
        
        ````
        ... age       educ fridge  has_car  hh  house_rooms  id  male     prov  weighthh
        0   44  secondary    yes        1   1            3   1     1       BC         2   
        1   43   bachelor    yes        1   1            3   2     0       BC         2   
        2   13    primary    yes        1   1            3   3     1       BC         2   
        3   70     higher     no        1   2            2   1     1  Alberta         3   
        4   23   bachelor    yes        0   3            1   1     1       BC         2   
        5   20  secondary    yes        0   3            1   2     0       BC         2   
        6   37     higher     no        1   4            3   1     1  Alberta         3   
        7   35     higher     no        1   4            3   2     0  Alberta         3   
        8    8    primary     no        1   4            3   3     0  Alberta         3   
        9   15    primary     no        1   4            3   4     0  Alberta         3   
        ```` 
        
        If you are using Stata, and you want to add a column with the household size, the command is simple:
        
        `egen hhsize = count(id), by(hh)`
        
        If you are using Pandas and have the dataset loaded as df, you might have to do something like:
        
        ```
        result = df[include].groupby('hh')['hh'].agg(['count'])
        result.rename(columns={'count':'hh size'}, inplace=True)
        merged = pd.merge(df, result, left_on='hh', right_index=True, how='left')
        ```
        
        Using this package, the command would be:
        
        ```
        from easyframes.easyframes import hhkit
        
        myhhkit = hhkit()
        df = myhhkit.egen(df, operation='count', groupby='hh', col='hh', column_label='hhsize')
        ```
        
        and Bob's your uncle:
        
        ```
           id  hh fridge  age  male  house_rooms  has_car  weighthh     prov       educ  hhsize
        0   1   1    yes   44     1            3        1         2       BC  secondary       3
        1   2   1    yes   43     0            3        1         2       BC   bachelor       3
        2   3   1    yes   13     1            3        1         2       BC    primary       3
        3   1   2     no   70     1            2        1         3  Alberta     higher       1
        4   1   3    yes   23     1            1        0         2       BC   bachelor       2
        5   2   3    yes   20     0            1        0         2       BC  secondary       2
        6   1   4     no   37     1            3        1         3  Alberta     higher       4
        7   2   4     no   35     0            3        1         3  Alberta     higher       4
        8   3   4     no    8     0            3        1         3  Alberta    primary       4
        9   4   4     no   15     0            3        1         3  Alberta    primary       4
        ```
        
        Ok, so it doesn't save much typing or space, but suppose you want to calculate the average age in the household. Here you would simply add
        ```
        df = myhhkit.egen(df, operation='mean', groupby='hh', col='age', column_label='mean age in hh')
        ```
        and the result:
        ```
           id  hh fridge  age  male  house_rooms  has_car  weighthh     prov       educ  hhsize  mean age in hh
        0   1   1    yes   44     1            3        1         2       BC  secondary       3       33.333333
        1   2   1    yes   43     0            3        1         2       BC   bachelor       3       33.333333
        2   3   1    yes   13     1            3        1         2       BC    primary       3       33.333333
        3   1   2     no   70     1            2        1         3  Alberta     higher       1       70.000000
        4   1   3    yes   23     1            1        0         2       BC   bachelor       2       21.500000
        5   2   3    yes   20     0            1        0         2       BC  secondary       2       21.500000
        6   1   4     no   37     1            3        1         3  Alberta     higher       4       23.750000
        7   2   4     no   35     0            3        1         3  Alberta     higher       4       23.750000
        8   3   4     no    8     0            3        1         3  Alberta    primary       4       23.750000
        9   4   4     no   15     0            3        1         3  Alberta    primary       4       23.750000
        ```
        
        You can also include or exclude certain rows. For example, suppose we want to include in household size only members over the age of 22:
        ```
        df = myhhkit.egen(df, operation='count', groupby='hh', col='hh', column_label='hhs_o22', include=df['age']>22)
        
        ```
        The result:
        ```
           id  hh fridge  age  male  house_rooms  has_car  weighthh     prov       educ  hhs_o22
        0   1   1    yes   44     1            3        1         2       BC  secondary        2
        1   2   1    yes   43     0            3        1         2       BC   bachelor        2
        2   3   1    yes   13     1            3        1         2       BC    primary        2
        3   1   2     no   70     1            2        1         3  Alberta     higher        1
        4   1   3    yes   23     1            1        0         2       BC   bachelor        1
        5   2   3    yes   20     0            1        0         2       BC  secondary        1
        6   1   4     no   37     1            3        1         3  Alberta     higher        2
        7   2   4     no   35     0            3        1         3  Alberta     higher        2
        8   3   4     no    8     0            3        1         3  Alberta    primary        2
        9   4   4     no   15     0            3        1         3  Alberta    primary        2
        ```
        You can also exclude members over 22 years of age:
        ```
        df = myhhkit.egen(df, operation='count', groupby='hh', col='hh', column_label='hhs_o22', 
        	exclude=df['age']>22)
        ```
        If you don't specify the column label, then a default is constructed:
        ```
        df = myhhkit.egen(df, operation='mean', groupby='hh', col='age')
        ```
        ```
           id  hh fridge  age  male  house_rooms  has_car  weighthh     prov       educ  (mean) age by hh
        0   1   1    yes   44     1            3        1         2       BC  secondary         33.333333
        1   2   1    yes   43     0            3        1         2       BC   bachelor         33.333333
        2   3   1    yes   13     1            3        1         2       BC    primary         33.333333
        3   1   2     no   70     1            2        1         3  Alberta     higher         70.000000
        4   1   3    yes   23     1            1        0         2       BC   bachelor         21.500000
        5   2   3    yes   20     0            1        0         2       BC  secondary         21.500000
        6   1   4     no   37     1            3        1         3  Alberta     higher         23.750000
        7   2   4     no   35     0            3        1         3  Alberta     higher         23.750000
        8   3   4     no    8     0            3        1         3  Alberta    primary         23.750000
        9   4   4     no   15     0            3        1         3  Alberta    primary         23.750000
        ```
Platform: UNKNOWN
