Metadata-Version: 1.2
Name: data_patterns
Version: 0.1.14
Summary: Package for generating and evaluating patterns in quantitative reports
Home-page: https://github.com/DeNederlandscheBank/data-patterns
Author: De Nederlandsche Bank
Author-email: ECDB_berichten@dnb.nl
License: MIT/X license
Description: =============
        data-patterns
        =============
        
        
        .. image:: https://img.shields.io/pypi/v/data_patterns.svg
                :target: https://pypi.python.org/pypi/data_patterns
                :alt: Pypi Version
        .. image:: https://img.shields.io/travis/DeNederlandscheBank/data-patterns.svg
                :target: https://travis-ci.org/DeNederlandscheBank/data-patterns
                :alt: Build Status
        .. image:: https://readthedocs.org/projects/data-patterns/badge/?version=latest
                :target: https://data-patterns.readthedocs.io/en/latest/?badge=latest
                :alt: Documentation Status
        .. image:: https://img.shields.io/badge/License-MIT/X-blue.svg
                :target: https://github.com/DeNederlandscheBank/data-patterns/blob/master/LICENSE
                :alt: License
        
        Package for generating and evaluating data-patterns in quantitative reports
        
        * Free software: MIT/X license
        * Documentation: https://data-patterns.readthedocs.io.
        
        
        Features
        --------
        
        Here is what the package does:
        
        - Generating and evaluating patterns in structured datasets and exporting to Excel and JSON
        - Transforming generated patterns into XBRL validation rules and Pandas code
        - Evaluating reporting data with data quality rules published by De Nederlandsche Bank (to be provided)
        
        Quick overview
        --------------
        
        To install the package
        
        ::
        
            pip install data_patterns
            
        
        To introduce the features of the this package define the following Pandas DataFrame::
        
            df = pd.DataFrame(columns = ['Name',       'Type',             'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'],
                              data   = [['Insurer  1', 'life insurer',     1000,     800,       0,             200,         200], 
                                        ['Insurer  2', 'non-life insurer', 4000,     0,         3200,          800,         800], 
                                        ['Insurer  3', 'non-life insurer', 800,      0,         700,           100,         100],
                                        ['Insurer  4', 'life insurer',     2500,     1800,      0,             700,         700], 
                                        ['Insurer  5', 'non-life insurer', 2100,     0,         2200,          200,         200], 
                                        ['Insurer  6', 'life insurer',     9000,     8800,      0,             200,         200],
                                        ['Insurer  7', 'life insurer',     9000,     0,         8800,          200,         200],
                                        ['Insurer  8', 'life insurer',     9000,     8800,      0,             200,         200],
                                        ['Insurer  9', 'non-life insurer', 9000,     0,         8800,          200,         200],
                                        ['Insurer 10', 'non-life insurer', 9000,     0,         8800,          200,         199.99]])
            df.set_index('Name', inplace = True)
        
        Start by defining a PatternMiner::
        
            miner = data_patterns.PatternMiner(df)
        
        To generate patterns use the find-function of this object::
        
            df_patterns = miner.find({'name'      : 'equal values', 
                                      'pattern'   : '=',
                                      'parameters': {"min_confidence": 0.5,
                                                     "min_support"   : 2}})
        
        The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains
        
        +----+--------------+------------+--------------+------------+--------+-----------+----------+
        | id |pattern_id    |P columns   |relation type |Q columns   |support |exceptions |confidence|
        +====+==============+============+==============+============+========+===========+==========+
        |  0 |equal values  |[Own funds] |=             |[Excess]    |9       |1          |0.9       |
        +----+--------------+------------+--------------+------------+--------+-----------+----------+
        |  1 |equal values  |[Excess]    |=             |[Own funds] |9       |1          |0.9       | 
        +----+--------------+------------+--------------+------------+--------+-----------+----------+
        
        The miner finds two patterns; the first states that the 'Own funds'-column is identical to the 'Excess'-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold), and the second pattern is identical to the first but with the columns reversed.
        
        We can also find patterns using expressions::
        
            df_patterns = miner.find({'name'      : 'equal values', 
                                      'expression'   : '{.*.*}={.*.*}',
                                      'parameters': {"min_confidence": 0.5,
                                                     "min_support"   : 2}})
                                                     
        This will give the same result.
        
        Expressions can be written as followed:
        
        1. Put it in a structure like above
        2. Columns are given with '{}', example: '{Assests} > 0'
        3. If you want to find matches with columns you can do '{.*.*}' (this will match all columns), example: '{.*TV.*} > 0' (will match TV-life and TV-nonlife)
        4. Conditional statements go with IF, THEN together with & and | (and/or), example: 'IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)' Note: AND is only used when you want the reverse of this statement, such as 'IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0) AND IF ({.*TV-life.*} = 0) THEN ~({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)'
        5. Use "@" if you do not have a specific value, example: 'IF ({.*Ty.*} = "@") THEN ({.*As.*} = "@")'
        
        To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input::
        
            df_results = miner.analyze(df)
        
        The result is a DataFrame with the results. If we select ``result_type = False`` then the first part of the output contains
        
        +-----------+--------------+-------------+------------+-------------+------------+---------+---------+
        |index      |result_type   |pattern_id   |P columns   |relation type|Q columns   |P values |Q values |
        +-----------+--------------+-------------+------------+-------------+------------+---------+---------+
        |Insurer 10 |False         |equal values |[Own funds] |=            |[Excess]    |[200]    |[199.99] |
        +-----------+--------------+-------------+------------+-------------+------------+---------+---------+
        |Insurer 10 |False         |equal values |[Excess]    |=            |[Own funds] |[199.99] |[200]    |
        +-----------+--------------+-------------+------------+-------------+------------+---------+---------+
        
        Other patterns you can use are '>', '<', '<=', '>=', '!=', 'sum', and '-->'. 
        
        Read the documentation for more features.
        
        
        
        =======
        History
        =======
        
        0.1.0 (2019-10-27)
        ------------------
        
        * Development release.
        
        0.1.11 (2019-11-6)
        ------------------
        
        * First release on PyPI.
        
Keywords: data_patterns
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.0, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
