# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['concise_concepts',
 'concise_concepts.conceptualizer',
 'concise_concepts.examples']

package_data = \
{'': ['*']}

install_requires = \
['gensim>=4,<5',
 'scipy>=1.7,<2.0',
 'sense2vec>=2.0.1,<3.0.0',
 'spacy>=3,<4',
 'spaczz>=0.5.4,<0.6.0']

setup_kwargs = {
    'name': 'concise-concepts',
    'version': '0.8.0',
    'description': 'This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity confidence scores!',
    'long_description': '# Concise Concepts\nWhen wanting to apply NER to concise concepts, it is really easy to come up with examples, but pretty difficult to train an entire pipeline. Concise Concepts uses few-shot NER based on word embedding similarity to get you going\nwith easy! Now with entity scoring!\n\n\n[![Python package](https://github.com/Pandora-Intelligence/concise-concepts/actions/workflows/python-package.yml/badge.svg?branch=main)](https://github.com/Pandora-Intelligence/concise-concepts/actions/workflows/python-package.yml)\n[![Current Release Version](https://img.shields.io/github/release/pandora-intelligence/concise-concepts.svg?style=flat-square&logo=github)](https://github.com/pandora-intelligence/concise-concepts/releases)\n[![pypi Version](https://img.shields.io/pypi/v/concise-concepts.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/concise-concepts/)\n[![PyPi downloads](https://static.pepy.tech/personalized-badge/concise-concepts?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/concise-concepts/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)\n\n\n## Usage\nThis library defines matching patterns based on the most similar words found in each group, which are used to fill a [spaCy EntityRuler](https://spacy.io/api/entityruler). To better understand the rule definition, I recommend playing around with the [spaCy Rule-based Matcher Explorer](https://demos.explosion.ai/matcher).\n\n### Tutorials\n- [TechVizTheDataScienceGuy](https://www.youtube.com/c/TechVizTheDataScienceGuy) created a [nice tutorial](https://prakhar-mishra.medium.com/few-shot-named-entity-recognition-in-natural-language-processing-92d31f0d1143) on how to use it.\n\n- [I](https://www.linkedin.com/in/david-berenstein-1bab11105/) created a [tutorial](https://www.rubrix.ml/blog/concise-concepts-rubrix/) in collaboration with Rubrix.\n\nThe section [Matching Pattern Rules](#matching-pattern-rules) expands on the construction, analysis and customization of these matching patterns.\n\n\n# Install\n\n```\npip install concise-concepts\n```\n\n# Quickstart\n\nTake a look at the [configuration section](#configuration) for more info.\n\n## Spacy Pipeline Component\n\nNote that, [custom embedding models](#custom-embedding-models) are passed via `model_path`.\n\n```python\nimport spacy\nfrom spacy import displacy\n\nimport concise_concepts\n\ndata = {\n    "fruit": ["apple", "pear", "orange"],\n    "vegetable": ["broccoli", "spinach", "tomato"],\n    "meat": ["beef", "pork", "fish", "lamb"],\n}\n\ntext = """\n    Heat the oil in a large pan and add the Onion, celery and carrots.\n    Then, cook over a medium–low heat for 10 minutes, or until softened.\n    Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes.\n    Later, add some oranges and chickens. """\n\nnlp = spacy.load("en_core_web_lg", disable=["ner"])\n\nnlp.add_pipe(\n    "concise_concepts",\n    config={\n        "data": data,\n        "ent_score": True,  # Entity Scoring section\n        "verbose": True,\n        "exclude_pos": ["VERB", "AUX"],\n        "exclude_dep": ["DOBJ", "PCOMP"],\n        "include_compound_words": False,\n        "json_path": "./fruitful_patterns.json",\n    },\n)\ndoc = nlp(text)\n\noptions = {\n    "colors": {"fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon"},\n    "ents": ["fruit", "vegetable", "meat"],\n}\n\nents = doc.ents\nfor ent in ents:\n    new_label = f"{ent.label_} ({ent._.ent_score:.0%})"\n    options["colors"][new_label] = options["colors"].get(ent.label_.lower(), None)\n    options["ents"].append(new_label)\n    ent.label_ = new_label\ndoc.ents = ents\n\ndisplacy.render(doc, style="ent", options=options)\n```\n![](https://raw.githubusercontent.com/Pandora-Intelligence/concise-concepts/master/img/example.png)\n\n## Standalone\n\nThis might be useful when iterating over few_shot training data when not wanting to reload larger models continuously.\nNote that, [custom embedding models](#custom-embedding-models) are passed via `model`.\n\n```python\nimport gensim\nimport spacy\n\nfrom concise_concepts import Conceptualizer\n\nmodel = gensim.downloader.load("fasttext-wiki-news-subwords-300")\nnlp = spacy.load("en_core_web_sm")\ndata = {\n    "disease": ["cancer", "diabetes", "heart disease", "influenza", "pneumonia"],\n    "symptom": ["headache", "fever", "cough", "nausea", "vomiting", "diarrhea"],\n}\nconceptualizer = Conceptualizer(nlp, data, model)\nconceptualizer.nlp("I have a headache and a fever.").ents\n\ndata = {\n    "disease": ["cancer", "diabetes"],\n    "symptom": ["headache", "fever"],\n}\nconceptualizer = Conceptualizer(nlp, data, model)\nconceptualizer.nlp("I have a headache and a fever.").ents\n```\n\n# Configuration\n## Matching Pattern Rules\nA general introduction about the usage of matching patterns in the [usage section](#usage).\n### Customizing Matching Pattern Rules\nEven though the baseline parameters provide a decent result, the construction of these matching rules can be customized via the config passed to the spaCy pipeline.\n\n - `exclude_pos`: A list of POS tags to be excluded from the rule-based match.\n - `exclude_dep`: A list of dependencies to be excluded from the rule-based match.\n - `include_compound_words`:  If True, it will include compound words in the entity. For example, if the entity is "New York", it will also include "New York City" as an entity.\n - `case_sensitive`: Whether to match the case of the words in the text.\n\n\n### Analyze Matching Pattern Rules\nTo motivate actually looking at the data and support interpretability, the matching patterns that have been generated are stored as `./main_patterns.json`. This behavior can be changed by using the `json_path` variable via the config passed to the spaCy pipeline.\n\n\n## Fuzzy matching using `spaczz`\n\n - `fuzzy`: A boolean value that determines whether to use fuzzy matching\n\n```python\ndata = {\n    "fruit": ["apple", "pear", "orange"],\n    "vegetable": ["broccoli", "spinach", "tomato"],\n    "meat": ["beef", "pork", "fish", "lamb"]\n}\n\nnlp.add_pipe("concise_concepts", config={"data": data, "fuzzy": True})\n```\n\n## Most Similar Word Expansion\n\n- `topn`: Use a specific number of words to expand over.\n\n```python\ndata = {\n    "fruit": ["apple", "pear", "orange"],\n    "vegetable": ["broccoli", "spinach", "tomato"],\n    "meat": ["beef", "pork", "fish", "lamb"]\n}\n\ntopn = [50, 50, 150]\n\nassert len(topn) == len\n\nnlp.add_pipe("concise_concepts", config={"data": data, "topn": topn})\n```\n\n## Entity Scoring\n\n- `ent_score`: Use embedding based word similarity to score entities against their groups\n\n```python\nimport spacy\nimport concise_concepts\n\ndata = {\n    "ORG": ["Google", "Apple", "Amazon"],\n    "GPE": ["Netherlands", "France", "China"],\n}\n\ntext = """Sony was founded in Japan."""\n\nnlp = spacy.load("en_core_web_lg")\nnlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True, "case_sensitive": True})\ndoc = nlp(text)\n\nprint([(ent.text, ent.label_, ent._.ent_score) for ent in doc.ents])\n# output\n#\n# [(\'Sony\', \'ORG\', 0.5207586), (\'Japan\', \'GPE\', 0.7371268)]\n```\n\n## Custom Embedding Models\n\n- `model_path`: Use `sense2vec.Sense2Vec`, `gensim.Word2vec` `gensim.FastText`, or `gensim.KeyedVectors` model from the [pre-trained gensim](https://radimrehurek.com/gensim/downloader.html) library or a custom model path.\n- `model`: within [standalone usage](#standalone), it is possible to pass these models directly.\n\n```python\ndata = {\n    "fruit": ["apple", "pear", "orange"],\n    "vegetable": ["broccoli", "spinach", "tomato"],\n    "meat": ["beef", "pork", "fish", "lamb"]\n}\n\n# model from https://radimrehurek.com/gensim/downloader.html or path to local file\nmodel_path = "glove-wiki-gigaword-300"\n\nnlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})\n````\n',
    'author': 'David Berenstein',
    'author_email': 'david.m.berenstein@gmail.com',
    'maintainer': 'None',
    'maintainer_email': 'None',
    'url': 'https://github.com/pandora-intelligence/concise-concepts',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.8,<3.12',
}


setup(**setup_kwargs)
