# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['top_github_scraper']

package_data = \
{'': ['*']}

install_requires = \
['bs4>=0.0.1,<0.0.2',
 'ipython>=7.21.0,<8.0.0',
 'pandas>=1.2.2,<2.0.0',
 'python-dotenv>=0.15.0,<0.16.0',
 'requests>=2.25.1,<3.0.0',
 'rich>=9.12.0,<10.0.0',
 'tqdm>=4.58.0,<5.0.0']

setup_kwargs = {
    'name': 'top-github-scraper',
    'version': '0.1.1',
    'description': 'Scrape top GitHub repositories and users based on keyword',
    'long_description': '# Top Github Users Scraper\n\nScrape top Github repositories and users based on keywords. \n\nI used this tool to analyze the top 1k machine learning users in [this article](https://towardsdatascience.com/i-scraped-more-than-1k-top-machine-learning-github-profiles-and-this-is-what-i-found-1ab4fb0c0474?sk=68156d6b1c05614d356645728fe02584).\n\n![demo](https://github.com/khuyentran1401/top-github-scraper/blob/master/figures/demo.gif?raw=True)\n\n## Setup\n\n**Installation**\n```bash\npip install top-github-scraper\n```\n**Add Credentials**\n\nTo make sure you can scrape many repositories and users, add your GitHub\'s credentials to `.env` file.\n```bash\ntouch .env\n```\nAdd your username and [token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) to `.env` file:\n```bash\nGITHUB_USERNAME=yourusername\nGITHUB_TOKEN=yourtoken\n```\n## Usage\n### Get Top Github Repositories\' URLs\n```python\nfrom top_github_scraper import get_top_repo_urls\n\nget_top_repo_urls(keyword="machine learning", stop_page=20)\n```\n\nOutput at `top_repo_urls_<keyword>_<sort_by>_<start_page>_<end_page>.json`:\n```python\n[\n    "/josephmisiti/awesome-machine-learning",\n    "/wepe/MachineLearning",\n    "/udacity/machine-learning",\n    "/Jack-Cherish/Machine-Learning",\n    "/ZuzooVn/machine-learning-for-software-engineers",\n    "/rasbt/python-machine-learning-book",\n    "/lawlite19/MachineLearning_Python",\n    "/lazyprogrammer/machine_learning_examples",\n    "/trekhleb/homemade-machine-learning",\n    "/ujjwalkarn/Machine-Learning-Tutorials"\n]\n```\n\n### Get Top Github Repositories\' Information\n```python\nfrom top_github_scraper import get_top_repos\n\nget_top_repos("machine learning", stop_page=20)\n```\nOutput for 1 repository at `top_repo_info_<keyword>_<sort_by>_<start_page>_<end_page>.json` :\n```python\n{\n        "stargazers_count": 48620,\n        "forks_count": 12155,\n        "contributors": {\n            "login": [\n                "josephmisiti",\n                "josephmmisiti",\n                "hslatman",\n                "0asa",\n                "ajkl",\n                "ipcenas",\n                "cogmission",\n                "spekulatius",\n                "basickarl",\n                "NathanEpstein"\n            ],\n            "url": [\n                "https://api.github.com/users/josephmisiti",\n                "https://api.github.com/users/josephmmisiti",\n                "https://api.github.com/users/hslatman",\n                "https://api.github.com/users/0asa",\n                "https://api.github.com/users/ajkl",\n                "https://api.github.com/users/ipcenas",\n                "https://api.github.com/users/cogmission",\n                "https://api.github.com/users/spekulatius",\n                "https://api.github.com/users/basickarl",\n                "https://api.github.com/users/NathanEpstein"\n            ],\n            "contributions": [\n                671,\n                105,\n                21,\n                12,\n                11,\n                9,\n                8,\n                7,\n                7,\n                7\n            ]\n        }\n    }\n```\n\n### Get Top Github Contributors\' Profiles\n```python\nfrom top_github_scraper import get_top_contributors\n\nget_top_contributors("machine learning", stop_page=20)\n```\nOutput at `top_contributor_info_<keyword>_<sort_by>_<start_page>_<end_page>.csv`:\n\n|| login | url | type | name | company | location | email | hireable | bio | public_repos | public_gists | followers |following\n| ------------- |:-------------:|:-------------:| :-----:| :-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n| 0 | josephmisiti | https://api.github.com/users/josephmisiti | User | Joseph Misiti | Math & Pencil |"Brooklyn, NY"|  | True | Mathematician & Co-founder of Math & Pencil|229|142|2705|275\n1|josephmmisiti|https://api.github.com/users/josephmmisiti|User|||||||0|0|2|0\n2|hslatman|https://api.github.com/users/hslatman|User|Herman Slatman|DistributIT|||||133|20|469|67\n3|0asa|https://api.github.com/users/0asa|User|Vincent Botta| | Belgium|||"Innovation Engineer @evs-broadcast, previously Data Scientist @kensuio, E-Marketing Tools Manager @Diagenode, cofounder @Antibody-Adviser and photographer"|35|15|25|16\n4|ajkl|https://api.github.com/users/ajkl|User|Ajinkya Kale|||kaleajinkya@gmail.com|||58|1|29|4\n5|ipcenas|https://api.github.com/users/ipcenas|User|||||||79|0|1|0\n6|cogmission|https://api.github.com/users/cogmission|User|David Ray||Third planet from the sun...|cognitionmission@gmail.com||Humanity\'s freedom and abundance through the pursuit of technological innovation in the area of cognitive applications - Cognition Mission|30|19|54|44\n7|spekulatius|https://api.github.com/users/spekulatius|User|Peter Thaleikis|@bringyourownideas |127.0.0.1||True|Software engineer focused on solutions using open source and simply filling in the gaps to fulfill the requirements.|42|1|232|920\n8|basickarl|https://api.github.com/users/basickarl|User|Karl Morrison||"Malmö, Sweden"|karl@basickarl.io||The question is: Will you take me seriously|5|1|12|6\n9|NathanEpstein|https://api.github.com/users/NathanEpstein|User|Nathan Epstein||"New York, NY"|nathanepst@gmail.com|True||23|12|208|0\n\n### Get Top Github Users\' Profiles\n```python\nfrom top_github_scraper import get_top_users\n\nget_top_users("machine learning", stop_page=20)\n```\nOutput at `top_user_info_<keyword>_<start_page>_<end_page>.csv`\n\n|| login | url | type | name | company | location | email | hireable | bio | public_repos | public_gists | followers |following\n| ------------- |:-------------:|:-------------:| :-----:| :-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n0|rasbt|https://api.github.com/users/rasbt|User|Sebastian Raschka|UW-Madison|"Madison, WI"|||"Machine Learning researcher & open source contributor. Author of ""Python Machine Learning."" Asst. Prof. of Statistics @ UW-Madison."|71|5|13888|35\n1|tqchen|https://api.github.com/users/tqchen|User|Tianqi Chen|"CMU, OctoML"||||Large scale Machine Learning|28|1|8611|126\n2|halfrost|https://api.github.com/users/halfrost|User|halfrost|@Alibaba | Shanghai China|i@halfrost.com||💪天道酬勤，勤能补拙。博观而约取，厚积而薄发。Gopher / Rustacean / iOS Dev. / Machine Learning / Retired acmer / Math / Philosophy / Technical Writer.|22|0|8566|314\n3|ageron|https://api.github.com/users/ageron|User|Aurélien Geron||Paris|||Author of the book Hands-On Machine Learning with Scikit-Learn and TensorFlow. Former PM of YouTube video classification and founder & CTO of a telco operator.|43|16|8383|2\n4|chiphuyen|https://api.github.com/users/chiphuyen|User|Chip Huyen|https://snorkel.ai|"Mountain View, CA"||True|Developing tools and best practices for machine learning production.|19|1|7839|15\n5|rhiever|https://api.github.com/users/rhiever|User|Randy Olson|FOXO BioScience|"Vancouver, WA"|rso@randalolson.com||"Chief Data Scientist, @FOXOBioScience. AI, Machine Learning, and Data Visualization specialist. Community leader for /r/DataIsBeautiful."|77|17|5363|13\n6|lexfridman|https://api.github.com/users/lexfridman|User|Lex Fridman|MIT|"Cambridge, MA"|||"AI researcher working on autonomous vehicles, human-robot interaction, and machine learning at MIT and beyond."|2|0|5031|0\n7|eriklindernoren|https://api.github.com/users/eriklindernoren|User|Erik Linder-Norén||"Stockholm, Sweden"|eriklindernoren@gmail.com||"ML engineer at Apple. Excited about machine learning, basketball and building things."|24|0|3764|11\n8|roboticcam|https://api.github.com/users/roboticcam|User|A/Prof Richard Xu                 徐亦达教授|University of Technology Sydney|Sydney Australia|||"I am an A/Professor in Machine Learning at UTS. manage a large research team of postdoc, PhD students close to 30 people"|10|0|3561|0\n9|ogrisel|https://api.github.com/users/ogrisel|User|Olivier Grisel|Inria|"Paris, France"|olivier.grisel@ensta.org||Machine Learning Engineer a Inria Saclay (Parietal team).|174|93|3237|116\n\n### Parameters\n\n* **get_top_urls**\n    * `keyword` : str\n        Keyword to search for (.i.e, machine learning)\n    * `sort_by`: str \n        sort by best match or most stars, by default `\'\'`, which will sort by best match. \n        Use `\'stars\'` to sort by most stars.\n    * `save_path` : str, optional\n        where to save the output file, by default `"top_repo_urls"`\n    * `start_page` : int, optional\n        page number to start scraping from, by default `1`\n    * `stop_page` : int, optional\n        page number of the last page to scrape, by default `50`\n* **get_top_repos**\n    * `keyword` : str\n        Keyword to search for (.i.e, machine learning)\n    * `sort_by`: str \n        sort by best match or most stars, by default `\'\'`, which will sort by best match. \n        Use `\'stars\'` to sort by most stars.\n    * `max_n_top_contributors`: int\n        number of top contributors in each repository to scrape from, by default `10`\n    * `start_page` : int, optional\n        page number to start scraping from, by default `1`\n    * `stop_page` : int, optional\n        page number of the last page to scrape, by default `50`\n    * `url_save_path` : str, optional\n        where to save the output file of URLs, by default `"top_repo_urls"`\n    * `repo_save_path` : str, optional\n        where to save the output file of repositories\' information, by default `"top_repo_info"`\n* **get_top_users**\n    * `keyword` : str\n        Keyword to search for (.i.e, machine learning)\n    * `sort_by`: str \n        sort by best match or most stars, by default \'\', which will sort by best match. \n        Use \'stars\' to sort by most stars.\n    * `max_n_top_contributors`: int\n        number of top contributors in each repository to scrape from, by default `10`\n    * `start_page` : int, optional\n        page number to start scraping from, by default `1`\n    * `stop_page` : int, optional\n        page number of the last page to scrape, by default `50`\n    * `url_save_path` : str, optional\n        where to save the output file of URLs, by default `"top_repo_urls"`\n    * `repo_save_path` : str, optional\n        where to save the output file of repositories\' information, by default `"top_repo_info"`\n    * `user_save_path` : str, optional\n        where to save the output file of users\' profiles, by default `"top_contributor_info"`\n* **get_top_user_urls**\n    * `keyword` : str\n        Keyword to search for (.i.e, machine learning)\n    * `save_path` : str, optional\n        where to save the output file, by default `"top_repo_urls"`\n    * `start_page` : int, optional\n        page number to start scraping from, by default `1`\n    * `stop_page` : int, optional\n        page number of the last page to scrape, by default `50`\n## How the Data is Scraped\n\n`top-github-scraper` scrapes the owners as well as the contributors of the top repositories that pop up in the search when searching for a specific keyword on GitHub.\n\n![image](https://github.com/khuyentran1401/top-github-scraper/blob/master/figures/machine_learning_results.png?raw=True)\n\nFor each user, `top-github-scraper` scrapes 16 data points:\n* `login`: username\n* `url`: URL of the user\n* `contributions`: Number of contributions to the repository that the user is scraped from\n* `stargazers_count`: Number of stars of the repository that the user is scraped from\n* `forks_count`: Number of forks of the repository that the user is scraped from\n* `type`: Whether this account is a user or an organization\n* `name`: Name of the user\n* `company`: User\'s company\n* `location`: User\'s location\n* `email`: User\'s email\n* `hireable`: Whether the user is hireable\n* `bio`: Short description of the user\n* `public_repos`: Number of public repositories the user has (including forked repositories)\n* `public_gists`: Number of public repositories the user has (including forked gists)\n* `followers`: Number of followers the user has\n* `following`: Number of people the user is following\n\n',
    'author': 'khuyentran1401',
    'author_email': 'khuyentran1476@gmail.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/khuyentran1401/top-github-scraper',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.7.1,<4.0',
}


setup(**setup_kwargs)
