Metadata-Version: 2.1
Name: pytorch-warmup
Version: 0.1.0
Summary: A PyTorch Extension for Learning Rate Warmup
Home-page: https://github.com/Tony-Y/pytorch_warmup
Author: Takenori Yamamoto
Author-email: yamamoto.takenory@gmail.com
License: UNKNOWN
Description: # A PyTorch Extension for Learning Rate Warmup
        
        This library contains PyTorch implementations of the warmup schedules described in [On the adequacy of untuned warmup for adaptive optimization](https://arxiv.org/abs/1910.04209).
        
        <p align="center"><img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/plots/figs/warmup_schedule.png" alt="Warmup schedule" width="400"/></p>
        
        ![Python package](https://github.com/Tony-Y/pytorch_warmup/workflows/Python%20package/badge.svg)
        [![PyPI version shields.io](https://img.shields.io/pypi/v/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
        [![PyPI license](https://img.shields.io/pypi/l/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
        [![PyPI pyversions](https://img.shields.io/pypi/pyversions/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
        
        ## Installation
        
        Make sure you have Python 3.6+ and PyTorch 1.1+. Then, run the following command:
        
        ```
        python setup.py install
        ```
        
        or
        
        ```
        pip install -U pytorch_warmup
        ```
        
        ## Usage
        
        ### Sample Codes
        
        The scheduled learning rate is dampened by the multiplication of the warmup factor:
        
        <p align="center"><img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/emnist/figs/learning_rate.png" alt="Learning rate" width="400"/></p>
        
        #### Approach 1
        [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach1_chaining.ipynb)
        
        When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used as follows:
        
        ```python
        import torch
        import pytorch_warmup as warmup
        
        optimizer = torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
        num_steps = len(dataloader) * num_epochs
        lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
        warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
        for epoch in range(1,num_epochs+1):
            for batch in dataloader:
                optimizer.zero_grad()
                loss = ...
                loss.backward()
                optimizer.step()
                with warmup_scheduler.dampening():
                    lr_scheduler.step()
        ```
        
        If you want to use the learning rate schedule "chaining" which is supported for PyTorch 1.4.0 or above, you may simply give a code of learning rate schedulers as a suite of the `with` statement:
        ```python
        lr_scheduler1 = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
        lr_scheduler2 = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
        warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
        for epoch in range(1,num_epochs+1):
            for batch in dataloader:
                ...
                optimizer.step()
                with warmup_scheduler.dampening():
                    lr_scheduler1.step()
                    lr_scheduler2.step()
        ```
        
        #### Approach 2
        [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach2_chaining.ipynb)
        
        When the learning rate schedule uses the epoch number, the warmup schedule can be used as follows:
        
        ```python
        lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[num_epochs//3], gamma=0.1)
        warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
        for epoch in range(1,num_epochs+1):
            for iter, batch in enumerate(dataloader):
                optimizer.zero_grad()
                loss = ...
                loss.backward()
                optimizer.step()
                if iter < len(dataloader)-1:
                    with warmup_scheduler.dampening():
                        pass
            with warmup_scheduler.dampening():
                lr_scheduler.step()
        ```
        
        ### Warmup Schedules
        
        #### Manual Warmup
        
        The warmup factor `w(t)` depends on the warmup period, which must manually be specified, for `LinearWarmup` and `ExponentialWarmup`.
        
        ##### Linear
        
        `w(t) = min(1, t / warmup_period)`
        
        ```python
        warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period=2000)
        ```
        
        ##### Exponential
        
        `w(t) = 1 - exp(-t / warmup_period)`
        
        ```python
        warmup_scheduler = warmup.ExponentialWarmup(optimizer, warmup_period=1000)
        ```
        
        #### Untuned Warmup
        
        The warmup period is given by a function of Adam's `beta2` parameter for `UntunedLinearWarmup` and `UntunedExponentialWarmup`.
        
        ##### Linear
        
        `warmup_period = 2 / (1 - beta2)`
        
        ```python
        warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
        ```
        
        ##### Exponential
        
        `warmup_period = 1 / (1 - beta2)`
        
        ```python
        warmup_scheduler = warmup.UntunedExponentialWarmup(optimizer)
        ```
        
        #### RAdam Warmup
        
        The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. Please see the original paper for the details.
        
        ```python
        warmup_scheduler = warmup.RAdamWarmup(optimizer)
        ```
        
        ### Apex's Adam
        
        The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used with the warmup schedulers. For example:
        
        [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_FusedAdam.ipynb)
        
        ```python
        optimizer = apex.optimizers.FusedAdam(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
        lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
        warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
        ```
        
        
        ## License
        
        MIT License
        
        Copyright (c) 2019 Takenori Yamamoto
        
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
