Metadata-Version: 2.1
Name: comb-py
Version: 0.0.1a0
Summary: Configure experiments
Home-page: https://github.com/stes/comb
Author: Steffen Schneider
Author-email: stes@hey.com
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/stes/comb/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: install_requires
License-File: LICENSE

![](./logo.svg)

A minimal package for configuring and exploring parameter `comb`inations of machine learning experiments.

`comb` is a small and simplistic library. `comb` is able to

- quickly and cleanly define grid searches and random searches, and handle parameter dependencies
- add configuration options to existing projects. Existing modules and packages can be turned into a registry, enabling python classes to become accessible and searchable via a string name.


For more complicated workflows, configuration packages like [hydra](https://github.com/facebookresearch/hydra) are better suited.

## Installation

Install `comb-py` from [pypi](https://pypi.org/project/comb-py/):

```python
$ pip install comb-py
```

and import it via

```python
import comb
```

`comb` does not have any dependencies beyond the python standard library. It works for `python>=3.7`.

## Defining an experiment (`comb.sweep`)

In your project, create sweeps directly in python ─ create the following file under `sweep/my_first_experiment.py`

```python
from comb import sweep
from comb.types import zipped, grid

@sweep.register("example-sweep")
class MyExperiment(sweep.Sweep):

    @property
    def script(self):
        return "my/exp.py"

    def get_random_args(self):
        return dict(
            # define a method to sample arguments from
            foo = np.random.choice([42, 73])
        )

    def get_fixed_args(self):
        return dict(
            # zip N dependen arguments together
            bar = zipped("hello", "check out"),
            baz = zipped("world", "comb"),
            # define a search grid (1x2 combinations)
            # over two parameters
            blubb  = grid("star"),
            blubb2 = grid("wars", "treck"),
        )
```

and generate a joblist using

```bash
$ python -m comb example-sweep
exp.py --bar hello --baz world --blubb star --blubb2 wars --foo 73
exp.py --bar hello --baz world --blubb star --blubb2 treck --foo 73
exp.py --bar check out --baz comb --blubb star --blubb2 wars --foo 73
exp.py --bar check out --baz comb --blubb star --blubb2 treck --foo 73
```

## Parametrizing an experiment (`comb.registry`)

`comb` makes it very easy to reference design choices within your experiment by names. Suppose you wanted to add a few datasets and loss functions to a machine learning experiment.

Turn your python module packages or packages into registries by a simple call to `comb.registry.add_helper_functions`:

``` python
# datasets.py
from comb import registry
registry.add_helper_functions(__name__)

@register("mnist")
class MNIST(): pass 

@register("svhn")
class SVHN(): pass 


# loss_functions.py
from comb import registry
registry.add_helper_functions(__name__)

@register("mse")
class MeanSquaredError(): pass

@register("infonce")
class InfoNCE(): pass
```

Afterwards, you can easily list and instantiate your functions:

``` python
>>> import datasets
>>> datasets.get_options("*")
mnist svhn
>>> datasets.init("mnist")
MNIST()
```

## Scheduling experiments

`comb` does not attempt to provide ways to actually launch these experiments ─ there are plenty tools better suited for this.
To name a few suggestions, the following workflows are possible:

### Using GNU parallel

Scheduling a maximum of 2 consecutive jobs via [GNU parallel](https://www.gnu.org/software/parallel/) (similar results can be achived via e.g. `xargs`):

``` bash
$ python -m comb bash-example || exit 1 | parallel --jobs 2 'echo Scheduling job {#}; eval {}'
```

### Using SLURM

Scheduling a job array via slurm:

``` bash
mkdir -p submitted
python -m comb bash-example > jobs.lst
num_jobs=$(wc -l jobs.lst)
jobid=$(sbatch -a 1-${num_jobs} --wrap 'job=$(sed -n ${SLURM_ARRAY_TASK_ID}p jobs.lst); srun ${job}')
mv jobs.lst submitted/{jobid}.lst 
```

## License

`comb` is released under an MIT License.

