Metadata-Version: 2.1
Name: cvmfs-server-scraper
Version: 0.0.1
Summary: Scrape metadata from CVMFS Stratum servers.
Home-page: https://github.com/eessi/cvmfs-server-scraper
Author: Terje Kvernes
Author-email: terje@kvernes.no
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
License-File: LICENSE


# CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

    - cvmfs/info/v1/repositories.json 

And then for every repo it finds (that it's not told to ignore), it grabs:

    - cvmfs/<repo>/.cvmfs_status.json
    - cvmfs/<repo>/.cvmfspublished

# Usage

````python
#!/usr/bin/env python3

from cvmfsscraper.main import scrape, scrape_server

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

servers = scrape(
    servers = [
        "aws-eu-west1.stratum1.cvmfs.eessi-infra.org",
        "bgo-no.stratum1.cvmfs.eessi-infra.org",
    ],
    ignore_repos = [
        "ci.eessi-hpc.org",
    ],
)

print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))
````

# Data structure

## Server 

A server object, representing a specific server that has been scraped.

````python
servers = scrape(...)
server_one = servers[0]
````

### Name
 
#### Type: Attribute

`server.name`

#### Returns

The name of the server, usually its fully qualified domain name.


### GeoApi status

#### Type: Attribute

`server.geoapi_status`

#### Returns

An integer value within `[0, 1, 2, 9]`, with the following meaning:

- 0 : OK
- 1 : GeoApi gives wrong location
- 2 : No response
- 9 : The server has no repository available so the GeoApi cannot be tested

### Repositories 

#### Type: attribute

`server.repositories`

#### Returns
A list of repository objects, empty if no repositores are scraped on the server.

### Ignored repositories

#### Type: Attribute

`server.ignored_repositories`

#### Returns:

List of repositories names that are to be ignored by the scraper.

### Forced repositories

#### Type: Attribute

`server.forced_repositories`

#### Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

## Repository

A repository object, representing a single repository on a scraped server.

````python
servers = scrape(...)
repo_one = servers[0].repositories[0]
````

### Name

#### Type: Attribute

`repo_one.name`

#### Returns

The fully qualified name of the repository.

### Server

#### Type: Attribute

`repo_one.server`

#### Returns

The server object to which the repository belongs.


### Path

#### Type: Attribute

`repo_one.path`

#### Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

`url = "http://" + repo_one.server.name + repo_one.path`

### Status attributes:

These attributes are populated from `cvmfs_status.json`:

| Attribute | Value |
| --- | --- |
| last_gc | Timestamp of last garbage collection |
| last_snapshot | Timestamp of the last snapshot |

Information from `.cvmfspublished` is also provided. For explanations for these keys, please see CMVFS' [official documentation](https://cvmfs.readthedocs.io/en/stable/cpt-details.html). The field value in the table is the field key from `.cvmfspublished`.

| Attribute | Field |
| --- | --- |
| alternative_name | A |
| full_name | N |
| is_garbage_collectable | G |
| metadata_cryptographic_hash | M |
| micro_cataogues | L |
| reflog_checksum_cryptographic_hash | Y |
| revision_timestamp | T |
| root_catalogue_ttl | D |
| root_cryptographic_hash | C |
| root_size | B |
| root_path_hash | R |
| signature | The end signature blob |
| signing_certificate_cryptographic_hash | X |
| tag_history_cryptographic_hash | H |


