Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# project-specific
tmp/
test-download/
vault-token.dat

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
56 changes: 41 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,10 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOAD
- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with `--vault-token /path/to/vault-token.dat`. See [Registration (Access Token)](#registration-access-token) for details on how to get a vault token.
- `--databus-key`
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.
- `--convert-to`
- Enables on-the-fly compression format conversion during download. Supported formats: `bz2`, `gz`, `xz`. Downloaded files will be automatically decompressed and recompressed to the target format. Example: `--convert-to gz` converts all downloaded compressed files to gzip format.
- `--convert-from`
- Optional filter to specify which source compression format should be converted. Use with `--convert-to` to convert only files with a specific compression format. Example: `--convert-to gz --convert-from bz2` converts only `.bz2` files to `.gz`, leaving other formats unchanged.

**Help and further information on download command:**
```bash
Expand All @@ -178,23 +182,33 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
Usage: databusclient download [OPTIONS] DATABUSURIS...

Download datasets from databus, optionally using vault access if vault
options are provided.
options are provided. Supports on-the-fly compression format conversion
using --convert-to and --convert-from options.

Options:
--localdir TEXT Local databus folder (if not given, databus folder
structure is created in current working directory)
--databus TEXT Databus URL (if not given, inferred from databusuri,
e.g. https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected databus
--all-versions When downloading artifacts, download all versions
instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
--clientid TEXT Client ID for token exchange [default: vault-token-
exchange]
--help Show this message and exit.
--localdir TEXT Local databus folder (if not given, databus
folder structure is created in current working
directory)
--databus TEXT Databus URL (if not given, inferred from
databusuri, e.g.
https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected
databus
--all-versions When downloading artifacts, download all
versions instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol
/openid-connect/token]
--clientid TEXT Client ID for token exchange [default: vault-
token-exchange]
--convert-to [bz2|gz|xz] Target compression format for on-the-fly
conversion during download (supported: bz2, gz,
xz)
--convert-from [bz2|gz|xz] Source compression format to convert from
(optional filter). Only files with this
compression will be converted.
--help Show this message and exit.
```

#### Examples of using the download command
Expand Down Expand Up @@ -247,6 +261,18 @@ databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHER
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
```

**Download with Compression Conversion**: download files and convert them to a different compression format on-the-fly
```bash
# Convert all compressed files to gzip format
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --convert-to gz

# Convert only bz2 files to xz format, leaving other compressions unchanged
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --convert-to xz --convert-from bz2

# Download a collection and unify all files to bz2 format
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --convert-to bz2
```

<a id="cli-deploy"></a>
### Deploy

Expand Down
Loading