Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# project-specific
tmp/
test-download/
vault-token.dat

# Byte-compiled / optimized / DLL files
Expand Down
56 changes: 41 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOAD
Note: Vault tokens are only required for certain protected Databus hosts (for example: `data.dbpedia.io`, `data.dev.dbpedia.link`). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass `--vault-token` for public downloads.
- `--databus-key`
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.
- `--convert-to`
- Enables on-the-fly compression format conversion during download. Supported formats: `bz2`, `gz`, `xz`. Downloaded files will be automatically decompressed and recompressed to the target format. Example: `--convert-to gz` converts all downloaded compressed files to gzip format.
- `--convert-from`
- Optional filter to specify which source compression format should be converted. Use with `--convert-to` to convert only files with a specific compression format. Example: `--convert-to gz --convert-from bz2` converts only `.bz2` files to `.gz`, leaving other formats unchanged.

**Help and further information on download command:**
```bash
Expand All @@ -186,23 +190,33 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
Usage: databusclient download [OPTIONS] DATABUSURIS...

Download datasets from databus, optionally using vault access if vault
options are provided.
options are provided. Supports on-the-fly compression format conversion
using --convert-to and --convert-from options.

Options:
--localdir TEXT Local databus folder (if not given, databus folder
structure is created in current working directory)
--databus TEXT Databus URL (if not given, inferred from databusuri,
e.g. https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected databus
--all-versions When downloading artifacts, download all versions
instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
--clientid TEXT Client ID for token exchange [default: vault-token-
exchange]
--help Show this message and exit.
--localdir TEXT Local databus folder (if not given, databus
folder structure is created in current working
directory)
--databus TEXT Databus URL (if not given, inferred from
databusuri, e.g.
https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected
databus
--all-versions When downloading artifacts, download all
versions instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol
/openid-connect/token]
--clientid TEXT Client ID for token exchange [default: vault-
token-exchange]
--convert-to [bz2|gz|xz] Target compression format for on-the-fly
conversion during download (supported: bz2, gz,
xz)
--convert-from [bz2|gz|xz] Source compression format to convert from
(optional filter). Only files with this
compression will be converted.
--help Show this message and exit.
```

#### Examples of using the download command
Expand Down Expand Up @@ -255,6 +269,18 @@ databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHER
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
```

**Download with Compression Conversion**: download files and convert them to a different compression format on-the-fly
```bash
# Convert all compressed files to gzip format
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --convert-to gz

# Convert only bz2 files to xz format, leaving other compressions unchanged
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --convert-to xz --convert-from bz2

# Download a collection and unify all files to bz2 format
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --convert-to bz2
```

<a id="cli-deploy"></a>
### Deploy

Expand Down
Loading