Skip to content

Supported image formats#6712

Draft
gthvn1 wants to merge 5 commits intoxapi-project:masterfrom
xcp-ng:gtn-image-formats
Draft

Supported image formats#6712
gthvn1 wants to merge 5 commits intoxapi-project:masterfrom
xcp-ng:gtn-image-formats

Conversation

@gthvn1
Copy link
Copy Markdown
Contributor

@gthvn1 gthvn1 commented Oct 16, 2025

This PR implements the supported image format mechanism proposed in this design document: https://xapi-project.github.io/new-docs/design/sm-supported-image-formats/index.html

  • It adds supported image formats to the SM object if the SM plugin specifies it in its DRIVER_INFO.
  • When the information is available, you can select which image format to use as the destination during a VM or VDI migration.

This feature is particularly useful because XCP-ng is adding support for the Qcow2 format in SMAPI to allow VDIs larger than 2TB. So in the near future (we're currently releasing the beta version), some SRs will support multiple formats such as VHD, RAW, and Qcow2.

With this patch, it becomes possible to migrate a VM with VHD disks on one SR to another SR with Qcow2 disks. If an SM plugin does not provide information about the supported image formats, the behavior remains unchanged.

For more details see the specification.

@gthvn1 gthvn1 changed the title Gtn image formats Supported image formats Oct 16, 2025
Comment thread ocaml/idl/datamodel.ml Outdated
Comment thread ocaml/idl/datamodel.ml
Comment thread ocaml/tests/common/test_common.ml
Comment thread ocaml/xapi-cli-server/cli_operations.ml Outdated
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
@lindig
Copy link
Copy Markdown
Contributor

lindig commented Oct 16, 2025

I would like to see more explicitly documented and checked the supported image formats. These appear to be some mysterious strings but the user of neither API or CLI should have to guess them,

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Oct 17, 2025

These appear to be some mysterious strings but the user of neither API or CLI should have to guess them,

Yes, in my mind the user should first check using SM.get_all_records to get the list, and then, when doing a migration, pick a format form that list.

Comment thread ocaml/idl/datamodel.ml Outdated
Comment thread ocaml/xapi-cli-server/record_util.ml Outdated
Comment thread ocaml/xapi-cli-server/records.ml Outdated
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
Comment thread ocaml/idl/schematest.ml Outdated
Comment thread ocaml/idl/datamodel_vm.ml Outdated
Comment thread ocaml/idl/datamodel_vm.ml Outdated
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from 63a9fe1 to 173e37b Compare October 22, 2025 13:38
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
@gthvn1 gthvn1 force-pushed the gtn-image-formats branch 5 times, most recently from dbfc215 to ee5b3d6 Compare November 4, 2025 17:09
@lindig
Copy link
Copy Markdown
Contributor

lindig commented Nov 12, 2025

@psafont could you take another look?

@psafont
Copy link
Copy Markdown
Member

psafont commented Nov 12, 2025

There's an issue with vdi migration when qcow2 is involved. Guillaume is investigating this and see whether the issue is in this PR, or on SM's side

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Nov 12, 2025

It should be fixed by @AnthoineB . So I will run another another series of manual testing with several all combinations (qcow2 -> vhd, vhd -> qcow2, vhd -> vhd and qcow2 -> qcow2). I will also try to check with different kind of SR (shared and not).

Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
Comment thread ocaml/xapi/xapi_vm_migrate.ml Outdated
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Nov 17, 2025

For information I tried to do several VDI pool migration on our xcp-ng with qcow2 enabled. I was able to migrate from one EXT SR to another switching between format (vhd -> qcow2 -> vhd). I also tried with wrong format like this:

The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem.
message: Generated_record_utils.Record_failure("Expected one of 'raw', 'vhd', 'qcow2', got vhdx")

I will run more tests using VM migration (I already tried a few migrations successfully).

@psafont
Copy link
Copy Markdown
Member

psafont commented Dec 3, 2025

This looks good, but needs a rebase. I think Guillaume wanted to retest with the newest xcp-ng builds

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Dec 4, 2025

Sure, I will rebase and I will rebuild it based on our last builds to run our CI with this modification.

@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from 97b33f9 to 3151fa3 Compare January 7, 2026 09:08
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Jan 7, 2026

As spotted by @psafont the issue with SMAPIv3 driver was that the plugin in xapi-storage-script generator required a default parameter (I added a default one for xapi but not for the generator).
I have updated the PR accordingly and rebased it.
I'm currently running our storage CI but a first test with filebased smapiv3 driver shows that it is working. I should have result by the end of the day (the CI takes several hours)

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Jan 8, 2026

For your information, I was able to run all the storage tests on our CI with no errors. Of course our CI doesn't use the new field but at least this allows to validate that it works with current configurations.
To test the new parameter I just do manual migrations.

Copy link
Copy Markdown
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be almost ready to go, but I've seen that there's a TODO, and the generator for smapiv3 plugins has not been modified, which is not what I was expecting, can you expand on those? The latter may be left for later because it shouldn't break anything, just not add nice support for the feature in smapiv3. but I don't understand the implication in the mirror call

Comment thread ocaml/idl/datamodel_vm.ml Outdated
Comment thread ocaml/idl/datamodel_common.ml Outdated
let schema_major_vsn = 5

let schema_minor_vsn = 792
let schema_minor_vsn = 793
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let schema_minor_vsn = 793
let schema_minor_vsn = 794

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Currently it is 792 right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread ocaml/xapi-storage-script/main.ml Outdated
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Jan 8, 2026

I thought this would be almost ready to go, but I've seen that there's a TODO, and the generator for smapiv3 plugins has not been modified, which is not what I was expecting, can you expand on those?

Mmm strange because I see the default value that was missing previously [@default []]. Are you talking about something else? Because it was the issue I had with smapiv3 driver.

Oh damned yes I forgot about this TODO in the xapi-storage-script. As we don't really use smapiv3 I just ensure that it doesn't break things but I don't know either the impact on mirroring. I will check that...

@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from 3151fa3 to d6757e8 Compare January 12, 2026 08:50
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Jan 12, 2026

While checking for smapiv3 plugin, I looked at what has been run in our CI, and in fact we are only testing the creation and deletion of SRs, as well as VMs on the SR. It appears that we are not doing any migration tests because the drivers we are currently shipping do not support migration. I will look into how to proceed further.
Notice that I replaced the TODO with a message in the log.

@psafont
Copy link
Copy Markdown
Member

psafont commented Jan 12, 2026

Thanks for looking into this

@lindig
Copy link
Copy Markdown
Contributor

lindig commented Feb 11, 2026

Is this still active or could be moved to draft if not?

@psafont psafont marked this pull request as draft February 11, 2026 10:03
@psafont
Copy link
Copy Markdown
Member

psafont commented Feb 11, 2026

This still needs to be tested on smapiv3, which currently is not possible since there's no backend available for testing that supports migration,

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Feb 24, 2026

I started adding some quick tests for storage migration. I can add more tests, or I can propose it independently of the supported image format, since the pool migration currently uses an extra parameter.

Comment thread ocaml/quicktest/quicktest_vm_migration.ml Outdated
@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from a2c6a3c to 97d0fcd Compare April 16, 2026 08:21
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Apr 16, 2026

  • Rebased on master
  • Removed supported_image_format from the query result in the plugin generator because, currently, even a default value forces the plugin to implement it. The generated code makes the field mandatory and that breaks current SMAPIv3 plugin. We have proposed a patch to fix this: Consider field with default option as optional mirage/ocaml-rpc#190

As there are no SMAPIv3 plugins that support live migration, we are using an empty list for now, and it cannot be modified at this time.

@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from 97d0fcd to 60e1b90 Compare April 16, 2026 08:38
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Apr 16, 2026

  • Fix schema hash

)
)

let receive_start _ctx ~dbg:_ ~sr:_ ~vdi_info:_ ~id:_ ~similar:_ =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's safe to modify calls like this? Could you check the reason there are 3 different receive_start calls? We might need newer ones. I think it might be related to backwards compatibility when doing RPUs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will check. But you are correct that there are different flavor for backwards compatibility.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing migrations from a host with a supported image format to another host (in a different pool) that does not support it (in both directions) I didn’t encounter any issues.

As I understand it, receive_start is called on the source and prepares the mirroring. In the end, it sends a VDI.create XML-RPC call (and some others) to the destination, but I don’t see any issue with receive_start itself.

The process will fail if you explicitly try to use the image format, because the source checks whether the destination SR uses an SM type that supports that format. However, if the option is not set, an empty list is used and no checks are performed against the destination SR, so it should work.

That said, even if I think that it is safe to modify the call, I agree that it doesn’t really make sense for a deprecated API.

I’ll also check intra-pool migration, but as far as I remember, VDI migration behaves the same way.

Copy link
Copy Markdown
Contributor Author

@gthvn1 gthvn1 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like there is an issue if you want to do a RPU (but it is not related to receive_start). The scenario is with two host A (master) and B:

  1. vdi pool migrate from A to B (ok)
  2. update A (so now A supports image format)
  3. reboot A
  4. we want to update B so we need to move the VM back to A but:
# xe vdi-pool-migrate uuid=4701fa33-c63d-49ff-baad-80e008dd6ab3 sr-uuid=21c7d26b-1ce7-04ea-1685-ba45e50c4191
You tried to call a method with the incorrect number of parameters. The fully-qualified method name that you used, and the number of received and expected parameters are returned.
method: VDI.pool_migrate
expected: 3
received: 4

There is an issue with the database because the new field is only available on the new host so B complains.

Apr 22 11:01:45 xcp-gtn-ip13 xapi: [ warn||326 HTTPS 10.1.38.12->:::80|host.request_backup D:d98b24f2a7cf|Xapi_database__Db_xml] no lifetime information about SM.supported_image_formats, ignoring
Apr 22 11:01:45 xcp-gtn-ip13 xapi: [ warn||326 HTTPS 10.1.38.12->:::80|host.request_backup D:d98b24f2a7cf|xmlrpc_client] stunnel pid: 347171 caught Db_exn.DBCache_NotFound("missing column", "SM", "supported_image_formats")
Apr 22 11:01:45 xcp-gtn-ip13 xapi: [error||326 :::80|dispatch:host.request_backup D:6671e841a40d|backtrace] host.request_backup D:d98b24f2a7cf failed with exception Db_exn.DBCache_NotFound("missing column", "SM", "supported_image_formats")
Apr 22 11:01:45 xcp-gtn-ip13 xapi: [error||326 :::80|dispatch:host.request_backup D:6671e841a40d|backtrace] Raised Db_exn.DBCache_NotFound("missing column", "SM", "supported_image_formats")
Apr 22 11:01:45 xcp-gtn-ip13 xapi: [error||326 :::80|dispatch:host.request_backup D:6671e841a40d|backtrace] 1/21 xapi Raised at file ocaml/database/schema.ml, line 190

I'm investigating that...

Copy link
Copy Markdown
Contributor Author

@gthvn1 gthvn1 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment below for the explanation

Comment thread ocaml/xapi/message_forwarding.ml Outdated
gthvn1 added 5 commits April 22, 2026 16:02
When running `xe sm-list params=all` you will now have the info of
supported image formats if the SM plugin specified it in its DRIVER_INFO.
The field is called `supported-image-formats`. If the plugin doesn't
provide the info the field will be empty.

This patch modifies the datamodel and add a new field to store this
information into the SM object.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
This patch allows specifying the destination format for individual VDIs
mapped to a destination SR. It adds a new parameter to `VM.migrate_send`
and `VM.assert_can_migrate` API. It also adds a new parameter to XE CLI.
The format to specify the image format is `image-format:<source VDI
UUID>=<destination image format>`. If the given image format cannot be
validated, an error is returned.

It also adds a new parameter to `VDI.pool-migrate`. This new parameter
allows to provide a string that is the destination format. This string is
used to check whether the destination SR supports the expected format. If
the check fails or cannot be performed due to missing information on the
destination SR, an error is returned.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
Update VM.MigrateSend call to include new VdiFormatMap parameter.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
A new field supported_image_format and new parameters have been added for:
  - VM.migrate_send
  - VM.assert_can_migrate
  - VDI.pool_migrate

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
Introduce a new quicktest covering local VDI migration between two
Storage Repositories (SRs).

Add a `migration_path` filter that injects a `(src, dst)` SR pair
derived from an `SR.srs` constraint. The filter selects a single
valid migration path (if available) and generates one test case.
If fewer than two compatible SRs exist, no test is produced.

The test:
- Creates a VDI on the source SR
- Attaches it to a temporary VM
- Calls VDI.pool_migrate to the destination SR
- Verifies that the VDI's SR has changed accordingly
- Cleans up safely, tracking ownership transfer when migration
  replaces the original VDI

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Apr 22, 2026

For information I'm currently running these scenarios:

  • Cross pool migration

    • Without using the option supported image format (and without QCow2 support)
      • migration from A (with patches) to B (no patches): xe vm-migrate ...
      • migration from B (no patches) to A (with patches): xe vm-migrate ...
      • migration from A (with patches) to B (with patches): xe vm-migrate ...
    • Using the option supported image format (+ QCow2 support)
      • migration from B VHD to A (currently it generates a QCow2 file): xe vm-migrate ...
      • migration from A QCow2 to B QCow2: xe vm-migrate ...
      • migration from B QCow2 to A VHD: xe vm-migrate ...image-format:<VDIUUID>=vhd
      • migration from A VHD to B RAW: xe vm-migrate ...image-format:<VDIUUID>=raw
        • Failed (see below the error)
      • migration from A Qcow2 to B RAW: xe vm-migrate ...image-format:<VDIUUID>=raw
        • Failed (same as above)
      • migration from B RAW to A Qcow2: xe vm-migrate ...image-format:<VDIUUID>=qcow2
  • Intra pool migration (simulate RPU)

    • Without using the option supported image format (no qcow2 support)
      • migration from A1 (no patches) to A2 (no patches): "evacuate" master (xe vdi-pool-migrate because I don't have shared SR so not real evacuate)
      • migration from A2 (no patches) to A1 (with patches): xe vdi-pool-migrate ...
      • migration from A1 (with patches) to A2 (with patches): xe vdi-pool-migrate ...
    • Using the option supported image format (+ QCow2 support)
      • migration from A2 VHD to A1 (currently it generates a QCow2 file): xe vdi-pool-migrate ...
      • migration from A1 QCow2 to A2 QCow2: xe vdi-pool-migrate ...
      • migration from A2 QCow2 to A1 VHD: xe vdi-pool-migrate ... image-format:<VDI_UUID>=vhd
      • migration from A1 VHD to A2 Qcow2: xe vdi-pool-migrate ... image-format:<VDI_UUID>=qcow2
      • migration from A2 QCow2 to A1 RAW: xe vdi-pool-migrate ... image-format:<VDI_UUID>=raw
        • Failed (same as above)
  • quicktest (newly added in this PR):

    • smapiv1 migration
    • smapiv3 migration: cannot be tested because we don't have any plugin that supports migration
  • Expected failures (check is done before migration):

    • passing a wrong parameter to vdi-pool-migrate: message: Storage_error ([S(Internal_error);S(Storage_error ([S(Migration_preparation_failure);S(Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_66);[S();S(Invalid VDI type [opterr=Invalid VDI type None]);S()]]]))]))])
    • passing a wrong parameter to vm-migrate
    • passing a parameter that is not in the list of supported_image_format
      • vdi pool migrate: Got vdi: Image format qcow2 is not supported by XXX where list is ["raw"; "vhd"]
      • vm migrate: Got vdi: Image format qcow2 is not supported by XXX where list is ["raw"; "vhd"]
  • NOTES:

    • By default supported image format of SM is an empty list. In this case there is no check before migration
    • By QCow2 support I mean installing xcp-ng qcow2
    • with patches mean Host with this current PR applied
    • Tests are in this order because it allows to run them consecutively
  • FAILURES:

    • extra and intra pool migration from qcow2 to raw.
      • Note: this is not related to this PR since it also happens with qcow2 from xcp-ng-testing.
# xe vdi-pool-migrate uuid=882cedbd-e8ef-4c5d-9a57-b99b91516b49  sr-uuid=d835d019-1b97-0080-1bc0-eb52542db3f5 dest-img-format=raw
The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem.
message: Storage_error ([S(Internal_error);S(Storage_error ([S(Migration_preparation_failure);S(Storage_error ([S(Unimplemented);S(VDI.snapshot)]))]))])

@gthvn1 gthvn1 force-pushed the gtn-image-formats branch from 60e1b90 to d3f45c6 Compare April 22, 2026 18:54
"Migrate a VDI to a specified SR, while the VDI is attached to a \
running guest."
"Migrate a VDI to a specified SR, while it is attached to a running \
guest. You can specify the image format for the destination."
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add what format is allowed: RAW, VHD and QCow2

Comment thread ocaml/idl/datamodel.ml
; (Map (String, String), "options", "Other parameters")
; ( Map (String, String)
, "options"
, "Extra parameters. Supports: \"dest-img-format\" (raw|vhd|qcow2) \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to be consistent with VM.pool-migrate that uses "img-format" ?

@gthvn1
Copy link
Copy Markdown
Contributor Author

gthvn1 commented Apr 24, 2026

  • I have passed all tests described here
    • everything looks good
  • The failures we are seeing concern RAW migration, but it appears to be a consistent issue with the current version of XCP-ng.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants