Skip to content

How to set dp MoE #304

@zhaozheng09

Description

@zhaozheng09

I want to set 12 experts and select top 4 per gpu.
I set parallel_type == 1, I find a2a in time.
I set parallel_type == 0, I find allgather in timeline .

I only want to dp Moe per gpu .

        from tutel import moe as tutel_moe
        self.ff_out = tutel_moe.moe_layer(
            gate_type={'type': 'top', 'k': 4},
            model_dim=512,
            experts={
                'num_experts_per_device': 12,
                'type': 'ffn', 'hidden_size_per_expert': 2048, 'activation_fn': lambda x: torch.nn.functional.relu(x)
            },
            parallel_type='data',
            scan_expert_func = lambda name, param: setattr(param, 'skip_allreduce', True),
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions