-
Notifications
You must be signed in to change notification settings - Fork 33.3k
Add OLMoE #32406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add OLMoE #32406
Changes from 23 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
082973c
Add OLMoE
Muennighoff 8588576
Add OLMoE
Muennighoff f1c569e
Updates
Muennighoff f6ea7c5
Make norm optional; add keys
Muennighoff 4d56722
Add output
Muennighoff 8b176d9
Add
Muennighoff 452da8d
Fix dtype
Muennighoff 140bafb
Fix eos config
Muennighoff 91f95fd
Update
Muennighoff 6c20b73
Add OLMoE
Muennighoff 171602e
git pushMerge branch 'olmoe' of https://github.com/Muennighoff/transf…
Muennighoff 30a4feb
Fix OLMoE path
Muennighoff 698f156
Merge branch 'huggingface:main' into olmoe
Muennighoff 474f8e8
Format
Muennighoff e7e2ce3
git stah popMerge branch 'olmoe' of https://github.com/Muennighoff/tr…
Muennighoff d3eeef0
Format
Muennighoff 28cdfd8
Rmv copy statement
Muennighoff 58aed4a
Rmv copy statement
Muennighoff f9fbd12
Format
Muennighoff 16ed9e1
Add copies
Muennighoff b9a045a
Cp rotary
Muennighoff 4c598be
Fix aming
Muennighoff 50507ea
Fix naming
Muennighoff 1d9b006
Merge branch 'huggingface:main' into olmoe
Muennighoff b9948cc
Update RoPE integration; num_logits_to_keep; Add copy statements
Muennighoff e97ae0e
Add eps to config
Muennighoff fd0baf5
Format
Muennighoff 79e0ecc
Add aux loss
Muennighoff 758a808
Adapt router_aux_loss_coef
Muennighoff efdcda6
Update md
Muennighoff 42145af
Merge branch 'huggingface:main' into olmoe
Muennighoff 34ef8f5
Adapt
Muennighoff 30aace4
adapt tests
Muennighoff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| <!-- | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
|
|
||
| # OLMoE | ||
|
|
||
| ## Overview | ||
|
|
||
| The OLMoE model was proposed in [TODO](TODO) by TODO. | ||
|
|
||
| OLMoE is a series of **O**pen **L**anguage **Mo**dels which are **M**ixture-**o**f-**E**xperts designed to enable the science of language models. We release all code, checkpoints, logs, and details involved in training these models. | ||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| *TODO* | ||
|
|
||
| This model was contributed by [Muennighoff](https://hf.co/Muennighoff). | ||
| The original code can be found [here](https://github.com/allenai/OLMoE). | ||
|
|
||
|
|
||
| ## OlmoeConfig | ||
|
|
||
| [[autodoc]] OlmoeConfig | ||
|
|
||
| ## OlmoeModel | ||
|
|
||
| [[autodoc]] OlmoeModel | ||
| - forward | ||
|
|
||
| ## OlmoeForCausalLM | ||
|
|
||
| [[autodoc]] OlmoeForCausalLM | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -164,6 +164,7 @@ | |
| nougat, | ||
| nystromformer, | ||
| olmo, | ||
| olmoe, | ||
| oneformer, | ||
| openai, | ||
| opt, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import ( | ||
| OptionalDependencyNotAvailable, | ||
| _LazyModule, | ||
| is_torch_available, | ||
| ) | ||
|
|
||
|
|
||
| _import_structure = { | ||
| "configuration_olmoe": ["OlmoeConfig"], | ||
| } | ||
|
|
||
| try: | ||
| if not is_torch_available(): | ||
| raise OptionalDependencyNotAvailable() | ||
| except OptionalDependencyNotAvailable: | ||
| pass | ||
| else: | ||
| _import_structure["modeling_olmoe"] = [ | ||
| "OlmoeForCausalLM", | ||
| "OlmoeModel", | ||
| "OlmoePreTrainedModel", | ||
| ] | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_olmoe import OlmoeConfig | ||
|
|
||
| try: | ||
| if not is_torch_available(): | ||
| raise OptionalDependencyNotAvailable() | ||
| except OptionalDependencyNotAvailable: | ||
| pass | ||
| else: | ||
| from .modeling_olmoe import ( | ||
| OlmoeForCausalLM, | ||
| OlmoeModel, | ||
| OlmoePreTrainedModel, | ||
| ) | ||
|
|
||
| else: | ||
| import sys | ||
|
|
||
| sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.