elastic · kosabogi · May 29, 2026 · May 29, 2026 · Jun 1, 2026 · Jun 1, 2026
@@ -2,4 +2,4 @@ Each cloud connected service has its own licensing and payment requirements.
 
 * AutoOps for ECE, ECK, and self-managed clusters is available for free across all [self-managed license types](https://www.elastic.co/subscriptions). It does not consume ECUs.
 
-* The Elastic {{infer-cap}} Service (EIS) for ECE, ECK, and self-managed clusters requires a [self-managed Enterprise license](https://www.elastic.co/subscriptions) or a self-managed free trial. Note that [EIS pricing](/explore-analyze/elastic-inference/eis-supported-models.md#pricing) is usage-based. Using EIS consumes ECUs.
+* The Elastic {{infer-cap}} Service (EIS) for ECE, ECK, and self-managed clusters requires a [self-managed Enterprise license](https://www.elastic.co/subscriptions) or a self-managed free trial. Note that [EIS pricing](/explore-analyze/elastic-inference/eis.md#pricing) is usage-based. Using EIS consumes ECUs.
@@ -180,6 +180,6 @@ For these models, you only need to create new {{infer}} endpoints if you want to
 
 ## Regions and billing
 
-For information about EIS regions and request routing, refer to [Region and hosting](/explore-analyze/elastic-inference/eis-supported-models.md#eis-regions).
+For information about EIS regions and request routing, refer to [Region and hosting](eis-region-and-hosting.md).
 
-EIS is billed per million tokens and consumes ECUs. For details on pricing and usage tracking, refer to [Pricing](/explore-analyze/elastic-inference/eis-supported-models.md#pricing) and [Monitor your token usage](/explore-analyze/elastic-inference/eis-supported-models.md#monitor-your-token-usage).
+EIS is billed per million tokens and consumes ECUs. For details on pricing and usage tracking, refer to [Pricing](eis.md#pricing) and [Monitor your token usage](eis.md#monitor-your-token-usage).
@@ -0,0 +1,25 @@
+---
+navigation_title: Rate limits
+applies_to:
+  stack: ga
+  serverless: ga
+description: Learn about rate limits for Elastic Inference Service (EIS) models.
+---
+
+# Rate limits [eis-rate-limits]
+
+This page lists the rate limits that apply to Elastic {{infer-cap}} Service (EIS) models.
+
+Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.
+
+| Model                                             | Request/minute  | Tokens/minute (ingest)  | Tokens/minute (search)  | Notes                    |
+|---------------------------------------------------|-----------------|-------------------------|-------------------------|--------------------------|
+| Elastic Managed LLMs {applies_to}`stack: ga 9.3+` | 2000            | -                       | -                       | No rate limit on tokens  |
+| ELSER {applies_to}`stack: ga 9.0+`                | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v5 Nano {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v5 Small {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v3 {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v5 (Small) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Embeddings v5 (Nano) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Reranker v2 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
+| Jina Reranker v3 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
@@ -0,0 +1,25 @@
+---
+navigation_title: Region and hosting
+applies_to:
+  stack: ga
+  serverless: ga
+description: Learn which regions host Elastic Inference Service (EIS) and how inference requests are routed.
+---
+
+# Region and hosting [eis-regions]
+
+This page lists the {{aws}} and {{gcp}} regions where Elastic {{infer-cap}} Service (EIS) is available and explains how {{infer}} requests are routed.
+
+**{{aws}}:**
+
+* `us-east-1` (Virginia)
+
+**{{gcp}}:**
+
+* `asia-southeast1` (Singapore)
+* `europe-west1` (Belgium)
+* `us-east4` (Virginia)
+
+All {{infer}} requests sent through EIS are routed to the nearest region, regardless of where your {{es}} deployment or {{serverless-short}} project is hosted.
+
+Depending on the model being used, request processing may involve Elastic {{infer}} infrastructure and, in some cases, trusted third-party model providers. For example, ELSER and Jina requests are processed entirely within Elastic {{infer}} infrastructure. Other models, such as large language models or third-party embedding models, may involve additional processing by their respective model providers, which can operate in different cloud platforms or regions.
@@ -20,6 +20,8 @@ The corresponding {{kib}} connectors and {{infer}} endpoints for these models ar
 The **{{infer-cap}} Regions** column shows the regions where {{infer}} requests are processed and where data is sent.
 ::::
 
+For region availability and request routing, refer to [Region and hosting](eis-region-and-hosting.md). For rate limits, refer to [Rate limits](eis-rate-limits.md).
+
 ### LLM chat models
 
 :::{csv-include} chat-models.csv
@@ -45,61 +47,3 @@ The **{{infer-cap}} Regions** column shows the regions where {{infer}} requests
 * After the listed end-of-life (EOL) date, the model is no longer available for {{infer}} use and requests will fail. You need to actively transition to another model before the EOL date, there is no automated migration.
 * Elastic makes every effort to use third party providers who do not use inputs to train models, and do not retain any data (zero data retention). Browse the tables on this page to double-check the status of a specific model.
 ::::
-
-## Region and hosting [eis-regions]
-
-Elastic {{infer-cap}} Service is currently available in these regions:
-
-**AWS:**
-
-* `us-east-1` (Virginia)
-
-**GCP:**
-
-* `asia-southeast1` (Singapore)
-* `europe-west1` (Belgium)
-* `us-east4` (Virginia)
-
-All {{infer}} requests sent through EIS are routed to the nearest region, regardless of where your {{es}} deployment or {{serverless-short}} project is hosted.
-
-Depending on the model being used, request processing may involve Elastic {{infer}} infrastructure and, in some cases, trusted third-party model providers. For example, ELSER and Jina requests are processed entirely within Elastic {{infer}} infrastructure. Other models, such as large language models or third-party embedding models, may involve additional processing by their respective model providers, which can operate in different cloud platforms or regions.
-
-## Rate limits
-
-The service enforces rate limits on an ongoing basis. Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.
-
-| Model                                             | Request/minute  | Tokens/minute (ingest)  | Tokens/minute (search)  | Notes                    |
-|---------------------------------------------------|-----------------|-------------------------|-------------------------|--------------------------|
-| Elastic Managed LLMs {applies_to}`stack: ga 9.3+` | 2000            | -                       | -                       | No rate limit on tokens  |
-| ELSER {applies_to}`stack: ga 9.0+`                | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Embeddings v5 Nano {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Embeddings v5 Small {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Embeddings v3 {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Embeddings v5 (Small) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Embeddings v5 (Nano) {applies_to}`stack: ga 9.3+`   | 6,000           | 6,000,000               | 600,000                 | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Reranker v2 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-| Jina Reranker v3 {applies_to}`stack: ga 9.3+`     | 600             | -                       | 6,000,000               | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first.  |
-
-## Pricing
-
-All models on EIS incur a charge per million tokens. Certain LLM providers charge different prices depending on the prompt size. The pricing details are available on our [Pricing page](https://www.elastic.co/pricing/serverless-search).
-
-This pricing model differs from the existing [Machine Learning Nodes](https://www.elastic.co/docs/explore-analyze/machine-learning/data-frame-analytics/ml-trained-models), which is billed through VCUs consumed.
-
-### Token-based billing
-
-EIS is billed per million tokens used:
-
-* For **chat** models, input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
-* For **embeddings** models, only input tokens are billed.
-
-Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token can be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
-
-For example, the sentence `It was the best of times, it was the worst of times.` contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
-
-### Monitor your token usage
-
-To track your token consumption:
-
-1. Navigate to [**Billing > Usage**](https://cloud.elastic.co/billing/usage) in the {{ecloud}} Console.
-2. Look for line items where the **Billing dimension** is set to "Inference".
@@ -213,3 +213,27 @@ You can now use `semantic_text` with the new ELSER endpoint on EIS. To learn how
 ##### Get started with semantic search with ELSER on EIS
 
 [Semantic Search with `semantic_text`](/solutions/search/semantic-search/semantic-search-semantic-text.md) has a detailed tutorial on using the `semantic_text` field and using the ELSER endpoint on EIS instead of the default endpoint. This is a great way to get started and try the new endpoint.
+
+## Pricing [pricing]
+
+All models on EIS incur a charge per million tokens. Certain LLM providers charge different prices depending on the prompt size. The pricing details are available on our [Pricing page](https://www.elastic.co/pricing/serverless-search).
+
+This pricing model differs from the existing [Machine Learning Nodes](https://www.elastic.co/docs/explore-analyze/machine-learning/data-frame-analytics/ml-trained-models), which is billed through VCUs consumed.
+
+### Token-based billing
+
+EIS is billed per million tokens used:
+
+* For **chat** models, input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
+* For **embeddings** models, only input tokens are billed.
+
+Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token can be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
+
+For example, the sentence `It was the best of times, it was the worst of times.` contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
+
+### Monitor your token usage [monitor-your-token-usage]
+
+To track your token consumption:
+
+1. Navigate to [**Billing > Usage**](https://cloud.elastic.co/billing/usage) in the {{ecloud}} Console.
+2. Look for line items where the **Billing dimension** is set to "Inference".
@@ -44,6 +44,8 @@ toc:
       - file: elastic-inference/eis.md
         children:
           - file: elastic-inference/eis-supported-models.md
+          - file: elastic-inference/eis-region-and-hosting.md
+          - file: elastic-inference/eis-rate-limits.md
           - file: elastic-inference/connect-self-managed-cluster-to-eis.md
           - hidden: elastic-inference/ml-node-vs-eis.md
       - file: elastic-inference/external.md

@@ -874,6 +874,19 @@ redirects:
       - to: 'explore-analyze/elastic-inference/eis-supported-models.md'
         anchors:
           'supported-models': 
+  'explore-analyze/elastic-inference/eis-supported-models.md':
+    to: 'explore-analyze/elastic-inference/eis-supported-models.md'
+    many:
+      - to: 'explore-analyze/elastic-inference/eis-region-and-hosting.md'
+        anchors:
+          'eis-regions': 'eis-regions'
+      - to: 'explore-analyze/elastic-inference/eis-rate-limits.md'
+        anchors:
+          'rate-limits': 'eis-rate-limits'
+      - to: 'explore-analyze/elastic-inference/eis.md'
+        anchors:
+          'pricing': 'pricing'
+          'monitor-your-token-usage': 'monitor-your-token-usage'
 # Split off links to inference UI pages
   'explore-analyze/elastic-inference/inference-api.md':
     to: 'explore-analyze/elastic-inference/inference-api.md'
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,4 +2,4 @@ Each cloud connected service has its own licensing and payment requirements.

		* AutoOps for ECE, ECK, and self-managed clusters is available for free across all [self-managed license types](https://www.elastic.co/subscriptions). It does not consume ECUs.

		* The Elastic {{infer-cap}} Service (EIS) for ECE, ECK, and self-managed clusters requires a [self-managed Enterprise license](https://www.elastic.co/subscriptions) or a self-managed free trial. Note that [EIS pricing](/explore-analyze/elastic-inference/eis-supported-models.md#pricing) is usage-based. Using EIS consumes ECUs.
		* The Elastic {{infer-cap}} Service (EIS) for ECE, ECK, and self-managed clusters requires a [self-managed Enterprise license](https://www.elastic.co/subscriptions) or a self-managed free trial. Note that [EIS pricing](/explore-analyze/elastic-inference/eis.md#pricing) is usage-based. Using EIS consumes ECUs.