Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
| e.valid_at = $edge_data.valid_at, | ||
| e.invalid_at = $edge_data.invalid_at, | ||
| e.fact_embedding = join([x IN coalesce($edge_data.fact_embedding, []) | toString(x) ], ","), | ||
| e.episodes = join($edge_data.episodes, ",") |
There was a problem hiding this comment.
Neptune queries silently drop custom attributes during save
High Severity
The new Neptune queries use explicit property lists instead of the previous SET e = removeKeyFromMap(...) pattern, but this omits custom attributes. Both EntityEdge and EntityNode have an attributes: dict[str, Any] field. The save() methods spread these attributes into the data dictionary via edge_data.update(self.attributes or {}). The old approach copied all properties including these spread attributes. The new explicit lists only include built-in fields, causing any custom attributes to be silently dropped on save. The same issue affects both edge queries and node queries for Neptune.
Additional Locations (1)
| continue | ||
| elif isinstance(ts_value, int | float): | ||
| # Unix timestamp | ||
| return datetime.fromtimestamp(ts_value, tz=timezone.utc) |
There was a problem hiding this comment.
Missing OSError handling for invalid Unix timestamps
Low Severity
The _extract_reference_time_from_json method handles string timestamp parsing failures with a try/except that continues to the next field, but the numeric timestamp branch at lines 127-129 has no exception handling. datetime.fromtimestamp() raises OSError for out-of-range timestamps (e.g., very large numbers like millisecond timestamps, or negative values on some platforms). The outer except only catches json.JSONDecodeError, TypeError, and ValueError, so OSError propagates up and causes the episode to fail instead of gracefully falling back to the current time.
8df1283 to
fb49a25
Compare
# Conflicts: # mcp_server/pyproject.toml # mcp_server/uv.lock # pyproject.toml # uv.lock diff --git c/mcp_server/pyproject.toml i/mcp_server/pyproject.toml index 886b6f9..499d72a 100644 --- c/mcp_server/pyproject.toml +++ i/mcp_server/pyproject.toml @@ -7,7 +7,7 @@ requires-python = ">=3.10,<4" dependencies = [ "mcp>=1.9.4", "openai>=1.91.0", - "graphiti-core[falkordb]==0.26.3", + "graphiti-core[falkordb,neptune]==0.26.3", "pydantic-settings>=2.0.0", "pyyaml>=6.0", "typing-extensions>=4.0.0", diff --git c/mcp_server/uv.lock i/mcp_server/uv.lock index 87c8651..0a689a0 100644 --- c/mcp_server/uv.lock +++ i/mcp_server/uv.lock @@ -718,7 +718,7 @@ wheels = [ [[package]] name = "graphiti-core" version = "0.26.3" -source = { registry = "https://pypi.org/simple" } +source = { editable = "../" } dependencies = [ { name = "diskcache" }, { name = "neo4j" },
This reverts commit b9d7dcf.
diff --git c/graphiti_core/driver/neptune_driver.py i/graphiti_core/driver/neptune_driver.py index 0a57ab3..9a02037 100644 --- c/graphiti_core/driver/neptune_driver.py +++ i/graphiti_core/driver/neptune_driver.py @@ -196,6 +196,7 @@ class NeptuneDriver(GraphDriver): timeout=30, # 30 second timeout to prevent hanging connections max_retries=5, # Enable retry logic with exponential backoff retry_on_timeout=True, # Retry when timeout occurs + retry_on_status=[502, 503, 504], # Retry on gateway errors and service unavailable ) def _sanitize_parameters(self, query, params: dict): diff --git c/graphiti_core/prompts/dedupe_edges.py i/graphiti_core/prompts/dedupe_edges.py index c6e4359..8687712 100644 --- c/graphiti_core/prompts/dedupe_edges.py +++ i/graphiti_core/prompts/dedupe_edges.py @@ -41,6 +41,12 @@ class Versions(TypedDict): def resolve_edge(context: dict[str, Any]) -> list[Message]: + existing_facts_count = len(context.get('existing_edges', [])) + invalidation_candidates_count = len(context.get('edge_invalidation_candidates', [])) + + existing_range = f'0 to {existing_facts_count - 1}' if existing_facts_count > 0 else 'none (empty list)' + invalidation_range = f'0 to {invalidation_candidates_count - 1}' if invalidation_candidates_count > 0 else 'none (empty list)' + return [ Message( role='system', @@ -50,16 +56,25 @@ def resolve_edge(context: dict[str, Any]) -> list[Message]: Message( role='user', content=f""" - Task: - You will receive TWO separate lists of facts. Each list uses 'idx' as its index field, starting from 0. - + You will analyze a NEW FACT against two separate lists of existing facts. + + ═══════════════════════════════════════════════════════════════ + LIST A: EXISTING FACTS (for duplicate detection) + ═══════════════════════════════════════════════════════════════ + Count: {existing_facts_count} facts + Valid idx range: {existing_range} + 1. DUPLICATE DETECTION: - If the NEW FACT represents identical factual information as any fact in EXISTING FACTS, return those idx values in duplicate_facts. - Facts with similar information that contain key differences should NOT be marked as duplicates. - Return idx values from EXISTING FACTS. - If no duplicates, return an empty list for duplicate_facts. - 2. CONTRADICTION DETECTION: + 2. FACT TYPE CLASSIFICATION: + - Given the predefined FACT TYPES, determine if the NEW FACT should be classified as one of these types. + - Return the fact type as fact_type or DEFAULT if NEW FACT is not one of the FACT TYPES. + + 3. CONTRADICTION DETECTION: - Based on FACT INVALIDATION CANDIDATES and NEW FACT, determine which facts the new fact contradicts. - Return idx values from FACT INVALIDATION CANDIDATES. - If no contradictions, return an empty list for contradicted_facts. @@ -73,17 +88,63 @@ def resolve_edge(context: dict[str, Any]) -> list[Message]: 1. Some facts may be very similar but will have key differences, particularly around numeric values in the facts. Do not mark these facts as duplicates. + <FACT TYPES> + {context['edge_types']} + </FACT TYPES> + <EXISTING FACTS> {context['existing_edges']} - </EXISTING FACTS> - <FACT INVALIDATION CANDIDATES> + ═══════════════════════════════════════════════════════════════ + LIST B: FACT INVALIDATION CANDIDATES (for contradiction detection) + ═══════════════════════════════════════════════════════════════ + Count: {invalidation_candidates_count} facts + Valid idx range: {invalidation_range} + {context['edge_invalidation_candidates']} - </FACT INVALIDATION CANDIDATES> - <NEW FACT> + ═══════════════════════════════════════════════════════════════ + NEW FACT TO ANALYZE + ═══════════════════════════════════════════════════════════════ {context['new_edge']} - </NEW FACT> + + ═══════════════════════════════════════════════════════════════ + FACT TYPES FOR CLASSIFICATION + ═══════════════════════════════════════════════════════════════ + {context['edge_types']} + + ═══════════════════════════════════════════════════════════════ + YOUR RESPONSE MUST INCLUDE THREE FIELDS + ═══════════════════════════════════════════════════════════════ + + 1. duplicate_facts (list of integers) + SOURCE: Use idx values ONLY from LIST A (EXISTING FACTS) + VALID RANGE: {existing_range} + PURPOSE: Identify which facts in LIST A are duplicates of the NEW FACT + CRITERIA: Facts must represent identical factual information (minor wording differences OK) + NOTE: Facts with key differences (especially numeric values) are NOT duplicates + IF NO DUPLICATES: Return empty list [] + + 2. contradicted_facts (list of integers) + SOURCE: Use idx values ONLY from LIST B (FACT INVALIDATION CANDIDATES) + VALID RANGE: {invalidation_range} + PURPOSE: Identify which facts in LIST B are contradicted by the NEW FACT + CRITERIA: Facts that are logically incompatible with the NEW FACT + IF NO CONTRADICTIONS: Return empty list [] + + 3. fact_type (string) + SOURCE: Choose from FACT TYPES listed above + PURPOSE: Classify the NEW FACT's type + DEFAULT: Return 'DEFAULT' if NEW FACT doesn't match any predefined FACT TYPES + + ═══════════════════════════════════════════════════════════════ + CRITICAL WARNINGS + ═══════════════════════════════════════════════════════════════ + - LIST A and LIST B are COMPLETELY SEPARATE with INDEPENDENT indexing + - Do NOT use idx values from LIST B in duplicate_facts field + - Do NOT use idx values from LIST A in contradicted_facts field + - Each list starts indexing from 0 independently + - Verify your idx values are within the valid ranges specified above """, ), ] diff --git c/graphiti_core/utils/maintenance/edge_operations.py i/graphiti_core/utils/maintenance/edge_operations.py index 9fc356a..3fb19c7 100644 --- c/graphiti_core/utils/maintenance/edge_operations.py +++ i/graphiti_core/utils/maintenance/edge_operations.py @@ -444,8 +444,7 @@ async def resolve_extracted_edges( resolved_edges: list[EntityEdge] = [] invalidated_edges: list[EntityEdge] = [] for result in results: - resolved_edge = result[0] - invalidated_edge_chunk = result[1] + resolved_edge, invalidated_edge_chunk, _ = result # Third value (duplicate_edges) not needed here resolved_edges.append(resolved_edge) invalidated_edges.extend(invalidated_edge_chunk) diff --git c/mcp_server/src/graphiti_mcp_server.py i/mcp_server/src/graphiti_mcp_server.py index 2b9a855..b264c40 100644 --- c/mcp_server/src/graphiti_mcp_server.py +++ i/mcp_server/src/graphiti_mcp_server.py @@ -92,6 +92,12 @@ logging.getLogger('uvicorn.access').setLevel(logging.WARNING) # Reduce access l logging.getLogger('mcp.server.streamable_http_manager').setLevel( logging.WARNING ) # Reduce MCP noise +logging.getLogger('opensearch').setLevel( + logging.ERROR +) # Only log actual errors, not transient warnings +logging.getLogger('urllib3.connectionpool').setLevel( + logging.ERROR +) # Suppress retry warnings # Patch uvicorn's logging config to use our format
2f0f64c to
037da15
Compare
|
|
||
| <FACT TYPES> | ||
| {context['edge_types']} | ||
| </FACT TYPES> |
There was a problem hiding this comment.
Missing edge_types in context causes KeyError
High Severity
The prompt template accesses context['edge_types'] at two locations, but the context dictionary built in resolve_extracted_edge function (in edge_operations.py lines 555-559) only includes existing_edges, new_edge, and edge_invalidation_candidates. When this prompt is called, it will raise a KeyError for the missing edge_types key, causing the edge deduplication to fail.
Additional Locations (1)
| 2. CONTRADICTION DETECTION: | ||
| 2. FACT TYPE CLASSIFICATION: | ||
| - Given the predefined FACT TYPES, determine if the NEW FACT should be classified as one of these types. | ||
| - Return the fact type as fact_type or DEFAULT if NEW FACT is not one of the FACT TYPES. |
There was a problem hiding this comment.
Response model missing required fact_type field
Medium Severity
The prompt instructs the LLM to return a fact_type field (documented at lines 73-75 and 135-138), but the EdgeDuplicate response model only defines duplicate_facts and contradicted_facts fields. The LLM-generated fact_type value will either be silently discarded or cause validation errors, making the fact type classification feature non-functional.
Additional Locations (1)
| n.group_id = $entity_data.group_id, | ||
| n.created_at = $entity_data.created_at, | ||
| n.summary = $entity_data.summary, | ||
| n.name_embedding = join([x IN coalesce($entity_data.name_embedding, []) | toString(x) ], ",") |
There was a problem hiding this comment.
Neptune entity nodes lose custom attributes on save
High Severity
The Neptune query now explicitly sets only name, group_id, created_at, summary, and name_embedding. The previous query used removeKeyFromMap to set all properties from $entity_data, which included custom attributes merged via entity_data.update(self.attributes). Custom entity attributes are now silently dropped on save.
Additional Locations (1)
| + group_filter_query | ||
| + """ | ||
| RETURN DISTINCT id(n) as id, n.name_embedding as embedding | ||
| LIMIT $batch_limit |
There was a problem hiding this comment.
Neptune query uses n but filter references c
High Severity
The new Neptune-specific query in community_similarity_search uses MATCH (n:Community) but the filter at line 1090 references c.group_id. When group_ids is provided, the query will fail with an undefined variable error because c is never defined in the Neptune code path.
| - Do NOT use idx values from LIST B in duplicate_facts field | ||
| - Do NOT use idx values from LIST A in contradicted_facts field | ||
| - Each list starts indexing from 0 independently | ||
| - Verify your idx values are within the valid ranges specified above |
There was a problem hiding this comment.
Prompt contradicts data indexing and consuming code
High Severity
The new prompt has contradictory indexing instructions. The data passed uses continuous idx numbering (invalidation candidates start where existing facts end), and the consuming code in edge_operations.py expects this. However, the "CRITICAL WARNINGS" section says lists have "INDEPENDENT indexing" starting from 0, and contradicted_facts must only use LIST B indices. The invalidation_range (line 48) also computes a 0-based range that doesn't match the actual offset-based idx values in the data. This will cause the LLM to return incorrect idx values, leading to wrong edges being invalidated.
| n.group_id = $entity_data.group_id, | ||
| n.created_at = $entity_data.created_at, | ||
| n.summary = $entity_data.summary, | ||
| n.name_embedding = join([x IN coalesce($entity_data.name_embedding, []) | toString(x) ], ",") |
There was a problem hiding this comment.
Neptune save queries silently drop entity/edge attributes
High Severity
The Neptune save queries were changed from SET n/e = removeKeyFromMap(...) (which wrote all properties from the data dict, including dynamically flattened attributes) to explicit field-by-field SET statements. The explicit statements omit attributes entirely. Since the save methods in nodes.py and edges.py call entity_data.update(self.attributes or {}) to merge attributes as top-level keys, the old approach saved them as graph properties. Now those attribute properties are silently dropped, causing data loss for any Neptune users with custom entity or edge types that define additional attributes.
Additional Locations (2)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| api_key=api_key, | ||
| base_url=base_url, | ||
| default_headers=custom_headers if custom_headers else None, | ||
| ) |
There was a problem hiding this comment.
Duplicated custom headers builder logic across factories
Low Severity
The custom headers building logic (reading from config extra_headers, OPENAI_EXTRA_HEADERS env var, and X_SESSION_ID env var, plus creating an AsyncOpenAI client) is duplicated nearly identically between LLMClientFactory.create and EmbedderFactory.create. This increases maintenance burden and risks inconsistent bug fixes if one copy is updated but not the other.


No description provided.