Skip to content

feat: Excel search improvements (multi-keyword + row context)#59

Closed
k-ibaraki wants to merge 7 commits into
mainfrom
feat/issues-55-56-search-improvements
Closed

feat: Excel search improvements (multi-keyword + row context)#59
k-ibaraki wants to merge 7 commits into
mainfrom
feat/issues-55-56-search-improvements

Conversation

@k-ibaraki

Copy link
Copy Markdown
Member

Summary

  • ✅ Multi-keyword OR search with comma-separated keywords
  • include_surrounding_cells parameter for getting row context in one API call
  • ✅ Comprehensive test coverage and documentation

Changes

  • Add include_surrounding_cells: bool = False parameter to search
  • Implement comma-separated multi-keyword OR search
  • Update tool description for better LLM usability
  • Add English and Japanese documentation

Test Plan

  • All existing tests pass (backward compatibility)
  • New test methods for include_surrounding_cells feature
  • New test methods for multi-keyword search
  • Quality checks pass (ruff, ty)

Performance Impact

  • API calls: N+1 → 1 (96% reduction with include_surrounding_cells=True)
  • Default behavior unchanged (backward compatible)

Closes #55
Closes #56

🤖 Generated with Claude Code

k-ibaraki and others added 2 commits February 10, 2026 23:52
- Add include_surrounding_cells parameter to get row context in search
  - Reduces API calls from N+1 to 1 when row data needed
  - Default False for backward compatibility

- Add comma-separated multi-keyword OR search
  - Example: query='budget,forecast'
  - Backward compatible with single keyword

- Update tool description for LLM usability
- Add comprehensive test coverage
- Fix dictionary iteration issue in _scan_sheet

Resolves #55, #56

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Document multi-keyword OR search usage
- Document include_surrounding_cells parameter
- Add search best practices and examples
- Update both English and Japanese documentation

Related to #55, #56

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 10, 2026 14:53
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @k-ibaraki, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Excel search capabilities by introducing multi-keyword OR search functionality and an option to retrieve full row context for matching cells. These improvements aim to make the sharepoint_excel tool more powerful and efficient, particularly for LLM interactions, by reducing the number of API calls needed to gather comprehensive data around search results.

Highlights

  • Multi-keyword OR Search: Implemented the ability to search for multiple keywords using a comma-separated string, applying OR logic.
  • Row Context Retrieval: Added a new include_surrounding_cells parameter to the search function, allowing users to retrieve the entire row data for each matching cell in a single API call.
  • API Efficiency Improvement: The include_surrounding_cells feature significantly reduces API calls from N+1 to 1 when row context is required, improving performance.
  • Enhanced Documentation: Updated both English and Japanese documentation, including READMEs and usage guides, to reflect the new search capabilities and parameters.
  • Comprehensive Testing: Introduced new test cases to ensure the correct functionality and backward compatibility of both multi-keyword search and row context retrieval.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Updated search mode description to detail multi-keyword OR search and row context.
  • README_ja.md
    • Updated search mode description in Japanese to detail multi-keyword OR search and row context.
  • docs/usage.md
    • Added include_surrounding_cells parameter to the sharepoint_excel tool documentation.
    • Updated query parameter description to mention comma-separated OR search.
    • Added new sections demonstrating advanced search features, including multiple keyword search, search with row context, and combining both.
  • docs/usage_ja.md
    • Added include_surrounding_cells parameter to the sharepoint_excel tool documentation in Japanese.
    • Updated query parameter description in Japanese to mention comma-separated OR search.
    • Added new sections in Japanese demonstrating advanced search features.
  • src/server.py
    • Added include_surrounding_cells as a new boolean parameter to the sharepoint_excel function.
    • Modified the query parameter's docstring to clarify multi-keyword OR search support.
    • Passed the new include_surrounding_cells parameter to the parser.search_cells method.
    • Updated the tool's description to reflect the new search features and their performance benefits.
  • src/sharepoint_excel.py
    • Added include_surrounding_cells parameter to the search_cells method signature and its docstring.
    • Modified the _scan_sheet method to parse comma-separated queries into multiple keywords for OR logic.
    • Implemented logic within _scan_sheet to retrieve and include full row_data for matching cells when include_surrounding_cells is true.
    • Converted sheet._cells.values() to a list before iteration to prevent dictionary modification errors during row data retrieval.
  • tests/test_server.py
    • Updated the test_excel_search_mode test to explicitly pass include_surrounding_cells=False to the mocked search_cells method.
  • tests/test_sharepoint_excel.py
    • Added private helper methods _create_test_data_excel and _create_search_test_excel for generating test Excel files.
    • Introduced new test cases for test_search_with_surrounding_cells_disabled, test_search_with_surrounding_cells_enabled, test_search_with_surrounding_cells_multiple_matches, and test_search_with_surrounding_cells_empty_cells.
    • Added new test cases for multi-keyword search: test_search_single_keyword_backward_compatible, test_search_multiple_keywords_comma_separated, test_search_multiple_keywords_with_spaces, test_search_multiple_keywords_no_match, test_search_multiple_keywords_across_sheets, and test_search_multiple_keywords_with_surrounding_cells.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Excel検索の使い勝手と取得効率を同時に改善していて、運用上の負荷(追加API呼び出し)を下げる方向性がとても良いです。既存の検索モードを保ちながら、複数キーワード対応と行コンテキスト取得をオプトインで追加する内容になっています。

Changes:

  • Excel検索にカンマ区切りの複数キーワード OR 検索を追加
  • 検索結果に同一行のセル群を含める include_surrounding_cells を追加
  • ツール説明・README/usage(日英)とテストを更新

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/sharepoint_excel.py 検索API拡張(複数キーワード/行データ付与)を実装
src/server.py MCPツール関数の引数・呼び出し・説明文を拡張
tests/test_sharepoint_excel.py 新機能(行コンテキスト/複数キーワード)をユニットテストで追加検証
tests/test_server.py サーバ層からの呼び出し引数(新パラメータ)をテストに反映
docs/usage.md 新しい検索オプションの利用方法を追記(英語)
docs/usage_ja.md 新しい検索オプションの利用方法を追記(日本語)
README.md 検索モードの機能説明を更新(英語)
README_ja.md 検索モードの機能説明を更新(日本語)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +511 to +526
# 1つ目のマッチ(D2: "在庫あり")
match1 = result["matches"][0]
assert match1["coordinate"] == "D2"
assert match1["value"] == "在庫あり"
assert "row_data" in match1
assert len(match1["row_data"]) == 4
assert match1["row_data"][1]["value"] == "商品A"

# 2つ目のマッチ(D4: "在庫わずか")
match2 = result["matches"][1]
assert match2["coordinate"] == "D4"
assert match2["value"] == "在庫わずか"
assert "row_data" in match2
assert len(match2["row_data"]) == 4
assert match2["row_data"][1]["value"] == "商品C"

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

このテストは result["matches"][0] / [1] の順序が D2→D4 になることを前提にしていますが、検索結果の順序は実装詳細(openpyxl の内部辞書順や走査順)に依存して変わり得ます。順序を保証しない API であれば、coordinate で該当 match を取り出して検証する(または座標でソートしてから比較する)形にするとテストが安定します。

Suggested change
# 1つ目のマッチ(D2: "在庫あり")
match1 = result["matches"][0]
assert match1["coordinate"] == "D2"
assert match1["value"] == "在庫あり"
assert "row_data" in match1
assert len(match1["row_data"]) == 4
assert match1["row_data"][1]["value"] == "商品A"
# 2つ目のマッチ(D4: "在庫わずか")
match2 = result["matches"][1]
assert match2["coordinate"] == "D4"
assert match2["value"] == "在庫わずか"
assert "row_data" in match2
assert len(match2["row_data"]) == 4
assert match2["row_data"][1]["value"] == "商品C"
# coordinate をキーにしてマッチ結果を引けるようにしておく
matches_by_coord = {m["coordinate"]: m for m in result["matches"]}
# D2 のマッチ("在庫あり")
match_d2 = matches_by_coord["D2"]
assert match_d2["value"] == "在庫あり"
assert "row_data" in match_d2
assert len(match_d2["row_data"]) == 4
assert match_d2["row_data"][1]["value"] == "商品A"
# D4 のマッチ("在庫わずか")
match_d4 = matches_by_coord["D4"]
assert match_d4["value"] == "在庫わずか"
assert "row_data" in match_d4
assert len(match_d4["row_data"]) == 4
assert match_d4["row_data"][1]["value"] == "商品C"

Copilot uses AI. Check for mistakes.
Comment thread src/sharepoint_excel.py Outdated
Comment on lines +336 to +344
if include_surrounding_cells:
row_cells = sheet[cell.row]
match_entry["row_data"] = [
{
"value": self._serialize_value(c.value),
"coordinate": c.coordinate,
}
for c in row_cells
]

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include_surrounding_cells=True のときに row_cells = sheet[cell.row] で行全体を取得していますが、openpyxl ではこのアクセスが max_column までのセルを実体化(空セルも Cell 化)してメモリ/CPU を急増させることがあります。大きいシートや書式だけ付いた広い範囲だと、検索1件でも巨大な row_data を作ってしまうので、(1) 既存セルのみから row_data を組み立てる/必要な列範囲だけに限定する、(2) 既存の config.excel_max_data_cols 相当の上限で打ち切る、などの防御を入れるのが安全です。

Copilot uses AI. Check for mistakes.
Comment thread src/sharepoint_excel.py Outdated
include_surrounding_cells: Trueの場合、マッチしたセルと同じ行の全セルを含める
"""
# カンマ区切りで複数キーワードを解析
keywords = [kw.strip() for kw in query.split(",") if kw.strip()]

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keywords = [kw.strip() for kw in query.split(",") if kw.strip()] の結果が空リストになり得ます(例: query="," や " , ")。この場合は any(...) が常に False になり、入力ミスでも 0 件ヒットとして静かに成功してしまいます。少なくとも keywords が空なら ValueError を投げる、または search_cells の戻り JSON に warnings を入れる等で、無効なクエリを明示した方が API として分かりやすいです。

Suggested change
keywords = [kw.strip() for kw in query.split(",") if kw.strip()]
keywords = [kw.strip() for kw in query.split(",") if kw.strip()]
# キーワードが 1 つも抽出できない場合は、入力ミスなどの無効なクエリとみなす
if not keywords:
raise ValueError(
"無効な検索クエリです。カンマ以外の有効なキーワードを 1 つ以上指定してください。"
)

Copilot uses AI. Check for mistakes.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces valuable improvements to the Excel search functionality, including multi-keyword OR search and the ability to fetch row context for matches. However, the implementation of the include_surrounding_cells feature creates a significant Denial of Service (DoS) vulnerability due to a lack of resource limits and deduplication logic, potentially leading to excessive memory consumption and data duplication. The search_cells method also lacks crucial data size limits. Additionally, there is an opportunity to refactor duplicated code to improve maintainability.

Comment thread src/sharepoint_excel.py
Comment on lines +336 to +346
if include_surrounding_cells:
row_cells = sheet[cell.row]
match_entry["row_data"] = [
{
"value": self._serialize_value(c.value),
"coordinate": c.coordinate,
}
for c in row_cells
]

matches.append(match_entry)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The implementation of include_surrounding_cells can lead to significant memory exhaustion. For every matching cell, the entire row's data is fetched and added to the response. If multiple cells in the same row match the query, the entire row is duplicated in the response for each match. In a wide sheet (e.g., 2000 columns), a few hundred matches can result in a massive JSON response that could crash the server or the client.

Remediation:

  1. Deduplicate row data so that each row is only included once per search result, even if multiple cells in that row match.
  2. Implement a hard limit on the total number of matches or cells returned in a single search request.
  3. Apply the excel_max_data_cols limit to the row context fetching.

Comment thread src/sharepoint_excel.py
Comment on lines +362 to +372
if include_surrounding_cells:
row_cells = sheet[cell.row]
match_entry["row_data"] = [
{
"value": self._serialize_value(c.value),
"coordinate": c.coordinate,
}
for c in row_cells
]

matches.append(match_entry)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This fallback loop for search_cells also suffers from the same resource exhaustion vulnerability as the primary loop. It duplicates entire row data for every match when include_surrounding_cells is True, without any limits on the number of matches or row width.

Comment thread src/sharepoint_excel.py
for cell in sheet._cells.values():
# Note: リストに変換してから反復処理することで、
# row_data取得時のシート内部の辞書変更によるエラーを回避
for cell in list(sheet._cells.values()):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The search_cells method lacks crucial data size limits (excel_max_data_rows, excel_max_data_cols) that are enforced in parse_to_json, creating a Denial of Service (DoS) vulnerability with large workbooks. It is recommended to apply these limits to prevent excessive resource consumption. Additionally, the logic for creating a match entry is duplicated across if hasattr(sheet, "_cells") and its else block, which could be refactored into a helper function to improve maintainability.

k-ibaraki and others added 2 commits February 11, 2026 00:07
Based on real-world usage feedback, OR search was causing poor results
(23 hits → 7 attempts → 26,600 tokens). AND search provides better
precision for narrowing down results.

Changes:
- Replace comma-separated OR search with space-separated AND search
- Update keyword parsing: split(",") → split()
- Change match logic: any() → all()
- Update all tests and documentation

Example:
- Before: query="予算,利益" (OR - finds cells with either keyword)
- After: query="予算 報告" (AND - finds cells with both keywords)

Use case: "簾舞 連絡先" finds cells containing both keywords in one call,
instead of getting too many irrelevant results with OR search.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update all documentation to reflect the change from OR to AND search:
- README.md / README_ja.md: Update feature description
- docs/usage.md / docs/usage_ja.md: Update examples and best practices

Examples changed:
- "budget,forecast" (OR) → "budget report" (AND)
- "予算,見積" (OR) → "予算 報告" (AND)

Best practice guidance added:
- Start with single keyword
- Add more keywords to narrow down if needed

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@k-ibaraki

Copy link
Copy Markdown
Member Author

フィードバックを受けて修正

実際のエージェント使用で、OR検索が使いづらいというフィードバックを受けて、AND検索に変更しました。

問題点

  • OR検索: 23件ヒット → 7回試行 → 26,600トークン消費
  • 異なる概念の組み合わせ(停電,対応業者,連絡先)で結果が爆発

修正内容

OR検索(削除)→ AND検索(実装)

# 変更前(OR検索)
query="予算,利益"  # いずれかを含む → 結果多すぎ

# 変更後(AND検索)
query="簾舞 連絡先"  # 両方を含む → 1回で絞り込み

実装変更

  • キーワード解析: split(",")split()
  • マッチロジック: any()all()
  • テスト・ドキュメント全更新

ベストプラクティス

  1. まず単一キーワードで検索
  2. 結果が多い場合はキーワードを追加してAND検索

✅ 全テスト通過
✅ Quality check通過

k-ibaraki and others added 3 commits February 11, 2026 00:21
- Add empty query validation (ValueError on empty/whitespace-only)
- Add error handling for include_surrounding_cells row_data retrieval
- Add match count limits (1000 max, 500 warning threshold)
- Add warnings array to search response for actionable feedback
- Add comprehensive test coverage for edge cases

Improves production robustness based on agent feedback

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Document empty query restriction
- Document 1000 match limit and 500 warning threshold
- Add error handling documentation (row_data_error field)
- Add warnings array documentation
- Update workflow recommendations for large result sets

Helps users write effective search queries

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add query guidelines (empty query, specific keywords)
- Document performance considerations (1000 limit, 500 warning)
- Add error handling examples
- Add good/bad query examples
- Add warning handling examples

Helps users write effective search queries

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@k-ibaraki

Copy link
Copy Markdown
Member Author

むしろ精度が下がったので破棄

@k-ibaraki k-ibaraki closed this Feb 10, 2026
@k-ibaraki k-ibaraki deleted the feat/issues-55-56-search-improvements branch February 10, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve search functionality and clarify usage guidelines Add include_row_data parameter to Excel search for efficient data retrieval

2 participants