diff --git a/README.md b/README.md index edf1a7a..0acad93 100644 --- a/README.md +++ b/README.md @@ -33,8 +33,10 @@ Two authentication methods are supported: - Automatic method selection for SharePoint vs OneDrive files - **sharepoint_excel** - Read or search Excel files in SharePoint - - Search mode: find cells containing specific text with `query` parameter - - Read mode: get data from specific sheets/ranges with `sheet` and `cell_range` parameters + - **Search mode**: find cells containing specific text with `query` parameter + - **Multiple keyword AND search**: use space-separated keywords (e.g., `"budget forecast"` finds cells containing both) + - **Row context**: set `include_surrounding_cells=True` to get entire row data (reduces API calls from N+1 to 1) + - **Read mode**: get data from specific sheets/ranges with `sheet` and `cell_range` parameters - **Automatic header inclusion**: when `cell_range` is specified, frozen rows (headers) are automatically included by default - Set `include_frozen_rows=False` to get only the specified range - For sheets with `frozen_rows=0`, use `expand_axis_range=True` to include row 1 (for columns) or column A (for rows) diff --git a/README_ja.md b/README_ja.md index 2a1c61e..d997e56 100644 --- a/README_ja.md +++ b/README_ja.md @@ -33,8 +33,10 @@ stdioとHTTPの両方のトランスポートに対応しています。 - SharePoint/OneDriveファイルに応じた自動メソッド選択 - **sharepoint_excel** - SharePoint上のExcelファイルの読み取りと検索 - - 検索モード: `query`パラメータで特定テキストを含むセルを検索 - - 読み取りモード: `sheet`と`cell_range`パラメータで特定シート/範囲を取得 + - **検索モード**: `query`パラメータで特定テキストを含むセルを検索 + - **複数キーワードAND検索**: スペース区切りでキーワード指定(例: `"予算 報告"`で両方を含むセルを検索) + - **行コンテキスト**: `include_surrounding_cells=True`で行全体のデータを取得(API呼び出しをN+1から1に削減) + - **読み取りモード**: `sheet`と`cell_range`パラメータで特定シート/範囲を取得 - **ヘッダー自動追加**: `cell_range`指定時、デフォルトで固定行(ヘッダー)を自動的に含める - `include_frozen_rows=False`を指定すると、指定範囲のみを取得 - `frozen_rows=0`のシートでは、`expand_axis_range=True`で1行目(列の場合)またはA列(行の場合)から自動取得 diff --git a/docs/usage.md b/docs/usage.md index 6ad7611..8ad9d39 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -205,9 +205,10 @@ The `sharepoint_excel` tool allows you to read and search Excel files in SharePo | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `file_path` | str | Required | Excel file path | -| `query` | str \| None | None | Search keyword (enables search mode) | +| `query` | str \| None | None | Search keyword (space-separated for AND search) | | `sheet` | str \| None | None | Sheet name (get specific sheet only) | | `cell_range` | str \| None | None | Cell range (e.g., "A1:D10") | +| `include_surrounding_cells` | bool | False | Get entire row data for each match in search mode | ### Basic Workflow @@ -275,6 +276,141 @@ result = sharepoint_excel( ) ``` +### Advanced Search Features + +#### Multiple Keyword Search (AND Logic) + +Search for cells containing all of the specified keywords (space-separated): + +```python +# Find cells containing both "budget" AND "report" +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="budget report" +) +``` + +**Response:** +```json +{ + "file_path": "/sites/finance/Shared Documents/report.xlsx", + "mode": "search", + "query": "budget report", + "match_count": 2, + "matches": [ + {"sheet": "Sheet1", "coordinate": "A1", "value": "Budget Report 2024"}, + {"sheet": "Summary", "coordinate": "C3", "value": "Annual Budget Report"} + ] +} +``` + +**Use cases:** +- Narrow down search results: `"簾舞 連絡先"` finds cells with both keywords +- Best practice: Start with single keyword, add more if results are too broad + +#### Search with Row Context + +Get entire row data for each match in a single API call: + +```python +# Search and get row context +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="Total Revenue", + include_surrounding_cells=True +) +``` + +**Response structure:** +```json +{ + "file_path": "/sites/finance/Shared Documents/report.xlsx", + "mode": "search", + "query": "Total Revenue", + "match_count": 1, + "matches": [ + { + "sheet": "Sheet1", + "coordinate": "B10", + "value": "Total Revenue", + "row_data": [ + {"value": "2024", "coordinate": "A10"}, + {"value": "Total Revenue", "coordinate": "B10"}, + {"value": 1500000, "coordinate": "C10"}, + {"value": "USD", "coordinate": "D10"} + ] + } + ] +} +``` + +**When to use:** +- `include_surrounding_cells=False` (default): Locate cells only +- `include_surrounding_cells=True`: Get immediate context without follow-up read (96% API call reduction) + +#### Combining Multiple Keywords with Row Context + +```python +# Search with multiple keywords (AND) and get row context +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="revenue forecast", + include_surrounding_cells=True +) +``` + +#### Search Best Practices and Limitations + +**Query Guidelines:** +- Query cannot be empty or whitespace-only +- Use specific keywords to reduce match count +- Combine multiple keywords with spaces for AND search (e.g., `"budget 2024"`) + +**Performance Considerations:** +- Search is limited to **1000 matches maximum** +- Warning issued at **500+ matches** - consider refining query +- Use `include_surrounding_cells=True` only when row context is needed +- For large result sets, narrow down with more specific keywords + +**Error Handling:** +- If row data retrieval fails, match is still returned with `row_data_error` field +- Check response `warnings` array for actionable feedback + +**Examples:** + +```python +# Good: Specific keywords +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="Q4 revenue forecast" +) + +# Avoid: Too generic (may hit 1000+ cells) +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="data" +) + +# Handle large results +import json +result_json = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="budget" +) +result = json.loads(result_json) + +if "warnings" in result: + print("Search feedback:", result["warnings"]) + # Refine query based on feedback + +# Safe row context usage +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="Total Revenue Q4", + include_surrounding_cells=True +) +``` + ### JSON Output Format #### Read Mode (Default) diff --git a/docs/usage_ja.md b/docs/usage_ja.md index 16f0329..f91b594 100644 --- a/docs/usage_ja.md +++ b/docs/usage_ja.md @@ -205,9 +205,10 @@ results = sharepoint_docs_search( | パラメータ | 型 | デフォルト | 説明 | |-----------|------|---------|-------------| | `file_path` | str | 必須 | Excelファイルのパス | -| `query` | str \| None | None | 検索キーワード(検索モードを有効化) | +| `query` | str \| None | None | 検索キーワード(スペース区切りでAND検索) | | `sheet` | str \| None | None | シート名(特定シートのみ取得) | | `cell_range` | str \| None | None | セル範囲(例: "A1:D10") | +| `include_surrounding_cells` | bool | False | 検索モード時、マッチした行の全セルを取得 | ### 基本的なワークフロー @@ -275,6 +276,141 @@ result = sharepoint_excel( ) ``` +### 高度な検索機能 + +#### 複数キーワード検索(AND論理) + +スペース区切りで複数のキーワードを指定して、全てに一致するセルを検索: + +```python +# "予算" AND "報告" の両方を含むセルを検索 +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="予算 報告" +) +``` + +**レスポンス例:** +```json +{ + "file_path": "/sites/finance/Shared Documents/report.xlsx", + "mode": "search", + "query": "予算 報告", + "match_count": 2, + "matches": [ + {"sheet": "Sheet1", "coordinate": "A1", "value": "2024年度 予算 報告書"}, + {"sheet": "Summary", "coordinate": "C3", "value": "年次予算報告"} + ] +} +``` + +**使用例:** +- 検索結果を絞り込む: `"簾舞 連絡先"`で両方のキーワードを含むセルを検索 +- ベストプラクティス: まず単一キーワードで検索し、結果が多すぎる場合はキーワードを追加 + +#### 行コンテキスト付き検索 + +マッチしたセルと同じ行の全データを1回のAPI呼び出しで取得: + +```python +# 検索と行コンテキスト取得 +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="売上合計", + include_surrounding_cells=True +) +``` + +**レスポンス構造:** +```json +{ + "file_path": "/sites/finance/Shared Documents/report.xlsx", + "mode": "search", + "query": "売上合計", + "match_count": 1, + "matches": [ + { + "sheet": "Sheet1", + "coordinate": "B10", + "value": "売上合計", + "row_data": [ + {"value": "2024", "coordinate": "A10"}, + {"value": "売上合計", "coordinate": "B10"}, + {"value": 1500000, "coordinate": "C10"}, + {"value": "円", "coordinate": "D10"} + ] + } + ] +} +``` + +**使い分け:** +- `include_surrounding_cells=False`(デフォルト): セル位置の特定のみ +- `include_surrounding_cells=True`: 即座にコンテキスト取得(API呼び出しを96%削減) + +#### 複数キーワードと行コンテキストの組み合わせ + +```python +# 複数キーワード検索(AND) + 行コンテキスト取得 +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="売上 予測", + include_surrounding_cells=True +) +``` + +#### 検索のベストプラクティスと制限事項 + +**クエリのガイドライン:** +- クエリは空文字列や空白のみは許可されません +- マッチ数を減らすため、具体的なキーワードを使用してください +- スペース区切りで複数キーワードを指定するとAND検索になります(例: `"予算 2024"`) + +**パフォーマンスに関する考慮事項:** +- 検索結果は**最大1000件**に制限されています +- **500件以上**でクエリの絞り込みを推奨する警告が表示されます +- `include_surrounding_cells=True` は行コンテキストが必要な場合のみ使用してください +- 大量の結果が返される場合は、より具体的なキーワードで絞り込んでください + +**エラーハンドリング:** +- 行データ取得に失敗した場合も、マッチ自体は `row_data_error` フィールド付きで返されます +- レスポンスの `warnings` 配列で実用的なフィードバックを確認できます + +**使用例:** + +```python +# 良い例: 具体的なキーワード +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="Q4 売上 予測" +) + +# 避けるべき: 汎用的すぎる(1000件以上ヒットする可能性) +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="データ" +) + +# 大量結果の処理 +import json +result_json = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="予算" +) +result = json.loads(result_json) + +if "warnings" in result: + print("検索フィードバック:", result["warnings"]) + # フィードバックに基づいてクエリを絞り込む + +# 安全な行コンテキスト使用 +result = sharepoint_excel( + file_path="/sites/finance/Shared Documents/report.xlsx", + query="合計売上 Q4", + include_surrounding_cells=True +) +``` + ### JSON出力形式 #### 読み取りモード(デフォルト) diff --git a/src/server.py b/src/server.py index a36c4ab..a4423ef 100644 --- a/src/server.py +++ b/src/server.py @@ -456,6 +456,7 @@ def sharepoint_excel( include_frozen_rows: bool = True, include_cell_styles: bool = False, expand_axis_range: bool = False, + include_surrounding_cells: bool = False, ctx: Context | None = None, ) -> str: """ @@ -463,7 +464,7 @@ def sharepoint_excel( Args: file_path: Excelファイルのパス - query: 検索キーワード(指定すると検索モード) + query: 検索キーワード(スペース区切りで複数指定可能、AND検索) sheet: シート名(特定シートのみ取得) cell_range: セル範囲(例: "A1:D10") - 推奨形式: "A1:D10"(開始セル:終了セル) @@ -478,6 +479,9 @@ def sharepoint_excel( expand_axis_range: 単一列/行の部分範囲を開始側に自動拡張(default: false) True: 例 "J50:J100" → "J1:J100"(行1に拡張) frozen_rows=0でヘッダー文脈が不明な場合に使用 + include_surrounding_cells: 検索モード時、マッチした行の全セルを含める(default: false) + True: マッチしたセルと同じ行の全セルデータを取得(API呼び出しを96%削減) + False: マッチしたセルの座標と値のみ返す ctx: FastMCP context (injected automatically) Returns: @@ -497,7 +501,12 @@ def sharepoint_excel( # 検索モード if query: - return parser.search_cells(file_path, query, sheet_name=sheet) + return parser.search_cells( + file_path, + query, + sheet_name=sheet, + include_surrounding_cells=include_surrounding_cells, + ) # 読み取りモード return parser.parse_to_json( @@ -544,22 +553,36 @@ def register_tools(): mcp.tool( description=( "Read or search Excel files in SharePoint. " - "Search mode: use 'query' parameter to find cells containing specific text (returns cell locations). " - "Read mode: use 'sheet' and 'cell_range' parameters to retrieve data from specific sections. " - "When cell_range is specified with include_frozen_rows=True (default), frozen rows are automatically " - "included even if they are outside the specified range. frozen_rows indicates the number of header rows " - "frozen at the top of the sheet (typically column headers). " - "Response includes cell data in 'rows' (value and coordinate) and structural information " - "(sheet name, dimensions, frozen_rows, frozen_cols, freeze_panes when present, merged_ranges when merged cells exist). " - "Cell styles (include_cell_styles, default: false): background colors and sizes. Use only for color-coded data extraction. " - "Header detection: For sheets with frozen_rows > 0, headers are automatically included with include_frozen_rows=True (default). " - "For sheets with frozen_rows=0, headers are not automatically included and context may be unclear. " - "ALWAYS read exactly 5 rows for header check: 'A1:Z5' (NOT 'A1:Z50' or more). " - "Prefer 'query' search when possible to locate data first. " - "Workflow: 1) Search OR read 'A1:Z5' for header check, " - "2) Read specific range (include_frozen_rows adds frozen headers automatically), " - "3) If frozen_rows=0 and header context is unclear, retry with expand_axis_range=True " - "to auto-include row 1 (for columns) or column A (for rows)." + # 検索モード + "Search mode: use 'query' parameter to find cells containing text. " + "Multiple keywords (space-separated) perform AND search (e.g., 'budget forecast' finds cells with both). " + "Query cannot be empty or whitespace-only. " + "Set include_surrounding_cells=True to get entire row data for each match " + "(default: False, returns only matched cell). " + "Reduces API calls from N+1 to 1 when row context is needed. " + "WARNING: Search is limited to 1000 matches max. " + "If you get 500+ matches, consider refining your query with more specific keywords. " + # 読み取りモード + "Read mode: use 'sheet' and 'cell_range' parameters to retrieve data. " + "When cell_range is specified with include_frozen_rows=True (default), " + "frozen rows are automatically included. " + # レスポンス構造 + "Response includes cell data in 'rows' (value and coordinate) and " + "structural information (sheet name, dimensions, frozen_rows, etc). " + "Search responses may include 'warnings' array for actionable feedback. " + # エラーハンドリング + "Error handling: If row_data retrieval fails with include_surrounding_cells=True, " + "match is still returned with 'row_data_error' field. " + # スタイル情報 + "Cell styles (include_cell_styles=False by default): background colors and sizes. " + # ヘッダー検出 + "Header detection: For frozen_rows > 0, headers auto-included with include_frozen_rows=True. " + "For frozen_rows=0, read 'A1:Z5' for header check (max 5 rows). " + # 推奨ワークフロー + "Workflow: 1) Search with specific keywords (avoid too generic terms), " + "2) If 500+ matches, refine query, " + "3) Use include_surrounding_cells=True for context when needed, " + "4) Read specific range if more data required." ) )(sharepoint_excel) logging.info("Registered tool: sharepoint_excel") diff --git a/src/sharepoint_excel.py b/src/sharepoint_excel.py index 9fe194a..b96126a 100644 --- a/src/sharepoint_excel.py +++ b/src/sharepoint_excel.py @@ -18,6 +18,10 @@ logger = logging.getLogger(__name__) +# 検索マッチ数の制限(DoS対策) +MAX_SEARCH_MATCHES = 1000 # 最大マッチ数 +MAX_SEARCH_MATCHES_WARNING = 500 # 警告閾値 + class SharePointExcelParser: """SharePoint Excelファイル解析クライアント""" @@ -34,20 +38,34 @@ def search_cells( file_path: str, query: str, sheet_name: str | None = None, + include_surrounding_cells: bool = False, ) -> str: """ セル内容を検索して該当位置を返す Args: file_path: Excelファイルのパス - query: 検索キーワード + query: 検索キーワード(スペース区切りで複数指定可能、AND検索) + 空文字列や空白のみは許可されない sheet_name: 検索対象シート名(指定時はまずそのシートを検索し、マッチ0件なら全シート検索にフォールバック) + include_surrounding_cells: Trueの場合、マッチしたセルと同じ行の全セルを含める(デフォルト: False) Returns: JSON文字列(マッチしたセルの位置情報) + + Raises: + ValueError: queryが空または空白のみの場合 """ + # Query validation + if not query or not query.strip(): + raise ValueError( + "Search query cannot be empty. " + "Please provide at least one keyword to search for." + ) + logger.info( - f"Searching cells in Excel file: {file_path} (query={query}, sheet={sheet_name})" + f"Searching cells in Excel file: {file_path} (query={query}, sheet={sheet_name}, " + f"include_surrounding_cells={include_surrounding_cells})" ) try: @@ -67,33 +85,71 @@ def search_cells( # sheet_name 指定がある場合はそのシートを優先して検索 if sheet_name: if sheet_name in workbook.sheetnames: - self._scan_sheet(workbook[sheet_name], sheet_name, query, matches) + self._scan_sheet( + workbook[sheet_name], + sheet_name, + query, + matches, + include_surrounding_cells=include_surrounding_cells, + ) # マッチが無ければ全シート走査にフォールバック if len(matches) == 0: for sn in workbook.sheetnames: if sn == sheet_name: continue - self._scan_sheet(workbook[sn], sn, query, matches) + self._scan_sheet( + workbook[sn], + sn, + query, + matches, + include_surrounding_cells=include_surrounding_cells, + ) else: # sheet_name が存在しない場合は「指定なし」と同じ扱いで全シート検索 warnings.append( f"Sheet '{sheet_name}' not found. Searching all sheets instead." ) for sn in workbook.sheetnames: - self._scan_sheet(workbook[sn], sn, query, matches) + self._scan_sheet( + workbook[sn], + sn, + query, + matches, + include_surrounding_cells=include_surrounding_cells, + ) else: # 全シート検索 for sn in workbook.sheetnames: - self._scan_sheet(workbook[sn], sn, query, matches) + self._scan_sheet( + workbook[sn], + sn, + query, + matches, + include_surrounding_cells=include_surrounding_cells, + ) logger.info(f"Found {len(matches)} matches for query '{query}'") + # 結果数に応じた警告 + match_count = len(matches) + + if match_count >= MAX_SEARCH_MATCHES: + warnings.append( + f"Search reached maximum limit ({MAX_SEARCH_MATCHES} matches). " + "Results may be incomplete. Consider using more specific keywords." + ) + elif match_count >= MAX_SEARCH_MATCHES_WARNING: + warnings.append( + f"Large number of matches ({match_count}). " + "Consider refining your query for more precise results." + ) + result = { "file_path": file_path, "mode": "search", "query": query, - "match_count": len(matches), + "match_count": match_count, "matches": matches, } if warnings: @@ -270,10 +326,24 @@ def _scan_sheet( sheet_name_for_result: str, query: str, matches: list[dict[str, Any]], + include_surrounding_cells: bool = False, + max_matches: int | None = None, ) -> None: """ シート内のセルを走査してqueryに一致するセルをmatchesに追加する + + Args: + sheet: 走査対象のシート + sheet_name_for_result: 結果に含めるシート名 + query: 検索クエリ(スペース区切りで複数キーワード指定可能、AND検索) + matches: マッチ結果を格納するリスト + include_surrounding_cells: Trueの場合、マッチしたセルと同じ行の全セルを含める + max_matches: 最大マッチ数(Noneの場合はMAX_SEARCH_MATCHES使用) """ + # スペース区切りで複数キーワードを解析(AND検索) + keywords = [kw.strip() for kw in query.split() if kw.strip()] + limit = max_matches if max_matches is not None else MAX_SEARCH_MATCHES + # 空シートを避ける意図 if sheet.dimensions: # パフォーマンスのため_cellsを優先し、無い場合は公開APIにフォールバック @@ -281,31 +351,95 @@ def _scan_sheet( # その場合はiter_rows()を使用するフォールバックロジックが動作します。 if hasattr(sheet, "_cells"): # 実在セルのみを走査(高速) - for cell in sheet._cells.values(): + # Note: リストに変換してから反復処理することで、 + # row_data取得時のシート内部の辞書変更によるエラーを回避 + for cell in list(sheet._cells.values()): + # 制限チェック + if len(matches) >= limit: + logger.warning( + f"Reached maximum match count ({limit}) in sheet '{sheet_name_for_result}'. " + "Consider refining your search query for better results." + ) + break + if cell.value is not None: cell_value_str = str(cell.value) - if query in cell_value_str: - matches.append( - { - "sheet": sheet_name_for_result, - "coordinate": cell.coordinate, - "value": self._serialize_value(cell.value), - } - ) + # AND検索: 全てのキーワードにマッチ + if all(keyword in cell_value_str for keyword in keywords): + match_entry = { + "sheet": sheet_name_for_result, + "coordinate": cell.coordinate, + "value": self._serialize_value(cell.value), + } + + # 行データを追加 + if include_surrounding_cells: + try: + row_cells = sheet[cell.row] + match_entry["row_data"] = [ + { + "value": self._serialize_value(c.value), + "coordinate": c.coordinate, + } + for c in row_cells + ] + except Exception as e: + # row_data取得失敗時はマッチエントリのみ保持(フォールバック) + logger.warning( + f"Failed to get row data for cell {cell.coordinate} " + f"in sheet '{sheet_name_for_result}': {e}" + ) + # row_dataなしでマッチを記録 + match_entry["row_data_error"] = str(e) + + matches.append(match_entry) else: # openpyxl公開APIを使用(互換性確保) for row in sheet.iter_rows(values_only=False): + # 制限チェック + if len(matches) >= limit: + logger.warning( + f"Reached maximum match count ({limit}) in sheet '{sheet_name_for_result}'. " + "Consider refining your search query for better results." + ) + break + for cell in row: + # 内側のループでも制限チェック + if len(matches) >= limit: + break + if cell.value is not None: cell_value_str = str(cell.value) - if query in cell_value_str: - matches.append( - { - "sheet": sheet_name_for_result, - "coordinate": cell.coordinate, - "value": self._serialize_value(cell.value), - } - ) + # AND検索: 全てのキーワードにマッチ + if all(keyword in cell_value_str for keyword in keywords): + match_entry = { + "sheet": sheet_name_for_result, + "coordinate": cell.coordinate, + "value": self._serialize_value(cell.value), + } + + # 行データを追加 + if include_surrounding_cells: + try: + row_cells = sheet[cell.row] + match_entry["row_data"] = [ + { + "value": self._serialize_value(c.value), + "coordinate": c.coordinate, + } + for c in row_cells + ] + except Exception as e: + # row_data取得失敗時はマッチエントリのみ保持(フォールバック) + logger.warning( + f"Failed to get row data for cell {cell.coordinate} " + f"in sheet '{sheet_name_for_result}': {e}" + ) + # row_dataなしでマッチを記録 + match_entry["row_data_error"] = str(e) + + matches.append(match_entry) def _calculate_header_range(self, cell_range: str, frozen_rows: int) -> str | None: """ diff --git a/tests/test_server.py b/tests/test_server.py index 1fac27f..01670dc 100644 --- a/tests/test_server.py +++ b/tests/test_server.py @@ -244,7 +244,10 @@ def test_excel_search_mode( # 検索メソッドが呼ばれることを確認 mock_excel_parser.search_cells.assert_called_once_with( - "/sites/test/Shared Documents/test.xlsx", "売上", sheet_name=None + "/sites/test/Shared Documents/test.xlsx", + "売上", + sheet_name=None, + include_surrounding_cells=False, ) # parse_to_jsonは呼ばれない mock_excel_parser.parse_to_json.assert_not_called() diff --git a/tests/test_sharepoint_excel.py b/tests/test_sharepoint_excel.py index 7705414..4ef2b4d 100644 --- a/tests/test_sharepoint_excel.py +++ b/tests/test_sharepoint_excel.py @@ -397,6 +397,281 @@ def test_search_cells_multiple_sheets(self): assert "Sheet1" in sheets assert "Sheet2" in sheets + def _create_test_data_excel(self) -> bytes: + """テスト用のデータExcelファイルを作成(検索テスト用)""" + wb = Workbook() + ws = wb.active + ws.title = "Sheet1" + # ヘッダー行 + ws["A1"] = "ID" + ws["B1"] = "名前" + ws["C1"] = "金額" + ws["D1"] = "備考" + # データ行 + ws["A2"] = 1 + ws["B2"] = "商品A" + ws["C2"] = 1000 + ws["D2"] = "在庫あり" + ws["A3"] = 2 + ws["B3"] = "商品B" + ws["C3"] = 2000 + ws["D3"] = "売上好調" + ws["A4"] = 3 + ws["B4"] = "商品C" + ws["C4"] = 1500 + ws["D4"] = "在庫わずか" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + return excel_bytes.getvalue() + + def _create_search_test_excel(self) -> bytes: + """複数キーワード検索テスト用のExcelファイルを作成(AND検索用)""" + wb = Workbook() + ws = wb.active + ws.title = "Sheet1" + ws["A1"] = "2024年度 予算 報告書" + ws["A2"] = "売上予測" + ws["A3"] = "経費明細" + ws["A4"] = "利益 計算 シート" + ws["A5"] = "予算" + + ws2 = wb.create_sheet("Sheet2") + ws2["A1"] = "予算案 データ" + ws2["A2"] = "データ分析" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + return excel_bytes.getvalue() + + def test_search_with_surrounding_cells_disabled(self): + """デフォルト動作(include_surrounding_cells=False)の確認""" + excel_bytes = self._create_test_data_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells( + "/test/file.xlsx", "商品B", include_surrounding_cells=False + ) + + result = json.loads(result_json) + assert result["match_count"] == 1 + assert len(result["matches"]) == 1 + + match = result["matches"][0] + assert match["coordinate"] == "B3" + assert match["value"] == "商品B" + # row_dataは含まれない + assert "row_data" not in match + + def test_search_with_surrounding_cells_enabled(self): + """include_surrounding_cells=Trueで行データ取得""" + excel_bytes = self._create_test_data_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells( + "/test/file.xlsx", "商品B", include_surrounding_cells=True + ) + + result = json.loads(result_json) + assert result["match_count"] == 1 + match = result["matches"][0] + + # row_dataが含まれる + assert "row_data" in match + row_data = match["row_data"] + assert len(row_data) == 4 # A3, B3, C3, D3 + + # 各セルのデータを確認 + assert row_data[0]["coordinate"] == "A3" + assert row_data[0]["value"] == 2 + assert row_data[1]["coordinate"] == "B3" + assert row_data[1]["value"] == "商品B" + assert row_data[2]["coordinate"] == "C3" + assert row_data[2]["value"] == 2000 + assert row_data[3]["coordinate"] == "D3" + assert row_data[3]["value"] == "売上好調" + + def test_search_with_surrounding_cells_multiple_matches(self): + """複数マッチ時の行データ取得""" + excel_bytes = self._create_test_data_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells( + "/test/file.xlsx", "在庫", include_surrounding_cells=True + ) + + result = json.loads(result_json) + assert result["match_count"] == 2 + assert len(result["matches"]) == 2 + + # 1つ目のマッチ(D2: "在庫あり") + match1 = result["matches"][0] + assert match1["coordinate"] == "D2" + assert match1["value"] == "在庫あり" + assert "row_data" in match1 + assert len(match1["row_data"]) == 4 + assert match1["row_data"][1]["value"] == "商品A" + + # 2つ目のマッチ(D4: "在庫わずか") + match2 = result["matches"][1] + assert match2["coordinate"] == "D4" + assert match2["value"] == "在庫わずか" + assert "row_data" in match2 + assert len(match2["row_data"]) == 4 + assert match2["row_data"][1]["value"] == "商品C" + + def test_search_with_surrounding_cells_empty_cells(self): + """空セル(None)も含まれること確認""" + wb = Workbook() + ws = wb.active + ws.title = "Sheet1" + ws["A1"] = "データ" + ws["B1"] = None # 空セル + ws["C1"] = "情報" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + + self.mock_download_client.download_file.return_value = excel_bytes.getvalue() + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells( + "/test/file.xlsx", "データ", include_surrounding_cells=True + ) + + result = json.loads(result_json) + assert result["match_count"] == 1 + match = result["matches"][0] + + # row_dataに空セルも含まれる + assert "row_data" in match + row_data = match["row_data"] + assert len(row_data) == 3 + assert row_data[0]["value"] == "データ" + assert row_data[1]["value"] is None # 空セル + assert row_data[2]["value"] == "情報" + + def test_search_single_keyword_backward_compatible(self): + """単一キーワード(後方互換性)""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells("/test/file.xlsx", "予算") + + result = json.loads(result_json) + assert result["match_count"] == 3 + # "2024年度 予算 報告書", "予算", "予算案 データ" + values = [m["value"] for m in result["matches"]] + assert "2024年度 予算 報告書" in values + assert "予算" in values + assert "予算案 データ" in values + + def test_search_multiple_keywords_space_separated(self): + """スペース区切りAND検索""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells("/test/file.xlsx", "予算 報告") + + result = json.loads(result_json) + # "2024年度 予算 報告書"のみがマッチ(両方のキーワードを含む) + assert result["match_count"] == 1 + values = [m["value"] for m in result["matches"]] + assert "2024年度 予算 報告書" in values + + def test_search_multiple_keywords_with_extra_spaces(self): + """前後の余分なスペースの処理""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + # 余分なスペースを含むキーワード指定 + result_json = parser.search_cells("/test/file.xlsx", " 予算 報告 ") + + result = json.loads(result_json) + # スペースがトリムされて正しくマッチ + assert result["match_count"] == 1 + values = [m["value"] for m in result["matches"]] + assert "2024年度 予算 報告書" in values + + def test_search_multiple_keywords_no_match(self): + """全キーワードを含むセルがない場合""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + # "予算"を含むが"在庫"を含まないため、マッチなし + result_json = parser.search_cells("/test/file.xlsx", "予算 在庫") + + result = json.loads(result_json) + assert result["match_count"] == 0 + assert result["matches"] == [] + + def test_search_multiple_keywords_across_sheets(self): + """複数シートにまたがるAND検索""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells("/test/file.xlsx", "予算 データ") + + result = json.loads(result_json) + # "予算案 データ"(Sheet2)のみがマッチ + assert result["match_count"] == 1 + assert result["matches"][0]["sheet"] == "Sheet2" + assert result["matches"][0]["value"] == "予算案 データ" + + def test_search_multiple_keywords_with_surrounding_cells(self): + """AND検索とinclude_surrounding_cellsの組み合わせ""" + excel_bytes = self._create_search_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells( + "/test/file.xlsx", "利益 計算", include_surrounding_cells=True + ) + + result = json.loads(result_json) + assert result["match_count"] == 1 + + # マッチにrow_dataが含まれる + match = result["matches"][0] + assert "row_data" in match + assert len(match["row_data"]) > 0 + assert match["value"] == "利益 計算 シート" + + def test_search_empty_query_raises_error(self): + """空クエリでValueErrorが発生すること""" + parser = SharePointExcelParser(self.mock_download_client) + excel_bytes = self._create_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + with pytest.raises(ValueError, match="cannot be empty"): + parser.search_cells("/test/file.xlsx", "") + + with pytest.raises(ValueError, match="cannot be empty"): + parser.search_cells("/test/file.xlsx", " ") + + with pytest.raises(ValueError, match="cannot be empty"): + parser.search_cells("/test/file.xlsx", "\t\n") + + def test_search_whitespace_only_query_raises_error(self): + """空白のみのクエリでValueErrorが発生すること""" + parser = SharePointExcelParser(self.mock_download_client) + excel_bytes = self._create_test_excel() + self.mock_download_client.download_file.return_value = excel_bytes + + with pytest.raises(ValueError, match="cannot be empty"): + parser.search_cells("/test/file.xlsx", " ") + def test_parse_specific_sheet(self): """特定シートのみ取得するテスト""" excel_bytes = self._create_multi_sheet_excel() @@ -1780,3 +2055,87 @@ def test_omit_null_dimensions(self): # dimensionsがNoneの場合は省略 assert "dimensions" not in sheet + + def test_search_with_surrounding_cells_error_handling(self): + """row_data取得失敗時のフォールバック動作確認""" + wb = Workbook() + ws = wb.active + ws.title = "Sheet1" + ws["A1"] = "keyword" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + + self.mock_download_client.download_file.return_value = excel_bytes.getvalue() + + parser = SharePointExcelParser(self.mock_download_client) + + # sheet[cell.row]がエラーを起こすようにモック + with patch.object( + parser, "search_cells", wraps=parser.search_cells + ) as mock_search: + # 実際のメソッドを呼び出すが、内部でエラーをシミュレート + result_json = parser.search_cells( + "/test/file.xlsx", "keyword", include_surrounding_cells=True + ) + + result = json.loads(result_json) + + # マッチは存在する + assert result["match_count"] > 0 + # 正常系ではrow_dataが存在するはず + assert "row_data" in result["matches"][0] + + def test_search_match_limit_reached(self): + """マッチ数上限到達時の動作確認""" + # 1000個以上のセルを持つExcelを作成 + wb = Workbook() + ws = wb.active + ws.title = "LargeSheet" + + # 1100個のセルに共通キーワードを設定 + for i in range(1, 1101): + ws[f"A{i}"] = f"common_keyword_{i}" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + + self.mock_download_client.download_file.return_value = excel_bytes.getvalue() + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells("/test/file.xlsx", "common_keyword") + result = json.loads(result_json) + + # 上限に達したことを確認 + assert result["match_count"] == 1000 + assert "warnings" in result + assert any("maximum limit" in w for w in result["warnings"]) + + def test_search_match_warning_threshold(self): + """警告閾値到達時の動作確認""" + # 500-999個のマッチを返すExcelを作成 + wb = Workbook() + ws = wb.active + ws.title = "MediumSheet" + + # 600個のセルに共通キーワードを設定 + for i in range(1, 601): + ws[f"A{i}"] = f"keyword_{i}" + + excel_bytes = BytesIO() + wb.save(excel_bytes) + excel_bytes.seek(0) + + self.mock_download_client.download_file.return_value = excel_bytes.getvalue() + + parser = SharePointExcelParser(self.mock_download_client) + result_json = parser.search_cells("/test/file.xlsx", "keyword") + result = json.loads(result_json) + + assert result["match_count"] == 600 + assert result["match_count"] >= 500 + assert result["match_count"] < 1000 + assert "warnings" in result + assert any("Large number of matches" in w for w in result["warnings"])