Skip to content

feat(cpex-scraper): add file logger for better debugging and tracing#4376

Open
faizkhairi wants to merge 3 commits into
nusmodifications:masterfrom
faizkhairi:feat/cpex-scraper-file-logger
Open

feat(cpex-scraper): add file logger for better debugging and tracing#4376
faizkhairi wants to merge 3 commits into
nusmodifications:masterfrom
faizkhairi:feat/cpex-scraper-file-logger

Conversation

@faizkhairi
Copy link
Copy Markdown

Summary

  • Add a lightweight file logger (src/logger.ts) that writes timestamped lines to both stdout and a log file
  • Log files are written to scrapers/cpex-scraper/logs/ with format cpex-YYYY-MM-DD.HH-mm-ss.log
  • The logs directory is created automatically at runtime (already covered by root .gitignore)
  • Update src/index.ts to use the file logger instead of the default console
  • Zero new dependencies -- uses only Node.js built-in fs module
  • Satisfies the existing Pick<Console, 'log'> interface, so scraper.ts is unchanged

Related Issue

Closes #4371

Test Plan

  • Run pnpm --filter nusmods-cpex-scraper build and verify compilation succeeds
  • Run the scraper and verify log file is created in logs/ directory
  • Verify console output still shows timestamped log lines
  • Verify existing unit tests still pass (pnpm --filter nusmods-cpex-scraper test)

Add a lightweight file logger that writes timestamped lines to both
stdout and a log file in the logs/ directory. The log directory is
created automatically at runtime (already gitignored by root .gitignore).

This matches the logging pattern used by the nus-v2 scraper, making
it easier to debug and trace CPEx scraper runs.

- Add src/logger.ts with createFileLogger() utility
- Update src/index.ts to use the file logger instead of console
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 24, 2026

@faizkhairi is attempting to deploy a commit to the modsbot's projects Team on Vercel.

A member of the Team first needs to authorize it.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.39%. Comparing base (988c6fd) to head (2ab9d26).
⚠️ Report is 227 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4376      +/-   ##
==========================================
+ Coverage   54.52%   56.39%   +1.86%     
==========================================
  Files         274      317      +43     
  Lines        6076     6962     +886     
  Branches     1455     1679     +224     
==========================================
+ Hits         3313     3926     +613     
- Misses       2763     3036     +273     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Fix log directory path: '../logs' resolved to build/logs/ at runtime
  (inside gitignored build dir). Changed to '../../logs' to resolve to
  scrapers/cpex-scraper/logs/ (the scraper root)
- Make close() return Promise to ensure all buffered data is flushed
  to disk before process exits
- Await close() in both success and error paths in index.ts
- Add error handler on write stream to prevent unhandled crash
@faizkhairi
Copy link
Copy Markdown
Author

faizkhairi commented Mar 24, 2026

Note on testing: I was unable to run the full CI suite locally (pnpm run ci) as I used a clone. Happy to add unit tests for logger.ts if desired, the main cases would be:

  1. log() writes timestamped lines to both stdout and the file stream
  2. close() resolves after the stream is fully flushed
  3. Constructor creates the logs directory if it doesn't exist
  4. Stream error handler prevents unhandled crashes

The existing scraper.test.ts tests are unaffected since FileLogger is only instantiated in index.ts, not in the testable scraper.ts.

Let me know if you'd like me to add these tests.

@jloh02 jloh02 self-assigned this May 18, 2026
@jloh02
Copy link
Copy Markdown
Member

jloh02 commented May 18, 2026

@greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Confidence Score: 3/5

Safe to merge for normal runs, but the promise chain in index.ts has a structural flaw that could produce a misleading exit code and an unhandled rejection when the write stream itself errors.

The logger logic is straightforward and the happy path works correctly. The concern is in index.ts: attaching .catch() after .then() means any failure inside the .then() close call is caught by the same handler that handles scrape failures — the stream close error is reported as 'Failed to scrape', the exit code is set to 1 even though the scrape succeeded, and a second close() call on an already-destroyed stream can leave an unhandled rejection.

scrapers/cpex-scraper/src/index.ts — the .then()/.catch() promise chain needs attention for the logger close handling.

Important Files Changed

Filename Overview
scrapers/cpex-scraper/src/index.ts Wires the new FileLogger into the scrape entry point; the .then()/.catch() structure double-closes the logger stream and can produce an unhandled rejection + wrong exit code on stream failure.
scrapers/cpex-scraper/src/logger.ts New file adding timestamped dual-output (stdout + file) logger; minor: redundant existsSync guard before mkdirSync({ recursive: true }), and pad2 is duplicated from scraper.ts.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[index.ts starts] --> B[createFileLogger]
    B --> C{logs/ dir exists?}
    C -- No --> D[mkdirSync recursive]
    C -- Yes --> E[createWriteStream cpex-timestamp.log]
    D --> E
    E --> F[scrapeCPEx runs]
    F --> G{resolves?}
    G -- Yes --> H[logger.close in .then]
    H --> I{close succeeds?}
    I -- Yes --> J[Done]
    I -- No --> K[.catch fires with stream error]
    K --> L[logger.log misleading error]
    K --> M[logger.close double-close]
    M --> N{close succeeds?}
    N -- No --> O[Unhandled rejection]
    N -- Yes --> P[process.exitCode = 1 wrong]
    G -- No --> Q[.catch fires with scrape error]
    Q --> R[logger.log error]
    R --> S[logger.close]
    S --> T[process.exitCode = 1]

    style K fill:#ffcccc
    style O fill:#ff6666
    style P fill:#ffcccc
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
scrapers/cpex-scraper/src/index.ts:23-29
**Double-close on stream failure, wrong exit code**

If `logger.close()` in the `.then()` block rejects (e.g. the write stream was already destroyed due to a disk error), the `.catch()` handler fires with that stream error as `error`. This causes three problems at once: `logger.log(...)` writes to the already-errored stream (silently dropped), `logger.close()` is called a second time on a destroyed stream (which can reject again with no further handler → unhandled promise rejection), and `process.exitCode = 1` is set even though the scrape itself succeeded.

The standard fix is to use `.finally()` so `close()` runs exactly once regardless of outcome.

### Issue 2 of 2
scrapers/cpex-scraper/src/logger.ts:39-41
The `existsSync` guard is redundant here because `mkdirSync` with `{ recursive: true }` is a no-op when the directory already exists — it never throws in that case. The extra check adds a TOCTOU window (another process could remove the directory between the check and the mkdir) without buying anything.

```suggestion
  fs.mkdirSync(LOG_DIR, { recursive: true });
```

Reviews (1): Last reviewed commit: "Merge branch 'master' into feat/cpex-scr..." | Re-trigger Greptile

Comment on lines +23 to 29
}).then(async () => {
await logger.close();
}).catch(async (error) => {
logger.log(`Failed to scrape: ${error}`);
await logger.close();
process.exitCode = 1;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double-close on stream failure, wrong exit code

If logger.close() in the .then() block rejects (e.g. the write stream was already destroyed due to a disk error), the .catch() handler fires with that stream error as error. This causes three problems at once: logger.log(...) writes to the already-errored stream (silently dropped), logger.close() is called a second time on a destroyed stream (which can reject again with no further handler → unhandled promise rejection), and process.exitCode = 1 is set even though the scrape itself succeeded.

The standard fix is to use .finally() so close() runs exactly once regardless of outcome.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scrapers/cpex-scraper/src/index.ts
Line: 23-29

Comment:
**Double-close on stream failure, wrong exit code**

If `logger.close()` in the `.then()` block rejects (e.g. the write stream was already destroyed due to a disk error), the `.catch()` handler fires with that stream error as `error`. This causes three problems at once: `logger.log(...)` writes to the already-errored stream (silently dropped), `logger.close()` is called a second time on a destroyed stream (which can reject again with no further handler → unhandled promise rejection), and `process.exitCode = 1` is set even though the scrape itself succeeded.

The standard fix is to use `.finally()` so `close()` runs exactly once regardless of outcome.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +39 to +41
if (!fs.existsSync(LOG_DIR)) {
fs.mkdirSync(LOG_DIR, { recursive: true });
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The existsSync guard is redundant here because mkdirSync with { recursive: true } is a no-op when the directory already exists — it never throws in that case. The extra check adds a TOCTOU window (another process could remove the directory between the check and the mkdir) without buying anything.

Suggested change
if (!fs.existsSync(LOG_DIR)) {
fs.mkdirSync(LOG_DIR, { recursive: true });
}
fs.mkdirSync(LOG_DIR, { recursive: true });
Prompt To Fix With AI
This is a comment left during a code review.
Path: scrapers/cpex-scraper/src/logger.ts
Line: 39-41

Comment:
The `existsSync` guard is redundant here because `mkdirSync` with `{ recursive: true }` is a no-op when the directory already exists — it never throws in that case. The extra check adds a TOCTOU window (another process could remove the directory between the check and the mkdir) without buying anything.

```suggestion
  fs.mkdirSync(LOG_DIR, { recursive: true });
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add file logger for CPEx scraper

2 participants