This document provides a high-level overview of the String Extractor's architecture, directory structure, and internal APIs.
The main entry point of the application is src/cli/strings.ml. It handles command-line argument parsing using Core.Command, sets up the Lwt runtime, and initiates the file traversal process.
/
├── src/
│ ├── cli/ # Main CLI application logic
│ │ ├── strings.ml # CLI entry point, traversal coordination
│ │ ├── vue.ml # Vue-specific parsing and extraction logic
│ │ └── generate.ml # Localization file generation (.strings, .json)
│ ├── parsing/ # Core parsers using Angstrom and Flow
│ │ ├── basic.ml # Common parsing utilities and combinators
│ │ ├── js_ast.ml # Flow AST walker for string extraction
│ │ ├── js.ml # JavaScript string extraction entry point
│ │ ├── pug.ml # Native Pug template parsing
│ │ ├── html.ml # HTML template parsing
│ │ ├── astro.ml # Native Astro file scanning (frontmatter, I18n, expressions)
│ │ ├── strings.ml # .strings file parsing logic
│ │ └── ... # Other specialized parsers (vue blocks, styles)
│ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing
│ │ ├── quickjs.ml # OCaml FFI to QuickJS
│ │ ├── quickjs.cpp # C++ implementation of the bridge
│ │ └── parsers.js # JS-based parsers running in QuickJS
│ └── utils/ # Shared utility modules
│ ├── collector.ml # State container for collected strings/errors
│ ├── io.ml # I/O helpers
│ ├── timing.ml # Performance measurement
│ └── exception.ml # Exception handling
├── strings/ # Directory where .strings files are managed
├── dune-project # Dune build system configuration
└── README.md # Project overview and usage instructions
Strings.main: Coordinates the entire run, including directory traversal and result generation.Vue.parse: Splits a.vuefile into its constituent parts (template, script, style).Generate.write_english: Createsenglish.stringsandenglish.jsonfrom the collected strings.Generate.write_other: Updates existing translations for other languages.
Parsing.Basic: Provides foundational Angstrom parsers for whitespace, strings, and standard error handling.Parsing.Js.extract_to_collector: Entry point for scanning JavaScript source code.Parsing.Js_ast.extract: A comprehensive walker for the Flow AST that identifies and extracts strings fromL("...")calls.Parsing.Pug.collect: Traverses the native Pug AST to extract strings.Parsing.Astro.parser/Parsing.Astro.collect: Native Angstrom scanner for.astrofiles. Segments a file into frontmatter,<script>blocks,{...}expressions (brace matching respects strings, template literals, and comments), and<I18n>/<i18n>blocks.collectenqueues I18n slot text intostrings, all code segments intopossible_scripts(always parsed as TSX so JSX in expressions works), re-scans expressions containing<I18n/<i18nfor nested I18n blocks, and emits a non-fatal warning when I18n text contains{placeholders}without theis:rawdirective.parsertakesunitbecause it uses an internal shared buffer — create a fresh parser per file.Parsing.Strings.parse: Parses existing.stringsfiles into a lookup table. Takes aLwt_io.input_channeland returns astring Core.String.Table.t Lwt.t.
Quickjs.extract_to_collector: Offloads extraction to QuickJS for TypeScript and advanced Pug templates.
Utils.Collector.create: Initializes a new string collection state for a specific file. (typet = { path: string; strings: string Queue.t; possible_scripts: string Queue.t; file_errors: string Queue.t; warnings: string Queue.t })Utils.Collector.render_errors/Utils.Collector.render_warnings: Render collected errors (❌, fatal) and warnings (⚠️, non-fatal) for terminal output.Utils.Collector.blit_transfer: Merges results from one collector into another.
- Initiation:
strings.exestarts, parses CLI flags, and identifies the target directory. - Traversal: Uses
Lwtto cooperatively walk the directory tree viaLwt_listandLwt_pool. - Dispatch: For each supported file extension, the corresponding parser in
src/parsingis invoked. - Collection: Parsers find strings (usually inside
L()) and add them to aCollector.t. - Generation:
Generate.mlaggregates strings from all collectors and updates thestrings/directory.
The project implements a multi-layered testing strategy:
- Inline Tests: Using
ppx_inline_test(e.g.let%test_unit) together withppx_assert(e.g.[%test_eq]), logic can be tested directly within the source files. This is primarily used for parser validation insrc/parsing/. - Standard Test Suite: Located in
tests/test_runner.ml, this suite runs the inline tests viappx_inline_testand usesppx_assertto verify:- JavaScript string extraction via
Flow_parser. - HTML extraction via
SZXXand Pug extraction viaAngstrom. - Apple-style
.stringsfile parsing (viaLwt_main.runandLwt_io).
- JavaScript string extraction via
- Integration Testing: The
tests/fixtures/directory contains sample files of all supported types (includingdemo.astro). The CLI can be run against these fixtures to verify end-to-end extraction and output generation (.stringsand.jsonfiles).
The tests/dune file configures the test library and enables inline tests for the module.