feat(components): add missing languages to Code Text Splitter#6290
feat(components): add missing languages to Code Text Splitter#6290deepak0x wants to merge 2 commits intoFlowiseAI:mainfrom
Conversation
The Code Text Splitter only exposed 16 languages natively supported by @langchain/textsplitters. The Python LangChain library supports many more. This adds 9 additional languages (c, csharp, cobol, elixir, haskell, kotlin, lua, powershell, ts) with custom separators ported from Python LangChain, while keeping existing languages on the native fromLanguage() path.
There was a problem hiding this comment.
Code Review
This pull request expands the CodeTextSplitter to support a variety of new programming languages by introducing custom separators and updating the language selection options. The review feedback highlights several language-specific inaccuracies in the separator lists, such as the use of 'class' in C and 'implements' in C#, and recommends refining the fallback logic in the initialization method to ensure better error handling.
C: replace class with struct/union/enum (C has no class keyword). C#: remove implements (C# uses :), add namespace and struct. Elixir: remove while (not a keyword in Elixir). Kotlin: remove case (Kotlin uses when). Fallback: return default splitter instead of calling fromLanguage with an unsupported language.
|
@HenryHengZJ — the Code Text Splitter was missing a bunch of languages that Python LangChain already supports. The JS New languages: C, C#, COBOL, Elixir, Haskell, Kotlin, Lua, PowerShell, TypeScript. The Gemini bot caught a few wrong keywords in the separator lists I ported — C doesn't have |
Proposed changes
The Code Text Splitter node only listed the 16 languages natively supported by the JS
@langchain/textsplitterspackage. The Python LangChain library supports many more languages that users have been requesting (C#, COBOL, Kotlin, TypeScript, etc). This PR adds 9 additional languages with custom separators ported from the Python LangChain source, while keeping existing languages on the nativefromLanguage()path so there is zero regression risk.New languages: c, csharp, cobol, elixir, haskell, kotlin, lua, powershell, ts
The dropdown is now sorted alphabetically for easier discovery.
Issue(s)
Closes #3752
How to test or reproduce
pnpm build && pnpm startTypes of changes
Checklist