Tidy's getting a bit large (4k lines!). So I'm thinking of moving all of the word list information/auditing over to a new project that I'm tentatively calling WLA.
My question: Should I now remove all this list attributes/information code from Tidy? Tidy users, thanks to the magic of Unix philosophy/piping, can simply pipe Tidy's output over to wla, which will perform the same as running tidy -AAAA:
tidy -D t eff.txt | wla
We could remove not only all the attribute printing code, but also the -G/-g options, that I bet are potentially confusing for users when comparing them to the -D/-d options!
Besides reducing codebase size, this also allows WLA to act more like a "true" auditor of word lists. One issue with having Tidy serve as both a word list creator and a word list auditing tool is that, if a list had duplicate or blank lines, Tidy would quietly remove these before printing attributes, which is kind of a lie. For example:
tidy -A counts this as 3 words, rather 4 or 5, since it automatically removes duplicates and blank lines before calculating attribute values, like list length.
In contrast, we can have wla count this as 5 "lines", then warn users that there are both blank and duplicate lines present.
Tidy's getting a bit large (4k lines!). So I'm thinking of moving all of the word list information/auditing over to a new project that I'm tentatively calling WLA.
My question: Should I now remove all this list attributes/information code from Tidy? Tidy users, thanks to the magic of Unix philosophy/piping, can simply pipe Tidy's output over to
wla, which will perform the same as runningtidy -AAAA:tidy -D t eff.txt | wlaWe could remove not only all the attribute printing code, but also the
-G/-goptions, that I bet are potentially confusing for users when comparing them to the-D/-doptions!Besides reducing codebase size, this also allows WLA to act more like a "true" auditor of word lists. One issue with having Tidy serve as both a word list creator and a word list auditing tool is that, if a list had duplicate or blank lines, Tidy would quietly remove these before printing attributes, which is kind of a lie. For example:
tidy -Acounts this as 3 words, rather 4 or 5, since it automatically removes duplicates and blank lines before calculating attribute values, like list length.In contrast, we can have
wlacount this as 5 "lines", then warn users that there are both blank and duplicate lines present.