Is there an easy-to-use python module that'd do english or finnish text validation?
It'd be ok if I could just check the words exist in user-defined dictionary and possibly checking that the grammar is somewhat okay.
I am planning to implement a fancy validation for a directory contents I did while ago back. This involves some simple stuff like checking that the config scripts won't crash and does it all well. It's all quite easy otherwise.
For the validator I should just be able to input whole files or strings of unicode text.
-
I'm not sure what you're trying to do, but if you're looking for something that can say 'this is valid English' or 'this is valid Finnish', then you're looking at a class of problems that is quite likely unsolvable.
If not, then use a dictionary and/or letter frequencies and Bayesian analysis to determine whether or not given text is English-like or Finnish-like. If you're trying to auto-detect a language, this is likely the best route, although you'll run into problems with mixed-language text.
0 comments:
Post a Comment