Sunday, May 1, 2011

English and/or Finnish text validation.

Is there an easy-to-use python module that'd do english or finnish text validation?

It'd be ok if I could just check the words exist in user-defined dictionary and possibly checking that the grammar is somewhat okay.

I am planning to implement a fancy validation for a directory contents I did while ago back. This involves some simple stuff like checking that the config scripts won't crash and does it all well. It's all quite easy otherwise.

For the validator I should just be able to input whole files or strings of unicode text.

From stackoverflow
  • I'm not sure what you're trying to do, but if you're looking for something that can say 'this is valid English' or 'this is valid Finnish', then you're looking at a class of problems that is quite likely unsolvable.

    If not, then use a dictionary and/or letter frequencies and Bayesian analysis to determine whether or not given text is English-like or Finnish-like. If you're trying to auto-detect a language, this is likely the best route, although you'll run into problems with mixed-language text.

0 comments:

Post a Comment