liteidex/src/3rdparty/libucd/README.md
This library provides a highly accurate set of heuristics that attempt to determine the character set used to encode some input text. This is extremely useful when your program has to handle an input file which is supplied without any encoding metadata.
The original code of libucd was written by Netscape Communications Corporation, is available at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/, but unfortunately the Firefox project removed most of the encoding detecting functions in their latest version. While the multi-language detector is widely used in other open source projects. So it's urgent to maintain a standalone version of the library that supports most of the language detecting. And this project was setup, and now also extended more languages, utilities and packaging.
Pulls together:
libicu.We have a build system based on autoconf/automake, simply do this incantation:
./configure
make
It also supports building for Linux distributions, such as RedHat/CentOS, Debian/Ubuntu, Arch Linux etc.
RedHat/CentOS
./autogen.sh make rpm
Debian/Ubuntu
./autogen.sh debuild -c -uc -us
Pacman
cd pacman makepkg -Asf
Android
Add a line in your Android.mk file in the folder jni, for example:
include jni/libucd/Android.mk
and then run ndk-build
See libucd.h or man pages, and utils/sample.c for the example.
debian/, rpm/, pacman/
doc/
man/
include/
src/
utils/
test/
Wikipedia index pages in target languages, sometimes in multiple encodings. The pages were manually stripped of english and boilerplate content, in hope that the remaining is significant and typical text.
Used to check how the detection works.
langstats/
The library is subject to the GNU General Public License Version 2. Alternatively, the library may be used under the terms of the GNU Lesser General Public License 2.1.