regex/docs/UpdatingUnicodeFiles.md
In order to update the Unicode data files, follow these steps:
dat in your current working directory. If updating to another version, replace 12.1.0 with the version you are aiming for.
UnicodeData.txt (https://www.unicode.org/Public/12.1.0/ucd/UnicodeData.txt)CaseFolding.txt (https://www.unicode.org/Public/12.1.0/ucd/CaseFolding.txt)SpecialCasing.txt (https://www.unicode.org/Public/12.1.0/ucd/SpecialCasing.txt)PropertyAliases.txt (https://www.unicode.org/Public/12.1.0/ucd/PropertyAliases.txt)PropertyValueAliases.txt (https://www.unicode.org/Public/12.1.0/ucd/PropertyValueAliases.txt)NameAliases.txt (https://www.unicode.org/Public/12.1.0/ucd/NameAliases.txt)ucd.nounihan.flat.xml (https://www.unicode.org/Public/12.1.0/ucdxml/ucd.nounihan.flat.zip)
emoji-data.txt (https://unicode.org/Public/emoji/12.0/emoji-data.txt)src/com.oracle.truffle.regex/tools/unicode-script.sh. This generates the following files in dat:
UnicodeFoldTable.txtNonUnicodeFoldTable.txtPythonFoldTable.txtsrc/com.oracle.truffle.regex/tools/generate_case_fold_table.clj >> src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/tregex/parser/CaseFoldTable.java to generate the new case fold tables and append them to CaseFoldTable.java. Then open CaseFoldTable.java in an editor to replace the old character data with the new definitions.java -jar clojure-1.8.0.jar --init src/com.oracle.truffle.regex/tools/generate_case_fold_table.clj --eval '(-main)'.src/com.oracle.truffle.regex/tools/generate_unicode_properties.py > src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/charset/UnicodePropertyData.java. This rewrites UnicodePropertyData.java to contain the new definitions of Unicode properties.main method of com.oracle.truffle.regex.charset.UnicodeGeneralCategoriesGenerator and replace src/com.oracle.truffle.regex/src/com/oracle/truffle/regex/charset/UnicodeGeneralCategories.java with its output.src/com.oracle.truffle.regex/tools/generate_ruby_case_folding.py and replace src/com/oracle/truffle/regex/tregex/parser/flavors/RubyCaseFoldingData.java with its output.src/com.oracle.truffle.regex/tools/generate_name_alias_table.py and replace src/com/oracle/truffle/regex/chardata/UnicodeCharacterAliases.java with its output.`mx eclipseformat to fix any code formatting issues.All of the above steps are automated by run_scripts.sh. This script assumes you have the following things installed: clojure, python3, wget, and unzip.