Package org.apache.uima.internal.util
Class CharacterUtils
java.lang.Object
org.apache.uima.internal.util.CharacterUtils
Collection of utilities for character handling. Contains utilities for semi-automatically
creating lexer rules.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
Represents character range. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static ArrayList<CharacterUtils.CharRange>
getCharacterRanges
(int[] charSpecs) static ArrayList<CharacterUtils.CharRange>
Generate an ArrayList of CharRanges for what Java considers to be a digit.static ArrayList<CharacterUtils.CharRange>
Generate an ArrayList of CharRanges for what Java considers to be a letter.private static final boolean
isType
(char c, int[] types) static void
static void
printAntlrLexRule
(String name, ArrayList<CharacterUtils.CharRange> charRanges) static void
printJavaCCLexRule
(String name, ArrayList<CharacterUtils.CharRange> charRanges) static String
toHexString
(char c) Create a hex representation of the UTF-16 encoding of a Java char.static String
toUnicodeChar
(char c) Create a hex representation of the UTF-16 encoding of a Java char.
-
Constructor Details
-
CharacterUtils
public CharacterUtils()Constructor for CharacterUtils.
-
-
Method Details
-
isType
private static final boolean isType(char c, int[] types) -
getCharacterRanges
-
toUnicodeChar
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by Java when reading source code.- Parameters:
c
- The char to be encoded.- Returns:
- String Hex representation of character. For example, the result of encoding
'A'
would be"A"
.
-
toHexString
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by the JavaCC lexer.- Parameters:
c
- The char to be encoded.- Returns:
- String Hex representation of character. For example, the result of encoding
'A'
would be"0x0041"
.
-
getLetterRange
Generate an ArrayList of CharRanges for what Java considers to be a letter. I use this as input to Unicode agnostic lexers like ANTLR.- Returns:
- ArrayList A list of character ranges.
-
getDigitRange
Generate an ArrayList of CharRanges for what Java considers to be a digit. I use this as input to Unicode agnostic lexers like ANTLR.- Returns:
- ArrayList A list of character ranges.
-
printAntlrLexRule
-
printJavaCCLexRule
-
main
-