Question:
Write a function that determines if each characters in a String appear no more than once.
There are three main types of encoding - UTF and ASCII.
Unicode is used by almost all operating systems to define the internal text coding system. Unicode assigns each character a unique number (its code point). One of the mapping methods used is UTF. UTF-8 takes a minimum of 8 bits and a maximum of 32 bits per character. This means there are 1,111,998 possible characters.
ASCII is more limited towards English language, and has 128 possible characters. Each char is assigned a code point from 0 to 127.
Let's assume ASCII for simplicity.
If our string is much longer than 128 characters long, we can use a boolean array of size 128. Each character encountered in our string we will flag as true for that index. If we encounter an array cell that is already set to true, then this means that the character has already appeared, so we return false. This way, the maximum number iterations will be 128.
The time complexity is O(n), but the space complexity is O(k) where k is the size of number of characters the encoding may generate.
static boolean isUniqueArray(String str) {
// string can't be longer than num encoding chars
if (str.length() > NUMBER_OF_CHARS) {
return false;
}
boolean[] array = new boolean[NUMBER_OF_CHARS];
int curr;
for (int i = 0; i < str.length(); i++) {
curr = (int) str.charAt(i);
if (array[curr]) {
return false;
} else {
array[curr] = true;
}
}
return true;
}
If the string is much shorter than the amount of characters in the character encoding, it would be good to use a bag or set. This way, the number of iterations is kept, at most, the length of the String.
Here's the algorithm pseudocode:
c
.c
is in our set. If not, insert. If it is, return false.The time complexity remains the same as above O(n), but the space complexity is lowered to O(l), where l is the length of our string.
static boolean isUniqueSet(String str) {
// string can't be longer than num encoding chars
if (str.length() > NUMBER_OF_CHARS) {
return false;
}
HashSet<Character> charSet = new HashSet<>();
Character c;
for (int i = 0; i < str.length(); i++) {
c = str.charAt(i);
if (charSet.contains(c)) {
return false;
} else {
charSet.add(c);
}
}
return true;
}
In case you aren't allowed to use additional space, we can implement an algorithm that compares each character to all other characters in the array. We can do this easily with two for-loops.
Although this approach takes quadratic time (O(n2)), there is no extra auxillary space needed.
static boolean isUniqueCheck(String str) {
// string can't be longer than num encoding chars
if (str.length() > NUMBER_OF_CHARS) {
return false;
}
char c;
for (int i = 0; i < str.length(); i++) {
c = str.charAt(i);
for (int j = i+1; j < str.length(); j++) {
if (c == str.charAt(j))
return false;
}
}
return true;
}
You may also sort the array, then see if any character repeats itself. Depending on the algorithm used to sort, you may or may not use auxillary space.
The time and space complexity depends on the algorithm, but we can bring the time complexity down to O(n log n) and space down to O(1).
static boolean isUniqueSort(String str) {
// string can't be longer than num encoding chars
if (str.length() > NUMBER_OF_CHARS) {
return false;
}
char[] charArray = str.toCharArray();
Arrays.sort(charArray);
for (int i = 0; i < charArray.length - 1; i++) {
if (charArray[i] == charArray[i+1])
return false;
}
return true;
}
Came up with a better solution or have a question? Comment below!