If you want to check if the FSSn is real the check digit is matched to the remainder of xxxxxxzzz / 31. The check digit can be numbers from 0-9 followed by letters A, B, C, D, E, F, H, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y. If the remainder is a number between 0-9 then it's matched to the numbers 0-9, if the remainder is from 10-30 it's matched to the letters in the order they are given. So for example 10 is A and 20 is M.
It makes sense to drop O, I, and Z from the alphabet because they can be confused to 0, 1 and 2, but quite often handwritten S and 5 get mixed up.
The algorithm in R would be the following
#Given x a vector of FSSn's
finID.real = function(x){
#if the amount of characters of a certain FSSn is not 11
#or some FSSn is NA we change them as ""
if(sum(nchar(x) != 11 | is.na(x)) >= 1) FSSn[nchar(x) != 11 | is.na(x) >= 1] = ""
check.char = c(0,1,2,3,4,5,6,7,8,9,LETTERS) #LETTERS equals the whole alphabet
check.char = check.char[-c(17,19,25,27,36)] #We remove the unwated letters
last.chars = substr(x,11,11) #The check digits
x = matrix(x,nrow=length(x)) #The standard trick so that apply works.
x = apply(x, 1, function(x) {
# as.integer("5") gives out 5.
x = as.integer(paste(substr(x,1,6),substr(x,8,10),sep="")) %% 31
x = x+1 #We add +1 because vectors in R begin from 1 not 0.
return(x)
}
)
#the ifelse return either the check.char or NA
#which is then matched to the last character
bol = last.chars == ifelse(!is.na(x), check.char[x], NA)
if(sum(is.na(bol)) >= 1) bol[is.na(bol)] = FALSE #Chancing NAs to FALSE
return(bol) #Returning a TRUE / FALSE vector.
}
No comments:
Post a Comment