If you want to check if the FSSn is real the check digit is matched to the remainder of xxxxxxzzz / 31. The check digit can be numbers from 0-9 followed by letters A, B, C, D, E, F, H, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y. If the remainder is a number between 0-9 then it's matched to the numbers 0-9, if the remainder is from 10-30 it's matched to the letters in the order they are given. So for example 10 is A and 20 is M.
It makes sense to drop O, I, and Z from the alphabet because they can be confused to 0, 1 and 2, but quite often handwritten S and 5 get mixed up.
The algorithm in R would be the following
#Given x a vector of FSSn's finID.real = function(x){ #if the amount of characters of a certain FSSn is not 11 #or some FSSn is NA we change them as "" if(sum(nchar(x) != 11 | is.na(x)) >= 1) FSSn[nchar(x) != 11 | is.na(x) >= 1] = "" check.char = c(0,1,2,3,4,5,6,7,8,9,LETTERS) #LETTERS equals the whole alphabet check.char = check.char[-c(17,19,25,27,36)] #We remove the unwated letters last.chars = substr(x,11,11) #The check digits x = matrix(x,nrow=length(x)) #The standard trick so that apply works. x = apply(x, 1, function(x) { # as.integer("5") gives out 5. x = as.integer(paste(substr(x,1,6),substr(x,8,10),sep="")) %% 31 x = x+1 #We add +1 because vectors in R begin from 1 not 0. return(x) } ) #the ifelse return either the check.char or NA #which is then matched to the last character bol = last.chars == ifelse(!is.na(x), check.char[x], NA) if(sum(is.na(bol)) >= 1) bol[is.na(bol)] = FALSE #Chancing NAs to FALSE return(bol) #Returning a TRUE / FALSE vector. }
No comments:
Post a Comment