So when at work I had to use ATC codes to get the substances out or the pharmaceutical(?) use. So while no-one was looking I naturally had to devise the following script. This snippet is more or less done, a part of a imaginary whole if you will. It lacks the functionality to search for partial matches. So you can't just look for all the "B01" codes. Probably doing that extension next when I have the time. Also you can't use this for substance to ATC code. But for now this one way ticket, the basic concept as a snippet. Naturally comments about making it more readable and/or efficient are always welcome.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(gdata) | |
require("RCurl") | |
require("XML") | |
AtcSearch <- function(AtcCodes){ | |
#Function inside a function is bad practice... | |
#DummySearch searches for one atc code | |
DummySearch <- function(Code){ | |
#A link to WHO search | |
htmlLink <- paste("http://www.whocc.no/atc_ddd_index/?code=", Code, sep="") | |
doc <- htmlParse(getURL(htmlLink)) | |
Drug <- xpathSApply(doc, "//td", xmlValue) | |
#No idea why these three are needed | |
Drug <- gsub("Â", "", Drug) | |
Drug <- gsub("\\s", " ", Drug) | |
Drug <- trim(Drug) #gdata just for this... | |
#if no drug is found, returning NULL | |
if(sum(Drug == Code) == 0 | length(Drug) == 0) return() | |
atcLevels <- xpathSApply(doc, "//b", xmlValue) | |
out <- matrix(Drug[which(Drug == Code):length(Drug)], ncol= which(Drug == Code)-1, byrow=TRUE) | |
#Basic handling of multiple DDDs and Units | |
if(nrow(out) > 1) out <- t(apply(out, 2, function(x) paste(x[x != ""], collapse="/"))) | |
out <- data.frame(out, t(atcLevels)) | |
colnames(out) <- c(Drug[1:6], "Anatomical", "Therapeutic", "Subgroup", "ChemicalSubgroup") | |
out | |
} | |
#lapply to get all the codes. | |
ret <- lapply(AtcCodes, DummySearch) | |
do.call("rbind", ret) | |
} | |
#Some examples | |
AtcSearch("N01AX15") | |
AtcSearch(c("B01AA01", "N01AX15")) | |
#Not perfect for example, the who database isn't complete. | |
AtcSearch("QN01AX99") |
Hallo Xachriel,
ReplyDeleteI have been looking something like this for a while. I wonder whether you had the chance to go further with it and reverse the serach for example (name -> ATC class) or make a batch query ...
Great thanks
Andrea Zaliani