2013-03-03

RMark: Parallel fast merging

Last time we looked at a different method of merging multiple files via fast mergin g. I highly recommend checking the fast merging function before we will translate it to a parallel solution. Parallel programming in this case means we are using more then one processor core. So even if you have a fancy 8 core machine R only uses one core if you don't use parallel functions.
In R parallel programming can be done for example with snow package. When we change our function from 1 core solution to a multicore solution we use snow packages parLapply function instead of lapply function. This change will magically make our independent loops run faster without doing any major changes to the script. As parLapply behaves almost exactly the same as lapply. So let us present the script first, the explanation will be given after that. There's alot more case handling in multicore case as we can't call our multicore solution if we only need one core.



For example if we have 100 frames to merge and a 4 core machine. We choose a section size of 25 parts (nparts=25). Which means that we have 4 sections. Section 1 will be frames from 1-25, section 2 will be frames 26-50, and so on. Now we tell the first core to merge the frames from section 1, the second core will merge frames outlined in section 2, and so on. After that is done we only have 4 frames, which can we will merge using the standard way. With this fast merging should atleast two times faster when using 2 cores. Let us see if that is the case, unfortunately I only have 4 cores.

2 comments:

  1. https://stackoverflow.com/questions/61027511/inner-join-on-large-datasets-using-mc-fast-merging-function

    please help

    ReplyDelete
  2. When I initially commented I appear to have clicked on the -Notify me when new comments are added- checkbox and now each time a comment is added I receive four emails with the exact same comment. There has to be a way you are able to remove me from that service? Many thanks!

    ReplyDelete