I don't know how much it is efficient, as it depends on the current and future optimizations in the Spark's engine, but you can try doing the following:
In any case, if n is very large, this method is efficient in that it does not require to collect an array of the first n elements in the driver node.
The first function could be pulled up by the execution engine and influence the behavior of the whole processing. Give it a try.
The first function transforms the RDD into a pair (value, idx) with idx going from 0 onwards. The second function takes the element with idx==9 (the 10th). The third function takes the original value. Then the result is returned.
Unfortunately, zipWithIndex requires a full pass over the data to calculate the index offset of each partition. It is still probably your best bet though.