Class RDDSampleUtils

java.lang.Object
org.apache.sedona.core.utils.RDDSampleUtils

public class RDDSampleUtils extends Object
The Class RDDSampleUtils.
  • Constructor Details

    • RDDSampleUtils

      public RDDSampleUtils()
  • Method Details

    • getSampleNumbers

      public static int getSampleNumbers(int numPartitions, long totalNumberOfRecords, int givenSampleNumbers)
      Returns the number of samples to take to partition the RDD into specified number of partitions.

      Number of partitions cannot exceed half the number of records in the RDD.

      Returns total number of records if it is < 1000. Otherwise, returns 1% of the total number of records or twice the number of partitions whichever is larger. Never returns a number > Integer.MAX_VALUE.

      If desired number of samples is not -1, returns that number.

      Parameters:
      numPartitions - the num partitions
      totalNumberOfRecords - the total number of records
      givenSampleNumbers - the given sample numbers
      Returns:
      the sample numbers
      Throws:
      IllegalArgumentException - if requested number of samples exceeds total number of records or if requested number of partitions exceeds half of total number of records