About 270,000 results
Open links in new tab
  1. Sampling Queries - Spark 4.0.1 Documentation

    Sampling Queries Description The TABLESAMPLE statement is used to sample the table. It supports the following sampling methods: TABLESAMPLE (x ROWS): Sample the table down …

  2. pyspark.sql.DataFrame.sampleBy — PySpark 4.0.1 documentation

    fractionsdict sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero. seedint, optional random seed Returns a new DataFrame that represents the stratified …

  3. DataSketches - The Apache Software Foundation

    The first focuses on methods and theory for data sketching and sampling. The second focuses on application and includes code examples using the Apache DataSketches project.

  4. Basic Statistics - RDD-based API - Spark 4.0.1 Documentation

    Sampling without replacement requires one additional pass over the RDD to guarantee sample size, whereas sampling with replacement requires two additional passes. Find full example …

  5. Sample — sample • SparkR - Apache Spark

    Arguments x A SparkDataFrame withReplacement Sampling with replacement or not fraction The (rough) sample target fraction seed Randomness seed value. Default is a random seed.

  6. TABLESAMPLE Clause - The Apache Software Foundation

    Because the sampling works by selecting a random set of data files, the proportion of sampled data from the table may be greater than the specified percentage, based on the number and …

  7. Probability Distributions :: Apache Solr Reference Guide

    Sampling All probability distributions support sampling. The sample function returns one or more random samples from a probability distribution. Below is an example drawing a single sample …

  8. Returns a stratified sample without replacement — sampleBy

    Arguments x A SparkDataFrame col column that defines strata fractions A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero. …

  9. Reservoir Sampling Sketches - datasketches.apache.org

    Reservoir sampling provides a way to construct a uniform random sample of size k from an unweighted stream of items, without knowing the final length of the stream in advance.

  10. Data Sampling | Apache Kylin

    Aug 18, 2022 · Kylin provides the data sampling function to facilitate table data analysis. With data sampling, you can collect table characteristics, such as cardinality, max value, and min …