pyspark.sql.functions.shuffle#
- pyspark.sql.functions.shuffle(col)[source]#
- Array function: Generates a random permutation of the given array. - New in version 2.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- The name of the column or expression to be shuffled. 
 
- col
- Returns
- Column
- A new column that contains an array of elements in random order. 
 
 - Notes - The shuffle function is non-deterministic, meaning the order of the output array can be different for each execution. - Examples - Example 1: Shuffling a simple array - >>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([([1, 20, 3, 5],)], ['data']) >>> df.select(sf.shuffle(df.data)).show() +-------------+ |shuffle(data)| +-------------+ |[1, 3, 20, 5]| +-------------+ - Example 2: Shuffling an array with null values - >>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([([1, 20, None, 3],)], ['data']) >>> df.select(sf.shuffle(df.data)).show() +----------------+ | shuffle(data)| +----------------+ |[20, 3, NULL, 1]| +----------------+ - Example 3: Shuffling an array with duplicate values - >>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([([1, 2, 2, 3, 3, 3],)], ['data']) >>> df.select(sf.shuffle(df.data)).show() +------------------+ | shuffle(data)| +------------------+ |[3, 2, 1, 3, 2, 3]| +------------------+ - Example 4: Shuffling an array with different types of elements - >>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([(['a', 'b', 'c', 1, 2, 3],)], ['data']) >>> df.select(sf.shuffle(df.data)).show() +------------------+ | shuffle(data)| +------------------+ |[1, c, 2, a, b, 3]| +------------------+