pyspark.pandas.DataFrame.spark.repartition#
- spark.repartition(num_partitions)#
- Returns a new DataFrame partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned. - Parameters
- num_partitionsint
- The target number of partitions. 
 
- Returns
- DataFrame
 
 - Examples - >>> psdf = ps.DataFrame({"age": [5, 5, 2, 2], ... "name": ["Bob", "Bob", "Alice", "Alice"]}).set_index("age") >>> psdf.sort_index() name age 2 Alice 2 Alice 5 Bob 5 Bob >>> new_psdf = psdf.spark.repartition(7) >>> new_psdf.to_spark().rdd.getNumPartitions() 7 >>> new_psdf.sort_index() name age 2 Alice 2 Alice 5 Bob 5 Bob