About 50 results
Open links in new tab
  1. Mastering Spark: RDDs vs. DataFrames | Miles Cole

    Oct 10, 2024 · While both RDDs and DataFrames use lazy evaluation, DataFrames benefit from query optimization, where the Catalyst optimizer can reorganize and compress transformation steps for …

  2. To V-Order or Not: Making the Case for Selective Use of V-Order in ...

    Sep 17, 2024 · Fabric Spark Runtimes currently enable V-Order optimization by default as a Spark configuration. V-Order is a Parquet write optimization that seeks to logically organize data based on …

  3. Automating V-Order: A Targeted Approach for Direct Lake Models

    Jan 31, 2025 · I’ve previously blogged in detail about V-Order optimization. In this post, I want to revisit the topic and demonstrate how V-Order can be strategically enabled in a programmatic fashion.

  4. Miles Cole | A Microsoft data & analytics blog

    A Microsoft data & analytics blog Last December (2024) I published a blog seeking to explore the question of whether data engineers in Microsoft Fabric should ditch Spark for DuckDb or Polars. Six …

  5. Cluster Configuration Secrets for Spark: Unlocking Parallel Processing ...

    Feb 19, 2024 · Something I’ve always found challenging in PaaS Spark platforms, such as Databricks and Microsoft Fabric, is efficiently leveraging compute resources to maximize parallel job execution …

  6. Querying Databases in Apache Spark: Pandas vs. Spark API vs. Pandas …

    Jan 24, 2024 · Apache Spark offers tremendous capability, regardless of the implementation—be it Microsoft Fabric or Databricks. However, with vast capabilities comes the risk of using the wrong …

  7. Page 2 of 8 for Miles Cole | A Microsoft data & analytics blog

    I’ve previously blogged in detail about V-Order optimization. In this post, I want to revisit the topic and demonstrate how V-Order can be strategically enabled in a programmatic fashion.

  8. Mastering Spark: Session vs. DataFrameWriter vs. Table Configs

    Dec 20, 2024 · V-Order Example There are exceptions to the standard precedence rule for transient writer configs. In the example below, we have V-Order enabled at the session level, but when writing …

  9. Page 4 of 8 for Miles Cole | A Microsoft data & analytics blog

    Fabric Spark Runtimes currently enable V-Order optimization by default as a Spark configuration. V-Order is a Parquet write optimization that seeks to logically organize data based on the same storage …

  10. Mastering Spark: The Art and Science of Table Compaction

    Feb 26, 2025 · If there anything that data engineers agree about, it’s that table compaction is important. Often one of the first big lessons that folks will learn early on is that not compacting tables can …