Metadata-Version: 2.1
Name: dask-expr
Version: 1.1.19
Summary: High Level Expressions for Dask 
Maintainer-email: Matthew Rocklin <mrocklin@gmail.com>
License: BSD
Project-URL: Source code, https://github.com/dask-contrib/dask-expr/
Keywords: dask pandas
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: dask==2024.11.2
Requires-Dist: pyarrow>=14.0.1
Requires-Dist: pandas>=2
Provides-Extra: analyze
Requires-Dist: crick; extra == "analyze"
Requires-Dist: distributed; extra == "analyze"
Requires-Dist: graphviz; extra == "analyze"

Dask Expressions
================

Dask DataFrames with query optimization.

This is a rewrite of Dask DataFrame that includes query
optimization and generally improved organization.

More in our blog posts:
- [Dask Expressions overview](https://blog.dask.org/2023/08/25/dask-expr-introduction)
- [TPC-H benchmark results vs. Dask DataFrame](https://docs.coiled.io/blog/tpch.html)

Example
-------

```python
import dask_expr as dx

df = dx.datasets.timeseries()
df.head()

df.groupby("name").x.mean().compute()
```

Query Representation
--------------------

Dask-expr encodes user code in an expression tree:

```python
>>> df.x.mean().pprint()

Mean:
  Projection: columns='x'
    Timeseries: seed=1896674884
```

This expression tree will be optimized and modified before execution:

```python
>>> df.x.mean().optimize().pprint()

Div:
  Sum:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
  Count:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
```

Stability
---------

This is the default backend for dask.DataFrame since version 2024.3.0.

API Coverage
------------

Dask-Expr covers almost everything of the Dask DataFrame API. The only missing features are:

- named GroupBy Aggregations
