Available Graph Adapters#
Here we list available graph adapters
Use from hamilton import base to use these Graph Adapters:
Name |
What it does |
When you’d use it |
|---|---|---|
This executes the Hamilton dataflow locally on a machine in a single threaded, single process fashion. It assumes a pandas dataframe as a result. |
This is the default GraphAdapter that Hamilton uses. Use this when you want to execute on a single machine, without parallelization, and you want a pandas dataframe as output. |
|
This executes the Hamilton dataflow locally on a machine in a single threaded, single process fashion. It allows you to specify a ResultBuilder to control the return type of what |
This is the default GraphAdapter that Hamilton uses. Use this when you want to execute on a single machine, without parallelization, and you want to control the return type of the object that |
Experimental Graph Adapters#
The following are considered experimental; there is a possibility of their API changing. That said, the code is stable, and you should feel comfortable giving the code for a spin - let us know how it goes, and what the rough edges are if you find any.
Use from hamilton.experimental import h_[NAME] to use these Graph Adapters:
Name |
What it does |
When you’d use it |
|---|---|---|
This walks the graph and translates it to run onto Dask.
You have the ability to pass in a ResultMixin object to the constructor to control the return type that gets produce by running on Dask.
|
Use this if you want to utilize multiple cores on a single machine, or you want to scale to large data set sizes with a Dask cluster that you can connect to. |
|
This walks the graph and translates it to run onto Ray.
You have the ability to pass in a ResultMixin object to the constructor to control the return type that gets produce by running on Ray.
|
Use this if you want to utilize multiple cores on a single machine, or you want to scale to larger data set sizes with a Ray cluster that you can connect to. Note: you are still constrained by machine memory size with Ray; you can’t just scale to any dataset size. |
|
This walks the graph and translates it to run onto Apache Spark using the Pandas API on Spark (aka Koalas).
You only have the ability to return either a Koalas Dataframe or a Pandas Dataframe. To do that you either use the stock base.PandasDataFrameResult ResultMixin, or you use the h_spark.KoalasDataframeResult.
|
You’d generally use this if you have an existing spark cluster running in your workplace, and you want to scale to very large data set sizes.
Note this GraphAdapter has only been tested to work on Spark 3.2+ when Koalas became part of the standard Spark library.
|