Another post on the proliferation of tools. I'm writing this to just keep up-to-date on what the cool kids are talking about.
Feature Store
- Feature management: "features used for one model can be used in another"
- Feature computation: "feature engineering logic, after being defined, needs to be computed... If the computation of this features isn't too expeisnive, it might be acceptable computing this feature each time [or] you might want to execute it only once the first time it is required, then store it"
- Feature consistency: "feature definitions written in Python during development might need to be converted into the language used in production... Modern feature stores... unify the logic for both batch features and streaming."
MLFlow is an open source project from Databricks. By putting its dependencies in your code, you can serve models via HTTP. However, you must still create a Docker container for its server.
Metaflow was open sourced by Netflix. It's a framework that creates a DAG of the pipeline a data scientist may use.
Kubeflow is an open source offering from Google.
Seldon Core helps to deploy pickled SKLearn models. Because the models are pickled, any pre-processing of the input must be done before calling it. For example, you'd call the served model with something like:
curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \ -X POST http://localhost:8003/seldon/seldon/sklearn/api/v1.0/predictions \ -H "Content-Type: application/json"
(from here).
Development Tools
Apache TVM is an "An end to end Machine Learning Compiler Framework for CPUs, GPUs and accelerators".
No comments:
Post a Comment