Apache Airflow - Workflow Orchestration Platform
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring data workflows and ETL pipelines.
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring data workflows and ETL pipelines.
Apache Flink is a distributed stream processing framework for stateful computations over unbounded and bounded data streams.
Apache Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of commodity hardware.
Apache Hive is a data warehouse infrastructure built on top of Apache Hadoop that provides SQL-like querying capabilities for large-scale …
Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant real-time data pipelines and streaming …
Apache Spark is a multi-language engine for large-scale data processing, machine learning, and streaming analytics.
Apache Superset is a modern, open-source data exploration and visualization platform designed for interactive analytics and dashboard …
Azure HDInsight is a managed cloud service for running open-source big data frameworks including Apache Spark, Hadoop, Hive, HBase, and …
A comprehensive reference for Chroma: the open-source embedding database for AI applications, local development, and lightweight production …
ClickHouse is an open-source columnar database management system optimized for real-time analytical queries on large datasets.
dbt (data build tool) is an open-source transformation framework that enables analytics engineers to transform data in warehouses using SQL …
DuckDB is an in-process analytical database management system designed for fast OLAP queries on local data without requiring a separate …
Eclipse Mosquitto is an open-source lightweight MQTT message broker for implementing publish/subscribe messaging in IoT and M2M …
Grafana is an open-source analytics and interactive visualization platform for monitoring data from Prometheus, Elasticsearch, InfluxDB, and …
Great Expectations is an open-source Python library for validating, documenting, and profiling data to ensure data quality in pipelines.
A comprehensive reference for Hugging Face: the model hub, Transformers library, datasets, and deployment options for open-source AI models.
Hugging Face Transformers is an open-source library providing thousands of pretrained models for NLP, computer vision, audio, and multimodal …
InfluxDB is an open-source time series database designed for high-write-throughput storage and real-time querying of timestamped data from …
Keycloak is an open-source identity and access management solution providing single sign-on, user federation, and identity brokering for …
Knative is an open-source platform that extends Kubernetes to provide serverless workload management with automatic scaling to zero and …
Kubeflow is an open-source machine learning platform that makes deploying, scaling, and managing ML workflows on Kubernetes simple and …
Metabase is an open-source business intelligence tool that enables non-technical users to ask questions about data and visualize results …
MinIO is a high-performance, S3-compatible object storage system designed for large-scale AI and data infrastructure workloads.
A comprehensive reference for MLflow: experiment tracking, model registry, deployment, and lifecycle management for enterprise ML and AI …
Neo4j is an open-source native graph database that stores and queries data as nodes and relationships, optimized for connected data …
Novu is an open-source notification infrastructure platform for managing multi-channel notifications across email, SMS, push, in-app, and …
Ollama is an open-source tool for running large language models locally on personal hardware with a simple command-line interface.
Whisper is an open-source automatic speech recognition model by OpenAI that provides robust, multilingual speech-to-text transcription.
OpenFaaS is an open-source framework for building and deploying serverless functions and microservices on Kubernetes and Docker Swarm.
OpenTelemetry is a vendor-neutral open-source observability framework for generating, collecting, and exporting telemetry data (traces, …
Prefect is an open-source workflow orchestration framework that makes it easy to build, observe, and react to data pipelines using Python.
Prometheus is an open-source systems monitoring and alerting toolkit designed for reliability, featuring a dimensional data model and …
A comprehensive reference for Qdrant: vector similarity search, payload filtering, collection management, and deployment patterns for …
Rasa is an open-source framework for building contextual AI assistants and chatbots with natural language understanding and dialogue …
spaCy is an open-source library for advanced natural language processing in Python, designed for production use with fast, accurate NLP …
Supabase is an open-source backend-as-a-service platform providing a PostgreSQL database, authentication, real-time subscriptions, storage, …
Temporal is an open-source durable execution platform for building reliable, long-running workflows and distributed applications.
Tesseract is an open-source optical character recognition engine that extracts text from images and scanned documents in over 100 languages.
TimescaleDB is an open-source time-series database built as a PostgreSQL extension, optimized for fast ingest and complex queries on …
vLLM is an open-source library for high-throughput, low-latency serving of large language models using PagedAttention memory management.
A comprehensive reference for Weaviate: open-source vector search, hybrid retrieval, generative search modules, and self-hosted deployment …