Bootstrapping FL Projects

Overview

Bootstrapping is the process of configuring a Flower FL project to work with Syft-Flwr. This transforms a standard Flower project into a privacy-preserving federated learning setup that can run across multiple datasites.

Prerequisites

Before bootstrapping, you need:

An existing Flower project with pyproject.toml
Valid datasite email addresses for participants
An aggregator (server) email address
Python 3.9 or higher

Basic Bootstrap Process

Using the CLI

The simplest way to bootstrap a project is using the syft_flwr CLI:

syft_flwr bootstrap /path/to/flower-project \
  --aggregator data-scientist@openmined.org \
  --datasites data-owner-1@hospital.org,data-owner-2@clinic.com

Interactive Mode

If you don’t provide the required arguments, the CLI will prompt you:

syft_flwr bootstrap /path/to/flower-project
# Enter the datasite email of the Aggregator (Flower Server): data-scientist@openmined.org
# Enter a comma-separated email of datasites of the Flower Clients: data-owner-1@hospital.org,data-owner-2@clinic.com

Using Python API

from pathlib import Path
from syft_flwr.bootstrap import bootstrap

project_dir = Path("./my-fl-project")
aggregator = "data-scientist@openmined.org"
datasites = [
    "data-owner-1@hospital.org",
    "data-owner-2@clinic.com"
]

bootstrap(
    flwr_project_dir=project_dir,
    aggregator=aggregator,
    datasites=datasites
)

Transport Configuration

Syft-Flwr supports two transport mechanisms:

SyftBox (Default)
P2P (Colab/Drive)

Uses local SyftBox client with RPC and end-to-end encryption.

bootstrap(
    flwr_project_dir=project_dir,
    aggregator=aggregator,
    datasites=datasites,
    transport="syftbox"  # Default
)

Features:

End-to-end encryption enabled by default
Uses SyftBox Go client for file sync
RPC communication via futures database
Best for production deployments

Uses peer-to-peer file sync via Google Drive or OneDrive.

bootstrap(
    flwr_project_dir=project_dir,
    aggregator=aggregator,
    datasites=datasites,
    transport="p2p"
)

Features:

File-based message passing
Works in Google Colab notebooks
No encryption (relies on Drive security)
Great for experimentation

Auto-Detection

When transport=None, Syft-Flwr automatically detects the environment:

bootstrap(
    flwr_project_dir=project_dir,
    aggregator=aggregator,
    datasites=datasites,
    transport=None  # Auto-detect
)
# In Colab: Uses "p2p"
# Otherwise: Uses "syftbox"

What Bootstrap Does

1. Validates Project Structure

Bootstrap verifies:

pyproject.toml exists in the project directory
main.py doesn’t already exist (to avoid overwriting)
All email addresses are valid datasites

# bootstrap.py:86-98
def __validate_flwr_project_dir(flwr_project_dir: Union[str, Path]) -> Path:
    flwr_pyproject = flwr_project_dir / "pyproject.toml"
    flwr_main_py = flwr_project_dir / "main.py"

    if flwr_main_py.exists():
        raise FileExistsError(f"File '{flwr_main_py}' already exists")

    if not flwr_project_dir.exists():
        raise FileNotFoundError(f"Directory '{flwr_project_dir}' not found")

    if not flwr_pyproject.exists():
        raise FileNotFoundError(f"File '{flwr_pyproject}' not found")

2. Updates pyproject.toml

Bootstrap adds Syft-Flwr configuration to your pyproject.toml:

[project]
name = "my-fl-project"
dependencies = [
    "syft_flwr==0.1.0",  # Added automatically
    # ... your existing dependencies
]

[tool.syft_flwr]
app_name = "aggregator@example.com_my-fl-project_1234567890"
datasites = ["client1@example.com", "client2@example.com"]
aggregator = "aggregator@example.com"
transport = "syftbox"  # or "p2p"

[tool.flwr.app.config]
partition-id = 0
num-partitions = 1

3. Generates main.py

Creates the entry point that orchestrates your FL workflow:

# Generated main.py structure
import sys
from pathlib import Path
from syft_flwr.run import syftbox_run_flwr_client, syftbox_run_flwr_server

def main():
    project_dir = Path(__file__).parent
    
    if "-s" in sys.argv or "--server" in sys.argv:
        syftbox_run_flwr_server(project_dir)
    else:
        syftbox_run_flwr_client(project_dir)

if __name__ == "__main__":
    main()

Project Structure

After bootstrapping, your project structure looks like:

my-fl-project/
├── pyproject.toml          # Updated with syft_flwr config
├── main.py                 # Generated entry point
├── my_fl_project/
│   ├── __init__.py
│   ├── server_app.py      # Your Flower ServerApp
│   ├── client_app.py      # Your Flower ClientApp
│   └── task.py            # Your ML logic
└── README.md

Email Validation

Datasite and aggregator emails must be valid:

# Valid examples
aggregator = "data-scientist@openmined.org"  # ✓
datasites = [
    "client1@hospital.edu",  # ✓
    "user@clinic.co.uk"      # ✓
]

# Invalid examples
aggregator = "invalid-email"           # ✗ No @ or domain
datasites = ["@example.com"]           # ✗ No local part

Complete Example

Here’s a full workflow from creating a Flower project to bootstrapping:

Create Flower Project

flwr new my-fl-project --framework pandas
cd my-fl-project

Bootstrap with Syft-Flwr

syft_flwr bootstrap . \
  --aggregator data-scientist@openmined.org \
  --datasites hospital1@med.org,hospital2@med.org

Verify Configuration

cat pyproject.toml | grep syft_flwr
ls main.py  # Should exist now

Troubleshooting

”main.py already exists”

Bootstrap won’t overwrite existing main.py. Remove it first:

rm main.py
syft_flwr bootstrap .

“Invalid datasite” Error

Ensure all emails match the pattern name@domain.tld:

# bootstrap.py:131-136
if not is_valid_datasite(aggregator):
    raise ValueError(f"'{aggregator}' is not a valid datasite")

for ds in datasites:
    if not is_valid_datasite(ds):
        raise ValueError(f"{ds} is not a valid datasite")

“pyproject.toml not found”

Make sure you’re in a valid Flower project directory:

ls pyproject.toml  # Must exist
cat pyproject.toml | grep "tool.flwr"  # Should have Flower config

Next Steps

Run Simulations

Test your bootstrapped project locally

Transport Configuration

Deep dive into transport options

Multi-Client Setup

Deploy across multiple datasites

Get Started

Core Concepts

Tutorials

Guides

Bootstrapping FL Projects

Overview

Prerequisites

Basic Bootstrap Process

Using the CLI

Interactive Mode

Using Python API

Transport Configuration

Auto-Detection

What Bootstrap Does

1. Validates Project Structure

2. Updates pyproject.toml

3. Generates main.py

Project Structure

Email Validation

Complete Example

Troubleshooting

”main.py already exists”

“Invalid datasite” Error

“pyproject.toml not found”

Next Steps

Run Simulations

Transport Configuration

Multi-Client Setup

Get Started

Core Concepts

Tutorials

Guides

Documentation Index

​Overview

​Prerequisites

​Basic Bootstrap Process

​Using the CLI

​Interactive Mode

​Using Python API

​Transport Configuration

​Auto-Detection

​What Bootstrap Does

​1. Validates Project Structure

​2. Updates pyproject.toml

​3. Generates main.py

​Project Structure

​Email Validation

​Complete Example

​Troubleshooting

​”main.py already exists”

​“Invalid datasite” Error

​“pyproject.toml not found”

​Next Steps

Run Simulations

Transport Configuration

Multi-Client Setup

Overview

Prerequisites

Basic Bootstrap Process

Using the CLI

Interactive Mode

Using Python API

Transport Configuration

Auto-Detection

What Bootstrap Does

1. Validates Project Structure

2. Updates pyproject.toml

3. Generates main.py

Project Structure

Email Validation

Complete Example

Troubleshooting

”main.py already exists”

“Invalid datasite” Error

“pyproject.toml not found”

Next Steps