Running Local Simulations

Overview

Simulations allow you to test your federated learning workflow locally before deploying to real datasites. Syft-Flwr simulates a multi-party FL environment using mock datasets and temporary client instances.

Prerequisites

A bootstrapped Syft-Flwr project (see Bootstrapping Projects)
Mock datasets prepared for testing
Python 3.9 or higher

Quick Start

Using the CLI

syft_flwr run /path/to/project \
  --mock-dataset-paths /data/client1,/data/client2

Interactive Mode

If you don’t provide dataset paths, the CLI will prompt you:

syft_flwr run ./my-fl-project
# Enter comma-separated paths to mock datasets: /data/hospital1,/data/hospital2

Using Python API

from pathlib import Path
from syft_flwr.run_simulation import run

project_dir = Path("./my-fl-project")
mock_datasets = [
    "/data/hospital1",
    "/data/hospital2"
]

success = run(project_dir, mock_datasets)
if success:
    print("Simulation completed successfully!")

How Simulations Work

1. Mock RDS Client Setup

Simulations create temporary RDS (Remote Data Store) clients for each participant:

# run_simulation.py:41-65
def _setup_mock_rds_clients(
    project_dir: Path, aggregator: str, datasites: list[str]
) -> tuple[Path, list[RDSClient], RDSClient]:
    """Setup mock RDS clients for the given project directory"""
    simulated_syftbox_network_dir = Path(tempfile.gettempdir(), project_dir.name)
    
    # Create aggregator client
    ds_syftbox_client = create_temp_client(
        email=aggregator, workspace_dir=simulated_syftbox_network_dir
    )
    ds_rds_client = init_session(
        host=aggregator, email=aggregator, syftbox_client=ds_syftbox_client
    )
    
    # Create data owner clients
    do_rds_clients = []
    for datasite in datasites:
        do_syftbox_client = create_temp_client(
            email=datasite, workspace_dir=simulated_syftbox_network_dir
        )
        do_rds_client = init_session(
            host=datasite, email=datasite, syftbox_client=do_syftbox_client
        )
        do_rds_clients.append(do_rds_client)
    
    return simulated_syftbox_network_dir, do_rds_clients, ds_rds_client

2. Encryption Bootstrap

By default, simulations use end-to-end encryption:

# run_simulation.py:68-130
def _bootstrap_encryption_keys(
    do_clients: list[RDSClient], ds_client: RDSClient
) -> None:
    """Bootstrap the encryption keys for all clients if encryption is enabled."""
    encryption_enabled = (
        os.environ.get(SYFT_FLWR_ENCRYPTION_ENABLED, "true").lower() != "false"
    )
    
    if not encryption_enabled:
        logger.warning("⚠️ Encryption disabled - skipping key bootstrap")
        return
    
    logger.info("🔐 Bootstrapping encryption keys for all participants...")
    
    # Bootstrap server and clients
    # Verify DID documents are accessible
    # ...

To disable encryption for testing:

export SYFT_FLWR_ENCRYPTION_ENABLED=false
syft_flwr run ./my-fl-project -m /data/client1,/data/client2

3. Concurrent Execution

Server and clients run concurrently using asyncio:

# run_simulation.py:169-231
async def _run_simulated_flwr_project(
    project_dir: Path,
    do_clients: list[RDSClient],
    ds_client: RDSClient,
    mock_dataset_paths: list[Union[str, Path]],
) -> bool:
    """Run all clients and server concurrently"""
    log_dir = project_dir / "simulation_logs"
    log_dir.mkdir(parents=True, exist_ok=True)
    
    main_py_path = project_dir / "main.py"
    
    # Start server
    ds_task = asyncio.create_task(
        _run_main_py(
            main_py_path,
            ds_client._syftbox_client.config_path,
            ds_client.email,
            log_dir,
        )
    )
    
    # Start clients
    client_tasks = []
    for client, mock_dataset_path in zip(do_clients, mock_dataset_paths):
        client_tasks.append(
            asyncio.create_task(
                _run_main_py(
                    main_py_path,
                    client._syftbox_client.config_path,
                    client.email,
                    log_dir,
                    mock_dataset_path,
                )
            )
        )
    
    # Wait for server to complete
    ds_return_code = await ds_task
    
    # Cancel client tasks when server completes
    for task in client_tasks:
        if not task.done():
            task.cancel()
    
    return ds_return_code == 0

Mock Dataset Configuration

Dataset Structure

Each mock dataset path should contain the data for one client:

/data/
├── hospital1/
│   ├── train.csv
│   └── test.csv
└── hospital2/
    ├── train.csv
    └── test.csv

Accessing Datasets in Client Code

Clients access their dataset via the DATA_DIR environment variable:

# client_app.py
import os
import pandas as pd
from pathlib import Path
from syft_flwr.utils import get_syftbox_dataset_path

def load_data():
    # Automatically uses DATA_DIR environment variable
    data_dir = get_syftbox_dataset_path()
    
    df_train = pd.read_csv(data_dir / "train.csv")
    df_test = pd.read_csv(data_dir / "test.csv")
    
    return pd.concat([df_train, df_test], ignore_index=True)

Dataset Path Validation

Simulation validates all dataset paths before execution:

# run_simulation.py:249-257
def _validate_mock_dataset_paths(mock_dataset_paths: list[str]) -> list[Path]:
    """Validate the mock dataset paths"""
    resolved_paths = []
    for path in mock_dataset_paths:
        path = Path(path).expanduser().resolve()
        if not path.exists():
            raise ValueError(f"Mock dataset path {path} does not exist")
        resolved_paths.append(path)
    return resolved_paths

Simulation Logs

Logs are saved to <project_dir>/simulation_logs/:

my-fl-project/
└── simulation_logs/
    ├── data-scientist@openmined.org.log  # Server logs
    ├── hospital1@med.org.log             # Client 1 logs
    └── hospital2@med.org.log             # Client 2 logs

Viewing Logs

# View server logs
cat my-fl-project/simulation_logs/data-scientist@openmined.org.log

# View all logs
tail -f my-fl-project/simulation_logs/*.log

Running in Different Environments

Command Line
Python Script
Jupyter Notebook

# Standard execution
syft_flwr run ./my-fl-project \
  --mock-dataset-paths /data/c1,/data/c2

Returns exit code 0 on success, 1 on failure.

from syft_flwr.run_simulation import run

success = run(
    project_dir="./my-fl-project",
    mock_dataset_paths=["/data/c1", "/data/c2"]
)

if success:
    print("Simulation passed!")
else:
    print("Simulation failed!")
    exit(1)

Returns True on success, False on failure.

from syft_flwr.run_simulation import run
import asyncio

# Run simulation (returns asyncio.Task in Jupyter)
task = run(
    project_dir="./my-fl-project",
    mock_dataset_paths=["/data/c1", "/data/c2"]
)

# Wait for completion
success = await task
print(f"Simulation {'passed' if success else 'failed'}")

Automatically detects event loop and returns a Task.

Complete Example

Here’s a full simulation workflow:

Prepare Mock Data

mkdir -p /tmp/mock_data/{hospital1,hospital2}

# Copy or generate mock datasets
cp hospital1_train.csv /tmp/mock_data/hospital1/train.csv
cp hospital1_test.csv /tmp/mock_data/hospital1/test.csv
cp hospital2_train.csv /tmp/mock_data/hospital2/train.csv
cp hospital2_test.csv /tmp/mock_data/hospital2/test.csv

Run Simulation

syft_flwr run ./fed-analytics-diabetes \
  --mock-dataset-paths /tmp/mock_data/hospital1,/tmp/mock_data/hospital2

Check Results

# Check if simulation succeeded
echo $?  # Should be 0

# Review logs
ls -la ./fed-analytics-diabetes/simulation_logs/
cat ./fed-analytics-diabetes/simulation_logs/*.log

Advanced Configuration

Skipping Module Validation

Useful for parallel test execution:

export SYFT_FLWR_SKIP_MODULE_CHECK=true
syft_flwr run ./my-fl-project -m /data/c1,/data/c2

Custom Temporary Directory

Simulations use /tmp by default. The directory is cleaned up automatically:

# run_simulation.py:292-312
async def main():
    try:
        run_success = await _run_simulated_flwr_project(...)
        if run_success:
            logger.success("Simulation completed successfully ✅")
        else:
            logger.error("Simulation failed ❌")
    finally:
        # Clean up the RDS stack
        remove_rds_stack_dir(simulated_syftbox_network_dir)
        # Remove config files and private keys
        remove_rds_stack_dir(simulated_syftbox_network_dir.parent / ".syftbox")

Troubleshooting

”Project directory does not exist”

Ensure the project is bootstrapped:

ls my-fl-project/main.py  # Must exist
ls my-fl-project/pyproject.toml

“Mock dataset path does not exist”

Verify all dataset paths:

ls /data/hospital1/  # Should contain train.csv, test.csv
ls /data/hospital2/

Simulation Hangs

Check logs for errors:

tail -f ./my-fl-project/simulation_logs/*.log

Common issues:

Client code has infinite loop
Server waiting for more clients than provided
Dataset loading errors

”FileNotFoundError: Path .data/ does not exist”

Ensure DATA_DIR environment variable is set correctly by the simulation runner:

# run_simulation.py:133-146
env = os.environ.copy()
env["SYFTBOX_CLIENT_CONFIG_PATH"] = str(config_path)
env["DATA_DIR"] = str(dataset_path)  # Set by simulation

Get Started

Core Concepts

Tutorials

Guides

Running Local Simulations

Overview

Prerequisites

Quick Start

Using the CLI

Interactive Mode

Using Python API

How Simulations Work

1. Mock RDS Client Setup

2. Encryption Bootstrap

3. Concurrent Execution

Mock Dataset Configuration

Dataset Structure

Accessing Datasets in Client Code

Dataset Path Validation

Simulation Logs

Viewing Logs

Running in Different Environments

Complete Example

Advanced Configuration

Skipping Module Validation

Custom Temporary Directory

Troubleshooting

”Project directory does not exist”

“Mock dataset path does not exist”

Simulation Hangs

”FileNotFoundError: Path .data/ does not exist”

Next Steps

Multi-Client Setup

Offline Training

Get Started

Core Concepts

Tutorials

Guides

Documentation Index

​Overview

​Prerequisites

​Quick Start

​Using the CLI

​Interactive Mode

​Using Python API

​How Simulations Work

​1. Mock RDS Client Setup

​2. Encryption Bootstrap

​3. Concurrent Execution

​Mock Dataset Configuration

​Dataset Structure

​Accessing Datasets in Client Code

​Dataset Path Validation

​Simulation Logs

​Viewing Logs

​Running in Different Environments

​Complete Example

​Advanced Configuration

​Skipping Module Validation

​Custom Temporary Directory

​Troubleshooting

​”Project directory does not exist”

​“Mock dataset path does not exist”

​Simulation Hangs

​”FileNotFoundError: Path .data/ does not exist”

​Next Steps

Multi-Client Setup

Offline Training

Overview

Prerequisites

Quick Start

Using the CLI

Interactive Mode

Using Python API

How Simulations Work

1. Mock RDS Client Setup

2. Encryption Bootstrap

3. Concurrent Execution

Mock Dataset Configuration

Dataset Structure

Accessing Datasets in Client Code

Dataset Path Validation

Simulation Logs

Viewing Logs

Running in Different Environments

Complete Example

Advanced Configuration

Skipping Module Validation

Custom Temporary Directory

Troubleshooting

”Project directory does not exist”

“Mock dataset path does not exist”

Simulation Hangs

”FileNotFoundError: Path .data/ does not exist”

Next Steps