CI/CD Integration

Integrate Preclinical into your CI/CD pipeline to automatically test your AI agents before deployment.

Overview

Running Preclinical tests in CI/CD allows you to:

Catch regressions before they reach production
Enforce quality gates based on pass rates
Track agent performance over time
Block deployments that don’t meet safety standards

GitHub Actions

Basic Example

Create .github/workflows/preclinical.yml:

name: AI Agent Testing

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run Preclinical Tests
        env:
          PRECLINICAL_API_KEY: ${{ secrets.PRECLINICAL_API_KEY }}
          AGENT_ID: ${{ secrets.PRECLINICAL_AGENT_ID }}
        run: |
          # Start test run
          RESPONSE=$(curl -s -X POST https://app.preclinical.dev/api/v1/runs \
            -H "Authorization: Bearer $PRECLINICAL_API_KEY" \
            -H "Content-Type: application/json" \
            -d "{\"agent_id\": \"$AGENT_ID\", \"test_mode\": \"demo\"}")

          RUN_ID=$(echo $RESPONSE | jq -r '.id')
          echo "Started test run: $RUN_ID"

          # Poll for completion
          while true; do
            STATUS_RESPONSE=$(curl -s "https://app.preclinical.dev/api/v1/runs/$RUN_ID" \
              -H "Authorization: Bearer $PRECLINICAL_API_KEY")

            STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
            PASS_RATE=$(echo $STATUS_RESPONSE | jq -r '.pass_rate // 0')

            echo "Status: $STATUS, Pass Rate: $PASS_RATE%"

            if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] || [ "$STATUS" = "canceled" ]; then
              break
            fi

            sleep 10
          done

          # Check pass rate threshold
          THRESHOLD=80
          if [ $(echo "$PASS_RATE < $THRESHOLD" | bc -l) -eq 1 ]; then
            echo "❌ Pass rate ($PASS_RATE%) is below threshold ($THRESHOLD%)"
            exit 1
          fi

          echo "✅ Pass rate ($PASS_RATE%) meets threshold ($THRESHOLD%)"

With Reusable Script

Create scripts/run-preclinical-tests.sh:

#!/bin/bash
set -e

API_KEY=${PRECLINICAL_API_KEY:?API key required}
AGENT_ID=${1:?Agent ID required}
THRESHOLD=${2:-80}
TEST_MODE=${3:-demo}

BASE_URL="https://app.preclinical.dev/api/v1"

echo "🚀 Starting Preclinical test run..."

# Start test run
RESPONSE=$(curl -s -X POST "$BASE_URL/runs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"agent_id\": \"$AGENT_ID\", \"test_mode\": \"$TEST_MODE\"}")

RUN_ID=$(echo $RESPONSE | jq -r '.id')

if [ "$RUN_ID" = "null" ]; then
  echo "❌ Failed to start test run"
  echo $RESPONSE | jq
  exit 1
fi

echo "📋 Test run started: $RUN_ID"

# Poll for completion
while true; do
  RESULT=$(curl -s "$BASE_URL/runs/$RUN_ID" \
    -H "Authorization: Bearer $API_KEY")

  STATUS=$(echo $RESULT | jq -r '.status')
  PASS_RATE=$(echo $RESULT | jq -r '.pass_rate // 0')
  PASSED=$(echo $RESULT | jq -r '.passed_count // 0')
  FAILED=$(echo $RESULT | jq -r '.failed_count // 0')
  TOTAL=$(echo $RESULT | jq -r '.total_scenarios // 0')

  echo "  Status: $STATUS | Passed: $PASSED/$TOTAL | Pass Rate: $PASS_RATE%"

  case $STATUS in
    completed|failed|canceled)
      break
      ;;
  esac

  sleep 10
done

# Final results
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "📊 Final Results"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "  Pass Rate: $PASS_RATE%"
echo "  Passed: $PASSED"
echo "  Failed: $FAILED"
echo "  Threshold: $THRESHOLD%"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

# Check threshold
if [ $(echo "$PASS_RATE < $THRESHOLD" | bc -l) -eq 1 ]; then
  echo "❌ FAILED: Pass rate below threshold"
  exit 1
fi

echo "✅ PASSED: All quality gates met"

Then use it in your workflow:

name: AI Agent Testing

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Preclinical Tests
        run: ./scripts/run-preclinical-tests.sh ${{ secrets.PRECLINICAL_AGENT_ID }} 85 demo
        env:
          PRECLINICAL_API_KEY: ${{ secrets.PRECLINICAL_API_KEY }}

Using Webhooks for Async Testing

For longer test runs, use webhooks instead of polling:

name: AI Agent Testing (Async)

on:
  push:
    branches: [main]

jobs:
  start-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Start Preclinical Tests
        run: |
          curl -X POST https://app.preclinical.dev/api/v1/runs \
            -H "Authorization: Bearer ${{ secrets.PRECLINICAL_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{
              "agent_id": "${{ secrets.PRECLINICAL_AGENT_ID }}",
              "test_mode": "full"
            }'

Then configure a webhook to notify your GitHub repository when tests complete.

GitLab CI

# .gitlab-ci.yml
stages:
  - test

preclinical-test:
  stage: test
  image: alpine:latest
  before_script:
    - apk add --no-cache curl jq bc
  script:
    - |
      # Start test run
      RESPONSE=$(curl -s -X POST https://app.preclinical.dev/api/v1/runs \
        -H "Authorization: Bearer $PRECLINICAL_API_KEY" \
        -H "Content-Type: application/json" \
        -d "{\"agent_id\": \"$AGENT_ID\", \"test_mode\": \"demo\"}")

      RUN_ID=$(echo $RESPONSE | jq -r '.id')
      echo "Started test run: $RUN_ID"

      # Poll for completion
      while true; do
        RESULT=$(curl -s "https://app.preclinical.dev/api/v1/runs/$RUN_ID" \
          -H "Authorization: Bearer $PRECLINICAL_API_KEY")

        STATUS=$(echo $RESULT | jq -r '.status')
        PASS_RATE=$(echo $RESULT | jq -r '.pass_rate // 0')

        echo "Status: $STATUS, Pass Rate: $PASS_RATE%"

        if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
          break
        fi

        sleep 10
      done

      # Check threshold
      if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
        echo "Pass rate below threshold"
        exit 1
      fi
  variables:
    PRECLINICAL_API_KEY: $PRECLINICAL_API_KEY
    AGENT_ID: $PRECLINICAL_AGENT_ID

CircleCI

# .circleci/config.yml
version: 2.1

jobs:
  preclinical-test:
    docker:
      - image: cimg/base:stable
    steps:
      - run:
          name: Run Preclinical Tests
          command: |
            # Start test run
            RESPONSE=$(curl -s -X POST https://app.preclinical.dev/api/v1/runs \
              -H "Authorization: Bearer $PRECLINICAL_API_KEY" \
              -H "Content-Type: application/json" \
              -d "{\"agent_id\": \"$AGENT_ID\", \"test_mode\": \"demo\"}")

            RUN_ID=$(echo $RESPONSE | jq -r '.id')

            # Poll and check (simplified)
            sleep 60  # Wait for demo tests

            RESULT=$(curl -s "https://app.preclinical.dev/api/v1/runs/$RUN_ID" \
              -H "Authorization: Bearer $PRECLINICAL_API_KEY")

            PASS_RATE=$(echo $RESULT | jq -r '.pass_rate')

            if [ $(echo "$PASS_RATE < 80" | bc -l) -eq 1 ]; then
              exit 1
            fi

workflows:
  test:
    jobs:
      - preclinical-test

Best Practices

Use Demo Mode for PRs

Run demo mode (20 scenarios) for pull requests for faster feedback. Use full mode for main branch.

Store Secrets Securely

Never commit API keys. Use your CI provider’s secret management.

Set Appropriate Thresholds

Start with 80% pass rate and adjust based on your agent’s maturity level.

Use Webhooks for Long Runs

For full test runs (100+ scenarios), use webhooks instead of polling to avoid timeouts.

Environment Variables

Variable	Description
`PRECLINICAL_API_KEY`	Your API key (store as secret)
`PRECLINICAL_AGENT_ID`	UUID of the agent to test

Quality Gates

Example threshold configurations:

Environment	Test Mode	Threshold	Rationale
PR checks	demo	75%	Quick validation
Staging	demo	85%	Pre-production gate
Production	full	90%	High safety standard

Troubleshooting

Tests Timing Out

Increase the timeout in your CI configuration or use webhooks for notification instead of polling.

Rate Limiting

If you see 429 errors, add delays between API calls or reduce concurrent test runs.

Pass Rate Fluctuations

AI agents can have variable behavior. Consider:

Running multiple test iterations
Using rolling averages
Setting slightly lower thresholds with manual review for borderline results

Getting Started

Core Concepts

Integrations

Configuration

Overview

GitHub Actions

Basic Example

With Reusable Script

Using Webhooks for Async Testing

GitLab CI

CircleCI

Best Practices

Use Demo Mode for PRs

Store Secrets Securely

Set Appropriate Thresholds

Use Webhooks for Long Runs

Environment Variables

Quality Gates

Troubleshooting

Tests Timing Out

Rate Limiting

Pass Rate Fluctuations

Getting Started

Core Concepts

Integrations

Configuration

​Overview

​GitHub Actions

​Basic Example

​With Reusable Script

​Using Webhooks for Async Testing

​GitLab CI

​CircleCI

​Best Practices

Use Demo Mode for PRs

Store Secrets Securely

Set Appropriate Thresholds

Use Webhooks for Long Runs

​Environment Variables

​Quality Gates

​Troubleshooting

​Tests Timing Out

​Rate Limiting

​Pass Rate Fluctuations

Overview

GitHub Actions

Basic Example

With Reusable Script

Using Webhooks for Async Testing

GitLab CI

CircleCI

Best Practices

Environment Variables

Quality Gates

Troubleshooting

Tests Timing Out

Rate Limiting

Pass Rate Fluctuations