Building and Testing AI Website Creation Agents: A Practical Guide

Artificial intelligence (AI) agents are demonstrating remarkable capabilities, tackling complex tasks like generating entire websites from simple prompts. While sophisticated commercial tools require significant development effort, the fundamental principles are accessible. By combining a capable Large Language Model (LLM) focused on coding (like models from Anthropic or Google) with tools allowing file system interaction (reading and writing files), you can build a basic code-generating agent.

However, the path from a basic concept to a reliable tool is paved with challenges. User experience quickly reveals nuances: an agent might generate functional but aesthetically plain websites unless specifically prompted to consider the overall design and stylesheet early on. Models might occasionally refuse to use the provided tools, or use libraries and coding standards differently than desired.

Managing these complexities often involves refining the system prompt. But as the prompt, tools, and agent workflow grow intricate, testing becomes essential. A small change intended as an improvement might inadvertently break existing functionality. Manual re-testing of every scenario is time-consuming and inefficient.

This guide focuses not just on building an AI agent capable of website creation, but crucially, on making it fully testable from the start.

Setting Up Your Project

We’ll use Python and the Pydantic AI library. First, set up your environment and install the necessary package:

# Using uv (recommended)
uv init lovable-clone-project
cd lovable-clone-project
uv add pydantic-ai

# Or using pip and venv
# python -m venv .venv
# source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
# pip install pydantic-ai

Building the AI Agent with Pydantic AI

Create a Python file (e.g., lovable_agent.py) and start defining the agent class.

# lovable_agent.py
import os
import shutil
import tempfile
from typing import List, Set, Optional
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage, UserPromptPart
from pydantic_graph import End
from pydantic_ai.models.openai import OpenAIModel
from openai.types.chat import ChatCompletionMessageParam

# Define template path (assuming a 'template' dir alongside this script)
template_path_base = os.path.dirname(__file__)
template_directory = os.path.join(template_path_base, "template")

class LovableAgent:
    def __init__(self):
        # Note: Ensure you have the necessary API keys set as environment variables
        # e.g., ANTHROPIC_API_KEY
        self.agent = Agent(
            "anthropic:claude-3-5-sonnet-latest", # Or another powerful coding model
            system_prompt=f"""
        You are a coding assistant specialized in building whole new websites from scratch.

        You will be given a basic React, TypeScript, Vite, Tailwind, and Radix UI template and will work on top of that. Use the components from the "@/components/ui" folder when appropriate.

        On the first user request for building the application, prioritize defining the overall application style. Start by modifying the src/index.css and tailwind.config.ts files to set up colors, fonts, and general theme.
        Then, proceed to build the website components and structure. You can call tools in sequence as needed.

        You will be given tools to read files, create files, and update files to carry out your work.

        You CAN access local files using the provided tools. You CANNOT run the application or execute commands, so do NOT suggest doing so.

        <execution_flow>
        1. Understand the user request.
        2. If it's the first build request, update styling files (index.css, tailwind.config.ts).
        3. Call the read_file tool to understand the current state of relevant files before updating or creating new ones.
        4. Build or modify the website components/pages using update_file and create_file tools sequentially.
        5. After completing the requested steps, ask the user for feedback or next steps.
        </execution_flow>

        <files>
        After the user request, the current file structure of the project will be provided within <files/> tags in the prompt. Use this to understand the project layout.
        </files>
        """,
            model_settings={
                "parallel_tool_calls": False, # Enforce sequential tool calls for better control
                "temperature": 0.0,          # For deterministic and focused output
                "max_tokens": 8192,          # Generous token limit for code generation
            },
        )

        self.history: list[ModelMessage] = []
        self.template_path: str = "" # Will be set during processing

        # Define Tools
        @self.agent.tool_plain(docstring_format="google", require_parameter_descriptions=True)
        def read_file(path: str) -> str:
            """Reads the content of a file within the project.

            Args:
                path (str): The relative path to the file to read (e.g., 'src/App.tsx'). Required.

            Returns:
                str: The content of the file, or an error message if not found.
            """
            full_path = os.path.join(self.template_path, path)
            try:
                with open(full_path, "r") as f:
                    return f.read()
            except FileNotFoundError:
                return f"Error: File {path} not found. Double-check the file path relative to the project root."
            except Exception as e:
                return f"Error reading file {path}: {e}"

        @self.agent.tool_plain(docstring_format="google", require_parameter_descriptions=True)
        def update_file(path: str, content: str) -> str:
            """Updates the entire content of an existing file.

            Args:
                path (str): The relative path to the file to update. Required.
                content (str): The full new content to write to the file. Required.

            Returns:
                str: "ok" on success, or an error message.
            """
            full_path = os.path.join(self.template_path, path)
            try:
                # Ensure parent directory exists
                os.makedirs(os.path.dirname(full_path), exist_ok=True)
                with open(full_path, "w") as f:
                    f.write(content)
                return "ok"
            except FileNotFoundError:
                return f"Error: File {path} not found for updating. Use create_file if it's a new file."
            except Exception as e:
                return f"Error updating file {path}: {e}"

        @self.agent.tool_plain(docstring_format="google", require_parameter_descriptions=True)
        def create_file(path: str, content: str) -> str:
            """Creates a new file with the given content. Fails if the file already exists.

            Args:
                path (str): The relative path for the new file to create. Required.
                content (str): The initial content to write to the file. Required.

            Returns:
                str: "ok" on success, or an error message.
            """
            full_path = os.path.join(self.template_path, path)
            try:
                if os.path.exists(full_path):
                    return f"Error: File {path} already exists. Use update_file to modify it."
                # Ensure parent directory exists
                os.makedirs(os.path.dirname(full_path), exist_ok=True)
                with open(full_path, "w") as f:
                    f.write(content)
                return "ok"
            except Exception as e:
                return f"Error creating file {path}: {e}"

    # --- Helper to generate file tree ---
    def _generate_directory_tree(self, path: str, ignore_dirs: Optional[Set[str]] = None) -> str:
        # (Implementation of generate_directory_tree function goes here - see original content for the code)
        # ... see below for the full function code ...
        return generate_directory_tree(path, ignore_dirs)


    # --- Agent processing logic ---
    async def process_user_message(
        self, message: str, template_path: str, debug: bool = False
    ) -> tuple[str, list[ChatCompletionMessageParam]]:
        self.template_path = template_path # Set the context for file tools

        tree = self._generate_directory_tree(template_path)

        user_prompt_with_context = f"""{message}

<files>
{tree}
</files>
"""

        final_response_data = ""
        all_new_messages: list[ModelMessage] = []

        async with self.agent.iter(
            user_prompt_with_context, message_history=self.history
        ) as agent_run:
            next_node = agent_run.next_node # Start the execution graph
            nodes_visited = [next_node]
            while not isinstance(next_node, End):
                # Process tool calls and model responses step-by-step
                next_node = await agent_run.next(next_node)
                nodes_visited.append(next_node)

            if not agent_run.result:
                raise Exception("Agent did not produce a result.")

            # Capture the final text response from the agent
            final_response_data = agent_run.result.data if agent_run.result.data else ""

            # Get all messages generated during this run
            new_messages = agent_run.result.new_messages()

            # Clean up the user prompt part for history brevity
            for msg in new_messages:
                cleaned_parts = []
                for part in msg.parts:
                    if isinstance(part, UserPromptPart) and part.content == user_prompt_with_context:
                         # Replace the long context prompt with the original user message
                        cleaned_parts.append(UserPromptPart(content=message))
                    else:
                        cleaned_parts.append(part)
                msg.parts = cleaned_parts
                all_new_messages.append(msg)

        self.history.extend(all_new_messages) # Add to conversation history

        # Convert messages to a standard format (OpenAI's) for testing/logging
        new_messages_openai_format = await self.convert_to_openai_format(
            all_new_messages
        )

        return final_response_data, new_messages_openai_format

    async def convert_to_openai_format(
        self, messages: list[ModelMessage]
    ) -> list[ChatCompletionMessageParam]:
        # Helper to map Pydantic AI messages to OpenAI's format
        openai_model = OpenAIModel("any") # Dummy model just for conversion method access
        new_messages_openai_format: list[ChatCompletionMessageParam] = []
        for message in messages:
            async for openai_message in openai_model._map_message(message):
                 new_messages_openai_format.append(openai_message)
        return new_messages_openai_format

    @classmethod
    def clone_template(cls) -> str:
        # Creates a temporary copy of the template directory for safe testing
        temp_dir = os.path.join(tempfile.mkdtemp(), "lovable_clone_test")
        shutil.copytree(template_directory, temp_dir, dirs_exist_ok=True)
        # Remove node_modules if copied, to keep the clone lightweight
        shutil.rmtree(os.path.join(temp_dir, "node_modules"), ignore_errors=True)
        shutil.rmtree(os.path.join(temp_dir, ".git"), ignore_errors=True)
        return temp_dir

# --- Directory Tree Generation Helper ---
# (Place the full code for generate_directory_tree, _get_directory_contents,
# and _format_tree functions here, as provided in the original content)
def generate_directory_tree(path: str, ignore_dirs: Optional[Set[str]] = None) -> str:
    """
    Generate a visual representation of the directory structure.
    Args: path: The path to the directory. ignore_dirs: Set of dir names to ignore.
    Returns: A formatted string representing the directory tree.
    """
    if ignore_dirs is None:
        ignore_dirs = {"node_modules", ".git", ".venv", "__pycache__", "dist", "build"}
    root_path = os.path.abspath(os.path.expanduser(path))
    if not os.path.isdir(root_path):
        return f"Error: Path '{path}' is not a valid directory."
    tree_str = ".\n"
    items = _get_directory_contents(root_path, ignore_dirs)
    tree_str += _format_tree(items, "", root_path, ignore_dirs)
    return tree_str

def _get_directory_contents(path: str, ignore_dirs: Set[str]) -> List[str]:
    """ Get sorted directory contents (dirs first). """
    items = []
    try:
        all_items = os.listdir(path)
        dirs = sorted([d for d in all_items if os.path.isdir(os.path.join(path, d)) and d not in ignore_dirs])
        files = sorted([f for f in all_items if not os.path.isdir(os.path.join(path, f))])
        items = dirs + files
    except (PermissionError, FileNotFoundError):
        pass # Ignore directories we can't read
    return items

def _format_tree(items: List[str], prefix: str, path: str, ignore_dirs: Set[str]) -> str:
    """ Recursively format the tree structure. """
    tree_str = ""
    count = len(items)
    for i, item in enumerate(items):
        is_last = i == count - 1
        conn = "└── " if is_last else "├── "
        next_prefix = "    " if is_last else "│   "
        tree_str += f"{prefix}{conn}{item}\n"
        item_path = os.path.join(path, item)
        if os.path.isdir(item_path) and item not in ignore_dirs:
            sub_items = _get_directory_contents(item_path, ignore_dirs)
            if sub_items:
                tree_str += _format_tree(sub_items, prefix + next_prefix, item_path, ignore_dirs)
    return tree_str

Preparing the Base Template

Our agent works by modifying a starting template. Set up a basic project using React, TypeScript, Vite, Tailwind CSS, and optionally a UI library like Radix UI or Shadcn UI.

Follow the official installation guide for Vite with the React+TS template.
Integrate Tailwind CSS into the Vite project.
Optionally, add Shadcn UI following its documentation (name the project folder template and place it next to your lovable_agent.py).
Install dependencies within the template directory:
shell cd template npm install # Test if it runs npm run dev cd ..
This template directory will be the base that the agent modifies.

The Crucial Step: Testing with Scenario

Manually testing the agent is tedious. We’ll use langwatch-scenario (Scenario) for automated testing. It uses another LLM (the “testing agent”) to simulate a user interacting with your agent, evaluating its performance against defined criteria.

Install the necessary testing libraries:

# Using uv
uv add pytest langwatch-scenario pytest-asyncio

# Or using pip
# pip install pytest langwatch-scenario pytest-asyncio

Now, create a test file (e.g., tests/test_lovable_agent.py):

# tests/test_lovable_agent.py
import pytest
from lovable_agent import LovableAgent # Import your agent class
from scenario import Scenario, TestingAgent

# Configure the testing agent (uses an LLM to evaluate your agent)
# Ensure ANTHROPIC_API_KEY (or relevant key for the model) is set
Scenario.configure(testing_agent=TestingAgent(model="anthropic/claude-3-5-sonnet-latest"))

@pytest.mark.asyncio # Mark test as asynchronous
async def test_lovable_clone_basic_website():
    # Create a fresh copy of the template for this test run
    temp_template_path = LovableAgent.clone_template()
    print(f"\n--> Test using template copy at: {temp_template_path}\n")

    # Define an async wrapper for your agent compatible with Scenario
    async def agent_under_test(message: str, context):
        # Instantiate your agent for each interaction turn if needed, or reuse instance
        # For simplicity here, we create a new one, ensuring clean state per turn
        agent_instance = LovableAgent()
        # We pass the *temporary* path to the agent
        response_text, messages_openai_format = await agent_instance.process_user_message(message, temp_template_path)

        # Scenario expects a dictionary, often including the raw messages for evaluation
        return {"response": response_text, "messages": messages_openai_format}

    # Define the test scenario
    scenario = Scenario(
        scenario="User wants to create a simple landing page for their new dog walking startup.",
        agent=agent_under_test, # The wrapper function for your agent
        strategy="""
        1. Send an initial request to generate the landing page structure and basic content.
        2. Follow up with one request to add a specific new section (e.g., 'Services Offered').
        3. Evaluate the final state and provide your verdict based on the criteria.
        """,
        success_criteria=[
            "Agent correctly understood the request for a dog walking landing page.",
            "Agent first modified styling files (index.css or tailwind.config.ts).",
            "Agent used file tools (read, update, create) appropriately.",
            "Agent generated relevant React components (e.g., in src/App.tsx or new files).",
            "Agent successfully added the requested 'Services Offered' section in the follow-up.",
            "The final generated code seems plausible for a simple landing page.",
        ],
        failure_criteria=[
            "Agent refused to perform the task or claimed inability to use tools.",
            "Agent produced placeholder code or incomplete implementations.",
            "Agent failed to modify the core App component (e.g., src/App.tsx).",
            "Agent hallucinated file paths or tool usage.",
            "Agent failed to add the requested new section.",
        ],
        max_turns=5 # Limit the conversation length
    )

    # Run the scenario
    result = await scenario.run()

    print(f"\n--> Test finished. Check generated files at: {temp_template_path}\n")

    # Assert that the scenario passed based on the LLM evaluation
    assert result.success, f"Scenario failed: {result.final_verdict}"

Running the Test and Viewing Results

Execute the test using Pytest:

# Make sure your API keys (e.g., ANTHROPIC_API_KEY) are exported in your environment
pytest tests/test_lovable_agent.py -s # -s shows print statements

You’ll observe the testing agent interacting with your LovableAgent in the console output, simulating the conversation based on the defined strategy. The testing agent will assess whether the success criteria are met and failure criteria are avoided.

After the test completes (hopefully successfully!), navigate to the temporary directory path printed in the output. You can inspect the generated files and even run the website:

cd /tmp/path-to-your-temp-lovable-clone...
npm install
npm run dev

This allows you to visually verify the output of your agent.

Why Testable AI Agents Matter

Building AI agents that interact with complex systems like codebases requires robust testing. Using a framework like Scenario allows you to:

Define Expected Behavior: Clearly state what success and failure look like for specific tasks.
Automate Evaluation: Leverage an LLM to assess your agent’s performance against complex criteria, simulating real user interaction.
Catch Regressions: Quickly identify when changes to your agent’s prompt, tools, or logic break previously working functionality.
Iterate with Confidence: Make improvements knowing you have a safety net to verify behavior.

Feel free to experiment! Modify the system prompt in LovableAgent, change the success/failure criteria in the test, or add new scenarios (e.g., asking the agent to use specific libraries or APIs). This testable setup provides a foundation for building more reliable and sophisticated AI agents.

Leverage AI Agent Expertise with Innovative Software Technology

Developing, refining, and rigorously testing AI agents, especially for complex tasks like automated code generation and website creation, requires specialized expertise. At Innovative Software Technology, we excel in harnessing the power of Large Language Models and AI agent frameworks to build custom, reliable solutions. Whether you need to integrate AI agents into your existing workflows, develop bespoke automated systems, or ensure the robustness of your AI applications through comprehensive testing strategies like those discussed here, our team can help. We provide end-to-end services, from prompt engineering and tool development for LLMs like Claude and Gemini to implementing sophisticated testing scenarios using frameworks such as Scenario. Partner with Innovative Software Technology to build cutting-edge, dependable AI agents that drive efficiency and innovation for your business.