Revolutionize Web Automation with Amazon Nova Act: AI-Powered Browser Control
In our daily digital lives, we constantly interact with websites – filling forms, comparing data across tabs, managing emails, or scheduling appointments. While incredibly useful, many of these web-based tasks are repetitive and time-consuming. Automating them presents challenges; manual methods are tedious, and traditional automation scripts are often brittle, breaking easily when websites update their layout or structure. Existing solutions frequently require deep technical knowledge and ongoing maintenance.
Introducing Amazon Nova Act, a groundbreaking AI model currently in research preview from Amazon Artificial General Intelligence (AGI). Available via the Amazon Nova Act SDK (currently in the US), this technology is designed to understand instructions and perform actions directly within a web browser, mimicking human interaction. Instead of relying on fragile backend integrations or complex selectors, the SDK navigates websites dynamically – clicking buttons, completing forms, and extracting information just like a person would.
How Amazon Nova Act Streamlines Web Tasks
The Amazon Nova Act SDK empowers developers and users to automate real-world workflows across virtually any website, regardless of whether it offers structured programmatic access (like APIs). It uniquely combines natural language instructions, Python scripting capabilities, and the power of Playwright automation within a single interface. This hybrid approach simplifies the process of building, testing, and refining website automation sequences. Furthermore, it supports running multiple workflows concurrently, dramatically reducing execution time for repetitive tasks far beyond human speed.
This innovative approach opens doors to simplifying numerous use cases:
- Data Aggregation: Gathering information from multiple sources for tasks like on-call engineering support or market research.
- Process Automation: Automating multi-system processes like submitting leave requests or expense reports.
- Marketing Operations: Streamlining the creation and management of marketing campaigns across different platforms.
- Quality Assurance: Implementing robust automated testing for web application interfaces and functionality.
Built upon the strong multimodal intelligence and agentic workflow capabilities of Amazon Nova foundation models, Amazon Nova Act has received specialized training for planning and executing multi-step actions within browsers. It excels in reliability for atomic actions (like finding an item or clicking a specific button) and demonstrates leading performance on perception benchmarks such as ScreenSpot and GroundUI Web.
The SDK’s design allows for a flexible mix of natural language commands and explicit code. This makes it easier to dissect complex workflows into manageable, reliable steps, with the option to fall back on conventional browser automation techniques when precision is paramount, all managed through a unified programming interface.
Getting Access to Amazon Nova Act
As Amazon Nova Act is an Amazon AGI research preview, it operates separately from standard AWS services and SDKs. Access requires a unique API key obtained through nova.amazon.com
. Users can sign in with their Amazon account to explore Amazon Nova foundation models and request access to the “Act” capability within the Labs section. Access may involve joining a waitlist; confirmation will be sent via email, after which the API key can be generated from the Lab section.
Using the Amazon Nova Act SDK: A Practical Example
Let’s illustrate how Amazon Nova Act functions with a common scenario: finding rental properties and determining their commute times. Manually, this involves searching a site like Zumper for suitable apartments and then individually checking biking distances to a train station on Google Maps for each promising listing.
This process can be automated using the following Python script:
from concurrent.futures import ThreadPoolExecutor, as_completed
import fire
import pandas as pd
from pydantic import BaseModel
from nova_act import NovaAct
class Apartment(BaseModel):
address: str
price: str
beds: str
baths: str
class ApartmentList(BaseModel):
apartments: list[Apartment]
class CaltrainBiking(BaseModel):
biking_time_hours: int
biking_time_minutes: int
biking_distance_miles: float
def add_biking_distance(apartment: Apartment, caltrain_city: str, headless: bool) -> CaltrainBiking | None:
# Initialize NovaAct client for Google Maps
with NovaAct(
starting_page="https://maps.google.com/",
headless=headless,
) as client:
# Instruct the AI to find biking directions
client.act(
f"Search for {caltrain_city} Caltrain station and press enter. "
"Click Directions. "
f"Enter '{apartment.address}' into the starting point field and press enter. "
"Click the bicycle icon for cycling directions."
)
# Extract the time and distance data using a defined schema
result = client.act(
"Return the shortest time and distance for biking", schema=CaltrainBiking.model_json_schema()
)
# Validate and parse the result
if not result.matches_schema:
print(f"Invalid JSON {result=}")
return None
time_distance = CaltrainBiking.model_validate(result.parsed_response)
return time_distance
def main(
caltrain_city: str = "Redwood City",
bedrooms: int = 2,
baths: int = 1,
headless: bool = False,
min_apartments_to_find: int = 5,
):
all_apartments: list[Apartment] = []
# Initialize NovaAct client for Zumper
with NovaAct(
starting_page="https://zumper.com/",
headless=headless,
) as client:
# Instruct the AI to search and filter apartments
client.act(
"Close any cookie banners. "
f"Search for apartments near {caltrain_city}, CA, "
f"then filter for {bedrooms} bedrooms and {baths} bathrooms. "
"If you see a dialog about saving a search, close it. "
"If results mode is 'Split', switch to 'List'. "
)
# Scroll and extract apartment data until enough are found
for _ in range(5): # Scroll down a max of 5 times.
result = client.act(
"Return the currently visible list of apartments", schema=ApartmentList.model_json_schema()
)
if not result.matches_schema:
print(f"Invalid JSON {result=}")
break
apartment_list = ApartmentList.model_validate(result.parsed_response)
all_apartments.extend(apartment_list.apartments)
if len(all_apartments) >= min_apartments_to_find:
break
client.act("Scroll down once")
print(f"Found apartments: {all_apartments}")
apartments_with_biking = []
# Use a thread pool to process commute times in parallel
with ThreadPoolExecutor() as executor:
future_to_apartment = {
executor.submit(add_biking_distance, apartment, caltrain_city, headless): apartment
for apartment in all_apartments
}
# Collect results as they complete
for future in as_completed(future_to_apartment.keys()):
apartment = future_to_apartment[future]
caltrain_biking = future.result()
if caltrain_biking is not None:
apartments_with_biking.append(apartment.model_dump() | caltrain_biking.model_dump())
else:
apartments_with_biking.append(apartment.model_dump())
# Process and display the combined data
apartments_df = pd.DataFrame(apartments_with_biking)
closest_apartment_data = apartments_df.sort_values(
by=["biking_time_hours", "biking_time_minutes", "biking_distance_miles"]
)
print()
print("Biking time and distance:")
print(closest_apartment_data.to_string())
if __name__ == "__main__":
fire.Fire(main)
When initializing the NovaAct
client, parameters like the starting URL (starting_page
) and whether to run the browser visibly or headlessly (headless
) are provided. The core interaction happens via the act()
method, which accepts natural language instructions. These instructions can be simple commands or incorporate variables from the script:
client.act("Close any cookie banners.")
or
client.act(f"Search for apartments near {caltrain_city}, CA")
To run this script, first install the necessary libraries:
pip install nova-act pandas pydantic fire
Set the API key as an environment variable:
export NOVA_ACT_API_KEY=<YOUR_API_KEY>
Executing the script initiates the automated process. Nova Act interacts with Zumper to find apartments matching the criteria, then launches parallel browser instances to query Google Maps for biking commute times for each apartment address. Finally, it compiles and presents the combined data, sorted by commute time – achieving in minutes what would take significantly longer manually.
This example highlights several key capabilities:
- Natural Language Commands: The
act()
method interprets plain English instructions into browser actions. - Structured Data Extraction: Nova Act extracts specific data points and returns them in a structured format (JSON), easily parsed using libraries like Pydantic.
- Parallelization: Running multiple
NovaAct
clients concurrently (usingThreadPoolExecutor
) drastically speeds up data collection from various pages or sites. - Hybrid Programming: The script seamlessly blends natural language AI interaction with standard Python logic for control flow and data processing.
Key Considerations
- Availability: Amazon Nova Act is currently a research preview from Amazon AGI, available in the US. It is expected to be integrated into Amazon Bedrock in the future.
- Cost: During the research preview phase, using the Amazon Nova Act SDK is free.
- Compatibility: The SDK supports macOS and Ubuntu operating systems with Python 3.10 or later.
- Best Practices: For optimal reliability, break down complex tasks into smaller steps, each corresponding to roughly 3-5 browser actions per
act()
call. Implement robust error handling within the Python script. - Authentication: For tasks requiring logins, the SDK can be configured to use a local Chrome browser installation, leveraging existing logged-in sessions.
- Experimental Nature: As an early release, the SDK is experimental and may occasionally make mistakes.
Ready to explore the future of web automation? Getting started with the Amazon Nova Act SDK provides access to powerful generative AI capabilities for automating web interactions reliably and efficiently, without needing intricate knowledge of specific website structures or APIs. Visit nova.amazon.com
to request access.
Leverage AI Web Automation with Innovative Software Technology
At Innovative Software Technology, we specialize in harnessing cutting-edge technologies like Amazon Nova Act to transform your business operations. Is your team bogged down by repetitive web-based tasks, manual data extraction, or complex multi-site workflows? We can help design and implement robust AI web automation solutions tailored to your specific needs. By leveraging AI-driven browser automation, we streamline processes, ensure accurate data extraction services, and significantly enhance operational efficiency. Partner with Innovative Software Technology for expert process automation consulting and custom AI development, unlocking new levels of productivity and freeing up your valuable human resources for strategic initiatives. Contact us to explore how intelligent automation can revolutionize your workflows.