Itinerary Modification Task

Updated 22 January 2026

Itinerary Modification Task is the automated or semi-automated revision of travel plans to address shifting user preferences and external disruptions.
The methodology involves structured editing operations, dynamic optimization, and LLM-guided pipelines to efficiently update itineraries under various constraints.
Applications span leisure travel and transportation systems, evaluated using metrics such as modification accuracy, responsiveness, and adaptability.

The itinerary modification task encompasses the automated and/or semi-automated revision of an existing travel plan in response to user preference changes or external disruptions, while preserving feasibility, user satisfaction, and system-level constraints. Contemporary research identifies this as a central challenge in real-world travel-assistance systems and benchmarks, given the frequency and diversity of required modifications. Cutting across domains from leisure travel to transportation operations, itinerary modification blends elements of structured editing, dynamic optimization, LLM orchestration, and objective evaluation. The following sections systematize definitions, modeling frameworks, algorithms, datasets, and empirical findings within this area.

1. Formal Problem Definitions and Task Variants

Let $\mathcal{P}$ denote the universe of points-of-interest (POIs). A base itinerary $i = [p_1, p_2, ..., p_k]$ , with each $p_j \in \mathcal{P}$ , is associated with attributes including category $c$ , spatial coordinates (lat, lon), and popularity. The central itinerary modification task is, given $i$ and external intent (preferences or disruptions), to produce a revised $i'$ that achieves objectives such as preference alignment or disruption resolution.

Three atomic operations define the edit space (Huang et al., 15 Jan 2026):

$o_{\mathrm{add}}$ : Insert POI $q \in \mathcal{P} \setminus i$ at some position.
$o_{\mathrm{replace}}$ : Swap $p_j \in i$ with $i = [p_1, p_2, ..., p_k]$ 0.
$i = [p_1, p_2, ..., p_k]$ 1: Remove $i = [p_1, p_2, ..., p_k]$ 2 from $i = [p_1, p_2, ..., p_k]$ 3.

Modification “intents” are categorized into:

$i = [p_1, p_2, ..., p_k]$ 4: Disrupt popularity distribution,
$i = [p_1, p_2, ..., p_k]$ 5: Disrupt spatial distance distribution,
$i = [p_1, p_2, ..., p_k]$ 6: Disrupt category diversity (Huang et al., 15 Jan 2026).

In the disruption-aware context, given original itinerary $i = [p_1, p_2, ..., p_k]$ 7, disruption event $i = [p_1, p_2, ..., p_k]$ 8 (type, severity $i = [p_1, p_2, ..., p_k]$ 9, timestamp, details) and user profile $p_j \in \mathcal{P}$ 0 (including tolerance), the objective is to output $p_j \in \mathcal{P}$ 1 such that $p_j \in \mathcal{P}$ 2 resolves $p_j \in \mathcal{P}$ 3, aligns with $p_j \in \mathcal{P}$ 4, and maximizes a weighted utility function over intent preservation, responsiveness, and adaptability (see Section 5) (Karmakar et al., 24 Oct 2025).

In transportation, the modification task generalizes to rescheduling and re-circulating trips and resources post-disruption. Here, itinerary refers to system-level vehicle assignments and routing, recast as an event-activity network under integer programming models (Fekete et al., 2011).

2. Architectures and Algorithmic Paradigms

LLM-Oriented Editing Pipelines

Recent systems such as Roamify and Vaiage instantiate itinerary modification as LLM-guided pipeline architectures. These typically integrate:

Upstream knowledge extraction (web-scraping, NLP, summarization),
Delta editing (“before-after” prompting),
Structured representations (JSON schemas for daily schedules, attraction registries),
Iterative feedback via both textual and map-based interactions (Udandarao et al., 10 Mar 2025, Liu et al., 16 May 2025).

Modification requests (preference/tolerance/configuration changes or external events) are normalized to structured signals (“add”, “remove”, “swap” POIs or update attributes), which are processed through multi-agent or modular LLM-driven optimization and postprocessing (Liu et al., 16 May 2025).

For transportation networks, an event–activity integer program is built to maximize recovered trips while reassigning resources (vehicle circulations) and enforcing feasibility under disrupted conditions (Fekete et al., 2011).

Hybrid Algorithmic Schemes

Hybrid frameworks combine evolutionary search with LLM creativity and domain knowledge (GA-LLM). Travel plans are encoded as genotype structures (JSON trees), genetic operators (crossover/mutation) are LLM-guided, and fitness functions integrate soft utility and hard constraint penalties (Shum et al., 9 Jun 2025). This enables exploration of the solution space beyond greedy or single-pass prompting.

Corrective Postprocessing (Guardrails)

LLM-generated itineraries often lack robust spatiotemporal consistency. Guardrail frameworks such as Iti-Validator systematically detect and correct violations using rule-based temporal checks (no-overlap, min/max transit, min stay), external flight/time APIs, and deterministic adjustment (Gadbail et al., 4 Sep 2025).

3. Data Generation, Benchmarks, and Evaluation

Dataset Synthesis and Annotated Corpora

The iTIMO dataset operationalizes modification as an intent-driven perturbation: given $p_j \in \mathcal{P}$ 5 and intent $p_j \in \mathcal{P}$ 6, a perturbed itinerary $p_j \in \mathcal{P}$ 7 is synthesized by atomic edit operations, with stringent hybrid metrics enforcing attribute-level distribution shifts (Huang et al., 15 Jan 2026). This process relies on LLM-based composition, supplemented by function-calling (numerical APIs for distances/diversity) and memory modules (for position/POI diversity).

TripTide benchmark systematically evaluates LLMs under realistic disruptions, stratifies test cases by disruption category and severity, and incorporates user profile-based tolerance (Karmakar et al., 24 Oct 2025).

Evaluation Metrics

General Modification

Modification Accuracy (Mod): Fraction of outputs matching correct operation and POI(s).
All-Pass Rate (APR): Ensures only specified attribute distributions change.
Soft metrics: Coverage, diversity, personalization uplift, satisfaction score (Udandarao et al., 10 Mar 2025).

Disruption-Aware

Preservation of Intent: Jaccard index over static constraints; sequential consistency.
Responsiveness: Proportion of cases with successful, disruption-resolving edits.
Adaptability: Semantic (BERT embedding drift), spatial (distance-distortion), sequential (edit distance) drift metrics (Karmakar et al., 24 Oct 2025).

Transportation

Trip recovery: Fraction of trips executed (even delayed) post-modification.
Resource assignment: Consistency of vehicle circulations under constraints.

Empirical results highlight:

DELETE is typically best solved; ADD/REPLACE are challenging for vanilla LLMs (Huang et al., 15 Jan 2026).
RAG (retrieval-augmented generation) and SFT (supervised fine-tuning) yield moderate to substantial performance boosts (Huang et al., 15 Jan 2026).
In disruption scenarios: Responsiveness and intent preservation degrade with increasing trip length and disruption severity (Karmakar et al., 24 Oct 2025).

4. Optimization and Constraint Handling

Structured Preference Encodings

User and system preferences are encoded as normalized weight vectors (e.g., genre sliders $p_j \in \mathcal{P}$ 8), which drive scoring and replacement logic. Time and budget constraints are incorporated as hard limits within the modification process (Udandarao et al., 10 Mar 2025, Liu et al., 16 May 2025).

Route and Assignment Optimization

Given updated context or feedback $p_j \in \mathcal{P}$ 9, graph-based representations such as TravelGraph capture nodes (POIs, constraints) and edges (temporal/causal relations, costs). Multi-agent planners then solve assignment and routing problems via constrained optimization:

$c$ 0

subject to day- and slot-level assignment constraints, activity durations, and conflict edges (Liu et al., 16 May 2025).

Event–activity integer programs generalize to rescheduling under resource and operational conflicts (Fekete et al., 2011).

Temporal and Spatial Consistency Enforcement

Validator modules enforce constraints such as:

No schedule overlap: Ensure $c$ 1,
Realistic minimum and maximum transit via external API,
Minimum city stay durations, with rule-based corrections applied as needed to restore feasibility (Gadbail et al., 4 Sep 2025).

5. LLM Prompting, Editing Strategies, and Interactive Feedback

Modification logic employs both zero-shot and few-shot prompting. Templates encode before–after pairs, modifications in JSON, and explicit constraint summaries. LLM calls execute not only initial plan generation but also delta editing (modification) by integrating explicit user input, preference updates, and sample exemplars (Udandarao et al., 10 Mar 2025, Karmakar et al., 24 Oct 2025, Huang et al., 15 Jan 2026). Prompts are grounded with attraction schemas, summaries, and current schedules; outputs are validated and, if necessary, post-processed or repaired.

Interactive systems support modification via natural language and direct manipulation (e.g., map-based slot adjustment). Agent-based orchestration ensures that user actions propagate through TravelGraph, triggering real-time re-optimization, re-ranking, and re-scheduling with all constraints and dependencies updated (Liu et al., 16 May 2025).

6. Empirical Findings and Systematic Limitations

Recent studies converge on several empirical patterns:

LLMs (e.g., GPT-4o, Qwen2.5‐7B-Instruct) excel in semantic and sequential consistency, but spatial and hard-constraint preservation degrade with longer, more complex plans (Karmakar et al., 24 Oct 2025, Huang et al., 15 Jan 2026).
All-pass and modification accuracy on simple DELETE tasks can reach 75–85%; for ADD/REPLACE, zero-shot LLMs perform significantly worse (<40%), especially without tool or memory augmentation (Huang et al., 15 Jan 2026).
RAG and SFT (especially Full Fine-Tuning for limited data regimes, LoRA for larger datasets) produce notable but not always additive improvements; prompt-format alignment is critical (Huang et al., 15 Jan 2026).
Deterministic validation/correction (e.g., Iti-Validator) achieves 100% feasibility post-processing, but does not correct for semantic misalignment or user profile inconsistency (Gadbail et al., 4 Sep 2025).
User studies emphasize the need for real-time, low-latency, explainable, and easily modifiable plans—highlighting reduced effort and increased personalization (Udandarao et al., 10 Mar 2025).

Identified limitations include LLMs' difficulties with:

Precise anchor-point selection for insertions/replacements,
Multi-attribute constraint reasoning without external calculator/function-call modules,
Inductive bias towards perturbing early/late slots (position bias),
Hallucination or drift in multi-hop revision scenarios.

7. Advanced Topics and Future Directions

Emerging directions include:

Data-driven user simulators for user-centric perturbations and modification (Huang et al., 15 Jan 2026).
Integration of fine-grained temporal constraints (visit durations, opening hours) and real-world map/transit layers in both plan synthesis and modification (Karmakar et al., 24 Oct 2025, Liu et al., 16 May 2025).
Multi-stage RL or feedback-enhanced LLMs to improve adherence on complex, dynamic editing tasks.
Expansion of datasets (iTIMO, TripTide) to new geographies, languages, and travel modalities.
Modular compositions of LLMs and symbolic agents for explainability and robust constraint satisfaction at scale.

The itinerary modification task has achieved systematic formalization, robust benchmarking, and clear identification of critical limitations, but ensuring adaptability, constraint satisfaction, and high-level semantic stability under open-world perturbations remains an open technical frontier (Huang et al., 15 Jan 2026, Karmakar et al., 24 Oct 2025, Liu et al., 16 May 2025, Udandarao et al., 10 Mar 2025).