Changelog
Follow up on the latest improvements and updates.
RSS
Date:
May 1, 2026Incident Window:
07:30 - 10:25 AESTSystems Affected:
VisualCare web application (app.visualcare.com.au)Summary
At approximately 7:30 AEST, some providers experienced intermittent issues connecting to the Visualcare web application. Worker and Participant access was not affected through Visualcare mobile applications, as those users connect through a separate domain.
Investigation identified that affected administrators’ browsers were holding outdated connection information that pointed to a web server which was no longer available. We adjusted the underlying configuration to prevent recurrence and worked with our customer teams to guide impacted administrators through a quick browser refresh to restore access.
Impact
Administrators across some providers were intermittently unable to access the Visualcare web application (app.visualcare.com.au) during the incident window. Worker and Participant access was unaffected, and no data was lost or compromised. Once administrators completed a browser refresh, access was fully restored.
Timeline
- 07:30: First reports received by the Engineering Team
- 07:30 - 09:30: Triage and investigation
- 09:30: Resolution guidance was shared with customer teams
- 09:30 - 10:25: Customer teams supported administrators through the resolution steps as tickets came in
- 10:25: Email notification sent out to all customers with resolution guidance
Root Cause
The Visualcare web application uses load balancers to distribute administrator traffic across multiple web servers. To keep each administrator’s session consistent, browsers temporarily remember which web server they’re connected to.
Overnight, one of those web servers became unavailable. Because the browser-side connection information was set to refresh once per day rather than more frequently, some administrators’ browsers continued attempting to reach the unavailable server instead of being routed to a healthy one. This is what caused the intermittent connection failures.
We are taking action to address this with improved infrastructure. (See Follow-Up Actions).
Resolution
To restore access, affected administrators needed to refresh their browser’s connection information. Depending on the browser, this was achieved by a hard refresh, clearing the browser cache, or opening the application in a private/incognito window.
To prevent recurrence with the existing infrastructure, we shortened the refresh interval on the underlying configuration so that any future change to a web server is picked up by browsers within minutes rather than up to a day.
Follow-Up Actions
Short Term
Audit related configuration to confirm appropriate refresh intervals are in place across the platform.
Medium Term
Containerise the Visualcare web application and API into a new production environment (already scheduled for next week). This work removes the dependency on the load balancing pattern that caused this incident, meaning this class of issue cannot occur in the new environment.
Long Term
Continue investing in platform monitoring and resilience to detect and prevent issues of this nature earlier.
Date:
29 April 2026Incident Window:
10:20 - 15:35 AESTSystems Affected:
Database cluster, downstream application servicesSummary
At approximately 10:20 AEST, the system experienced a rapid spike in database connections, leading to resource exhaustion and degraded application performance. A failover at 10:40 temporarily alleviated the issue.
A second, more severe spike occurred at 14:00, again resulting in database exhaustion. Investigation identified a recently released query related to Commonwealth Unspent Funds as the root cause. The query was executing at high volume due to backlog processing and contained inefficient subqueries.
A hotfix was deployed at 15:35, resolving the issue.

Impact
- Intermittent application degradation and timeouts
- Elevated database connection usage leading to exhaustion
- Reduced system responsiveness during spike windows
- Potential delays in provider statement generation
Timeline
- 10:20: Initial spike in database connections observed
- 10:40: Database failover performed; connection levels stabilised
- 10:40–14:00: Investigation into root cause underway
- 14:00: Second spike; database connections exhausted again
- ~14:10: Problem query identified
- ~14:30–15:30: Query analysis and remediation work
- 15:35: Hotfix deployed; connection usage returns to normal
Root Cause
A query introduced last week for Support At Home contained two unbounded subqueries.This resulted in:
- High memory and CPU overhead per execution
- Limited execution plan selection under load
- Amplified cost when executed concurrently
The issue was triggered by a backlog of statements queries which resulted in an increase in database connections and subsequent exhaustion.
Resolution
- Identified and analysed the problematic query
- Implemented a hotfix to optimise/remove unbounded subqueries
- Reduced per-query load and execution cost
- Deployment at 15:35 resolved connection exhaustion
Blameless Root Cause Statement
The incident was caused by a query design that resulted in large backlog processing. When triggered at scale, it resulted in database load and connection exhaustion. Improvements are required in query design standards, load validation, and bulk processing controls.
What we are doing about it
Last year Visualcare kicked off an infrastructure upgrade. Key updates include containerisation and database sharding. In recent months we have been piloting this with a key partner to assess performance and reliability. This pilot has been successful and is being pushed live in early May. Once complete we will be rolling out these improvements to support improved reliability and performance.
The duration of incidents within the last 30 days has resulted in an uptime of 99.2%.
Overview
On 16 March, 23 March, 9 April and 10 April (outlined in red in the L90 image below), vCore and other services experienced performance degradation, including elevated response times.
Despite normal CPU utilisation on the database (~50%) and healthy API indicators, the system exhibited:
- High request latency
- Increased database commit latency
- Application-level slowness
A controlled
Aurora failover (reader → writer promotion)
restored system performance, resulting in the best observed performance baseline in recent periods.L90 of Cluster Connections

The graph above shows the last 90 days of concurrent database connections. It can be seen that spikes in connection count increased gradually until hitting a critical point mid-last-week which caused a feedback loop of sustained connections, resulting in the performance degradation. Note the total flatline of connections at the far right of the graph (outlined in green), indicating that connections are not piling up and that the failover solution has eliminated stale resources and contentions.
Previous performance issues (notably Monday the 16th and Monday the 23rd of April) are also outlined in red.
Database connections from early Thursday

Latency and Load during degradation

Latency and Load reduction following failover
(note: AWS metrics were wiped on failover. The latency and load peaks shown above represent the average load for the previous 2 days)
Database connection stacking from before, during, and after the incident

Impact
- User Impact:
- Slow page loads across the web application
- Intermittent failures/timeouts on user actions
- Business Impact:
- Degraded user experience
- Increased operational load during incident response
- Risk to customer trust due to instability
Detection
Elevated response times reported via application monitoring
Aurora metrics indicated:
- Increased commit latency
- No corresponding spike in CPU or memory utilisation
Apache metrics showed:
- Increased request duration
- Worker saturation symptoms
Timeline
- ~09:00 Apr 9 Performance degradation begins
- ~10:00 Elevated latency observed in application
- ~10:00-10:30 Database metrics reviewed (CPU normal, latency elevated)
- 11:00 Apache / application restarts attempted (no improvement)
- 12:00 Query analysis identifies high-cost UPDATE with full table scan
- 12:30 Indexes added to mitigate query inefficiency
- 13:00 Connection count to database began to reduce
- 14:00 Performance degraded again
- 09:00 Apr 10 Performance remained degraded
- 14:00 Aurora failover initiated (reader promoted to writer)
- 14:05 Immediate restoration of performance
- ~23:00 Database instance size increased
What Happened (Technical Summary)
The system entered a degraded state characterised by
high database commit latency and request blocking
, despite moderate resource utilisation.Investigation revealed:
- A high-frequency UPDATE query performing a full table scan (~800k rows) due to missing indexing
- Resulting in lock contention and transaction queuing
- Accumulation of long-lived or blocked transactions
- Increasing contention within InnoDB internal structures
Although indexing improvements were applied, the database remained in a degraded state due to
residual transactional and locking contention
.A failover reset:
- Active connections
- Open transactions
- Lock queues
- InnoDB internal state
- Various other caches
This immediately restored normal performance.
Root Cause
Primary Root Cause
A high-frequency database update query executed without an appropriate index, resulting in:
- Full table scans
- Excessive row-level locking
- Transaction contention under load
Contributing Factors
1. Transaction and Lock Accumulation
- Blocked and queued transactions accumulated over time
- Lock contention propagated across unrelated queries due to shared resources
2. Connection Management Characteristics (PHP + Apache Prefork)
- High number of concurrent database connections
- Long-lived connections increasing contention footprint
3. InnoDB State Degradation Under Contention
- Internal structures (lock queues, undo logs, buffer pool efficiency) degraded under sustained load
- System did not self-recover after contention was introduced
4. Lack of Early Detection Signals
No alerting on:
- Commit latency
- Lock wait time
- Long-running transactions
Issue was detected only after user-visible degradation
5. Delayed Recovery Without Reset
- Restarting application layers (Apache/PHP) did not clear database-level contention
- Only a database failover (hard reset of state) resolved the issue
Why It Affected the Entire System
Although the triggering query targeted a specific table, the impact was systemic due to:
- Shared InnoDB resources (buffer pool, lock manager)
- Transaction queue contention affecting unrelated queries
- Connection pool saturation at the application layer
- Increased commit latency impacting all write operations
Resolution
Immediate mitigation achieved via:
- Aurora failover (reader promoted to writer)
Performance returned to baseline immediately after failover
Lessons Learned
- Moderate CPU utilisation does not indicate database health
- Commit latency is a critical early warning signal
- Database engines can enter degraded states that do not self-recover
- Failover acts as a reset, not a root cause fix
Follow-Up Actions
Short Term
Confirm all high-frequency queries are properly indexed
Enable and review slow query logging (lower threshold temporarily)
Monitor and alert on:
- Commit latency
- Lock wait time
- Active transactions (innodb_trx)
Add visibility into connection counts and states
Medium Term
Review connection management strategy (reduce long-lived connections where possible)
Add dashboards for:
- Transaction age
- Lock contention
- Threads running vs connected
Long Term
● Evaluate architectural changes to reduce high-frequency write contention
● Introduce backpressure or rate limiting on heavy write paths
● Consider read/write isolation improvements or workload partitioning
● Formalise database failover as a controlled operational response (not primary mitigation)
Blameless Summary
This incident was caused by a combination of:
- An inefficient query pattern under load
- Insufficient observability into database contention signals
- Expected but unmanaged behaviour of the database under sustained transactional pressure
No single action or individual directly caused the incident.
The system behaved in line with its current design and constraints.
Incident History (Last 90 Days)

* An Outage indicates that the system was completely inaccessible. Performance Degradation indicates that while the system was slow, it was still accessible and most work could be done, albeit at a less efficient rate.
The duration of actual outages within the last 90 days results in an uptime of 99.86%
Incident Summary
On Monday 16 March 2026, Visualcare experienced a major service disruption affecting API availability.
The incident was triggered by inefficient database query behaviour within a background worker process responsible for form-related data processing. This resulted in a surge of long-running queries that exhausted available database connections.
As database resources became constrained, API requests were unable to complete, leading to application worker saturation and instability across API nodes, causing significant slowdowns and incomplete queries in both the Visualcare Web Application and Worker Mobile App.
While initial mitigation actions restored partial functionality, the underlying database pressure persisted, resulting in repeated instability until a controlled recovery was completed.
Service access was fully restored at ~13:11 AEST at a reduced speed and the system resumed to normal behaviour at ~14:22 AEST as database pressure subsided.
Impact
Customer Impact
- Intermittent failures and timeouts when accessing the platform
- Slow response times and request timeouts
- Periods where the platform was unavailable
Duration
- Start:~09:56 AEST
- Resolved:~13:11 AEST
- Total duration:~3 hours 15 minutes
What Happened
A background worker responsible for processing form data executed queries that scaled poorly under certain data conditions, resulting in significantly longer execution times than expected.

As these queries accumulated:
- Database connections became heavily utilised
- API requests began queuing while waiting for available connections
- Application workers became saturated handling blocked requests
- API nodes became unstable under sustained load
This, combined with elevated system load at the time, accelerated database resource exhaustion, resulting in request backlogs, application worker saturation, and progressive service degradation.
As API nodes became increasingly unstable, overall platform performance deteriorated significantly. Requests were delayed or failed as application processes remained blocked waiting for database access.

Initial recovery actions (traffic redistribution and application restarts) provided only temporary relief, as they did not address the underlying database load:
The worker process continued generating high database activity
Resource contention rapidly reoccurred after each recovery attempt
This resulted in a
repeating cycle of degradation and partial recovery
, significantly extending the duration of the incident 
Detection
The issue was initially identified through customer reports, followed by internal validation of:
- API responsiveness degradation
- Elevated database connection usage
- Application instability
Gap Identified
At the time of the incident, there were limited proactive alerts for:
- Database connection saturation
- Long-running query thresholds
Resolution
Service was restored through:
- Terminating long-running queries across our database shards.
- Controlled recovery of application processes across API nodes
- Careful management of traffic during recovery to prevent recurrence
- Allowing the database load to return to normal operating levels
Once database pressure was reduced and application services stabilised, normal service resumed.
Root Cause
Inefficient database query behaviour in a background worker process led to sustained resource consumption, exhausting database connections and causing cascading failure across API services.
What We’re Improving
We are implementing several improvements to prevent recurrence and strengthen system resilience:
- Workload Protection
- Introduce safeguards to prevent excessive resource consumption from any single workload
- Improve isolation of database usage across different request types
- Query Optimisation & Limits
- Optimise form-related query patterns
- Enforce execution time limits on database queries
- Query Controls
- Detect and manage long-running database activity more proactively
- Observability & Alerting
- Add alerts for:
- Database connection utilisation
- Query execution duration
Closing Statement
We recognise the impact this incident had and take full responsibility for the disruption.
This event has led to clear improvements in how we:
- Protect shared system resources
- Detect abnormal behaviour earlier
- Maintain stability under load
These changes are already planned and underway to ensure a more resilient and reliable platform moving forward.
Summary
On 12 January 2026, Visualcare experienced a Priority 0 (P0) service incident affecting the Mobile API and related services. The incident was triggered by a sudden and sustained surge of external request traffic, which placed unexpected pressure on backend systems and led to service unavailability.
Core services were restored by 1:35pm AEDT, with degraded performance continuing until 2:25pm AEDT, after which normal service levels were fully re-established.
Incident Classification
- Priority: P0
- Detected: 12:45pm AEDT
- Declared: 12:47pm AEDT
- Stable: 1:35pm AEDT
- Normal: 2:25pm AEDT
Customer Impact
During the incident window:
- The Mobile APIwas unavailable or intermittently unresponsive
- Some customers experienced timeouts or slow responses in connected Visualcare services
- During the recovery phase, services were available but may have exhibited degraded performance
There was no data loss, no unauthorised access, and no impact to data integrity.
What Happened
The incident was caused by a rapid increase in external request volume directed at the Mobile API. The traffic pattern resulted in a significantly higher number of concurrent requests than typically observed.
As request volume increased, backend processing slowed, and active requests accumulated faster than they could be completed. This led to temporary resource saturation and prevented the system from efficiently accepting or completing new requests.
An initial service restart did not immediately restore normal service. Additional controls were subsequently applied to support stable request handling during high traffic conditions, after which the system recovered.
Detection
The issue was detected through a combination of:
- Internal monitoring indicating elevated load and degraded responsiveness
- Customer reports of service unavailability
The incident was escalated and formally declared a P0 once widespread impact was confirmed.
Resolution
Service recovery occurred in two phases:
- Stabilisation (by 1:35pm AEDT)
- Protective request-handling controls were applied
- Services were restarted in a controlled manner
- Core functionality was restored and customer access resumed
- Performance Recovery (1:35pm–2:25pm AEDT)
- Elevated traffic gradually subsided
- System performance progressively returned to normal levels
- No manual intervention was required for downstream systems once stability was achieved.
Timeline (AEDT)
- 12:35pm – Elevated external request volume begins impacting service responsiveness
- 12:45pm–12:50pm – Mobile API becomes unavailable or severely degraded
- 12:50pm – Incident declared P0
- 12:50pm – Initial restart attempted; elevated traffic persists
- 1:30pm – Additional protective controls applied to manage request load
- 1:35pm – Core services restored (start of degraded performance window)
- 2:25pm – Traffic normalises; full-service performance restored; P0 cleared
Root Cause
A sudden and sustained surge of external requests placed unexpected load on the Mobile API, leading to temporary saturation of request processing capacity. This prevented the system from handling new requests efficiently until traffic was regulated and services were stabilised.
Preventative Actions
To reduce the risk and impact of similar events in the future, we are implementing the following improvements:
- Enhanced controls to better regulate and absorb sudden spikes in request traffic
- Improved monitoring and alerting to detect abnormal traffic patterns earlier
- Additional safeguards to ensure services recover more quickly under extreme load
These actions are actively being tracked through our internal delivery process.
Closing
We recognise the operational impact this incident may have caused and appreciate your patience.
Visualcare continues to invest in platform resilience and protection to ensure reliable service, even during abnormal traffic conditions.
If you have any questions or would like further clarification, please contact your Customer Success Manager, or the Head of Customer Success, Maddie Hayes (mhayes@visualcare.com.au).
new
Support at Home
Claiming
Support at Home (SAH) Claiming
We’ve released the first version of
Support at Home (SAH) Claiming
in Visualcare, including CSV export
options. This update also introduces new settings for SAH claiming configuration and user-level permissions.This release supports providers preparing for their first Support at Home claim and introduces the foundational claiming workflow, aligned to the Aged Care Web Services requirements.
What’s New
SAH Claiming – New Claiming Workspace
A new claiming area is now available:
Timesheets > SAH Claiming
This workspace allows you to:
- Create draft SAH claims
- View all SAH claims with real-time status updates (when claiming via API)
- Reconcile completed or rejected claims
- Export claims using bulk CSV if you prefer a file-based workflow
Supported statuses include:
- Draft – items batched and ready to claim
- Submitted – successfully sent to Services Australia
- Rejected – items returned with a reason and available for correction
- Claimed – processed by Services Australia; reconciliation applied
This workflow differs from HCP processes. We strongly recommend familiarising yourself with the new screens and statuses before submitting your first SAH claim.
New SAH Claiming Settings
A new configuration area is available to set your claiming rules:
Settings > Integrations > Manage > SAH Claim Settings
You can now configure:
- Care Management Claiming options to claim each activity individually, or aggregate Care Management per participant before submission
- Default Support at Home Claiming Method
- Rounding Thresholds for SAH service types that use hour as the unit (e.g., Care Management)
- Option to prevent rounding down to 0 minutes
User Permissions for SAH Claiming
A new permission setting has been added to control who can access SAH Claiming:
Settings > User Group Security
This allows you to limit SAH claiming to specific roles such as finance, coordinators, or administrators.
Where to Find Everything
SAH Claiming - Timesheets > SAH Claiming
SAH Claim Settings - Settings > Integrations
User Access Control - Settings > User Group Security
Need Help?
Refer to the Support at Home Knowledge Hub for detailed guidance, examples, and setup instructions.
For configuration support or assistance validating your first SAH claim, please contact Support or your Customer Success Manager.
new
Support at Home
AT-HM
Bulk Import
Assistive Technology & Home Modifications (AT-HM) – Bulk Upload Enhancements
We’ve updated the
Expenses Bulk Import
feature to support uploading AT-HM expense items in bulk. This enhancement helps providers manage high-volume AT-HM records more efficiently and ensures correct mapping of the new AT-HM fields required under Support at Home.This update is available now under:
Operations > Import CSV > Expenses
What’s New
AT-HM Bulk Import Support
The Expenses Bulk Import template has been extended to include all AT-HM-related fields. You can now bulk upload AT-HM items with full classification and linkage to the required AT-HM attributes.
Newly supported fields include:
- AT-HM Parent
- AT-HM Item / Wraparound
- AT-HM Item Code
- AT-HM Prescribed
- AT-HM First Payment
- AT-HM Loaned
- Home Support Item Code
These fields ensure AT-HM expenses are imported with the correct structure, enabling accurate claiming, reporting, and compliance.
How It Works
- Download the latest Expenses CSV template from the Import CSV screen.
- Populate AT-HM fields where required.
- Upload the completed file under Operations > Import CSV > Expenses.
- Imported AT-HM items will appear in the participant’s expenses and will be available for claiming (where relevant under SAH).
Why This Matters
These enhancements reduce manual entry and improve data consistency across AT-HM records, particularly important as providers transition to Support at Home and manage higher volumes of AT-HM activity.
Need Help?
Refer to the updated
AT-HM Bulk Import
guide in the Support at Home Knowledge Hub, or contact Support or your Customer Success Manager for assistance.fixed
improved
Support at Home
Bugs
Incidents
Patch Fixes
- Participant contributions not saving- Fixed an issue where only a small subset of participant contributions were being retrieved due to incorrect API filtering. Contributions are now pulled per participant, ensuring all data is saved correctly.
- Participant supplements missing- Updated validation so supplements are saved even when Services Australia sends partial or inconsistent fields. Supplement records now load and display reliably.
- Unspent HCP funds not showing- Corrected logic for recognising “active” budgets so grandfathered budgets with no end date now appear in the SAH Participant Profile.
- SIRS reportable flag & comments not persisting- Fixed an issue where SIRS records saved with “Reportable” and comments did not reload on edit. Values now persist as expected.
new
NDIS
Services
NDIS Pricing Update - Effective 24 November 2025
The NDIA has released an update to the Pricing Arrangements and Price Limits.
Visualcare has updated the pricing catalogue to reflect the new prices, effective 24 November 2025.
Action required:
The updated catalogue is now available for bulk upload.
You can access it from:
Maintenance > Services > NDIS Support Catalogue
and apply the latest pricing in your environment.More information on how to do this can be found in vDocs: Update NDIS Pricing.
We’ve improved the Expenses → Import CSV feature to make bulk expense uploads more accurate and efficient.
What’s New
- Service Code field addedto the expense CSV import process.
- CSV template updatedto include the new Service Code column.
Where to find it
Go to
Operations → Import CSV
and select Expenses
from the dropdown.How It Works
When preparing your CSV, simply enter the Visualcare
Service Code
for each expense in the new column. The system will attach the correct service to the expense during upload as long as that service is active and linked to the client's agreement
, ensuring accurate claiming and reporting.Load More
→