Recently, I worked on a project where I needed to import a large volume of data into Sitecore Content Hub. Much of this data referenced existing entities through relations—both to taxonomy definitions (like categories or tags) and to non-taxonomy definitions (such as linked assets or custom entities)—that were already present in the system. To maximize import speed and minimize API calls, I decided to cache all referential data at the start of the import.
This approach raised an important technical question: What’s the most efficient way to retrieve and cache large sets of reference data from Content Hub? The Stylelabs WebClient SDK offers three main strategies: QueryAsync, CreateEntityIterator, and CreateEntityScroller.
Here’s what I learned about each.
Technical Note: Content Hub uses Elasticsearch as its underlying search engine. All queries, paging, and scroller operations in the SDK are powered by Elasticsearch, which is optimized for fast, scalable search and retrieval of large datasets.
1. QueryAsync: Simple Paging with Skip/Take
The most direct way to retrieve entities is using the QueryAsync method. By default, this method returns up to 50 items per call, but you can specify the number of items to fetch using the Take property and page through results using Skip.
The QueryAsync method is best used for:
- Small to medium datasets (up to 10,000 items).
- Simple paging scenarios.
Limitations:
- Strict 10,000 item skip limit: Content Hub enforces a maximum skip threshold of 10,000 items, else it will throw an Internal Server 500 error.
- Performance degrades as skip increases due to offset-based paging.
2. CreateEntityIterator: Offset-Based Iteration (Skip/Take Under the Hood)
The CreateEntityIterator method provides an IEntityIterator for paging through entities. It abstracts the skip/take logic and is ideal for scenarios where you want to iterate over a result set without manually managing offsets. However, it is still subject to the same skip/take and 10,000 item limit as QueryAsync.
The CreateEntityIterator method is best used for:
- Small to medium datasets (up to 10,000 items).
- When you want to avoid manual skip/take management.
Limitations:
- Still subject to the 10,000 item skip limit.
- Uses offset-based paging, so performance can degrade with large offsets.
- Functionally similar to QueryAsync with skip/take, but with a more convenient API.
3. CreateEntityScroller: Cursor-Based Paging for Large Datasets
For large datasets, CreateEntityScroller is the recommended approach. It uses cursor-based pagination, which is more efficient and reliable for traversing large result sets.
The CreateEntityIterator method is best used for:
- Large datasets (over 10,000 items).
- High-performance, reliable paging.
Advantages:
- No skip limit: Efficiently handles datasets larger than 10,000 items.
- Consistent performance: Each page fetch is fast, regardless of dataset size.
- Data integrity: Reduces the risk of missing or duplicating items during paging.
Tip: Use the scrollTime parameter wisely—set it just long enough to process each batch, as excessive scroll times can impact backend performance.
Official documentation:
See the official documentation for more details on the various methods:
Happy Sitecore-ing!
–Robbert



