Muse® Harvesting

Muse® Harvesting, is based on different types of controlled extraction, such as search, date or feed based. Aggregated from any number of sources, records are gathered by feed or query either timed or as an ‘on demand’ operation. It features a consistent delivery record format, enhanced, ‘virtual’ records, all delivered by file or feed.

The benefits of Muse Harvesting can be barely counted as they have a lot of unique characteristics, each of them being a key factor of the process.

The harvesting process can connect to any Source by specific connectors, each being able to handle a search language, data extraction, record enhancement, entity extraction and normalization – and all these while having advanced tools for building and maintaining Connectors. The tools are very easy to use: just add an abstract description, map fields by simply painting then and, at the end, use the built-in checkers and the test databases.

The quantity of the harvesting process is of up to 10k records per run, with regular timed delivery aligned on minutes granularity or irregular frequency. Such features are dependent on the quality of the extraction. The completeness is achieved by using MD5 checksums for transmission and comparison of records retrieved and the number reported, immediate (or next run) multiple re-tries to avoid situations such as network congestion.

The harvesting process can deliver both full results or ‘deltas’, meaning that only the records new since the last run are retrieved.