Enhancement Request: URL aggregator optimization

Reply
L3 Networker

Enhancement Request: URL aggregator optimization

Today, the stdlib.aggregatorURL aggregator processes a list of URLs, removes duplicates, and manages withdrawals/whitelists.  However, no optimization is performed on the output of this aggregator.  I would like to recommend the following enhancements:

 

1. Removal of superfluous URLs

URLs that are made redundant by shorter, wildcard URLs should be removed from the output list.

 

Example:

*.domain.com

subdomain.domain.com   <-- REMOVE

host.subdomain.domain.com <-- REMOVE

 

2. Convert to lowercase before removing duplicate URLs

The aggregator output today could include duplicate URLs containing mixed case letters.  This can be addressed by converting all URL strings to lowercase before the removal of duplicates.

 

Example:

login.microsoftonline.com
Login.microsoftonline.com

 

3.  Sorted output

This one is more cosmetic in nature, but it will help users when troubleshooting.  Sorting the aggregator output will help save time when firewall adminsitrators need to look up URLs received in an EDL.  Today, the output is not sorted.

 

L7 Applicator

Re: Enhancement Request: URL aggregator optimization

1) and 2) should be supported by a URL-specific aggregator. ER#9 has been open to track this.

 

Output list are sorted indeed, but by default they are sorted based on the time of update. Most recent entries at the top of the list.

L1 Bithead

Re: Enhancement Request: URL aggregator optimization

Not to derail the original post, but are all the output lists (IPv4 specifically) sorted with the newest ones at the top? I ask as I am trying to solve for how to handle the lower EBL counts on the PA-500 and PA-3020.

L7 Applicator

Re: Enhancement Request: URL aggregator optimization

Hi greg.rohel,

yes, same applies to all the list generated by Output nodes of class RedisSet (all the "feed*" prototypes are based on this class).

To cope with platforms limits you can split the lists in multiple sublists using the s and n URL parameters:

 

Examples:

- topmost 1000 elements of the list

https://<minemeld>/feeds/feed1?n=1000

 

- elements 1000-2000 of the list

https://<minemeld>/feeds/feed1?s=1000&n=1000

 

The indicator value used to sort the list is configurable by changing the prototype (and this will be possible in the next release of MM). The parameter is called scoring_attribute and by default it is set to last_seen.

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!

The Live Community thanks you for your participation!