ML Spark Exception for processing logs with IPv6 Addresses

Reply
L0 Member

ML Spark Exception for processing logs with IPv6 Addresses

Getting the following error when trying to process CSV:

 

Exception:

 

Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

 

Full trace: 

 

(/opt/Spark/spark/bin/spark-submit --class com.paloaltonetworks.tbd.LogCollectorCompacter --deploy-mode client --supervise /var/www/html/OS/spark/packages/LogCoCo-1.2.4-SNAPSHOT.jar MLServer='10.10.50.100', master='local[3]', debug='false', taskID='65', user='admin', dbUser='root', dbPass='paloalto', dbServer='10.10.50.100:3306', timeZone='Europe/Helsinki', mode='Expedition', input='007254000047808:8.0.3:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv', output='/var/expedition/connections.parquet', tempFolder='/var/expedition'; echo /var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv; )>> "/tmp/error_logCoCo" 2>>/tmp/error_logCoCo &
---- CREATING SPARK Session:
warehouseLocation:/PALogs/spark-warehouse
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/Spark/extraLibraries/slf4j-nop-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/Spark/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]
+--------------------+---------------+--------+--------------------+
| rowLine| fwSerial|panosver| csvpath|
+--------------------+---------------+--------+--------------------+
|007254000047808:8...|007254000047808| 8.0.3|/var/backup/fw1_t...|
+--------------------+---------------+--------+--------------------+

8.0.0:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv
LogCollector&Compacter called with the following parameters:
Parameters for execution
Master[processes]:............ local[3]
User:......................... admin
debug:........................ false
Parameters for Job Connections
Task ID:...................... 65
My IP:........................ 10.10.50.100
Expedition IP:................ 10.10.50.100:3306
Time Zone:.................... Europe/Helsinki
dbUser (dbPassword):.......... root (************)
projectName:.................. demo
Parameters for Data Sources
App Categories (source):........ (Expedition)
CSV Files Path:.................007254000047808:8.0.3:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv
Parquet output path:.......... file:///var/expedition/connections.parquet
Temporary folder:............. /var/expedition
---- AppID DB LOAD:
Application Categories loading...
DONE

Logs of format 7.1.x NOT found
Logs of format 8.0.2 found
Logs of format 8.1.0-beta17 NOT found
Logs of format 8.1.0 NOT found
Size of trafficExtended: 50 MB
[Stage 44:> (0 + 3) / 3]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 44.0 failed 1 times, most recent failure: Lost task 2.0 in stage 44.0 (TID 936, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function(anonfun$18: (string) => bigint)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at com.paloaltonetworks.tbd.LogCollectorCompacter$.com$paloaltonetworks$tbd$LogCollectorCompacter$$IPv4ToLong$1(LogCollectorCompacter.scala:275)
at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886)
at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886)
... 13 more

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at com.paloaltonetworks.tbd.LogCollectorCompacter$.main(LogCollectorCompacter.scala:1039)
at com.paloaltonetworks.tbd.LogCollectorCompacter.main(LogCollectorCompacter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function(anonfun$18: (string) => bigint)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at com.paloaltonetworks.tbd.LogCollectorCompacter$.com$paloaltonetworks$tbd$LogCollectorCompacter$$IPv4ToLong$1(LogCollectorCompacter.scala:275)
at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886)
at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886)
... 13 more
/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv

L7 Applicator

Re: ML Spark Exception for processing logs with IPv6 Addresses

Hi,

 

thanks we are working to fix it. By now ipv6 analisys will not be supported yet. But we will prevent spark to quit when an ipv6 address is found.

L0 Member

Re: ML Spark Exception for processing logs with IPv6 Addresses

Are we still working on being able to ingest IPv6 addressing in logs, or are you still working on having Spark not quit thus aborting the whole process via stacktrace?  It has been 3 weeks and I have not seen an update here.  I am running Expedition-Beta versoin 1.0.104.

 

Thanks in advance,

Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!

The Live Community thanks you for your participation!