Getting the following error when trying to process CSV:
Exception:
Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Full trace:
(/opt/Spark/spark/bin/spark-submit --class com.paloaltonetworks.tbd.LogCollectorCompacter --deploy-mode client --supervise /var/www/html/OS/spark/packages/LogCoCo-1.2.4-SNAPSHOT.jar MLServer='10.10.50.100', master='local[3]', debug='false', taskID='65', user='admin', dbUser='root', dbPass='paloalto', dbServer='10.10.50.100:3306', timeZone='Europe/Helsinki', mode='Expedition', input='007254000047808:8.0.3:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv', output='/var/expedition/connections.parquet', tempFolder='/var/expedition'; echo /var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv; )>> "/tmp/error_logCoCo" 2>>/tmp/error_logCoCo & ---- CREATING SPARK Session: warehouseLocation:/PALogs/spark-warehouse SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/Spark/extraLibraries/slf4j-nop-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/Spark/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory] +--------------------+---------------+--------+--------------------+ | rowLine| fwSerial|panosver| csvpath| +--------------------+---------------+--------+--------------------+ |007254000047808:8...|007254000047808| 8.0.3|/var/backup/fw1_t...| +--------------------+---------------+--------+--------------------+
8.0.0:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv LogCollector&Compacter called with the following parameters: Parameters for execution Master[processes]:............ local[3] User:......................... admin debug:........................ false Parameters for Job Connections Task ID:...................... 65 My IP:........................ 10.10.50.100 Expedition IP:................ 10.10.50.100:3306 Time Zone:.................... Europe/Helsinki dbUser (dbPassword):.......... root (************) projectName:.................. demo Parameters for Data Sources App Categories (source):........ (Expedition) CSV Files Path:.................007254000047808:8.0.3:/var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv Parquet output path:.......... file:///var/expedition/connections.parquet Temporary folder:............. /var/expedition ---- AppID DB LOAD: Application Categories loading... DONE
Logs of format 7.1.x NOT found Logs of format 8.0.2 found Logs of format 8.1.0-beta17 NOT found Logs of format 8.1.0 NOT found Size of trafficExtended: 50 MB [Stage 44:> (0 + 3) / 3]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 44.0 failed 1 times, most recent failure: Lost task 2.0 in stage 44.0 (TID 936, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function(anonfun$18: (string) => bigint) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at com.paloaltonetworks.tbd.LogCollectorCompacter$.com$paloaltonetworks$tbd$LogCollectorCompacter$$IPv4ToLong$1(LogCollectorCompacter.scala:275) at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886) at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886) ... 13 more
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.collect(RDD.scala:935) at com.paloaltonetworks.tbd.LogCollectorCompacter$.main(LogCollectorCompacter.scala:1039) at com.paloaltonetworks.tbd.LogCollectorCompacter.main(LogCollectorCompacter.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Failed to execute user defined function(anonfun$18: (string) => bigint) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NumberFormatException: For input string: "2001:470:ba7e:20::254" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at com.paloaltonetworks.tbd.LogCollectorCompacter$.com$paloaltonetworks$tbd$LogCollectorCompacter$$IPv4ToLong$1(LogCollectorCompacter.scala:275) at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886) at com.paloaltonetworks.tbd.LogCollectorCompacter$$anonfun$18.apply(LogCollectorCompacter.scala:886) ... 13 more /var/backup/fw1_traffic_2018_08_17_last_calendar_day.csv
... View more