Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spark Loader insertSuccessCnt statistics incorrect #502

Open
1 task done
liuxiaocs7 opened this issue Aug 3, 2023 · 1 comment
Open
1 task done

[Bug] Spark Loader insertSuccessCnt statistics incorrect #502

liuxiaocs7 opened this issue Aug 3, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@liuxiaocs7
Copy link
Member

liuxiaocs7 commented Aug 3, 2023

Bug Type (问题类型)

others (please comment below)

Spark Loader success but insertSuccessCnt is not right, always be 0.

Before submit

  • I had searched in the issues and found no similar issues.

Environment (环境信息)

  • Server Version: v1.0.0
  • Toolchain Version: master
  • Data Size: xx vertices, xx edges

Expected & Actual behavior (期望与实际表现)

23/08/04 00:11:49 INFO HugeGraphSparkLoader:
 Finished  load example/spark/vertex_software.json  data
23/08/04 00:11:49 INFO DAGScheduler: Job 5 finished: foreachPartition at HugeGraphSparkLoader.java:154, took 1.023988 s
23/08/04 00:11:49 INFO DAGScheduler: ResultStage 3 (foreachPartition at HugeGraphSparkLoader.java:154) finished in 1.021 s
23/08/04 00:11:49 INFO HugeGraphSparkLoader:
 Finished  load example/spark/edge_knows.json  data
23/08/04 00:11:49 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job
23/08/04 00:11:49 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
23/08/04 00:11:49 INFO DAGScheduler: Job 4 finished: foreachPartition at HugeGraphSparkLoader.java:154, took 1.025300 s
23/08/04 00:11:49 INFO HugeGraphSparkLoader:
 Finished  load example/spark/vertex_person.json  data
23/08/04 00:11:49 INFO HugeGraphSparkLoader:
 ------------The data load task is complete-------------------

 insertSuccessCnt:       0
 ---------------------------------------------

23/08/04 00:11:49 INFO SparkUI: Stopped Spark web UI at http://192.168.34.164:4040
23/08/04 00:11:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/08/04 00:11:49 INFO MemoryStore: MemoryStore cleared
23/08/04 00:11:49 INFO BlockManager: BlockManager stopped
23/08/04 00:11:49 INFO BlockManagerMaster: BlockManagerMaster stopped
23/08/04 00:11:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/08/04 00:11:49 INFO SparkContext: Successfully stopped SparkContext
23/08/04 00:11:49 INFO SparkContext: SparkContext already stopped.
23/08/04 00:11:49 INFO SparkContext: SparkContext already stopped.
23/08/04 00:12:49 INFO ShutdownHookManager: Shutdown hook called
23/08/04 00:12:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-41fba547-151e-4a5a-8982-fa9df80978db
23/08/04 00:12:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-8c9b8b45-a9cf-4876-a719-b7446f4e46cd

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

// Define schema
    schema.propertyKey("name").asText().ifNotExist().create();
    schema.propertyKey("age").asInt().ifNotExist().create();
    schema.propertyKey("city").asText().ifNotExist().create();
    schema.propertyKey("weight").asDouble().ifNotExist().create();
    schema.propertyKey("lang").asText().ifNotExist().create();
    schema.propertyKey("date").asText().ifNotExist().create();
    schema.propertyKey("price").asDouble().ifNotExist().create();

    schema.vertexLabel("person")
            .properties("name", "age", "city")
            .useCustomizeStringId()
            .nullableKeys("age", "city")
            .ifNotExist()
            .create();

    schema.vertexLabel("software")
            .properties("name", "lang", "price")
            .useCustomizeStringId()
            .ifNotExist()
            .create();

    schema.edgeLabel("knows")
            .sourceLabel("person")
            .targetLabel("person")
            .properties("date", "weight")
            .ifNotExist()
            .create();

    schema.edgeLabel("created")
            .sourceLabel("person")
            .targetLabel("software")
            .properties("date", "weight")
            .ifNotExist()
            .create();
{
  "vertices": [
    {
      "label": "person",
      "input": {
        "type": "file",
        "path": "example/spark/vertex_person.json",
        "format": "JSON",
        "header": ["name", "age", "city"],
        "charset": "UTF-8",
        "skipped_line": {
          "regex": "(^#|^//).*"
        }
      },
      "id": "name",
      "null_values": ["NULL", "null", ""]
    },
    {
      "label": "software",
      "input": {
        "type": "file",
        "path": "example/spark/vertex_software.json",
        "format": "JSON",
        "header": ["id","name", "lang", "price","ISBN"],
        "charset": "GBK"
      },
      "id": "name",
      "ignored": ["ISBN"]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "source": ["source_name"],
      "target": ["target_name"],
      "input": {
        "type": "file",
        "path": "example/spark/edge_knows.json",
        "format": "JSON",
        "date_format": "yyyyMMdd",
        "header": ["source_name","target_name", "date", "weight"]
      },
      "field_mapping": {
        "source_name": "name",
        "target_name": "name"
      }
    }
  ]
}
@liuxiaocs7 liuxiaocs7 added the bug Something isn't working label Aug 3, 2023
@liuxiaocs7 liuxiaocs7 changed the title [Bug] Spark Loader [Bug] Spark Loader insertSuccessCnt statistics incorrect Aug 3, 2023
@haohao0103
Copy link
Contributor

@liuxiaocs7 hi,bro,What is your command to submit the task? Whether the --sink-type is specified. My guess is that insertSuccessCnt only works with bypass load data...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants