Flink-Hadoop编译&简单测试

目录

  1. 准备工作

  2. 下载源码包

  3. 准备操作

    1. 配置支持CDH依赖
    2. 编译Flink-shaded
    3. flink测试模块删减
    4. 配置支持maven-assembly-plugin插件
    5. node、npm等依赖添加国内仓库
    6. Kafka Schema Registry相关maven库配置
  4. 执行Flink编译

  5. 提取编译后的安装包

  6. 单节点部署测试

准备工作

  • maven3
  • jdk1.8

下载源码包

Flink官网: https://flink.apache.org/

下载地址: Apache Flink 1.10.0 Source Release

准备操作

配置支持CDH依赖

maven默认不支持cdh的依赖下载,修改maven目录下conf中的settings.xml如下:(这里的cloudera-releases是flink源码中配置的id)

<mirrors>
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>*,!cloudera-releases,!cloudera</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>

<!-- hortonworks maven -->
<mirror>
<id>nexus-hortonworks</id>
<mirrorOf>*,!central</mirrorOf>
<name>Nexus hortonworks</name>
<url>https://repo.hortonworks.com/content/groups/public/</url>
</mirror>
<mirror>
<id>confluent</id>
<mirrorOf>confluent</mirrorOf>
<name>Nexus public mirror</name>
<url>http://packages.confluent.io/maven/</url>
</mirror>
</mirrors>

注意: 有时候hortonworks的仓库并不好使,如果发现不好使,注释掉即可

修改flink-1.10.0/pom.xml,添加:

<!--添加CDH的仓库-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>

不同的 Flink 版本使用的 Flink-shaded不同,Flink 10.0 版本使用Flink-shaded 9.0
如果不编译的话会报错找不到:flink-shaded-hadoop-2:jar:2.6.0-cdh5.16.2-9.0

[ERROR] Failed to execute goal on project flink-hadoop-fs: Could not resolve dependencies for project org.apache.flink:flink-hadoop-fs:jar:1.9.1: The following artifacts could not be resolved: org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-cdh5.14.2-7.0, org.apache.hadoop:hadoop-hdfs:jar:tests:2.6.0-cdh5.14.2, org.apache.hadoop:hadoop-common:jar:tests:2.6.0-cdh5.14.2: Could not find artifact org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-cdh5.14.2-7.0 in nexus-hortonworks (https://repo.hortonworks.com/content/groups/public/) -> [Help 1]

因此,这一步需要手动编译flink-shaded-hadoop-2,并将其打入到maven库。

  1. 下载flink-shaded-10.0-src.tgz

  2. 修改项目pom.xml

    flink-shaded-7.0/pom.xml文件中添加 cloudera 的maven库:

    <!--添加CDH的仓库-->
    <repositories>
    <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    </repository>
    </repositories>

    flink-shaded-7.0/flink-shaded-hadoop-2/pom.xml文件中也添加:

    <!--添加CDH的仓库-->
    <repositories>
    <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    </repository>
    </repositories>

    flink-shaded-7.0/flink-shaded-hadoop-2-uber/pom.xml 中的 dependencyManagement 标签中添加如下依赖:

    <dependency>
    <groupId>commons-cli</groupId>
    <artifactId>commons-cli</artifactId>
    <version>1.3.1</version>
    </dependency>

    注意:这一步一定要添加,不然编译成功后,启动不了,并 .out 文件中抛出如下错误:

    Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;

    原因是项目打包后,依赖的 commons-cli 是1.2版本的,build 方法在该版本中不存在。

  3. 开始编译:

    mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.16.2

flink测试模块删减

删除flink中的以下test模块,防止编译出错:

<module>flink-tests</module>
<module>flink-end-to-end-tests</module>
<module>flink-yarn-tests</module>
<module>flink-fs-tests</module>

配置支持maven-assembly-plugin插件

编辑flink-1.10.0/flink-libraries/pom.xml,新增maven-assembly-plugin插件,否则会报错。

<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

node、npm等依赖添加国内仓库

为flink-runtime-web/添加国内仓库,编辑flink-1.10.0/flink-runtime-web/pom.xml。
Flink1.9.x的flink-runtime-web模块引入了frontend-maven-plugin依赖,并安装了node和部分依赖组件,添加国内仓库,否则会访问不到:

<nodeDownloadRoot>https://registry.npm.taobao.org/dist/</nodeDownloadRoot>
<npmDownloadRoot>https://registry.npmjs.org/npm/-/</npmDownloadRoot>

注意frontend-maven-plugin原本就有,我们只是进行修改,修改后的完整配置是:

<plugin>
<groupId>com.github.eirslett</groupId>
<artifactId>frontend-maven-plugin</artifactId>
<version>1.6</version>
<executions>
<execution>
<id>install node and npm</id>
<goals>
<goal>install-node-and-npm</goal>
</goals>
<configuration>
<nodeDownloadRoot>https://registry.npm.taobao.org/dist/</nodeDownloadRoot>
<npmDownloadRoot>https://registry.npmjs.org/npm/-/</npmDownloadRoot>
<nodeVersion>v10.9.0</nodeVersion>
</configuration>
</execution>
<execution>
<id>npm install</id>
<goals>
<goal>npm</goal>
</goals>
<configuration>
<arguments>ci --cache-max=0 --no-save --no-bin-links</arguments>
<environmentVariables>
<HUSKY_SKIP_INSTALL>true</HUSKY_SKIP_INSTALL>
</environmentVariables>
</configuration>
</execution>
<execution>
<id>npm run build</id>
<goals>
<goal>npm</goal>
</goals>
<configuration>
<arguments>run build</arguments>
</configuration>
</execution>
</executions>
<configuration>
<workingDirectory>web-dashboard</workingDirectory>
</configuration>
</plugin>

否则报错:

[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.6:npm (npm install) on project flink-runtime-web_2.11: Failed to run task: 'npm ci --cache-max=0 --no-save' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]

Kafka Schema Registry相关maven库配置

相关jar包在 Maven仓库中下载不到,所以需要在maven的settings文件中添加如下信息(在文章开头已经配置了):

<mirrors>
<mirror>
<id>confluent</id>
<mirrorOf>confluent</mirrorOf>
<name>Nexus public mirror</name>
<url>http://packages.confluent.io/maven/</url>
</mirror>
</mirrors>

同时在flink项目的主pom里添加如下仓库配置

<repository>
<id>confluent</id>
<url>http://packages.confluent.io/maven/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>ICM</id>
<url>http://maven.icm.edu.pl/artifactory/repo/</url>
</repository>

否则会找不到io/confluent/kafka-schema-registry-client、kafka-schema-registry-parent、rest-utils-parent等依赖包。

执行Flink编译

mvn -T2C clean package -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.2  -Dscala-2.12 -Drat.skip=true

# 参数说明:
# -Pinclude-hadoop # 将 hadoop的 jar包,打入到lib/中
# -Pvendor-repos # 激活 Maven 构建配置文件,其中包括 Cloudera,Hortonworks 或 MapR 等流行的 Hadoop 供应商的存储库。
# -Dhadoop.version=2.6.0-cdh5.14.2 # 指定 hadoop 的版本
# -Dscala-2.11 # Scala版本,默认是2.11,如果开发使用的Scala版本不同需要指定

编译过程中的遇到比较多的问题都是跟依赖下载有关,一些问题的解决都在上面提到了,还有一些下不下来的包直接通过maven库或官方库下载到本地库

如果编译所需依赖都已下载,时间大概在半小时左右,视情况而定。
编译成功效果:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] force-shading ...................................... SUCCESS [ 2.031 s]
[INFO] flink .............................................. SUCCESS [ 1.561 s]
[INFO] flink-annotations .................................. SUCCESS [ 4.092 s]
[INFO] flink-shaded-curator ............................... SUCCESS [ 5.122 s]
[INFO] flink-metrics ...................................... SUCCESS [ 0.663 s]
[INFO] flink-metrics-core ................................. SUCCESS [ 4.475 s]
[INFO] flink-test-utils-parent ............................ SUCCESS [ 0.676 s]
[INFO] flink-test-utils-junit ............................. SUCCESS [ 3.405 s]
[INFO] flink-core ......................................... SUCCESS [ 19.344 s]
[INFO] flink-java ......................................... SUCCESS [ 7.728 s]
[INFO] flink-queryable-state .............................. SUCCESS [ 0.193 s]
[INFO] flink-queryable-state-client-java .................. SUCCESS [ 1.361 s]
[INFO] flink-filesystems .................................. SUCCESS [ 1.046 s]
[INFO] flink-hadoop-fs .................................... SUCCESS [ 3.766 s]
[INFO] flink-runtime ...................................... SUCCESS [01:07 min]
[INFO] flink-scala ........................................ SUCCESS [01:35 min]
[INFO] flink-mapr-fs ...................................... SUCCESS [ 3.541 s]
[INFO] flink-filesystems :: flink-fs-hadoop-shaded ........ SUCCESS [ 8.875 s]
[INFO] flink-s3-fs-base ................................... SUCCESS [ 28.538 s]
[INFO] flink-s3-fs-hadoop ................................. SUCCESS [ 22.938 s]
[INFO] flink-s3-fs-presto ................................. SUCCESS [ 30.871 s]
[INFO] flink-swift-fs-hadoop .............................. SUCCESS [ 49.025 s]
[INFO] flink-oss-fs-hadoop ................................ SUCCESS [ 25.729 s]
[INFO] flink-azure-fs-hadoop .............................. SUCCESS [ 29.693 s]
[INFO] flink-optimizer .................................... SUCCESS [ 3.194 s]
[INFO] flink-clients ...................................... SUCCESS [ 1.632 s]
[INFO] flink-streaming-java ............................... SUCCESS [ 10.892 s]
[INFO] flink-test-utils ................................... SUCCESS [ 8.855 s]
[INFO] flink-runtime-web .................................. SUCCESS [05:56 min]
[INFO] flink-streaming-scala .............................. SUCCESS [01:37 min]
[INFO] flink-table ........................................ SUCCESS [ 0.818 s]
[INFO] flink-table-common ................................. SUCCESS [ 4.308 s]
[INFO] flink-table-api-java ............................... SUCCESS [ 5.168 s]
[INFO] flink-table-api-java-bridge ........................ SUCCESS [ 3.254 s]
[INFO] flink-table-api-scala .............................. SUCCESS [ 19.390 s]
[INFO] flink-table-api-scala-bridge ....................... SUCCESS [ 22.887 s]
[INFO] flink-sql-parser ................................... SUCCESS [ 13.172 s]
[INFO] flink-state-backends ............................... SUCCESS [ 0.712 s]
[INFO] flink-statebackend-rocksdb ......................... SUCCESS [ 4.609 s]
[INFO] flink-libraries .................................... SUCCESS [ 1.007 s]
[INFO] flink-cep .......................................... SUCCESS [ 11.703 s]
[INFO] flink-table-planner ................................ SUCCESS [04:36 min]
[INFO] flink-connectors ................................... SUCCESS [ 1.105 s]
[INFO] flink-orc .......................................... SUCCESS [ 10.208 s]
[INFO] flink-jdbc ......................................... SUCCESS [ 10.759 s]
[INFO] flink-hadoop-compatibility ......................... SUCCESS [ 18.036 s]
[INFO] flink-table-runtime-blink .......................... SUCCESS [ 12.010 s]
[INFO] flink-table-planner-blink .......................... SUCCESS [05:52 min]
[INFO] flink-hbase ........................................ SUCCESS [07:06 min]
[INFO] flink-hcatalog ..................................... SUCCESS [ 22.569 s]
[INFO] flink-metrics-jmx .................................. SUCCESS [ 2.176 s]
[INFO] flink-connector-kafka-base ......................... SUCCESS [ 19.286 s]
[INFO] flink-connector-kafka-0.9 .......................... SUCCESS [ 7.931 s]
[INFO] flink-connector-kafka-0.10 ......................... SUCCESS [ 3.107 s]
[INFO] flink-connector-kafka-0.11 ......................... SUCCESS [ 7.556 s]
[INFO] flink-formats ...................................... SUCCESS [ 0.361 s]
[INFO] flink-json ......................................... SUCCESS [ 8.512 s]
[INFO] flink-connector-elasticsearch-base ................. SUCCESS [ 15.884 s]
[INFO] flink-connector-elasticsearch2 ..................... SUCCESS [ 38.263 s]
[INFO] flink-connector-elasticsearch5 ..................... SUCCESS [ 40.943 s]
[INFO] flink-connector-elasticsearch6 ..................... SUCCESS [ 15.306 s]
[INFO] flink-csv .......................................... SUCCESS [ 2.397 s]
[INFO] flink-connector-hive ............................... SUCCESS [01:39 min]
[INFO] flink-connector-rabbitmq ........................... SUCCESS [ 1.635 s]
[INFO] flink-connector-twitter ............................ SUCCESS [ 6.631 s]
[INFO] flink-connector-nifi ............................... SUCCESS [ 3.832 s]
[INFO] flink-connector-cassandra .......................... SUCCESS [ 31.134 s]
[INFO] flink-avro ......................................... SUCCESS [ 15.516 s]
[INFO] flink-connector-filesystem ......................... SUCCESS [ 6.685 s]
[INFO] flink-connector-kafka .............................. SUCCESS [ 7.954 s]
[INFO] flink-connector-gcp-pubsub ......................... SUCCESS [ 4.723 s]
[INFO] flink-sql-connector-elasticsearch6 ................. SUCCESS [ 16.608 s]
[INFO] flink-sql-connector-kafka-0.9 ...................... SUCCESS [ 2.115 s]
[INFO] flink-sql-connector-kafka-0.10 ..................... SUCCESS [ 4.332 s]
[INFO] flink-sql-connector-kafka-0.11 ..................... SUCCESS [ 1.628 s]
[INFO] flink-sql-connector-kafka .......................... SUCCESS [ 7.450 s]
[INFO] flink-connector-kafka-0.8 .......................... SUCCESS [ 4.267 s]
[INFO] flink-avro-confluent-registry ...................... SUCCESS [ 11.695 s]
[INFO] flink-parquet ...................................... SUCCESS [ 8.175 s]
[INFO] flink-sequence-file ................................ SUCCESS [ 1.294 s]
[INFO] flink-examples ..................................... SUCCESS [ 0.457 s]
[INFO] flink-examples-batch ............................... SUCCESS [ 48.271 s]
[INFO] flink-examples-streaming ........................... SUCCESS [ 32.402 s]
[INFO] flink-examples-table ............................... SUCCESS [ 31.118 s]
[INFO] flink-examples-build-helper ........................ SUCCESS [ 0.274 s]
[INFO] flink-examples-streaming-twitter ................... SUCCESS [ 1.147 s]
[INFO] flink-examples-streaming-state-machine ............. SUCCESS [ 0.807 s]
[INFO] flink-examples-streaming-gcp-pubsub ................ SUCCESS [ 6.252 s]
[INFO] flink-container .................................... SUCCESS [ 1.893 s]
[INFO] flink-queryable-state-runtime ...................... SUCCESS [ 3.343 s]
[INFO] flink-gelly ........................................ SUCCESS [ 10.752 s]
[INFO] flink-gelly-scala .................................. SUCCESS [01:01 min]
[INFO] flink-gelly-examples ............................... SUCCESS [ 49.843 s]
[INFO] flink-cep-scala .................................... SUCCESS [ 28.524 s]
[INFO] flink-state-processor-api .......................... SUCCESS [ 5.763 s]
[INFO] flink-table-uber ................................... SUCCESS [ 15.688 s]
[INFO] flink-table-uber-blink ............................. SUCCESS [ 3.355 s]
[INFO] flink-sql-client ................................... SUCCESS [ 23.316 s]
[INFO] flink-quickstart ................................... SUCCESS [ 2.348 s]
[INFO] flink-quickstart-java .............................. SUCCESS [ 2.894 s]
[INFO] flink-quickstart-scala ............................. SUCCESS [ 3.156 s]
[INFO] flink-contrib ...................................... SUCCESS [ 0.203 s]
[INFO] flink-connector-wikiedits .......................... SUCCESS [ 2.453 s]
[INFO] flink-mesos ........................................ SUCCESS [01:05 min]
[INFO] flink-yarn ......................................... SUCCESS [ 6.167 s]
[INFO] flink-metrics-dropwizard ........................... SUCCESS [ 0.927 s]
[INFO] flink-metrics-graphite ............................. SUCCESS [ 0.302 s]
[INFO] flink-metrics-influxdb ............................. SUCCESS [ 1.825 s]
[INFO] flink-metrics-prometheus ........................... SUCCESS [ 1.091 s]
[INFO] flink-metrics-statsd ............................... SUCCESS [ 0.853 s]
[INFO] flink-metrics-datadog .............................. SUCCESS [ 1.633 s]
[INFO] flink-metrics-slf4j ................................ SUCCESS [ 0.880 s]
[INFO] flink-python ....................................... SUCCESS [ 4.793 s]
[INFO] flink-scala-shell .................................. SUCCESS [ 43.526 s]
[INFO] flink-dist ......................................... SUCCESS [ 18.899 s]
[INFO] flink-docs ......................................... SUCCESS [ 5.133 s]
[INFO] flink-ml-parent .................................... SUCCESS [ 0.399 s]
[INFO] flink-ml-api ....................................... SUCCESS [ 4.031 s]
[INFO] flink-ml-lib ....................................... SUCCESS [ 1.550 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18:19 min (Wall Clock)
[INFO] Finished at: 2020-01-02T23:01:28+08:00
[INFO] Final Memory: 334M/847M
[INFO] ------------------------------------------------------------------------

提取编译后的安装包

编译成功之后的位置在flink-1.10.0/flink-dist/target/flink-1.10.0-bin中,将文件目录压缩,上传到服务器上就可以配置使用了。

单节点部署测试

  1. 修改flink-conf.yaml

    jobmanager.rpc.address: hadoop	
    taskmanager.numberOfTaskSlots: 4
    rest.port: 18081
  2. 修改slaves

    hadoop
  3. 启动

    start-cluster.sh
  4. UI启动

    hadoop:18081
  5. 在windows快速生成Flink项目

    mvn archetype:generate \
    -DarchetypeGroupId=org.apache.flink \
    -DarchetypeArtifactId=flink-quickstart-scala \
    -DarchetypeVersion=${1:-1.10.0} \
    -DgroupId=com.tunan.flink \
    -DartifactId=tunan-flink \
    -Dversion=1.0.0 \
    -Dpackage=com.tunan.quickstart \
    -DinteractiveMode=false

参考:

Apache Flink官方编译手册

Flink1.9.1源码编译支持hadoop-2.6.0-cdh5.16.2

Flink1.9.1源码编译支持hadoop-2.6.0-cdh5.16.2

Maven添加Kafka Schema Registry的pom依赖

Maven-CDH版本hadoop添加pom的依赖

Author: Tunan
Link: http://yerias.github.io/2020/05/10/flink/1/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.