如何在 Ubuntu 18.04 LTS 上安装 parquet-tools,无需从源代码构建

如何在 Ubuntu 18.04 LTS 上安装 parquet-tools,无需从源代码构建

我见过:

还有一些关于安装的内容thrift。我真的不想从源thirft代码构建parquet-mr我想要的只是parquet-tools

我上线了:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
$

我尝试过的事情:

  • 下载源代码github阿帕奇

  • 尝试按照说明从源代码构建这里这里。我遇到了许多不同的错误。

  • master从某些发布标签构建,例如。1.11.x出现各种错误,例如

    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project parquet-generator: Error rendering velocity resource.
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
        ...
    Caused by: org.apache.maven.plugin.MojoExecutionException: Error rendering velocity resource.
        at org.apache.maven.plugin.resources.remote.ProcessRemoteResourcesMojo.processResourceBundles (ProcessRemoteResourcesMojo.java:1246)
        ...
    Caused by: java.lang.NullPointerException
        at java.util.Objects.requireNonNull (Objects.java:203)
        ...
    
  • 使用以下方式安装 thrift sudo apt-get install thrift-compiler(安装0.9.x,但在构建时会出现编译错误parquet-mr

    [DEBUG]   (f) arguments = [-c, thrift -version | fgrep 'Thrift version 0.12.0' && exit 0;
                          echo "=================================================================================";
                          echo "========== [FATAL] Build is configured to require Thrift version 0.12.0 ==========";
                          echo -n "========== Currently installed: ";
                          thrift -version;
                          echo "=================================================================================";
                          exit 1]
    
  • 尝试构建thrift 来自源头,我收到一些错误:

    checking whether we are cross compiling... configure: error: in `/home/kash/vm_share/thrift-0.13.0':
    configure: error: cannot run C compiled programs.
    
  • 尝试寻找0.12/13.0预建的thrift但找不到。好像对于仿生来说只有0.9.0

拜托!我只想在命令行上查看 parquet 文件的元数据。

答案1

所以我最终设法从源代码进行编译。

总结

  1. trift用编译--host=x86_64
  2. 在 parquet-mr repo 上使用apache-parquet-1.11.11标签而不是master
  3. 将 trift 依赖项版本从 12 更新到 13,parquet-mr/pom.xml并添加 maven central repo(codehaus已失效):
+    <repository>
+      <id>mvnrepository</id>
+      <url>https://repo1.maven.org/maven2/</url>
+    </repository>
...
-    <thrift.version>0.12.0</thrift.version>
+    <thrift.version>0.13.0</thrift.version>

# install dependencies as described here: https://thrift.apache.org/docs/install/debian.html

# install thrift from source
wget -nv http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz
tar xzf thrift-0.13.0.tar.gz
cd thrift-0.13.0
chmod +x ./configure
./configure --host=x86_64 --disable-libs
sudo make install

# build parquet-tools from source
git clone https://github.com/Parquet/parquet-mr.git
cd parquet-mr
git checkout apache-parquet-1.11.11

# build only parquet-tools and it's dependencies
# had to skip tests because one failed
mvn package -pl parquet-tools -am -Plocal -Dmaven.test.skip=true

# Use
java -jar parquet-tools/target/parquet-tools-*.jar --help

# Or if you're lazy like me:
alias parquet-tools="java -jar $(realpath ./parquet-tools/target/parquet-tools-*.jar)"

parquet-tools -h

答案2

如果你感兴趣的话,你可以用 homebrew 来做:

brew install parquet-tools

它对我来说是有效的(在 20.04LTS 上),但是它确实花了一段时间并且拖拽了很多东西。

相关内容