使用PySpark从Azure Blob存储中读取CSV文件 [英] reading a csv file from azure blob storage with PySpark

查看：226 发布时间：2020/9/17 22:14:45 azure apache-spark pyspark azure-storage azure-hdinsight

本文介绍了使用PySpark从Azure Blob存储中读取CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Microsoft Azure上的PySpark HDInsight群集来做一个机器学习项目.要在群集上运行，请使用Jupyter笔记本.另外，我将数据(一个csv文件)存储在Azure Blob存储中.

I'm trying to do a machine learning project using a PySpark HDInsight cluster on Microsoft Azure. To operate on my cluster a use a Jupyter notebook. Also, I have my data (a csv file), stored on the Azure Blob storage.

根据文档，我文件路径的语法为:

According to the documentation the syntax of the path to my file is:

path = 'wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'

但是，当我尝试使用以下命令读取csv文件时:

However, when i try to read the csv file with the following command:

csvFile = spark.read.csv(path, header=True, inferSchema=True)

我收到以下错误:

'java.net.URISyntaxException: Illegal character in scheme name at index 4: wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'

这是该错误的屏幕截图，看起来像在笔记本中:

Here is a screenshot of the the error looks like in the notebook:

关于如何解决此问题的任何想法?

Any ideas on how to fix this?

使用PySpark从Azure Blob存储中读取CSV文件 [英] reading a csv file from azure blob storage with PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用PySpark从Azure Blob存储中读取CSV文件 [英] reading a csv file from azure blob storage with PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭